mirror of
https://github.com/NomaDamas/k-skill.git
synced 2026-06-24 02:04:11 +00:00
Namu Wiki's current HTML layout uses build-time-obfuscated CSS class
names (e.g. _36R8DWTn, OZVChh+l) and has no <article>/<main>/<section>
tags, so all six MAIN_CONTENT_CLASSES anchors fail to match and
extract_summary() returned empty with a 'Main content region not
detected' warning on every live page.
Replace the single class-based strategy with a three-tier fallback
chain that pins to progressively weaker but more structurally stable
anchors:
1. First h2 section boundary. Namu Wiki articles consistently open
with '<h2>1. 개요[편집]</h2>' and mark subsequent sections with
numbered h2 headings. Extracting text between the first and
second h2 reliably captures the overview section on every page
sampled (중꺾마, 갓생, 럭키비키, 어쩔티비).
2. MAIN_CONTENT_CLASSES / <article> - kept as a legacy fallback
for older Namu Wiki layouts and for third-party fixtures.
3. og:description meta tag - final safety net before returning
empty, gives the agent at least a ~64-char preview when the
article has unusual structure.
Strip '[편집]' edit-affordance markers and numbered section prefixes
(e.g. '1.2.') from the extracted text so headings don't leak through
as noise.
Live verification (text format):
slang_lookup.py 중꺾마 -> Title + 286-char summary
slang_lookup.py 갓생 -> Title + 96-char summary
slang_lookup.py 럭키비키 -> Title + 59-char summary
slang_lookup.py 어쩔티비 -> Title + 20-char summary
All previously-empty. Not-found / blocked / upstream-error paths and
exit codes are unchanged.
|
||
|---|---|---|
| .. | ||
| fixtures | ||
| build-region-codes.js | ||
| check-setup.sh | ||
| fine_dust.py | ||
| geeknews_search.py | ||
| kakaotalk_mac.py | ||
| korean_character_count.js | ||
| korean_spell_check.py | ||
| ktx_booking.py | ||
| mfds_drug_safety.py | ||
| mfds_food_safety.py | ||
| patent_search.py | ||
| run-k-skill-proxy.sh | ||
| sillok_search.py | ||
| skill-docs.test.js | ||
| subway_lost_property.py | ||
| test_coupang_partners_mcp_wrapper.py | ||
| test_fine_dust.py | ||
| test_geeknews_search.py | ||
| test_kakaotalk_mac.py | ||
| test_korean_character_count.js | ||
| test_korean_slang_writing.py | ||
| test_korean_spell_check.py | ||
| test_ktx_booking.py | ||
| test_mfds_drug_safety.py | ||
| test_mfds_food_safety.py | ||
| test_naver_blog_search.py | ||
| test_patent_search.py | ||
| test_sillok_search.py | ||
| test_subway_lost_property.py | ||
| test_zipcode_search.py | ||
| validate-skills.sh | ||
| zipcode_search.py | ||