k-skill/scripts/test_naver_blog_search.py
owen 4f015c5680
feat: 네이버 블로그 리서치 스킬 추가 (#107)
* feat: 네이버 블로그 리서치 스킬 추가

API 키 없이 python3 표준 라이브러리만으로 네이버 블로그 검색, 원문 읽기, 이미지 다운로드를 수행하는 스킬.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Restore Naver blog search paging and newest-sort behavior

Naver's current blog search surface does not honor the older
where=blog + sort query pattern used by this skill. The request
now targets the blog tab surface, uses the observed NSO sort
controls, and trims each parsed page to the visible 15-result
window so count-based pagination returns distinct results.

Constraint: Must keep using stdlib-only HTTP scraping without adding dependencies
Constraint: Current Naver blog tab behavior requires ssc/tab parameters plus nso sort controls
Rejected: Keep where=blog and tune start values only | still returned repeated first-page results
Rejected: Leave sort=date as-is | current endpoint ignored it and returned relevance ordering
Confidence: medium
Scope-risk: narrow
Reversibility: clean
Directive: Re-verify request params against live Naver markup before changing paging or sort semantics again
Tested: python3 -m py_compile on naver-blog-research scripts and new regression test; PYTHONPATH=.:scripts python3 -m unittest scripts.test_naver_blog_search; npm run lint; live naver_search.py --count 20/30 --sort sim; live naver_search.py --count 10/20 --sort date
Not-tested: Full npm run test remains blocked by unrelated local pyexpat/libexpat environment failures in patent-search tests

* Surface the new Naver blog skill in the main README

PR 107 adds the skill and feature guide, but the repository landing page
still omitted it from the user-facing capability list. This commit keeps the
README aligned with the actual shipped skill set so users can discover the
new entry point from the main docs.

Constraint: README capability tables and feature lists should stay aligned with docs/features entries
Rejected: Leave README unchanged until merge | hides the new skill from the main index during PR review
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: When adding a new skill guide, update both the summary table and the included-features list together
Tested: README diff review; verified docs/features/naver-blog-research.md link target exists
Not-tested: Full npm run ci (docs-only change)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Jeffrey (Dongkyu) Kim <vkehfdl1@gmail.com>
2026-04-13 00:06:18 +09:00

91 lines
3.7 KiB
Python

import importlib.util
import pathlib
import unittest
from unittest import mock
MODULE_PATH = pathlib.Path(__file__).resolve().parents[1] / "naver-blog-research" / "scripts" / "naver_search.py"
MODULE_SPEC = importlib.util.spec_from_file_location("naver_search", MODULE_PATH)
naver_search = importlib.util.module_from_spec(MODULE_SPEC)
assert MODULE_SPEC.loader is not None
MODULE_SPEC.loader.exec_module(naver_search)
def make_result(index: int) -> dict[str, str]:
return {
"url": f"https://blog.naver.com/author{index}/{200000000000 + index}",
"mobile_url": f"https://m.blog.naver.com/author{index}/{200000000000 + index}",
"author": f"author{index}",
"title": f"title-{index}",
"snippet": f"snippet-{index}",
}
class RequestBuilderTest(unittest.TestCase):
def test_build_search_params_target_blog_tab_and_switch_sm_for_paging(self):
page_one = naver_search.build_search_params("서울 맛집", start=1, sort="sim")
page_two = naver_search.build_search_params("서울 맛집", start=16, sort="date")
self.assertEqual(page_one["ssc"], "tab.blog.all")
self.assertEqual(page_one["sm"], "tab_jum")
self.assertEqual(page_one["start"], "1")
self.assertEqual(page_one["nso"], "so:r,p:all,a:all")
self.assertEqual(page_two["ssc"], "tab.blog.all")
self.assertEqual(page_two["sm"], "tab_pge")
self.assertEqual(page_two["start"], "16")
self.assertEqual(page_two["nso"], "so:dd,p:all,a:all")
class SearchWorkflowTest(unittest.TestCase):
def test_search_uses_15_result_pages_and_ignores_extra_anchors_beyond_page_window(self):
fetch_starts: list[int] = []
parsed_pages = {
"page-1": [make_result(index) for index in range(1, 16)] + [make_result(101), make_result(102)],
"page-16": [make_result(index) for index in range(16, 31)] + [make_result(101), make_result(102)],
}
def fake_fetch(query: str, start: int = 1, sort: str = "sim", timeout: int = 15, *, insecure: bool = False) -> str:
self.assertEqual(query, "서울 맛집")
self.assertEqual(sort, "sim")
self.assertEqual(timeout, 15)
self.assertFalse(insecure)
fetch_starts.append(start)
return f"page-{start}"
def fake_parse(html: str) -> list[dict]:
return parsed_pages[html]
with (
mock.patch.object(naver_search, "fetch_search_page", side_effect=fake_fetch),
mock.patch.object(naver_search, "parse_search_results", side_effect=fake_parse),
mock.patch.object(naver_search.time, "sleep"),
):
result = naver_search.search("서울 맛집", count=20)
self.assertEqual(fetch_starts, [1, 16])
self.assertEqual(result["total_results"], 20)
self.assertEqual(
[item["url"] for item in result["results"]],
[make_result(index)["url"] for index in range(1, 21)],
)
def test_search_passes_date_sort_through_to_fetcher(self):
captured_sorts: list[str] = []
def fake_fetch(query: str, start: int = 1, sort: str = "sim", timeout: int = 15, *, insecure: bool = False) -> str:
captured_sorts.append(sort)
return "page-1"
with (
mock.patch.object(naver_search, "fetch_search_page", side_effect=fake_fetch),
mock.patch.object(naver_search, "parse_search_results", return_value=[make_result(1)]),
):
result = naver_search.search("서울 맛집", count=1, sort="date")
self.assertEqual(captured_sorts, ["date"])
self.assertEqual(result["results"][0]["url"], make_result(1)["url"])
if __name__ == "__main__":
unittest.main()