Restore Naver blog search paging and newest-sort behavior

Naver's current blog search surface does not honor the older where=blog + sort query pattern used by this skill. The request now targets the blog tab surface, uses the observed NSO sort controls, and trims each parsed page to the visible 15-result window so count-based pagination returns distinct results. Constraint: Must keep using stdlib-only HTTP scraping without adding dependencies Constraint: Current Naver blog tab behavior requires ssc/tab parameters plus nso sort controls Rejected: Keep where=blog and tune start values only | still returned repeated first-page results Rejected: Leave sort=date as-is | current endpoint ignored it and returned relevance ordering Confidence: medium Scope-risk: narrow Reversibility: clean Directive: Re-verify request params against live Naver markup before changing paging or sort semantics again Tested: python3 -m py_compile on naver-blog-research scripts and new regression test; PYTHONPATH=.:scripts python3 -m unittest scripts.test_naver_blog_search; npm run lint; live naver_search.py --count 20/30 --sort sim; live naver_search.py --count 10/20 --sort date Not-tested: Full npm run test remains blocked by unrelated local pyexpat/libexpat environment failures in patent-search tests
feat: 네이버 블로그 리서치 스킬 추가
2026-06-24 02:04:11 +00:00 · 2026-04-12 23:36:08 +09:00 · 2026-04-12 02:30:14 +09:00
9 changed files with 1036 additions and 2 deletions
--- a/docs/features/naver-blog-research.md
+++ b/docs/features/naver-blog-research.md
@ -0,0 +1,63 @@
+# 네이버 블로그 리서치 가이드
+
+## 이 기능으로 할 수 있는 일
+
+- 네이버 블로그 키워드 검색 (관련도순/최신순 정렬)
+- 블로그 포스트 원문 텍스트 추출
+- 블로그 포스트 내 이미지 URL 추출 및 로컬 다운로드
+- 구글 검색과 병행한 한국어 콘텐츠 교차 검증 리서치
+
+## 먼저 필요한 것
+
+- `python3` 3.8+
+- 인터넷 연결
+- API 키 불필요
+
+## 입력값
+
+- 검색: 검색어 문자열 (예: `"서울 맛집 추천"`)
+- 원문 읽기: 네이버 블로그 포스트 URL (PC 또는 모바일)
+- 이미지 다운로드: 이미지 URL 목록 또는 `naver_read.py` 파이프 출력
+
+## 공식 표면
+
+- 검색: `https://search.naver.com/search.naver?where=blog&query={query}`
+- 블로그 원문 (모바일): `https://m.blog.naver.com/{userId}/{postId}`
+- 이미지 CDN: `blogfiles.naver.net`, `postfiles.pstatic.net`
+
+## 기본 흐름
+
+1. `naver_search.py`로 네이버 블로그 검색 실행
+2. 검색 결과에서 상위 3~5개 포스트 선택
+3. `naver_read.py`로 선택한 포스트의 원문 읽기
+4. 필요 시 `naver_download_images.py`로 이미지 로컬 저장
+5. 구글 검색(WebSearch) 결과와 교차 검증하여 정보 신뢰도 확보
+
+## 예시
+
+블로그 검색:
+
+```bash
+python3 scripts/naver_search.py "제주도 여행 코스" --count 5 --sort sim
+```
+
+블로그 원문 읽기:
+
+```bash
+python3 scripts/naver_read.py "https://blog.naver.com/user123/224212849946"
+```
+
+이미지 다운로드:
+
+```bash
+python3 scripts/naver_read.py "https://blog.naver.com/user123/224212849946" \
+  | python3 scripts/naver_download_images.py --output ./images/ --max 5
+```
+
+## 주의 사항
+
+- 네이버 검색엔진에 직접 요청하므로 대량 자동화 시 IP 차단 가능성이 있다. 한 세션에 과도한 요청을 자제한다.
+- 이 스킬은 소량·비상업적 콘텐츠 리서치 용도로 설계되었다.
+- 네이버 HTML 구조 변경 시 파싱이 실패할 수 있다. 에러 발생 시 스크립트 업데이트가 필요하다.
+- PC 버전(`blog.naver.com`)은 iframe 구조여서 모바일 버전(`m.blog.naver.com`)을 사용한다.
+- 블로그 출처(URL, 작성자)를 사용자에게 반드시 함께 안내한다.
--- a/naver-blog-research/.gitignore
+++ b/naver-blog-research/.gitignore
@ -0,0 +1,3 @@
+__pycache__/
+*.pyc
+naver-images/
--- a/naver-blog-research/SKILL.md
+++ b/naver-blog-research/SKILL.md
@ -0,0 +1,138 @@
+---
+name: naver-blog-research
+description: Search Naver blogs, read full post content, and download images using only python3 stdlib — no API key required.
+license: MIT
+metadata:
+  category: research
+  locale: ko-KR
+  phase: v1
+---
+
+# 네이버 블로그 리서치
+
+## What this skill does
+
+네이버 블로그를 검색하고, 개별 포스트의 원문을 읽고, 이미지를 로컬에 다운로드한다.
+
+- API 키 없이 `python3` 표준 라이브러리만으로 동작한다.
+- 검색 결과를 구조화된 JSON으로 출력한다.
+- 모바일 버전(`m.blog.naver.com`)을 이용해 iframe 없이 본문을 직접 추출한다.
+- 블로그 이미지 CDN(`blogfiles.naver.net`, `postfiles.pstatic.net`)에서 이미지를 다운로드한다.
+
+## When to use
+
+- "네이버 블로그에서 결혼식 체크리스트 검색해줘"
+- "네이버 블로그 리서치 해줘"
+- "한국 블로그에서 관련 정보 조사해줘"
+- "네이버 블로그 글 읽어줘"
+- "이 네이버 블로그 포스트에서 이미지 다운로드해줘"
+- 한국어 콘텐츠 리서치에서 구글 외 네이버 블로그 소스가 필요한 상황
+
+## When not to use
+
+- 네이버 뉴스, 카페, 지식iN 등 블로그 외 네이버 서비스 검색
+- 대량 크롤링/스크래핑 (한 세션에 수십 건 이상의 요청)
+- 상업적 데이터 수집
+
+## Prerequisites
+
+- 인터넷 연결
+- `python3` 3.8+
+- 이 스킬 디렉토리의 `scripts/` 안에 포함된 helper 스크립트
+
+## Workflow
+
+### 1. 네이버 블로그 검색
+
+```bash
+python3 scripts/naver_search.py "검색어" --count 10 --sort sim
+```
+
+| 인자 | 필수 | 설명 | 기본값 |
+|------|------|------|--------|
+| query | O | 검색어 | - |
+| --count | X | 결과 수 (최대 30) | 10 |
+| --sort | X | sim(관련도), date(최신) | sim |
+| --timeout | X | 요청 타임아웃(초) | 15 |
+
+출력 예시:
+
+```json
+{
+  "query": "결혼식 체크리스트",
+  "total_results": 7,
+  "results": [
+    {
+      "title": "결혼식 체크리스트 총정리",
+      "url": "https://blog.naver.com/user123/224212849946",
+      "mobile_url": "https://m.blog.naver.com/user123/224212849946",
+      "snippet": "결혼식 1주일 전에 반드시 확인해야 할...",
+      "author": "user123"
+    }
+  ]
+}
+```
+
+### 2. 블로그 원문 읽기
+
+검색 결과에서 관심 있는 포스트의 URL을 선택하여 원문을 읽는다.
+
+```bash
+python3 scripts/naver_read.py "https://blog.naver.com/user123/224212849946"
+```
+
+| 인자 | 필수 | 설명 | 기본값 |
+|------|------|------|--------|
+| url | O | 블로그 포스트 URL (PC 또는 모바일) | - |
+| --no-images | X | 이미지 URL 제외 | false |
+| --max-length | X | 본문 최대 글자 수 (0=무제한) | 0 |
+| --timeout | X | 요청 타임아웃(초) | 20 |
+
+PC URL을 넣어도 자동으로 모바일 URL로 변환하여 요청한다.
+
+### 3. 이미지 다운로드 (필요 시)
+
+```bash
+python3 scripts/naver_download_images.py --urls "url1,url2,url3" --output ./images/
+```
+
+또는 `naver_read.py` 결과를 파이프로 전달:
+
+```bash
+python3 scripts/naver_read.py "https://..." | python3 scripts/naver_download_images.py --output ./images/
+```
+
+| 인자 | 필수 | 설명 | 기본값 |
+|------|------|------|--------|
+| --urls | X | 쉼표 구분 이미지 URL | - |
+| --output | X | 저장 디렉토리 | ./naver-images/ |
+| --max | X | 최대 다운로드 수 | 10 |
+| --timeout | X | 요청 타임아웃(초) | 15 |
+
+### 추천 워크플로우
+
+1. `naver_search.py`로 검색 → 상위 3~5개 결과 확인
+2. 관련도 높은 포스트를 `naver_read.py`로 원문 읽기
+3. 필요 시 `naver_download_images.py`로 이미지 저장
+4. WebSearch(구글) 결과와 교차 검증하여 정보 신뢰도 높이기
+
+## Response policy
+
+- 검색 결과와 본문은 사용자에게 요약하여 전달한다.
+- 블로그 출처(URL, 작성자)를 반드시 함께 안내한다.
+- 한 세션에 과도한 요청(수십 건 이상)을 자제한다.
+- 이미지 다운로드 시 사용자에게 저장 경로를 안내한다.
+
+## Done when
+
+- 검색 결과가 JSON으로 정상 출력된다.
+- 블로그 원문 텍스트가 추출된다.
+- 필요한 이미지가 로컬에 저장된다.
+- 출처가 명시된다.
+
+## Notes
+
+- 네이버 검색엔진을 직접 요청하므로 대량/자동화 사용 시 IP 차단 가능성이 있다.
+- 이 스킬은 소량, 비상업적 콘텐츠 리서치 용도로 설계되었다.
+- 네이버 HTML 구조는 변경될 수 있어, 파싱 실패 시 에러 메시지를 확인하고 스크립트 업데이트가 필요할 수 있다.
+- PC 버전(`blog.naver.com`)은 iframe 구조여서 모바일 버전(`m.blog.naver.com`)을 사용한다.
--- a/naver-blog-research/scripts/_naver_http.py
+++ b/naver-blog-research/scripts/_naver_http.py
@ -0,0 +1,58 @@
+"""Shared HTTP utilities for Naver blog scripts (SSL handling, URL validation, urlopen wrapper)."""
+
+from __future__ import annotations
+
+import re
+import ssl
+import sys
+import urllib.error
+import urllib.parse
+import urllib.request
+
+
+TAG_RE = re.compile(r"<[^>]+>")
+
+_ssl_ctx_secure: ssl.SSLContext | None = None
+_ssl_ctx_insecure: ssl.SSLContext | None = None
+
+
+def _get_ssl_context(*, insecure: bool = False) -> ssl.SSLContext:
+    global _ssl_ctx_secure, _ssl_ctx_insecure
+    if insecure:
+        if _ssl_ctx_insecure is None:
+            ctx = ssl.create_default_context()
+            ctx.check_hostname = False
+            ctx.verify_mode = ssl.CERT_NONE
+            _ssl_ctx_insecure = ctx
+        return _ssl_ctx_insecure
+    if _ssl_ctx_secure is None:
+        _ssl_ctx_secure = ssl.create_default_context()
+    return _ssl_ctx_secure
+
+
+_NAVER_DOMAINS = (".naver.com", ".naver.net", ".pstatic.net")
+
+
+def is_naver_url(url: str) -> bool:
+    host = urllib.parse.urlparse(url).hostname or ""
+    return any(host == d.lstrip(".") or host.endswith(d) for d in _NAVER_DOMAINS)
+
+
+def urlopen(request: urllib.request.Request, timeout: int, *, insecure: bool = False):
+    """urlopen with explicit SSL insecure mode for Naver domains.
+
+    When *insecure* is True and the target is a Naver domain, SSL certificate
+    verification is skipped.  A warning is printed to stderr on every call so
+    the caller is always aware.
+    """
+    if insecure:
+        if not is_naver_url(request.full_url):
+            raise ValueError("insecure 모드는 네이버 도메인에만 사용할 수 있습니다.")
+        print(
+            "[warn] SSL 인증서 검증이 비활성화되었습니다. 연결이 안전하지 않을 수 있습니다.",
+            file=sys.stderr,
+        )
+        return urllib.request.urlopen(
+            request, timeout=timeout, context=_get_ssl_context(insecure=True),
+        )
+    return urllib.request.urlopen(request, timeout=timeout, context=_get_ssl_context())
--- a/naver-blog-research/scripts/naver_download_images.py
+++ b/naver-blog-research/scripts/naver_download_images.py
@ -0,0 +1,233 @@
+from __future__ import annotations
+
+import argparse
+import json
+import os
+import sys
+import urllib.error
+import urllib.request
+from concurrent.futures import ThreadPoolExecutor, as_completed
+
+sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
+from _naver_http import is_naver_url, urlopen
+
+DEFAULT_OUTPUT_DIR = "./naver-images"
+DEFAULT_MAX = 10
+DEFAULT_TIMEOUT = 15
+
+DEFAULT_HEADERS = {
+    "Accept": "image/webp,image/apng,image/*,*/*;q=0.8",
+    "Accept-Language": "ko,en-US;q=0.9,en;q=0.8",
+    "Referer": "https://m.blog.naver.com/",
+    "User-Agent": (
+        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
+        "AppleWebKit/537.36 (KHTML, like Gecko) Chrome/136.0.0.0 Safari/537.36"
+    ),
+}
+
+CONTENT_TYPE_TO_EXT = {
+    "image/jpeg": ".jpg",
+    "image/png": ".png",
+    "image/gif": ".gif",
+    "image/webp": ".webp",
+    "image/bmp": ".bmp",
+    "image/svg+xml": ".svg",
+}
+
+
+_MAGIC_BYTES = (
+    (b"\x89PNG\r\n\x1a\n", ".png"),
+    (b"GIF87a", ".gif"),
+    (b"GIF89a", ".gif"),
+    (b"RIFF", ".webp"),  # WebP: RIFF....WEBP (check first 4 bytes)
+    (b"BM", ".bmp"),
+)
+
+
+def guess_extension(url: str, content_type: str | None = None, data: bytes | None = None) -> str:
+    if content_type:
+        ct = content_type.split(";")[0].strip().lower()
+        if ct in CONTENT_TYPE_TO_EXT:
+            return CONTENT_TYPE_TO_EXT[ct]
+
+    lower_url = url.lower().split("?")[0]
+    for ext in (".jpg", ".jpeg", ".png", ".gif", ".webp", ".bmp", ".svg"):
+        if lower_url.endswith(ext):
+            return ".jpg" if ext == ".jpeg" else ext
+
+    if data:
+        for magic, ext in _MAGIC_BYTES:
+            if data[:len(magic)] == magic:
+                if ext == ".webp" and data[8:12] != b"WEBP":
+                    continue
+                return ext
+        if data[:2] in (b"\xff\xd8",):
+            return ".jpg"
+
+    return ".jpg"
+
+
+def download_image(url: str, output_path: str, output_dir: str, timeout: int = DEFAULT_TIMEOUT, *, insecure: bool = False) -> dict:
+    """Download a single image from a Naver CDN URL.
+
+    *output_dir* is used solely for path-traversal protection: the resolved
+    *output_path* must reside inside *output_dir*.
+    """
+    if not is_naver_url(url):
+        return {"url": url, "error": "Not a Naver CDN URL. Skipped."}
+
+    real_dir = os.path.realpath(output_dir)
+    if not os.path.realpath(output_path).startswith(real_dir + os.sep):
+        return {"url": url, "error": "Output path escapes target directory. Skipped."}
+
+    request = urllib.request.Request(url, headers=DEFAULT_HEADERS)
+
+    try:
+        with urlopen(request, timeout, insecure=insecure) as response:
+            data = response.read()
+            content_type = response.headers.get("Content-Type", "")
+    except (urllib.error.HTTPError, urllib.error.URLError, OSError) as error:
+        return {"url": url, "error": str(error)}
+
+    ext = guess_extension(url, content_type, data)
+    if not os.path.splitext(output_path)[1]:
+        output_path += ext
+
+    os.makedirs(os.path.dirname(output_path) or ".", exist_ok=True)
+
+    with open(output_path, "wb") as f:
+        f.write(data)
+
+    size_kb = round(len(data) / 1024, 1)
+    return {"url": url, "path": output_path, "size_kb": size_kb}
+
+
+def download_images(
+    urls: list[str],
+    output_dir: str = DEFAULT_OUTPUT_DIR,
+    max_count: int = DEFAULT_MAX,
+    timeout: int = DEFAULT_TIMEOUT,
+    *,
+    insecure: bool = False,
+) -> dict:
+    os.makedirs(output_dir, exist_ok=True)
+
+    max_count = max(1, max_count)
+    targets = urls[:max_count]
+    downloaded: list[dict] = []
+    failed: list[dict] = []
+
+    # index → result 순서를 보장하기 위해 dict로 매핑
+    results_by_index: dict[int, dict] = {}
+
+    with ThreadPoolExecutor(max_workers=min(4, max(1, len(targets)))) as executor:
+        future_to_index = {}
+        for i, url in enumerate(targets, start=1):
+            filename = f"{i:03d}"
+            output_path = os.path.join(output_dir, filename)
+            future = executor.submit(download_image, url, output_path, output_dir, timeout, insecure=insecure)
+            future_to_index[future] = i
+
+        for future in as_completed(future_to_index):
+            idx = future_to_index[future]
+            try:
+                results_by_index[idx] = future.result()
+            except Exception as exc:
+                results_by_index[idx] = {"url": targets[idx - 1], "error": str(exc)}
+
+    # 원래 순서대로 정렬
+    for idx in sorted(results_by_index):
+        result = results_by_index[idx]
+        if "error" in result:
+            failed.append(result)
+        else:
+            downloaded.append(result)
+
+    return {
+        "downloaded": len(downloaded),
+        "files": downloaded,
+        "failed": failed,
+    }
+
+
+def parse_args(argv: list[str]) -> argparse.Namespace:
+    parser = argparse.ArgumentParser(
+        description="Download images from Naver blog CDN URLs."
+    )
+    parser.add_argument(
+        "--urls", type=str, default="",
+        help="Comma-separated image URLs.",
+    )
+    parser.add_argument(
+        "--output", type=str, default=DEFAULT_OUTPUT_DIR,
+        help=f"Output directory. Default: {DEFAULT_OUTPUT_DIR}",
+    )
+    parser.add_argument(
+        "--max", type=int, default=DEFAULT_MAX,
+        help=f"Maximum number of images to download. Default: {DEFAULT_MAX}",
+    )
+    parser.add_argument(
+        "--timeout", type=int, default=DEFAULT_TIMEOUT,
+        help=f"HTTP request timeout in seconds. Default: {DEFAULT_TIMEOUT}",
+    )
+    parser.add_argument(
+        "--insecure", action="store_true",
+        help="Skip SSL certificate verification (use only when certificate errors occur).",
+    )
+    return parser.parse_args(argv)
+
+
+def read_urls_from_stdin() -> list[str]:
+    try:
+        data = json.load(sys.stdin)
+        if isinstance(data, dict) and "images" in data:
+            return [img["url"] for img in data["images"] if isinstance(img, dict) and img.get("url")]
+        if isinstance(data, list):
+            return [
+                u for item in data
+                if (u := (item if isinstance(item, str) else item.get("url", "")))
+            ]
+        if isinstance(data, dict):
+            print(
+                "[warn] stdin JSON에 'images' 키가 없습니다. "
+                "naver_read.py 실행 시 --no-images 플래그를 사용하지 않았는지 확인하세요.",
+                file=sys.stderr,
+            )
+    except (json.JSONDecodeError, KeyError, TypeError) as exc:
+        print(f"[warn] stdin JSON 파싱 실패: {exc}", file=sys.stderr)
+        return []
+    return []
+
+
+def main(argv: list[str] | None = None) -> int:
+    args = parse_args(argv or sys.argv[1:])
+
+    urls: list[str] = []
+
+    if args.urls:
+        urls = [u.strip() for u in args.urls.split(",") if u.strip()]
+
+    if not urls and not sys.stdin.isatty():
+        urls = read_urls_from_stdin()
+
+    if not urls:
+        print(
+            json.dumps({"error": "No image URLs provided. Use --urls or pipe naver_read.py output via stdin."}, ensure_ascii=False),
+            file=sys.stderr,
+        )
+        return 1
+
+    result = download_images(
+        urls,
+        output_dir=args.output,
+        max_count=args.max,
+        timeout=args.timeout,
+        insecure=args.insecure,
+    )
+
+    print(json.dumps(result, ensure_ascii=False, indent=2))
+    return 0
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
--- a/naver-blog-research/scripts/naver_read.py
+++ b/naver-blog-research/scripts/naver_read.py
@ -0,0 +1,256 @@
+from __future__ import annotations
+
+import argparse
+import json
+import os
+import re
+import sys
+import urllib.error
+import urllib.request
+from html import unescape
+
+sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
+from _naver_http import TAG_RE, is_naver_url, urlopen
+
+MOBILE_UA = (
+    "Mozilla/5.0 (iPhone; CPU iPhone OS 17_0 like Mac OS X) "
+    "AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0 Mobile/15E148 Safari/604.1"
+)
+
+DEFAULT_HEADERS = {
+    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
+    "Accept-Language": "ko,en-US;q=0.9,en;q=0.8",
+    "User-Agent": MOBILE_UA,
+}
+
+BR_RE = re.compile(r"<br\s*/?>", re.IGNORECASE)
+BLOCK_END_RE = re.compile(r"</(p|div|li)>", re.IGNORECASE)
+WHITESPACE_RE = re.compile(r"[ \t]+")
+BLANK_LINES_RE = re.compile(r"\n{3,}")
+
+_IMG_CDN_HOSTS = r"(?:blogfiles\.naver\.net|postfiles\.pstatic\.net|mblogthumb-phinf\.pstatic\.net)"
+
+IMAGE_LAZY_PATTERN = re.compile(
+    rf'data-lazy-src="(https?://{_IMG_CDN_HOSTS}[^"]+)"'
+)
+IMAGE_SRC_PATTERN = re.compile(
+    rf'src="(https?://{_IMG_CDN_HOSTS}[^"]+)"'
+)
+IMAGE_ALT_PATTERN = re.compile(
+    r'alt="([^"]*)"'
+)
+
+TITLE_PATTERN = re.compile(
+    r'<title[^>]*>(.*?)</title>', re.DOTALL | re.IGNORECASE
+)
+
+SCRIPT_STYLE_RE = re.compile(r"<(script|style|noscript)[^>]*>.*?</\1>", re.DOTALL | re.IGNORECASE)
+
+PC_BLOG_RE = re.compile(r"^https?://blog\.naver\.com/")
+BLOG_ID_RE = re.compile(r"blog\.naver\.com/([a-zA-Z0-9_]+)/(\d+)")
+
+
+def to_mobile_url(url: str) -> str:
+    url = url.strip()
+    url = PC_BLOG_RE.sub("https://m.blog.naver.com/", url)
+    if not url.startswith("https://m.blog.naver.com/"):
+        match = BLOG_ID_RE.search(url)
+        if match:
+            url = f"https://m.blog.naver.com/{match.group(1)}/{match.group(2)}"
+    return url
+
+
+def fetch_blog_page(url: str, timeout: int = 20, *, insecure: bool = False) -> str:
+    mobile_url = to_mobile_url(url)
+    if not is_naver_url(mobile_url):
+        raise ValueError(f"Not a Naver blog URL: {url}")
+    request = urllib.request.Request(mobile_url, headers=DEFAULT_HEADERS)
+
+    try:
+        with urlopen(request, timeout, insecure=insecure) as response:
+            return response.read().decode("utf-8", "ignore")
+    except urllib.error.HTTPError as error:
+        raise RuntimeError(
+            f"Naver blog returned HTTP {error.code} for {mobile_url}. "
+            "The post may not exist or access may be restricted."
+        ) from error
+
+
+def extract_title(html: str) -> str:
+    match = TITLE_PATTERN.search(html)
+    if not match:
+        return ""
+    title = unescape(TAG_RE.sub("", match.group(1))).strip()
+    title = re.sub(r"\s*[-:|]?\s*네이버\s*블로그$", "", title).strip()
+    return title
+
+
+def _extract_div_block(html: str, start_pos: int) -> str:
+    tag_start = html.rfind("<div", 0, start_pos)
+    if tag_start < 0:
+        tag_start = start_pos
+
+    depth = 0
+    pos = tag_start
+    started = False
+    length = len(html)
+    while pos < length:
+        # HTML 주석 건너뛰기
+        if html[pos : pos + 4] == "<!--":
+            end = html.find("-->", pos + 4)
+            pos = end + 3 if end >= 0 else length
+            continue
+        if html[pos : pos + 4] == "<div" and (pos + 4 >= length or html[pos + 4] in (" ", ">", "\t", "\n", "/")):
+            depth += 1
+            started = True
+        elif html[pos : pos + 6] == "</div>":
+            depth -= 1
+            if started and depth == 0:
+                return html[tag_start : pos + 6]
+        pos += 1
+
+    return html[tag_start:]
+
+
+def extract_content_area(html: str) -> str:
+    cleaned = SCRIPT_STYLE_RE.sub("", html)
+
+    match = re.search(r'class="[^"]*\bse-main-container\b[^"]*"', cleaned)
+    if match:
+        return _extract_div_block(cleaned, match.start())
+
+    for class_name in ("post_ct", "postViewArea", "post-view"):
+        match = re.search(rf'class="[^"]*\b{re.escape(class_name)}\b[^"]*"', cleaned)
+        if match:
+            return _extract_div_block(cleaned, match.start())
+
+    marker = cleaned.find('id="viewTypeSelector"')
+    if marker >= 0:
+        return _extract_div_block(cleaned, marker)
+
+    return ""
+
+
+def extract_text(html_fragment: str) -> str:
+    text = BR_RE.sub("\n", html_fragment)
+    text = BLOCK_END_RE.sub("\n", text)
+    text = TAG_RE.sub("", text)
+    text = unescape(text)
+
+    lines = []
+    for line in text.split("\n"):
+        stripped = WHITESPACE_RE.sub(" ", line).strip()
+        if stripped:
+            lines.append(stripped)
+
+    result = "\n".join(lines)
+    result = BLANK_LINES_RE.sub("\n\n", result)
+    return result.strip()
+
+
+def extract_images(html_fragment: str) -> list[dict]:
+    images: list[dict] = []
+    seen_base: set[str] = set()
+
+    img_tags = re.finditer(r"<img\s[^>]+>", html_fragment, re.IGNORECASE)
+    for img_match in img_tags:
+        img_tag = img_match.group(0)
+
+        lazy_match = IMAGE_LAZY_PATTERN.search(img_tag)
+        src_match = IMAGE_SRC_PATTERN.search(img_tag)
+        url_match = lazy_match or src_match
+        if not url_match:
+            continue
+
+        url = url_match.group(1)
+
+        base_url = re.sub(r"\?type=.*$", "", url)
+        if base_url in seen_base:
+            continue
+        seen_base.add(base_url)
+
+        if "?type=" not in url:
+            url = base_url
+        elif "_blur" in url:
+            url = re.sub(r"\?type=w\d+_blur", "?type=w800", url)
+
+        alt_match = IMAGE_ALT_PATTERN.search(img_tag)
+        alt = unescape(alt_match.group(1)).strip() if alt_match else ""
+
+        images.append({"url": url, "alt": alt})
+
+    return images
+
+
+def read_blog(url: str, include_images: bool = True, max_length: int = 0, timeout: int = 20, *, insecure: bool = False) -> dict:
+    html = fetch_blog_page(url, timeout=timeout, insecure=insecure)
+    mobile_url = to_mobile_url(url)
+
+    title = extract_title(html)
+    content_area = extract_content_area(html)
+    content = extract_text(content_area)
+
+    if max_length > 0 and len(content) > max_length:
+        content = content[:max_length] + "..."
+
+    result: dict = {
+        "url": mobile_url,
+        "title": title,
+        "content": content,
+        "char_count": len(content),
+    }
+
+    if not content:
+        result["warning"] = "본문 영역을 찾지 못했습니다. 네이버 HTML 구조가 변경되었을 수 있습니다."
+
+    if include_images:
+        result["images"] = extract_images(content_area)
+
+    return result
+
+
+def parse_args(argv: list[str]) -> argparse.Namespace:
+    parser = argparse.ArgumentParser(
+        description="Read a Naver blog post and extract text content and images."
+    )
+    parser.add_argument("url", help="Naver blog post URL (PC or mobile).")
+    parser.add_argument(
+        "--no-images", action="store_true",
+        help="Exclude image URLs from output.",
+    )
+    parser.add_argument(
+        "--max-length", type=int, default=0,
+        help="Maximum content length in characters (0 = unlimited). Default: 0.",
+    )
+    parser.add_argument(
+        "--timeout", type=int, default=20,
+        help="HTTP request timeout in seconds. Default: 20.",
+    )
+    parser.add_argument(
+        "--insecure", action="store_true",
+        help="Skip SSL certificate verification (use only when certificate errors occur).",
+    )
+    return parser.parse_args(argv)
+
+
+def main(argv: list[str] | None = None) -> int:
+    args = parse_args(argv or sys.argv[1:])
+
+    try:
+        result = read_blog(
+            args.url,
+            include_images=not args.no_images,
+            max_length=args.max_length,
+            timeout=args.timeout,
+            insecure=args.insecure,
+        )
+    except (RuntimeError, ValueError) as error:
+        print(json.dumps({"error": str(error)}, ensure_ascii=False), file=sys.stderr)
+        return 1
+
+    print(json.dumps(result, ensure_ascii=False, indent=2))
+    return 0
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
--- a/naver-blog-research/scripts/naver_search.py
+++ b/naver-blog-research/scripts/naver_search.py
@ -0,0 +1,192 @@
+from __future__ import annotations
+
+import argparse
+import json
+import os
+import re
+import sys
+import time
+import urllib.parse
+import urllib.request
+from html import unescape
+
+sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
+from _naver_http import TAG_RE, urlopen
+
+SEARCH_URL = "https://search.naver.com/search.naver"
+DEFAULT_COUNT = 10
+MAX_COUNT = 30
+FIRST_PAGE_START = 1
+RESULTS_PER_PAGE = 15
+
+DEFAULT_HEADERS = {
+    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
+    "Accept-Language": "ko,en-US;q=0.9,en;q=0.8",
+    "User-Agent": (
+        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
+        "AppleWebKit/537.36 (KHTML, like Gecko) Chrome/136.0.0.0 Safari/537.36"
+    ),
+}
+
+BLOG_ANCHOR_PATTERN = re.compile(
+    r'<a[^>]*href="(https?://blog\.naver\.com/([a-zA-Z0-9_]+)/(\d+))"[^>]*>(.*?)</a>',
+    re.DOTALL,
+)
+
+
+def strip_html(text: str) -> str:
+    return unescape(TAG_RE.sub("", text)).strip()
+
+
+def build_search_params(query: str, start: int = FIRST_PAGE_START, sort: str = "sim") -> dict[str, str]:
+    return {
+        "query": query,
+        "ssc": "tab.blog.all",
+        "sm": "tab_jum" if start <= FIRST_PAGE_START else "tab_pge",
+        "start": str(start),
+        "nso": {"sim": "so:r,p:all,a:all", "date": "so:dd,p:all,a:all"}.get(sort, "so:r,p:all,a:all"),
+    }
+
+
+def fetch_search_page(query: str, start: int = 1, sort: str = "sim", timeout: int = 15, *, insecure: bool = False) -> str:
+    params = build_search_params(query, start=start, sort=sort)
+    url = f"{SEARCH_URL}?{urllib.parse.urlencode(params)}"
+    request = urllib.request.Request(url, headers=DEFAULT_HEADERS)
+
+    try:
+        with urlopen(request, timeout, insecure=insecure) as response:
+            return response.read().decode("utf-8", "ignore")
+    except urllib.error.HTTPError as error:
+        raise RuntimeError(
+            f"Naver search returned HTTP {error.code}. "
+            "The request may have been blocked. Retry later or reduce request volume."
+        ) from error
+
+
+def parse_search_results(html: str) -> list[dict]:
+    results: list[dict] = []
+    anchors = BLOG_ANCHOR_PATTERN.findall(html)
+
+    pending: dict[str, dict] = {}
+
+    for full_url, user_id, post_id, inner_html in anchors:
+        if full_url not in pending:
+            pending[full_url] = {
+                "url": full_url,
+                "mobile_url": f"https://m.blog.naver.com/{user_id}/{post_id}",
+                "author": user_id,
+                "title": "",
+                "snippet": "",
+            }
+
+        text = strip_html(inner_html)
+        if not text:
+            continue
+
+        entry = pending[full_url]
+
+        if "headline1" in inner_html or "text-type-headline" in inner_html:
+            if not entry["title"]:
+                entry["title"] = text
+        elif "body1" in inner_html or "text-type-body" in inner_html:
+            if not entry["snippet"]:
+                entry["snippet"] = text
+        else:
+            if not entry["title"]:
+                entry["title"] = text
+
+    for entry in pending.values():
+        results.append(entry)
+
+    return results
+
+
+def search(query: str, count: int = DEFAULT_COUNT, sort: str = "sim", timeout: int = 15, *, insecure: bool = False) -> dict:
+    count = max(1, min(count, MAX_COUNT))
+    all_results: list[dict] = []
+    seen_urls: set[str] = set()
+    start = FIRST_PAGE_START
+    # 네이버 검색이 페이지당 정확히 RESULTS_PER_PAGE개를 반환하지 않을 수 있으므로 여유 페이지 확보
+    max_pages = (count // RESULTS_PER_PAGE) + 3
+
+    for page_num in range(max_pages):
+        if len(all_results) >= count:
+            break
+
+        if page_num > 0:
+            time.sleep(0.5)
+
+        html = fetch_search_page(query, start=start, sort=sort, timeout=timeout, insecure=insecure)
+        page_results = parse_search_results(html)[:RESULTS_PER_PAGE]
+
+        if not page_results:
+            if start == 1:
+                print("[warn] 검색 결과 파싱 실패. 네이버 HTML 구조가 변경되었을 수 있습니다.", file=sys.stderr)
+            break
+
+        new_count = 0
+        for result in page_results:
+            if result["url"] not in seen_urls:
+                seen_urls.add(result["url"])
+                all_results.append(result)
+                new_count += 1
+                if len(all_results) >= count:
+                    break
+
+        if new_count == 0:
+            break
+
+        start += RESULTS_PER_PAGE
+
+    return {
+        "query": query,
+        "total_results": len(all_results),
+        "results": all_results,
+    }
+
+
+def parse_args(argv: list[str]) -> argparse.Namespace:
+    parser = argparse.ArgumentParser(
+        description="Search Naver blogs and return structured JSON results."
+    )
+    parser.add_argument("query", help="Search query string.")
+    parser.add_argument(
+        "--count", type=int, default=DEFAULT_COUNT,
+        help=f"Number of results to return (max {MAX_COUNT}, default {DEFAULT_COUNT}).",
+    )
+    parser.add_argument(
+        "--sort", choices=["sim", "date"], default="sim",
+        help="Sort order: sim (relevance) or date (newest first). Default: sim.",
+    )
+    parser.add_argument(
+        "--timeout", type=int, default=15,
+        help="HTTP request timeout in seconds. Default: 15.",
+    )
+    parser.add_argument(
+        "--insecure", action="store_true",
+        help="Skip SSL certificate verification (use only when certificate errors occur).",
+    )
+    return parser.parse_args(argv)
+
+
+def main(argv: list[str] | None = None) -> int:
+    args = parse_args(argv or sys.argv[1:])
+
+    try:
+        result = search(
+            args.query,
+            count=args.count,
+            sort=args.sort,
+            timeout=args.timeout,
+            insecure=args.insecure,
+        )
+    except RuntimeError as error:
+        print(json.dumps({"error": str(error)}, ensure_ascii=False), file=sys.stderr)
+        return 1
+
+    print(json.dumps(result, ensure_ascii=False, indent=2))
+    return 0
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
--- a/package.json
+++ b/package.json
@ -9,9 +9,9 @@
  ],
  "scripts": {
    "build": "npm run build --workspaces --if-present",
-    "lint": "node --check scripts/skill-docs.test.js && python3 -m py_compile scripts/fine_dust.py scripts/test_fine_dust.py scripts/ktx_booking.py scripts/test_ktx_booking.py scripts/sillok_search.py scripts/test_sillok_search.py scripts/korean_spell_check.py scripts/test_korean_spell_check.py scripts/patent_search.py scripts/test_patent_search.py && npm run lint --workspaces --if-present && ./scripts/validate-skills.sh",
+    "lint": "node --check scripts/skill-docs.test.js && python3 -m py_compile scripts/fine_dust.py scripts/test_fine_dust.py scripts/ktx_booking.py scripts/test_ktx_booking.py scripts/sillok_search.py scripts/test_sillok_search.py scripts/korean_spell_check.py scripts/test_korean_spell_check.py scripts/patent_search.py scripts/test_patent_search.py scripts/test_naver_blog_search.py naver-blog-research/scripts/_naver_http.py naver-blog-research/scripts/naver_search.py naver-blog-research/scripts/naver_read.py naver-blog-research/scripts/naver_download_images.py && npm run lint --workspaces --if-present && ./scripts/validate-skills.sh",
    "typecheck": "tsc --noEmit",
-    "test": "node --test scripts/skill-docs.test.js && PYTHONPATH=.:scripts python3 -m unittest scripts.test_fine_dust scripts.test_ktx_booking scripts.test_sillok_search scripts.test_korean_spell_check scripts.test_patent_search && npm run test --workspaces --if-present && ./scripts/validate-skills.sh",
+    "test": "node --test scripts/skill-docs.test.js && PYTHONPATH=.:scripts python3 -m unittest scripts.test_fine_dust scripts.test_ktx_booking scripts.test_sillok_search scripts.test_korean_spell_check scripts.test_patent_search scripts.test_naver_blog_search && npm run test --workspaces --if-present && ./scripts/validate-skills.sh",
    "pack:dry-run": "npm pack --workspace k-lotto --dry-run && npm pack --workspace daiso-product-search --dry-run && npm pack --workspace blue-ribbon-nearby --dry-run && npm pack --workspace kakao-bar-nearby --dry-run && npm pack --workspace cheap-gas-nearby --dry-run && npm pack --workspace kleague-results --dry-run && npm pack --workspace lck-analytics --dry-run && npm pack --workspace toss-securities --dry-run && npm pack --workspace hipass-receipt --dry-run && npm pack --workspace used-car-price-search --dry-run",
    "ci": "npm run lint && npm run typecheck && npm run test && npm run pack:dry-run",
    "version-packages": "changeset version",
--- a/scripts/test_naver_blog_search.py
+++ b/scripts/test_naver_blog_search.py
@ -0,0 +1,91 @@
+import importlib.util
+import pathlib
+import unittest
+from unittest import mock
+
+
+MODULE_PATH = pathlib.Path(__file__).resolve().parents[1] / "naver-blog-research" / "scripts" / "naver_search.py"
+MODULE_SPEC = importlib.util.spec_from_file_location("naver_search", MODULE_PATH)
+naver_search = importlib.util.module_from_spec(MODULE_SPEC)
+assert MODULE_SPEC.loader is not None
+MODULE_SPEC.loader.exec_module(naver_search)
+
+
+def make_result(index: int) -> dict[str, str]:
+    return {
+        "url": f"https://blog.naver.com/author{index}/{200000000000 + index}",
+        "mobile_url": f"https://m.blog.naver.com/author{index}/{200000000000 + index}",
+        "author": f"author{index}",
+        "title": f"title-{index}",
+        "snippet": f"snippet-{index}",
+    }
+
+
+class RequestBuilderTest(unittest.TestCase):
+    def test_build_search_params_target_blog_tab_and_switch_sm_for_paging(self):
+        page_one = naver_search.build_search_params("서울 맛집", start=1, sort="sim")
+        page_two = naver_search.build_search_params("서울 맛집", start=16, sort="date")
+
+        self.assertEqual(page_one["ssc"], "tab.blog.all")
+        self.assertEqual(page_one["sm"], "tab_jum")
+        self.assertEqual(page_one["start"], "1")
+        self.assertEqual(page_one["nso"], "so:r,p:all,a:all")
+
+        self.assertEqual(page_two["ssc"], "tab.blog.all")
+        self.assertEqual(page_two["sm"], "tab_pge")
+        self.assertEqual(page_two["start"], "16")
+        self.assertEqual(page_two["nso"], "so:dd,p:all,a:all")
+
+
+class SearchWorkflowTest(unittest.TestCase):
+    def test_search_uses_15_result_pages_and_ignores_extra_anchors_beyond_page_window(self):
+        fetch_starts: list[int] = []
+        parsed_pages = {
+            "page-1": [make_result(index) for index in range(1, 16)] + [make_result(101), make_result(102)],
+            "page-16": [make_result(index) for index in range(16, 31)] + [make_result(101), make_result(102)],
+        }
+
+        def fake_fetch(query: str, start: int = 1, sort: str = "sim", timeout: int = 15, *, insecure: bool = False) -> str:
+            self.assertEqual(query, "서울 맛집")
+            self.assertEqual(sort, "sim")
+            self.assertEqual(timeout, 15)
+            self.assertFalse(insecure)
+            fetch_starts.append(start)
+            return f"page-{start}"
+
+        def fake_parse(html: str) -> list[dict]:
+            return parsed_pages[html]
+
+        with (
+            mock.patch.object(naver_search, "fetch_search_page", side_effect=fake_fetch),
+            mock.patch.object(naver_search, "parse_search_results", side_effect=fake_parse),
+            mock.patch.object(naver_search.time, "sleep"),
+        ):
+            result = naver_search.search("서울 맛집", count=20)
+
+        self.assertEqual(fetch_starts, [1, 16])
+        self.assertEqual(result["total_results"], 20)
+        self.assertEqual(
+            [item["url"] for item in result["results"]],
+            [make_result(index)["url"] for index in range(1, 21)],
+        )
+
+    def test_search_passes_date_sort_through_to_fetcher(self):
+        captured_sorts: list[str] = []
+
+        def fake_fetch(query: str, start: int = 1, sort: str = "sim", timeout: int = 15, *, insecure: bool = False) -> str:
+            captured_sorts.append(sort)
+            return "page-1"
+
+        with (
+            mock.patch.object(naver_search, "fetch_search_page", side_effect=fake_fetch),
+            mock.patch.object(naver_search, "parse_search_results", return_value=[make_result(1)]),
+        ):
+            result = naver_search.search("서울 맛집", count=1, sort="date")
+
+        self.assertEqual(captured_sorts, ["date"])
+        self.assertEqual(result["results"][0]["url"], make_result(1)["url"])
+
+
+if __name__ == "__main__":
+    unittest.main()