Polish naver-news: preflight, link canonicalization, /health docs (#143)

Address the three non-blocking items flagged in the round 1/2 reviews. All
were explicitly deferred by the reviewer as "follow-up if the maintainer
wants" — picking them up now so the feature lands with a tighter surface.

1) Preflight 400 for start + display - 1 > 1000
   Naver's official news endpoint only exposes the first 1000 items
   (start 1..1000, display 1..100). Asking for start=1000 & display=100
   would send a request that silently returns no usable items, wasting
   an upstream quota call. Reject the combination before calling upstream
   with a 400 bad_request and a message that tells the caller which item
   the request would have needed and what the cap is. Boundary values
   (start + display - 1 === 1000) are still accepted.

2) Canonical link dedup
   The previous dedup key was link.toLowerCase(), which failed to merge
   the same article when Naver's redirect URLs differed only by query-param
   order, trailing slash, host-name casing, or fragment. Added
   canonicalizeLinkForDedup() which parses the URL, sorts search params by
   key, strips a single trailing pathname slash, drops the fragment, and
   lowercases the result — conservative on purpose so different paths or
   different query values stay as distinct articles. The visible
   items[].link value is still the original URL returned by Naver; only
   the dedup key is canonicalized.

3) Clarify the naverSearchApiConfigured vs naverNewsApiConfigured split
   The two flags currently evaluate the same boolean, but their semantic
   contracts differ: naverSearchApiConfigured reports "are the Naver
   Open API keys configured" (which is advisory for the shopping route
   since shopping has a BFF fallback), while naverNewsApiConfigured
   reports "is the news route operational end-to-end" (no fallback — 503
   when false). Hoist the shared expression into a local, and add a
   `/health 업스트림 플래그 의미` section to packages/k-skill-proxy/README.md
   documenting the split. Also update naver-news-search SKILL.md and
   docs/features/naver-news-search.md to mention the new preflight and
   the canonical-link dedup behavior.

TDD verification: added 4 new node:test cases exercising the boundary,
overflow, and URL-dedup paths; ran the full k-skill-proxy workspace
suite (202/202 pass) plus the root `npm run ci` (exit 0). Manual QA on
a proxy started from this commit reproduces every round-1 case plus the
new preflight: start=1000 & display=100 → 400 bad_request before
upstream; start=1000 & display=1 and start=901 & display=100 → 503 (or
200/401 depending on keys), confirming the boundary passes preflight.
This commit is contained in:
Jeffrey (Dongkyu) Kim 2026-04-22 13:17:51 +09:00
commit 71d577b24d
7 changed files with 260 additions and 39 deletions

View file

@ -2,4 +2,4 @@
"k-skill-proxy": minor
---
Add `/v1/naver-news/search` route plus matching `naver-news-search` skill. Proxies the official Naver Search Open API news endpoint (`openapi.naver.com/v1/search/news.json`), reuses the existing `NAVER_SEARCH_CLIENT_ID`/`NAVER_SEARCH_CLIENT_SECRET` credentials, and keeps the user-facing credential surface empty ("불필요"). Strips `<b>` highlight tags and decodes HTML entities in titles/descriptions, parses RFC822 `pubDate` into ISO-8601, deduplicates results by `link`, caches successes for 5 minutes (failures are not cached), and exposes `naverNewsApiConfigured` on `/health`. Closes #143.
Add `/v1/naver-news/search` route plus matching `naver-news-search` skill. Proxies the official Naver Search Open API news endpoint (`openapi.naver.com/v1/search/news.json`), reuses the existing `NAVER_SEARCH_CLIENT_ID`/`NAVER_SEARCH_CLIENT_SECRET` credentials, and keeps the user-facing credential surface empty ("불필요"). Strips `<b>` highlight tags and decodes HTML entities in titles/descriptions, parses RFC822 `pubDate` into ISO-8601, deduplicates results by canonicalized `link` (query-param order, trailing slash, host casing and fragments are ignored; different paths or query values are preserved), caches successes for 5 minutes (failures are not cached), and exposes `naverNewsApiConfigured` on `/health`. The route rejects `start + display - 1 > 1000` with a `400 bad_request` preflight before calling upstream, so requests outside Naver's 1000-item search window fail fast with a clear message instead of returning empty results. Closes #143.

View file

@ -38,7 +38,7 @@ description: k-skill-proxy 경유 네이버 검색 Open API 뉴스 검색으로
| --- | --- | --- | --- |
| `q` / `query` / `keyword` | string | (필수) | 검색어. 2글자 이상 |
| `display` / `limit` / `size` | int | 10 | 반환 건수. 1 ~ 100 으로 clamp |
| `start` / `offset` | int | 1 | 검색 시작 위치(1-indexed). 최대 1000 |
| `start` / `offset` | int | 1 | 검색 시작 위치(1-indexed). 최대 1000. `start + display - 1 > 1000` 이면 proxy 가 업스트림 호출 전에 `400 bad_request` 로 거절 |
| `sort` | string | `sim` | `sim`(관련도순) 또는 `date`(최신순). 그 외 값은 `sim` fallback |
## 기본 호출
@ -125,7 +125,8 @@ curl -fsS --get 'http://127.0.0.1:4020/v1/naver-news/search' \
- 사용자 요구가 "오늘", "최신" 이면 `sort=date` 로 호출하는 것이 보통 더 만족스럽다.
- `display` 가 클수록 네이버 API 쿼터를 빨리 소모한다. 기본 10 에서 벗어날 필요 없는 경우가 많다.
- `start + display` 조합이 1000 을 넘는 위치는 네이버 API 가 결과를 돌려주지 않는다. 아주 오래된 기사를 찾을 때는 검색어를 좁히는 것이 낫다.
- `start + display - 1` 이 1000 을 넘는 조합(예: `start=1000&display=100`)은 proxy가 업스트림 호출 전에 `400 bad_request`(`"start + display exceeds Naver's 1000-item search window"`) 로 거절한다. 네이버 API 는 1000번째 아이템까지만 열람 가능하므로, 더 오래된 기사를 찾을 때는 검색어를 좁히는 것이 낫다.
- 뉴스 `link` 중복 제거는 쿼리 파라미터 순서와 trailing slash 를 무시한 **canonical URL** 기준으로 수행된다(`?a=1&b=2``?b=2&a=1` 은 같은 기사로 간주). 실제 페이로드의 `link` 필드는 네이버가 돌려준 원문 그대로 노출한다.
- `pub_date` 는 RFC822 형식, `pub_date_iso` 는 UTC ISO-8601 이다. 사용자에게 보여줄 때는 KST(UTC+9) 로 변환한다.
- proxy route 는 public/read-only/no-auth 이며 5분 캐시 + 분당 60 회 rate limit 으로 남용을 막는다.
- 기사 원문 풀텍스트가 필요하면 이 스킬로는 얻을 수 없다. 사용자가 링크를 직접 방문하도록 안내한다.

View file

@ -58,7 +58,7 @@ curl -fsS --get "${KSKILL_PROXY_BASE_URL:-https://k-skill-proxy.nomadamas.org}/v
- `q` 또는 `query` — 검색어. 2글자 이상.
- `display` — 반환 건수. 기본 10, 범위 1~100.
- `start` — 검색 시작 위치(1-indexed). 기본 1, 최대 1000. `start + display` 는 네이버 API 상 최대 1000 까지만 접근 가능하다.
- `start` — 검색 시작 위치(1-indexed). 기본 1, 최대 1000. **`start + display - 1` 은 1000 을 넘을 수 없다**: 예를 들어 `start=1000 & display=100``1099`번째 아이템을 요구하므로 proxy가 업스트림 호출 전에 `400 bad_request`("start + display exceeds Naver's 1000-item search window")로 거절한다. 아주 오래된 기사를 찾으려면 검색어를 좁히는 것이 낫다.
- `sort``sim`(유사도 순, 기본값) 또는 `date`(최신순). 그 외 값은 `sim` 으로 fallback.
응답 주요 필드:
@ -91,7 +91,7 @@ curl -fsS --get "${KSKILL_PROXY_BASE_URL:-https://k-skill-proxy.nomadamas.org}/v
## Failure modes
- `400 bad_request` — 검색어 누락, 2글자 미만, 허용되지 않는 파라미터. 에러 메시지를 그대로 사용자에게 노출한다.
- `400 bad_request` — 검색어 누락, 2글자 미만, 허용되지 않는 파라미터, 혹은 `start + display - 1 > 1000` 조합(네이버 1000-item search window 초과). 에러 메시지를 그대로 사용자에게 노출한다.
- `503 upstream_not_configured` — 프록시 서버에 `NAVER_SEARCH_CLIENT_ID`/`NAVER_SEARCH_CLIENT_SECRET` 가 없는 경우. 운영자가 키를 등록해야 한다. 사용자에게는 "잠시 후 다시 시도해 주세요" 정도로 안내한다.
- `401 upstream_error` — 프록시 서버의 Client ID/Secret 이 잘못된 경우(`errorCode: 024`). 운영자가 재발급해야 한다.
- `429 upstream_error` — 네이버 검색 API 일일 쿼터(25,000 호출/일) 초과(`errorCode: 010`). 재시도 루프는 금지. 잠시 후 다시 시도하도록 안내한다.

View file

@ -28,6 +28,16 @@
- `GET /v1/lh-notice/search` — LH 청약 공고 목록(`DATA_GO_KR_API_KEY`)
- `GET /v1/lh-notice/detail` — LH 청약 공고 상세(`DATA_GO_KR_API_KEY`)
## `/health` 업스트림 플래그 의미
`/health``upstreams` 는 각 라우트의 **운영 가능 여부**를 보고하며, 같은 환경변수를 공유하는 라우트라도 **폴백 유무에 따라 의미가 달라진다**:
- `naverShoppingConfigured` — 네이버 쇼핑 라우트는 공개 BFF JSON fallback 이 있어서 **항상 `true`** 다. 키가 없어도 public BFF 경로로 응답이 나간다.
- `naverSearchApiConfigured` — 네이버 검색 Open API 키(`NAVER_SEARCH_CLIENT_ID` + `NAVER_SEARCH_CLIENT_SECRET`) 설정 여부. 네이버 쇼핑 라우트는 이 값이 `true` 면 공식 API 를 선호하고, `false` 면 BFF fallback 으로 자동 전환한다. 즉 이 플래그는 **쇼핑 쪽에서는 advisory** 다.
- `naverNewsApiConfigured` — 네이버 뉴스 라우트의 **운영 가능 여부**. 뉴스에는 fallback 이 없어서 키가 없으면 뉴스 라우트는 `503 upstream_not_configured` 를 돌려준다.
`naverSearchApiConfigured``naverNewsApiConfigured` 는 같은 환경변수에 의존하므로 현재 boolean 값은 항상 일치하지만, **의미(semantic contract)는 다르다**: 전자는 "공식 키가 있는지" 를, 후자는 "뉴스 라우트가 실제로 응답을 돌려줄 수 있는지" 를 보고한다. 향후 검색 키가 분리되거나 fallback 정책이 바뀌어도 이 두 플래그는 분리된 채 유지된다.
## 환경변수
- `AIR_KOREA_OPEN_API_KEY` — 프록시 서버 쪽 AirKorea upstream key

View file

@ -5,6 +5,7 @@ const MAX_DISPLAY = 100;
const DEFAULT_START = 1;
const MIN_START = 1;
const MAX_START = 1000;
const MAX_SEARCH_WINDOW = 1000;
const ALLOWED_SORTS = new Set(["sim", "date"]);
function parseInteger(value, fallback) {
@ -64,6 +65,22 @@ function normalizeUrl(value) {
return null;
}
function canonicalizeLinkForDedup(link) {
try {
const u = new URL(link);
const params = [...u.searchParams.entries()];
params.sort(([a], [b]) => (a < b ? -1 : a > b ? 1 : 0));
u.search = params.length ? new URLSearchParams(params).toString() : "";
if (u.pathname.length > 1 && u.pathname.endsWith("/")) {
u.pathname = u.pathname.replace(/\/+$/, "") || "/";
}
u.hash = "";
return u.toString().toLowerCase();
} catch {
return String(link).toLowerCase();
}
}
function parsePubDateIso(rfc822) {
if (!rfc822) {
return null;
@ -89,10 +106,21 @@ function normalizeNaverNewsSearchQuery(query) {
const requestedSort = trimOrNull(query.sort) || "sim";
const sort = ALLOWED_SORTS.has(requestedSort) ? requestedSort : "sim";
const display = clamp(rawDisplay, MIN_DISPLAY, MAX_DISPLAY);
const start = clamp(rawStart, MIN_START, MAX_START);
if (start + display - 1 > MAX_SEARCH_WINDOW) {
throw new Error(
`start + display exceeds Naver's ${MAX_SEARCH_WINDOW}-item search window ` +
`(start=${start} + display=${display} would fetch item ${start + display - 1}, ` +
`max accessible item is ${MAX_SEARCH_WINDOW}). Narrow the search or reduce start/display.`
);
}
return {
query: q,
display: clamp(rawDisplay, MIN_DISPLAY, MAX_DISPLAY),
start: clamp(rawStart, MIN_START, MAX_START),
display,
start,
sort
};
}
@ -127,7 +155,7 @@ function normalizeNaverNewsSearchPayload(
const pubDate = trimOrNull(item.pubDate);
const pubDateIso = parsePubDateIso(pubDate);
const dedupKey = link.toLowerCase();
const dedupKey = canonicalizeLinkForDedup(link);
if (seenLinks.has(dedupKey)) {
continue;
}

View file

@ -1276,32 +1276,35 @@ function buildServer({ env = process.env, provider = null, now = () => new Date(
}
});
app.get("/health", async () => ({
ok: true,
service: config.proxyName,
port: config.port,
upstreams: {
airKoreaConfigured: Boolean(config.airKoreaApiKey),
kmaOpenApiConfigured: Boolean(config.kmaOpenApiKey),
blueRibbonConfigured: Boolean(config.blueRibbonSessionId),
seoulOpenApiConfigured: Boolean(config.seoulOpenApiKey),
hrfcoConfigured: Boolean(config.hrfcoApiKey),
opinetConfigured: Boolean(config.opinetApiKey),
molitConfigured: Boolean(config.molitApiKey),
lhNoticeConfigured: Boolean(config.molitApiKey),
data4libraryConfigured: Boolean(config.data4libraryAuthKey),
foodsafetyKoreaConfigured: Boolean(config.foodsafetyKoreaApiKey),
neisSchoolMealConfigured: Boolean(config.keduInfoKey),
krxConfigured: Boolean(config.krxApiKey),
naverShoppingConfigured: true,
naverSearchApiConfigured: Boolean(config.naverSearchClientId && config.naverSearchClientSecret),
naverNewsApiConfigured: Boolean(config.naverSearchClientId && config.naverSearchClientSecret)
},
auth: {
tokenRequired: false
},
timestamp: new Date().toISOString()
}));
app.get("/health", async () => {
const naverSearchKeysPresent = Boolean(config.naverSearchClientId && config.naverSearchClientSecret);
return {
ok: true,
service: config.proxyName,
port: config.port,
upstreams: {
airKoreaConfigured: Boolean(config.airKoreaApiKey),
kmaOpenApiConfigured: Boolean(config.kmaOpenApiKey),
blueRibbonConfigured: Boolean(config.blueRibbonSessionId),
seoulOpenApiConfigured: Boolean(config.seoulOpenApiKey),
hrfcoConfigured: Boolean(config.hrfcoApiKey),
opinetConfigured: Boolean(config.opinetApiKey),
molitConfigured: Boolean(config.molitApiKey),
lhNoticeConfigured: Boolean(config.molitApiKey),
data4libraryConfigured: Boolean(config.data4libraryAuthKey),
foodsafetyKoreaConfigured: Boolean(config.foodsafetyKoreaApiKey),
neisSchoolMealConfigured: Boolean(config.keduInfoKey),
krxConfigured: Boolean(config.krxApiKey),
naverShoppingConfigured: true,
naverSearchApiConfigured: naverSearchKeysPresent,
naverNewsApiConfigured: naverSearchKeysPresent
},
auth: {
tokenRequired: false
},
timestamp: new Date().toISOString()
};
});
app.get("/B552584/:service/:operation", async (request, reply) => {
const { service, operation } = request.params;

View file

@ -15,10 +15,20 @@ test("normalizeNaverNewsSearchQuery validates q/query and clamps display/start/s
assert.throws(() => normalizeNaverNewsSearchQuery({ q: "a" }), /at least 2/);
assert.deepEqual(
normalizeNaverNewsSearchQuery({ query: " 인공지능 ", display: "999", start: "9999", sort: "date" }),
normalizeNaverNewsSearchQuery({ query: " 인공지능 ", display: "999", start: "1", sort: "date" }),
{
query: "인공지능",
display: 100,
start: 1,
sort: "date"
}
);
assert.deepEqual(
normalizeNaverNewsSearchQuery({ query: "주식", display: "1", start: "9999", sort: "date" }),
{
query: "주식",
display: 1,
start: 1000,
sort: "date"
}
@ -55,16 +65,60 @@ test("normalizeNaverNewsSearchQuery validates q/query and clamps display/start/s
);
});
test("normalizeNaverNewsSearchQuery aliases keyword and caps start+display at 1000 window", () => {
test("normalizeNaverNewsSearchQuery accepts keyword as an alias for q/query", () => {
assert.deepEqual(
normalizeNaverNewsSearchQuery({ keyword: "스타트업", display: "50", start: "1000" }),
normalizeNaverNewsSearchQuery({ keyword: "스타트업" }),
{
query: "스타트업",
display: 50,
start: 1000,
display: 10,
start: 1,
sort: "sim"
}
);
assert.deepEqual(
normalizeNaverNewsSearchQuery({ keyword: "반도체", display: "20", start: "5", sort: "date" }),
{
query: "반도체",
display: 20,
start: 5,
sort: "date"
}
);
});
test("normalizeNaverNewsSearchQuery rejects start+display combinations exceeding Naver's 1000-item window", () => {
// Boundary values that are still valid (start + display - 1 === 1000 or below) must pass.
assert.deepEqual(
normalizeNaverNewsSearchQuery({ q: "경계", start: "1000", display: "1" }),
{ query: "경계", display: 1, start: 1000, sort: "sim" }
);
assert.deepEqual(
normalizeNaverNewsSearchQuery({ q: "경계", start: "901", display: "100" }),
{ query: "경계", display: 100, start: 901, sort: "sim" }
);
assert.deepEqual(
normalizeNaverNewsSearchQuery({ q: "경계", start: "500", display: "100" }),
{ query: "경계", display: 100, start: 500, sort: "sim" }
);
// One past the boundary (start + display - 1 > 1000) must throw preflight 400.
assert.throws(
() => normalizeNaverNewsSearchQuery({ q: "초과", start: "902", display: "100" }),
/1000-item|window|Naver/i
);
assert.throws(
() => normalizeNaverNewsSearchQuery({ q: "초과", start: "1000", display: "2" }),
/1000-item|window|Naver/i
);
assert.throws(
() => normalizeNaverNewsSearchQuery({ q: "초과", start: "1000", display: "100" }),
/1000-item|window|Naver/i
);
assert.throws(
() => normalizeNaverNewsSearchQuery({ q: "초과", start: "950", display: "60" }),
/1000-item|window|Naver/i
);
});
test("buildNaverNewsSearchUrl constructs the official Naver Search news endpoint URL", () => {
@ -186,6 +240,99 @@ test("normalizeNaverNewsSearchPayload skips items without title or link and dedu
assert.equal(result.items[0].title, "정상 기사");
});
test("normalizeNaverNewsSearchPayload dedupes links that differ only in query-param order, trailing slash, or host casing", () => {
const result = normalizeNaverNewsSearchPayload(
{
lastBuildDate: "Mon, 22 Apr 2026 00:00:00 +0900",
total: 5,
start: 1,
display: 5,
items: [
{
title: "원본 기사",
originallink: "https://publisher.example.com/42",
link: "https://news.example.com/articles/42?a=1&b=2",
description: "본문",
pubDate: "Mon, 22 Apr 2026 00:00:00 +0900"
},
{
title: "쿼리 파라미터 순서만 다른 중복",
originallink: "https://publisher.example.com/42-b",
link: "https://news.example.com/articles/42?b=2&a=1",
description: "본문",
pubDate: "Mon, 22 Apr 2026 00:00:00 +0900"
},
{
title: "trailing slash 만 다른 중복",
originallink: "https://publisher.example.com/42-c",
link: "https://news.example.com/articles/42/?a=1&b=2",
description: "본문",
pubDate: "Mon, 22 Apr 2026 00:00:00 +0900"
},
{
title: "host 대소문자·fragment 만 다른 중복",
originallink: "https://publisher.example.com/42-d",
link: "https://NEWS.example.com/articles/42?a=1&b=2#comments",
description: "본문",
pubDate: "Mon, 22 Apr 2026 00:00:00 +0900"
},
{
title: "진짜 다른 기사",
originallink: "https://publisher.example.com/43",
link: "https://news.example.com/articles/43?a=1&b=2",
description: "본문",
pubDate: "Mon, 22 Apr 2026 00:00:00 +0900"
}
]
},
{ query: "중복", display: 10, start: 1, sort: "sim" }
);
assert.equal(result.items.length, 2);
assert.equal(result.items[0].title, "원본 기사");
assert.equal(result.items[1].title, "진짜 다른 기사");
});
test("normalizeNaverNewsSearchPayload preserves items that differ by path or by a non-redundant query param", () => {
const result = normalizeNaverNewsSearchPayload(
{
lastBuildDate: "Mon, 22 Apr 2026 00:00:00 +0900",
total: 3,
start: 1,
display: 3,
items: [
{
title: "첫 번째",
originallink: "https://publisher.example.com/1",
link: "https://news.example.com/articles/42?a=1",
description: "본문",
pubDate: "Mon, 22 Apr 2026 00:00:00 +0900"
},
{
title: "다른 쿼리 값",
originallink: "https://publisher.example.com/2",
link: "https://news.example.com/articles/42?a=2",
description: "본문",
pubDate: "Mon, 22 Apr 2026 00:00:00 +0900"
},
{
title: "다른 경로",
originallink: "https://publisher.example.com/3",
link: "https://news.example.com/articles/42/related?a=1",
description: "본문",
pubDate: "Mon, 22 Apr 2026 00:00:00 +0900"
}
]
},
{ query: "서로다른", display: 10, start: 1, sort: "sim" }
);
assert.equal(result.items.length, 3);
assert.equal(result.items[0].title, "첫 번째");
assert.equal(result.items[1].title, "다른 쿼리 값");
assert.equal(result.items[2].title, "다른 경로");
});
test("normalizeNaverNewsSearchPayload handles missing optional fields gracefully", () => {
const result = normalizeNaverNewsSearchPayload(
{
@ -248,6 +395,38 @@ test("naver news search endpoint returns 503 when proxy credentials are missing"
assert.match(body.message, /NAVER_SEARCH_CLIENT_ID/);
});
test("naver news search endpoint returns 400 preflight when start+display exceeds Naver's 1000-item window", async (t) => {
const originalFetch = global.fetch;
let fetchCount = 0;
global.fetch = async () => {
fetchCount += 1;
return new Response("{}", { status: 200, headers: { "content-type": "application/json" } });
};
const app = buildServer({
env: {
NAVER_SEARCH_CLIENT_ID: "client-id",
NAVER_SEARCH_CLIENT_SECRET: "client-secret",
KSKILL_PROXY_CACHE_TTL_MS: "60000"
}
});
t.after(async () => {
global.fetch = originalFetch;
await app.close();
});
const response = await app.inject({
method: "GET",
url: "/v1/naver-news/search?q=%EC%82%BC%EC%84%B1%EC%A0%84%EC%9E%90&start=1000&display=100"
});
assert.equal(response.statusCode, 400);
const body = response.json();
assert.equal(body.error, "bad_request");
assert.match(body.message, /1000-item|window|Naver/i);
assert.equal(fetchCount, 0, "must not call upstream when preflight fails");
});
test("naver news search endpoint proxies to official API with correct headers and params", async (t) => {
const originalFetch = global.fetch;
const fetchCalls = [];