fix(kstartup-search): implement promised client-side filter to deliver on SKILL.md L121

Live data revealed two unmet contracts in the kstartup-search helper:

1. SKILL.md L121 promised the helper re-applies supt_regin / aply_trgt /
   biz_enyy filters on the client side because K-Startup upstream ignores
   them server-side. The helper had no such logic — calling
   `--supt-regin 서울특별시 --rcrt-prgs-yn Y` returned 경북/충북/충남
   announcements as-is, silently misleading callers.

2. The upstream `supt_regin` field is stored as the short form
   (`서울`, `경기`, `충북`, ...) but every CLI example in the skill used
   the standard 광역지자체 long form (`서울특별시`), which would never
   substring-match even after a client filter was added.

Add `apply_client_filters()` that runs after `urlopen` returns. It honors
the SKILL.md contract literally: substring match per token, AND-joined
across comma-separated user values, with a 17-region (+`전국`) shortname
normalisation table so both `--supt-regin 서울특별시` and
`--supt-regin 서울` resolve to upstream's `서울`. Filtered responses
expose a new `client_filter: {fields, upstream_returned, after_filter}`
metadata block so callers can detect "first page was depleted by filter"
and page through.

Tests: 9 new ClientFilterTests + 2 normalisation tests on top of the
existing 14 (25 total, all passing).

Live smoke (against a dev proxy with DATA_GO_KR_API_KEY activated for
dataset 15125364): `--supt-regin 서울특별시 --rcrt-prgs-yn Y --per-page 10`
now returns 4 actual 서울 announcements (upstream returned 10 mixed-region
rows; client filter narrowed to 4), with detl_pg_url to k-startup.go.kr.

Confidence: high. Scope-risk: narrow — purely additive on the response
path; other endpoints (business-info / contents / statistics) pass
through unchanged.
This commit is contained in:
Jeffrey (Dongkyu) Kim 2026-05-19 00:21:21 +09:00
commit 2f68b1ab4b
3 changed files with 234 additions and 1 deletions

View file

@ -118,7 +118,12 @@ python3 scripts/run_kstartup.py announcements \
### 4. Filter on the client side for richer questions
API는 단순 필드 매칭만 지원하고, **그중 `supt_regin` 같은 일부 필터는 upstream이 서버 측에서 적용하지 않는 사례가 관측된다.** `--supt-regin 서울특별시`로 호출해도 타 지역 공고가 섞여 돌아오는 경우가 있어서, `supt_regin`·`aply_trgt`·`biz_enyy` 처럼 답변 정확도가 중요한 필드는 helper가 받은 응답 JSON을 client에서 한 번 더 거른다. `pbanc_rcpt_end_dt``YYYY-MM-DD HH:MM:SS` 문자열이라 KST 기준으로 직접 비교한다. "이번 주 마감", "30대 대상", "특정 키워드 포함" 같은 복합 조건도 client에서 마저 처리한다.
API는 단순 필드 매칭만 지원하고, **그중 `supt_regin` 같은 일부 필터는 upstream이 서버 측에서 적용하지 않는 사례가 관측된다.** `--supt-regin 서울특별시`로 호출해도 타 지역 공고가 섞여 돌아오는 경우가 있어서, `supt_regin`·`aply_trgt`·`biz_enyy` 필드는 helper가 받은 응답을 client에서 한 번 더 거른다.
- 응답 `supt_regin`은 upstream이 축약형(`서울`, `경기`, `충북`)으로 돌려준다. helper는 사용자가 `--supt-regin 서울특별시` 같은 표준 광역지자체명을 줘도 17개 광역시·도(+ `전국`) 매핑 테이블로 자동 정규화해 매치한다.
- client filter가 적용되면 응답 JSON에 `client_filter: {fields, upstream_returned, after_filter}` 블록이 함께 붙는다. `upstream_returned`는 같지만 `after_filter`가 작으면 첫 페이지로는 부족하니 `--page`를 늘려 추가 페이지를 받는다.
- 쉼표로 여러 값을 주면 AND 매치다 (`--aply-trgt 예비창업자,1년미만` → 두 토큰 모두 row에 있어야 통과).
- `pbanc_rcpt_end_dt``YYYYMMDD` 문자열이라 KST 기준으로 직접 비교한다. "이번 주 마감", "30대 대상", "특정 키워드 포함" 같은 복합 조건은 helper가 안 거르므로 응답 JSON에서 agent가 직접 처리한다.
### 5. Cite the source

View file

@ -49,6 +49,44 @@ OPERATIONS: Dict[str, Dict[str, Any]] = {
YN_FIELDS = {"intg_pbanc_yn", "rcrt_prgs_yn"}
DATE_FIELDS = {"pbanc_rcpt_bgng_dt", "pbanc_rcpt_end_dt"}
# Fields where the K-Startup upstream is observed to ignore the server-side
# filter and return non-matching rows. SKILL.md L121 promises that the helper
# re-applies these filters on the client side after receiving the response.
#
# - supt_regin: upstream returns mixed regions even when supt_regin is set.
# - aply_trgt: upstream returns rows whose aply_trgt does not contain the
# requested target (e.g. asking for "예비창업자" returns rows
# with only "일반인,일반기업").
# - biz_enyy: upstream returns rows whose biz_enyy does not include the
# requested founding period bucket.
#
# Matching policy: substring match against the comma-separated list inside
# each row's field. Multiple requested values (comma-separated by the user)
# are AND-joined: every requested token must appear somewhere in the row.
# This mirrors how the K-Startup web UI narrows results.
CLIENT_FILTER_FIELDS = {"supt_regin", "aply_trgt", "biz_enyy"}
REGION_SHORTNAME = {
"서울특별시": "서울", "서울시": "서울", "서울": "서울",
"부산광역시": "부산", "부산시": "부산", "부산": "부산",
"대구광역시": "대구", "대구시": "대구", "대구": "대구",
"인천광역시": "인천", "인천시": "인천", "인천": "인천",
"광주광역시": "광주", "광주시": "광주", "광주": "광주",
"대전광역시": "대전", "대전시": "대전", "대전": "대전",
"울산광역시": "울산", "울산시": "울산", "울산": "울산",
"세종특별자치시": "세종", "세종시": "세종", "세종": "세종",
"경기도": "경기", "경기": "경기",
"강원특별자치도": "강원", "강원도": "강원", "강원": "강원",
"충청북도": "충북", "충북": "충북",
"충청남도": "충남", "충남": "충남",
"전북특별자치도": "전북", "전라북도": "전북", "전북": "전북",
"전라남도": "전남", "전남": "전남",
"경상북도": "경북", "경북": "경북",
"경상남도": "경남", "경남": "경남",
"제주특별자치도": "제주", "제주도": "제주", "제주": "제주",
"전국": "전국",
}
class HelperError(RuntimeError):
"""User-facing CLI error."""
@ -183,6 +221,65 @@ def http_get(url: str, *, timeout: int) -> Tuple[int, str, str]:
raise HelperError(f"network error: {exc.reason}") from exc
def _normalise_filter_token(field: str, token: str) -> str:
if field == "supt_regin":
return REGION_SHORTNAME.get(token, token)
return token
def _row_matches_token(row: Dict[str, Any], field: str, token: str) -> bool:
raw = row.get(field)
if raw is None:
return False
haystack = str(raw)
needle = _normalise_filter_token(field, token)
return needle in haystack
def _row_matches_field(row: Dict[str, Any], field: str, requested: str) -> bool:
tokens = [t.strip() for t in requested.split(",") if t.strip()]
if not tokens:
return True
return all(_row_matches_token(row, field, token) for token in tokens)
def apply_client_filters(
payload: Dict[str, Any],
args: argparse.Namespace,
operation: str,
) -> Dict[str, Any]:
if operation != "announcements":
return payload
requested: Dict[str, str] = {}
for field in CLIENT_FILTER_FIELDS:
value = getattr(args, field, None)
if value is None:
continue
text = str(value).strip()
if text:
requested[field] = text
if not requested:
return payload
data = payload.get("data")
if not isinstance(data, list):
return payload
upstream_count = len(data)
filtered = [
row for row in data
if isinstance(row, dict)
and all(_row_matches_field(row, field, value) for field, value in requested.items())
]
payload["data"] = filtered
payload["currentCount"] = len(filtered)
payload["client_filter"] = {
"fields": requested,
"upstream_returned": upstream_count,
"after_filter": len(filtered),
"note": "Applied after upstream response because K-Startup ignores some server-side filters.",
}
return payload
def summarise(operation: str, payload: Dict[str, Any]) -> str:
items: Iterable[Dict[str, Any]] = []
if isinstance(payload, dict):
@ -297,6 +394,8 @@ def run(argv: Optional[List[str]] = None) -> int:
payload = {"raw": payload}
payload.setdefault("query", query)
payload = apply_client_filters(payload, args, operation)
if args.text:
print(summarise(operation, payload))
else:

View file

@ -186,5 +186,134 @@ class DryRunIntegrationTests(unittest.TestCase):
self.assertNotIn("super-secret", payload["url"])
class ClientFilterTests(unittest.TestCase):
@staticmethod
def _payload(rows):
return {
"currentCount": len(rows),
"data": list(rows),
"totalCount": 999,
"page": 1,
"perPage": len(rows),
}
def test_supt_regin_drops_other_regions(self):
payload = self._payload([
{"biz_pbanc_nm": "서울 청년창업", "supt_regin": "서울"},
{"biz_pbanc_nm": "경북 모집", "supt_regin": "경북"},
{"biz_pbanc_nm": "충북 K-바이오", "supt_regin": "충북"},
])
args = make_args("announcements", supt_regin="서울특별시")
result = run_kstartup.apply_client_filters(payload, args, "announcements")
self.assertEqual(result["currentCount"], 1)
self.assertEqual(result["data"][0]["biz_pbanc_nm"], "서울 청년창업")
self.assertEqual(result["client_filter"]["upstream_returned"], 3)
self.assertEqual(result["client_filter"]["after_filter"], 1)
self.assertEqual(result["client_filter"]["fields"]["supt_regin"], "서울특별시")
def test_supt_regin_normalises_long_official_names(self):
rows = [
("서울특별시", "서울"),
("부산광역시", "부산"),
("경기도", "경기"),
("강원특별자치도", "강원"),
("전북특별자치도", "전북"),
("제주특별자치도", "제주"),
("세종특별자치시", "세종"),
]
for long_name, short_name in rows:
payload = self._payload([
{"biz_pbanc_nm": "match", "supt_regin": short_name},
{"biz_pbanc_nm": "other", "supt_regin": "전국"},
])
args = make_args("announcements", supt_regin=long_name)
result = run_kstartup.apply_client_filters(payload, args, "announcements")
self.assertEqual(
[row["biz_pbanc_nm"] for row in result["data"]],
["match"],
f"long name {long_name!r} should match upstream short form {short_name!r}",
)
def test_supt_regin_short_form_also_works(self):
payload = self._payload([
{"biz_pbanc_nm": "match", "supt_regin": "서울"},
{"biz_pbanc_nm": "other", "supt_regin": "경기"},
])
args = make_args("announcements", supt_regin="서울")
result = run_kstartup.apply_client_filters(payload, args, "announcements")
self.assertEqual([row["biz_pbanc_nm"] for row in result["data"]], ["match"])
def test_supt_regin_handles_nationwide_rows_explicitly(self):
payload = self._payload([
{"biz_pbanc_nm": "전국 공모", "supt_regin": "전국"},
{"biz_pbanc_nm": "서울 공모", "supt_regin": "서울특별시"},
])
args = make_args("announcements", supt_regin="서울특별시")
result = run_kstartup.apply_client_filters(payload, args, "announcements")
self.assertEqual([row["biz_pbanc_nm"] for row in result["data"]], ["서울 공모"])
def test_aply_trgt_substring_match_in_comma_list(self):
payload = self._payload([
{"biz_pbanc_nm": "예비창업자 대상", "aply_trgt": "일반인,일반기업,예비창업자"},
{"biz_pbanc_nm": "일반 대상", "aply_trgt": "일반인,일반기업"},
])
args = make_args("announcements", aply_trgt="예비창업자")
result = run_kstartup.apply_client_filters(payload, args, "announcements")
self.assertEqual(len(result["data"]), 1)
self.assertEqual(result["data"][0]["biz_pbanc_nm"], "예비창업자 대상")
def test_multiple_filters_are_anded(self):
payload = self._payload([
{"biz_pbanc_nm": "ok", "supt_regin": "서울특별시", "aply_trgt": "예비창업자"},
{"biz_pbanc_nm": "wrong-region", "supt_regin": "경기도", "aply_trgt": "예비창업자"},
{"biz_pbanc_nm": "wrong-target", "supt_regin": "서울특별시", "aply_trgt": "일반인"},
])
args = make_args(
"announcements",
supt_regin="서울특별시",
aply_trgt="예비창업자",
)
result = run_kstartup.apply_client_filters(payload, args, "announcements")
self.assertEqual([row["biz_pbanc_nm"] for row in result["data"]], ["ok"])
def test_comma_separated_request_requires_all_tokens(self):
payload = self._payload([
{"biz_pbanc_nm": "match-all", "biz_enyy": "예비창업자,1년미만,2년미만"},
{"biz_pbanc_nm": "missing-one", "biz_enyy": "예비창업자"},
])
args = make_args("announcements", biz_enyy="예비창업자,1년미만")
result = run_kstartup.apply_client_filters(payload, args, "announcements")
self.assertEqual([row["biz_pbanc_nm"] for row in result["data"]], ["match-all"])
def test_no_client_filter_args_is_passthrough(self):
payload = self._payload([{"biz_pbanc_nm": "x", "supt_regin": "전국"}])
args = make_args("announcements")
result = run_kstartup.apply_client_filters(payload, args, "announcements")
self.assertEqual(result["currentCount"], 1)
self.assertNotIn("client_filter", result)
def test_non_announcements_operations_are_passthrough(self):
payload = self._payload([{"titl_nm": "공모전 공지"}])
args = make_args("contents")
result = run_kstartup.apply_client_filters(payload, args, "contents")
self.assertEqual(result["currentCount"], 1)
self.assertNotIn("client_filter", result)
def test_empty_filter_value_is_treated_as_unset(self):
payload = self._payload([{"supt_regin": "경기도"}])
args = make_args("announcements", supt_regin=" ")
result = run_kstartup.apply_client_filters(payload, args, "announcements")
self.assertNotIn("client_filter", result)
def test_missing_field_in_row_is_not_matched(self):
payload = self._payload([
{"biz_pbanc_nm": "has-field", "supt_regin": "서울특별시"},
{"biz_pbanc_nm": "no-field"},
])
args = make_args("announcements", supt_regin="서울특별시")
result = run_kstartup.apply_client_filters(payload, args, "announcements")
self.assertEqual([row["biz_pbanc_nm"] for row in result["data"]], ["has-field"])
if __name__ == "__main__":
unittest.main()