k-skill/hwp/SKILL.md
Jeffrey (Dongkyu) Kim 68e6829052
Sync dev → main: scholarship, public restroom, KBL, Hola Poke + HWP/stock proxy upgrades (#136)
* Add a guided Hola Poke Yeoksam skill without widening repo scope

Issue #120 only needs a repository skill payload, discoverability docs,
and regression coverage. This change adds the new skill, wires it into
existing docs surfaces, and locks the remote-MCP-only contract in tests
so future edits keep the phone-only event flow and verbatim message
relay behavior.

Constraint: The upstream Hola Poke flow lives on a remote MCP server, so this repo should not add proxy/runtime code
Constraint: Tests must be written before refining the new docs/skill wording
Rejected: Add local package or proxy support for Hola Poke | would over-scope a docs-only skill addition
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep this skill limited to 올라포케 역삼점 and treat the MCP response message as the event source of truth
Tested: node --test scripts/skill-docs.test.js --test-name-pattern='hola-poke-yeoksam'
Tested: npm run ci
Tested: Live MCP initialize/tools/list/get_menu/get_shop_info/enter_event(phone_format) smoke checks against https://hola-poke-yeoksam-skill.onrender.com/mcp
Not-tested: Successful live event entry with a real phone number

* Help users find nearby public restrooms from Korean location queries

This adds a new public-restroom-nearby skill and reusable package that resolves a user-provided location, narrows the official 공중화장실정보 dataset by region when possible, and ranks nearby restroom results with opening-time hints and map links.

Constraint: Must use free official/open surfaces without introducing new dependencies
Constraint: Must follow TDD and keep release/docs metadata aligned in the same change
Rejected: Add a proxy route first | direct official CSV access already works and keeps scope narrower
Rejected: Use nationwide-only ranking without regional narrowing | too much noisy data for dense urban anchors
Confidence: high
Scope-risk: moderate
Reversibility: clean
Directive: If Kakao place-panel or localdata CSV schema changes, update parser fixtures before broad logic changes
Tested: npm run ci; live smoke via searchNearbyPublicRestroomsByLocationQuery('광화문', { limit: 3 }); architect review APPROVED
Not-tested: Non-Seoul live smoke across every regional orgCode

* Pin the Hola Poke MCP contract in repo-owned regression fixtures

The earlier issue #120 regression only matched prose, so this follow-up records the verified remote MCP tool/result snapshot in a checked-in fixture and makes both docs surfaces byte-align to it. That keeps the discoverability docs honest while turning the review claim into a real contract lock for tools/list, get_menu, get_shop_info, and the invalid-phone event flow.

Constraint: The upstream remote MCP server can change independently of this repo
Rejected: Keep prose-only regex checks | would not catch contract drift
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Refresh the fixture, both JSON fences, and the live-smoke evidence together whenever the upstream contract changes
Tested: node --test scripts/skill-docs.test.js --test-name-pattern='hola-poke-yeoksam'; npm run ci; live MCP smoke check against https://hola-poke-yeoksam-skill.onrender.com/mcp (initialize, tools/list, get_menu, get_shop_info, invalid enter_event)
Not-tested: Successful enter_event with a real phone number (intentionally avoided to prevent live event participation)

* Keep nearby restroom lookups resilient to flaky Kakao place panels

The review caught two regressions in the new public-restroom-nearby package: a single broken Kakao panel aborted anchor resolution, and coordinate search dropped maxDistanceMeters before normalization. This change adds targeted regression coverage first, keeps per-candidate HTTP failures recoverable, and hardens request errors with explicit status/url metadata so fallback logic no longer depends on parsing error strings.

Constraint: Must preserve the published package surface and keep the fix scoped to PR #123 follow-up
Rejected: Swallow all panel errors | would hide non-HTTP failures like network faults
Rejected: Parse request error messages for status codes | brittle coupling to string formatting
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep recoverable Kakao panel handling aligned with request() error annotations if request() changes again
Tested: npm test --workspace public-restroom-nearby
Tested: npm run ci
Tested: live smoke searchNearbyPublicRestroomsByLocationQuery('광화문', { limit: 3 })
Tested: LSP diagnostics on packages/public-restroom-nearby/src/index.js and test/index.test.js
Not-tested: Live Kakao fallback against a real upstream 5xx place-panel response

* Keep the Hola Poke contract claims aligned with verified coverage

The reviewed fixture-based regression already locks the documented remote
snapshot, but the docs still implied the enter_event success path had
live proof. Narrow the docs and the regression so they explicitly say the
success fields are pinned by the recorded snapshot while the live smoke
only verifies the invalid-phone retry path.

Constraint: Live success-path verification would trigger a real event entry and is intentionally avoided
Rejected: Leave the broader wording in place | review feedback showed it overstated the live evidence
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: If a safe non-mutating success-path probe becomes available, update the docs and fixture wording together
Tested: node --test scripts/skill-docs.test.js --test-name-pattern='hola-poke-yeoksam'; npm run ci; live MCP smoke against https://hola-poke-yeoksam-skill.onrender.com/mcp (initialize, tools/list, get_menu subset, get_shop_info subset, invalid enter_event)
Not-tested: Real enter_event success-path invocation

* Document the restroom distance-cap contract with regression coverage

The approved issue-117 code fix already restored maxDistanceMeters behavior, but the published docs did not lock or explain that contract. This follow-up adds a failing-first doc regression, then updates the feature guide and package README with the verified 100m example so users and future reviewers see the same behavior the package now ships.

Constraint: Must stay scoped to the existing PR #123 follow-up without reopening the implementation surface
Rejected: Leave the behavior implicit in code/tests only | published docs would lag the verified contract
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep the public-restroom-nearby docs and skill-docs regression aligned with live maxDistanceMeters smoke evidence if the sample query changes
Tested: node --test scripts/skill-docs.test.js (red then green)
Tested: npm test --workspace public-restroom-nearby
Tested: npm run ci
Tested: live smoke searchNearbyPublicRestroomsByLocationQuery('광화문', { limit: 3 })
Tested: live smoke searchNearbyPublicRestroomsByLocationQuery('광화문', { limit: 3, maxDistanceMeters: 100 })
Tested: architect review APPROVED
Not-tested: Alternative landmark queries with a non-zero maxDistanceMeters hit set

* Expose KRX partial failures instead of misreporting stock lookups

The Korean stock proxy used to silently drop failed market snapshots during
search and could turn an empty holiday trade snapshot into a 502 by falling
back into base-info lookup.

This change surfaces degraded market metadata on partial search success,
short-circuits empty trade snapshots to not_found, and refreshes the user
docs to use a real trading day in examples.

Constraint: KOSPI base-info approval is granted separately from other KRX routes
Constraint: Healthy markets should still return usable search results during a partial outage
Rejected: Return 502 on every partial search failure | hides still-usable markets and breaks current clients unnecessarily
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep degraded search metadata when any market snapshot fetch fails so partial outages stay visible
Tested: npm test --workspace k-skill-proxy
Tested: node --test scripts/skill-docs.test.js
Tested: npm run ci
Not-tested: Live KOSPI base-info behavior after the new KRX permission is approved

* Adopt kordoc for the hwp skill workflow

Issue #119 replaces the previous HWP guidance with kordoc so the skill matches the newer agent-native document flow. The docs and regression tests now center the HWP skill on kordoc parsing, JSON extraction, diffing, form filling, and Markdown-to-HWPX round-tripping, while the install/source references stay in sync.

Constraint: The repository treats skill behavior as documentation contracts backed by regression tests
Constraint: The requested branch/PR flow must target dev with TDD and verified execution evidence
Rejected: Keep @ohah/hwpjs or hwp-mcp as fallback guidance | issue #119 explicitly approves replacing the prior stack with kordoc
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep future hwp skill/docs/tests aligned to a single kordoc-first contract unless a new issue explicitly reintroduces multi-backend routing
Tested: node --test scripts/skill-docs.test.js; npm run ci; temp-dir kordoc roundtrip via markdownToHwpx -> sample.hwpx -> kordoc CLI markdown output; architect review APPROVED
Not-tested: Live parsing of user-provided proprietary HWP/HWPX samples outside the generated roundtrip fixture

* Prevent degraded stock search outages from sticking in cache

Reviewer feedback showed that partial KRX market failures could be cached as full search answers, masking recovery on the next identical request. This change adds a regression that fails first, skips route-level caching for degraded search payloads, and keeps the trade-info empty-snapshot contract documented alongside the partial-failure response semantics.

Constraint: Existing PR #124 already targets dev and must remain the follow-up lane for issue #99
Constraint: Proxy behavior must stay read-only and dependency-free
Rejected: Cache degraded search payloads for a short TTL | still risks transient false negatives during the TTL window
Rejected: Broaden trade-info fallback behavior | empty snapshots should stay explicit not_found results
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep degraded search responses out of the long-lived route cache unless a future design adds explicit revalidation semantics
Tested: npm test --workspace k-skill-proxy; node --test scripts/skill-docs.test.js; npm run ci; explicit buildServer degraded-search recovery repro
Not-tested: Live KRX production endpoints from this branch

* Align HWP docs with the published kordoc surface

The issue #119 follow-up needs the repository contract to match what the
currently published kordoc package actually supports. This narrows the
HWP skill/docs/tests to the verified install requirement and supported
CLI/Node API surfaces, and removes unsupported fill/mcp claims.

Constraint: Published kordoc CLI fails at startup without pdfjs-dist
Constraint: Docs/tests must reflect the current npm package behavior, not intended future features
Rejected: Keep fill/mcp examples with caveats | still documents unsupported entrypoints
Confidence: high
Scope-risk: narrow
Directive: Reintroduce fill/mcp docs only after verifying the published package exposes them in both CLI and Node API
Tested: node --test scripts/skill-docs.test.js; npm run ci; temp-dir clean install smoke; temp-dir kordoc+pdfjs-dist watch/parse/extractFormFields/compare/markdownToHwpx/roundtrip smoke; Claude architect review
Not-tested: Real-world HWPX template that produces non-empty extractFormFields output

* Keep HWP docs runnable against the published kordoc package

The follow-up closes the last runnable-contract gaps from review by documenting the working one-shot npx form and separating Node API examples into a local project install path. The regression suite now locks both install notes so future edits do not drift back to broken command shapes.

Constraint: Published kordoc CLI still requires pdfjs-dist at startup
Constraint: Global NODE_PATH does not make ESM imports from kordoc resolvable in the documented examples
Rejected: Keep bare `npx kordoc` examples | fails in a clean environment
Rejected: Keep global-install Node API guidance | ESM import remains unresolved
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep HWP docs aligned to verified published kordoc surfaces until the package contract changes upstream
Tested: node --test scripts/skill-docs.test.js
Tested: npm run ci
Tested: temp-dir local npm install kordoc pdfjs-dist plus markdownToHwpx -> sample.hwpx -> one-shot kordoc roundtrip smoke
Not-tested: upstream unpublished kordoc features beyond the verified CLI and Node API surfaces

* Add Korean scholarship search skill and reporting workflow (#116)

* Add nationwide scholarship search skill workflow

* Rename scholarship skill to 장학금 주세요 쮜에발

* Fix scholarship skill validation in CI

* Trigger GitHub PR diff refresh after dev rebase on main

* Fix scholarship helper status handling and test coverage

* Use KST as scholarship helper default date basis

* Rename scholarship skill display name

---------

Co-authored-by: Jeffrey (Dongkyu) Kim <vkehfdl1@gmail.com>

* Feature/#121 (#127)

* Recover KakaoTalk mac skill auth when upstream user_id detection fails

Issue #121 reproduces on a real MacBook because `kakaocli auth` can fail even when the encrypted hex-named DB exists. This change adds a thin repo-owned helper that recovers the active user_id from plist revision hashes, caches the validated DB/key tuple, and reuses it for read-only `kakaocli` commands. The skill and feature docs now steer users to the helper when upstream auto-detection stops at candidate key mismatch, and regression tests lock the recovery flow before the implementation.

Constraint: Must stay a thin adapter around upstream kakaocli rather than forking the CLI
Constraint: Must verify on a real local macOS KakaoTalk install where issue #121 reproduces
Rejected: Full kakaocli reimplementation inside k-skill | too broad for the user_id/key-derivation failure scope
Rejected: Docs-only workaround | does not actually fix the broken auth path for users
Confidence: high
Scope-risk: moderate
Reversibility: clean
Directive: Keep this helper limited to auth/key recovery and read-only passthrough unless upstream gaps widen materially
Tested: python3 -m unittest scripts.test_kakaotalk_mac
Tested: node --test scripts/skill-docs.test.js
Tested: npm run ci
Tested: python3 scripts/kakaotalk_mac.py auth --refresh --max-user-id 800000000 --workers 8 --chunk-size 2000000
Tested: python3 scripts/kakaotalk_mac.py chats --limit 1 --json
Not-tested: Other kakaocli subcommands beyond auth/chats/messages/search/query/schema

* Protect the KakaoTalk helper's safe recovery path

Address the PR follow-up by treating malformed auth cache files as cache misses,
removing write-capable passthrough from the wrapper surface, and redacting
human-readable auth output so the cached SQLCipher key is not echoed back into
terminal history. The docs and regression suite now describe and enforce the
read-only contract that the helper is meant to preserve.

Constraint: Helper must remain a read-only recovery wrapper around local kakaocli access
Rejected: Keep query support with SQL validation | still leaves a risky write-capable escape hatch
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Do not re-expose arbitrary SQL passthrough or print the SQLCipher key in default text output
Tested: python3 -m unittest scripts.test_kakaotalk_mac; node --test scripts/skill-docs.test.js; npm run ci; python3 scripts/kakaotalk_mac.py auth --refresh --max-user-id 800000000 --workers 8 --chunk-size 2000000; python3 scripts/kakaotalk_mac.py chats --limit 1 --json; python3 scripts/kakaotalk_mac.py auth --cache-path <bad-json>; python3 scripts/kakaotalk_mac.py query --help
Not-tested: External automation consumers that depend on shell/json auth output beyond the documented helper flows

* Lock the helper CLI surface against accidental regressions

The approved issue #121 fixes already hardened the KakaoTalk Mac helper, but the test suite still only exercised the passthrough validator directly. Add an explicit parser-level regression so the public CLI contract stays read-only and `query` cannot quietly reappear in future edits.

Constraint: Follow-up is on the existing feature/#121 PR branch and must stay minimal
Rejected: Re-open helper implementation changes | current code already satisfies the approved review findings
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep parser exposure tests aligned with READ_ONLY_COMMANDS whenever helper subcommands change
Tested: python3 -m unittest scripts.test_kakaotalk_mac; node --test scripts/skill-docs.test.js; npm run ci; python3 scripts/kakaotalk_mac.py auth --refresh --max-user-id 800000000 --workers 8 --chunk-size 2000000; python3 scripts/kakaotalk_mac.py chats --limit 1 --json; python3 scripts/kakaotalk_mac.py auth --cache-path <bad-json>
Not-tested: No new production code paths changed in this follow-up

* Honor explicit Kakao auth recovery overrides

The helper now treats manual auth overrides as a cache-bypassing recovery request and rejects invalid brute-force tuning flags at the CLI boundary so users get deterministic behavior instead of stale cached tuples or Python tracebacks. Regression coverage locks both paths before the PR follow-up lands.

Constraint: The helper must remain a thin read-only wrapper around kakaocli auth recovery
Rejected: Require --refresh whenever --user-id/--uuid is passed | worse UX than honoring overrides directly
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep explicit auth overrides ahead of cache reuse unless the CLI contract is redesigned and documented
Tested: python3 -m unittest scripts.test_kakaotalk_mac; node --test scripts/skill-docs.test.js; npm run ci; python3 scripts/kakaotalk_mac.py auth --refresh --max-user-id 800000000 --workers 8 --chunk-size 2000000; python3 scripts/kakaotalk_mac.py chats --limit 1 --json; python3 scripts/kakaotalk_mac.py auth --cache-path <bad-json>; python3 scripts/kakaotalk_mac.py auth --refresh --max-user-id -1; python3 scripts/kakaotalk_mac.py auth --refresh --workers 2 --chunk-size 0 --max-user-id 10; python3 scripts/kakaotalk_mac.py auth --cache-path <temp-cache> --user-id 999; python3 scripts/kakaotalk_mac.py auth --cache-path <temp-cache> --uuid <live-uuid>
Not-tested: Manual override success with a truly alternate valid user_id/uuid pair on a multi-account local install

* Feature/#129 (#131)

* Add official KBL results support so basketball queries use live league data

Issue #129 needs a read-only skill and reusable package for KBL schedules, results, and standings. The implementation follows the existing sports package pattern and uses the league's live JSON APIs after verifying they respond successfully in real requests.

Constraint: Must use official KBL JSON surfaces before considering scraping
Constraint: Packaging changes must pass npm run ci and include docs plus Changesets updates
Rejected: Browser scraping first | official api.kbl.or.kr endpoints are live and simpler to maintain
Rejected: Reuse KBO/K League package shapes verbatim | KBL payload and team/status fields differ materially
Confidence: high
Scope-risk: moderate
Reversibility: clean
Directive: Keep seasonGrade=1 as the default KBL path unless future docs/tests explicitly widen to D-League flows
Tested: npm run ci; npm run lint --workspace kbl-results; npm test --workspace kbl-results; live getKBLSummary("2026-04-01", { team: "KCC", includeStandings: true })
Not-tested: Historical standings snapshots for past seasons via alternative KBL endpoints

* Prevent optional standings lookups from over-fetching the KBL API

The new kbl-results summary helper exposes includeStandings=false, so the
regression suite now proves that path stays schedule-only and never calls
the standings endpoint when the caller opts out.

Constraint: The KBL package should preserve the caller's no-standings contract
Rejected: Rely on manual inspection of the helper options | a targeted test is cheaper and safer
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep includeStandings=false side-effect free unless the public API contract changes explicitly
Tested: npm test --workspace kbl-results; npm run lint --workspace kbl-results
Not-tested: Full-repo CI before stacking this commit onto the rebased branch

---------

Co-authored-by: minsing-jin <ironman0722@naver.com>
2026-04-18 11:50:47 +09:00

7 KiB

name description license metadata
hwp Use kordoc for agent-native HWP/HWPX document parsing, JSON extraction, diffing, form-field extraction, and Markdown→HWPX reverse conversion. MIT
category locale phase
documents ko-KR v1

HWP

What this skill does

kordoc으로 .hwp / .hwpx / .hwpml 문서를 AI가 읽기 좋은 Markdown 또는 JSON으로 바꾸고, 필요하면 문서 비교, 양식 필드 추출, Markdown→HWPX 역변환까지 수행한다.

이 스킬의 기본 엔진은 항상 kordoc 이다. 문서 변환, 비교, 필드 추출, 역변환까지 같은 도구로 일관되게 처리한다.

When to use

  • "이 HWP 파일을 Markdown으로 바꿔줘"
  • "공문서를 JSON 구조로 뽑아서 AI가 읽게 해줘"
  • "두 버전 문서 차이점을 보고 싶어"
  • "신청서 HWPX 안에 어떤 양식 필드가 있는지 뽑아줘"
  • "AI가 만든 Markdown을 다시 HWPX로 저장해줘"
  • "폴더 안 문서를 한 번에 변환해줘"

When not to use

  • OCR이 필수인데 OCR provider 연결이 전혀 없는 이미지 기반 PDF만 있는 경우
  • .docx, .xlsx, .pdf 만 다루더라도 문서 파싱 자체가 아니라 편집기 GUI 자동화가 필요한 경우
  • 원본 프로그램의 실시간 UI 제어가 반드시 필요한 경우

Prerequisites

  • Node.js 18+
  • 출력 경로 쓰기 권한
  • kordocpdfjs-dist를 같은 전역/로컬 환경에 설치했거나, 둘 다 포함된 npx --yes --package kordoc --package pdfjs-dist kordoc ... 실행 환경
  • 현재 배포된 kordoc CLI는 시작 시 pdfjs-dist를 바로 로드하므로 PDF를 안 써도 함께 설치해야 한다

Inputs

  • 원본 .hwp, .hwpx, .hwpml 파일 경로 또는 폴더/글롭 경로
  • 원하는 결과 형태: markdown, json, hwpx
  • 출력 파일/디렉터리 경로
  • 페이지 범위 지정 여부
  • 비교 / 양식 필드 추출 / 역변환 여부

Routing policy

Default: kordoc

다음 작업은 모두 기본적으로 kordoc으로 처리한다.

  • HWP/HWPX/HWPML → Markdown
  • HWP/HWPX/HWPML → JSON (blocks, metadata)
  • 배치 변환
  • 페이지 범위 파싱
  • 이미지/표/양식이 포함된 공문서 구조 추출
  • 디렉터리 감시 변환 (watch)
  • Markdown→HWPX 역변환
  • HWPX 양식 필드 추출

Optional library path

CLI만으로 부족하면 Node API를 사용한다.

  • parse() — Markdown + 구조화 블록
  • compare() — 신구 문서 비교
  • extractFormFields() — 파싱된 블록에서 양식 필드 추출
  • markdownToHwpx() — Markdown→HWPX 역변환

Workflow

1. Prepare the CLI runtime

일회성 변환이면 둘 다 포함한 npx 형태를 바로 쓴다.

npx --yes --package kordoc --package pdfjs-dist kordoc --help

반복 실행용 전역 설치가 필요하면:

npm install -g kordoc pdfjs-dist

현재 배포된 kordoc CLI는 pdfjs-dist가 없으면 kordoc --help 단계부터 실패하므로 깨끗한 환경에서는 두 패키지를 같이 설치한 뒤 실행한다.

2. Prepare a local project for Node API examples

parse(), compare(), extractFormFields(), markdownToHwpx() 같은 ESM 예시는 전역 NODE_PATH가 아니라 로컬 프로젝트 설치 기준으로 실행한다.

mkdir -p ./kordoc-local && cd ./kordoc-local
npm init -y
npm install kordoc pdfjs-dist

이미 package.json이 있는 작업 디렉터리라면 npm install kordoc pdfjs-dist만 추가로 실행하면 된다.

3. Convert a document to Markdown

npx --yes --package kordoc --package pdfjs-dist kordoc 보고서.hwp -o 보고서.md

여러 문서를 한 번에 처리하려면:

npx --yes --package kordoc --package pdfjs-dist kordoc ./문서함/* -d ./변환결과

특정 페이지 범위만 읽고 싶으면:

npx --yes --package kordoc --package pdfjs-dist kordoc 보고서.hwp --pages 1-3

4. Extract structured JSON for AI/automation

npx --yes --package kordoc --package pdfjs-dist kordoc 검토서.hwpx --format json > 검토서.json

JSON 결과에서는 success, markdown, blocks, metadata를 우선 확인한다. 표나 이미지가 중요하면 blocks 안의 table, image 타입을 확인한다.

5. Inspect HWPX form fields from parsed blocks

node --input-type=module - <<'EOF'
import { parse, extractFormFields } from "kordoc";

const result = await parse("신청서.hwpx");
if (!result.success) {
  console.error(result.error);
  process.exit(1);
}

const fields = extractFormFields(result.blocks);
console.log(JSON.stringify(fields, null, 2));
EOF

자동 변환이 계속 들어오는 폴더면 CLI의 watch 명령을 쓴다.

npx --yes --package kordoc --package pdfjs-dist kordoc watch ./문서함

6. Reverse-convert Markdown back to HWPX

node --input-type=module - <<'EOF'
import { markdownToHwpx } from "kordoc";
import { writeFileSync } from "node:fs";

const hwpx = await markdownToHwpx("# 제목\n\n본문\n\n| 항목 | 값 |\n| --- | --- |\n| 성명 | 홍길동 |");
writeFileSync("출력.hwpx", Buffer.from(hwpx));
EOF

7. Compare two document versions when diff matters

node --input-type=module - <<'EOF'
import { compare } from "kordoc";
import { readFileSync } from "node:fs";

const before = readFileSync("이전버전.hwp");
const after = readFileSync("최신버전.hwpx");
const diff = await compare(before, after);
console.log(diff.stats);
EOF

Verify outputs after every run

  • Markdown: 파일이 생성되었고 제목/본문/표 구조가 깨지지 않았는지 확인
  • JSON: success: trueblocks / metadata 존재 여부 확인
  • 배치 처리: 입력 수와 출력 수가 크게 어긋나지 않는지 확인
  • 양식 필드 추출: extractFormFields(result.blocks) 결과가 비어 있지 않은지 확인
  • 역변환: 생성된 .hwpx 파일이 열리고 기본 서식/테이블 구조가 유지되는지 확인
  • 비교: diff.stats 에 added / removed / modified 값이 합리적인지 확인

Done when

  • 요청한 Markdown / JSON / HWPX 결과물이 생성되어 있다
  • 공문서 표·이미지·메타데이터가 필요한 수준으로 확인되어 있다
  • 양식 필드 추출이나 역변환 요청이 있었다면 결과/출력 구조까지 검증되어 있다
  • 배치 요청이면 처리 범위와 실패 건수가 정리되어 있다

Failure modes

  • 손상된 HWP/HWPX/HWPML 파일
  • 암호화/배포 제한 문서에서 일부 파싱 한계 발생
  • 이미지 기반 PDF인데 OCR provider가 없음
  • 출력 디렉터리 권한 부족
  • 양식 라벨이 템플릿 안에서 예상과 다르게 배치되어 일부 필드가 인식되지 않음

Notes

  • kordoc은 HWP/HWPX뿐 아니라 HWPML, PDF, XLSX, DOCX도 함께 다룬다.
  • 기본 목적은 AI가 읽을 수 있는 Markdown/JSON 변환 이다.
  • 현재 배포본 기준으로 문서화된 CLI 명령은 기본 변환과 watch 이며, 양식 처리는 extractFormFields() 같은 Node API로 연결한다.