mirror of
https://github.com/NomaDamas/k-skill.git
synced 2026-06-24 02:04:11 +00:00
3 commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
c563ef535b |
rhwp-edit (#155): guard replace-all case-insensitive path against UTF-16 length-drift
Round 2 review flagged a latent Unicode safety bug: when replaceAll's
caseSensitive=false branch encounters characters whose toLowerCase()
changes UTF-16 length (e.g. Turkish İ U+0130 → i + U+0307 combining dot
above), offsets taken in the lowercased haystack drift by the expansion
delta for every subsequent match and silently corrupt the document.
Reviewer repro: 'ABCİABCİXYZ' + case-insensitive İ→Z reported
{ok:true,count:2} but rendered 'ABCZABCİZYZ' instead of 'ABCZABCZXYZ'
(the X at index 8 was corrupted while the second İ survived).
Surface a descriptive error rather than silently drift:
- findAllMatchOffsets: in the case-insensitive branch, verify that the
paragraph text and the query each preserve UTF-16 length under
toLowerCase; otherwise throw with an actionable message pointing the
user to --case-sensitive or input normalization.
- This is strictly a safety guard: the 2025→2026 headline workflow,
ASCII, Hangul, and every existing test are unaffected.
Tests (TDD red → green, net +4 in packages/k-skill-rhwp):
- 'replaceAll refuses case-insensitive matching when source text
contains case-folding length-changing chars (e.g. Turkish İ U+0130)'
reproduces the exact reviewer input and asserts rejection + no output
file
- 'replaceAll refuses case-insensitive matching when the query itself
contains case-folding length-changing chars' covers the query-side path
- 'replaceAll with --case-sensitive succeeds on inputs containing İ'
confirms the guard only fires in the case-insensitive path and that
case-sensitive produces ABCZABCZXYZ with no X corruption
- 'replaceAll case-insensitive still works for normal ASCII/Hangul'
regression-guards against the fix over-rejecting the common case
Doc disclosure in all 4 surfaces called out by the reviewer:
- rhwp-edit/SKILL.md: new failure-mode bullet naming U+0130 specifically
- docs/features/rhwp-edit.md: Unicode 대소문자 무시 주의 paragraph
under scenario 3 (replace-all)
- packages/k-skill-rhwp/README.md: extended Scope section
- packages/k-skill-rhwp/src/cli.js: USAGE 'Scope note' appended
- scripts/skill-docs.test.js: 2 new assertions locking the SKILL.md and
feature-doc disclosure so they can't be silently removed
- .changeset: note the guard in the pending v0.1.0 release notes
Manual QA (end-to-end via the published CLI):
$ k-skill-rhwp replace-all … --query İ --replacement Z
→ exit 1 + 'case-insensitive matching is unsafe because case folding
changes the UTF-16 length …'
→ no output file written
$ k-skill-rhwp replace-all … --query İ --replacement Z --case-sensitive
→ {ok:true,count:2}, render shows 'ABCZABCZXYZ', search İ ⇒ found:false
$ replace-all '2025'→'2026' on '2025 2025 2025' ⇒ {ok:true,count:3}
$ replace-all 'hello'→'hi' (case-insens.) on 'hello WORLD 안녕 HELLO'
⇒ {ok:true,count:2}
Verification:
- npm test --workspace k-skill-rhwp: 35 pass / 0 fail (+4 vs Round 2)
- node --test scripts/skill-docs.test.js: 114 pass / 0 fail
- npm run ci: exit 0 (lint + typecheck + all workspace tests +
pack:dry-run + validate-skills.sh all green)
Refs PR #162 Round 2 review 'Non-blocking residual risk — Unicode
case-insensitive offset drift'.
|
||
|
|
6dbdeb1912 |
rhwp-edit (#155): fix replace-all silent no-op and document body-only scope
Upstream @rhwp/core HwpDocument.replaceAll returns {ok:true, count:N} but
does not persist the mutation into exportHwp() serialization, so the output
bytes are byte-identical to the input. This is confirmed against
@rhwp/core@0.7.3 with SHA diffing and round-trip searchText.
Rewrite the Node wrapper replaceAll to compose engine primitives that do
persist: for each body paragraph, read the full text via getTextRange,
compute all non-overlapping match offsets in JS, then apply replaceText
right-to-left so earlier offsets are unaffected by length changes. This
restores the documented '2025 → 2026 일괄 치환' headline workflow.
Guard rails in the new replaceAll:
- Reject replacements containing newline or paragraph-break characters
(\n, \r, U+2028, U+2029) with a descriptive error. Splitting a paragraph
via replaceText would invalidate subsequent offsets.
- Non-overlapping semantics against the original text, so
--query a --replacement aa against 'aaa' yields 'aaaaaa' (3 replacements)
instead of looping on the freshly inserted 'a' characters.
Tighten the regression tests to assert content, not just length:
- Same-length replacement: output SHA must differ from input, searchText
must find the replacement and must NOT find the original query.
- Longer-length replacement: paragraph length must grow by the correct
amount and output SHA must differ.
- Shorter-length replacement: paragraph length must shrink by the correct
amount and output SHA must differ.
- Empty replacement: deletes every match and output no longer contains
the query.
- Replacement contains query (a→aa on aaa): expects count 3 and length 6.
- Zero matches: count 0, output still written.
- Case-sensitive flag skips mismatched case.
- Newline replacement is rejected synchronously.
Document the body-only scope of search and replace-all in the SKILL.md
routing policy, failure-modes, CLI USAGE text, feature doc, and package
README so users know to use set-cell-text for cell content. This matches
the upstream searchText contract, which does not descend into table cells,
headers, footers, or footnotes.
Add a matching regression assertion to scripts/skill-docs.test.js so the
body-only scope note cannot be silently removed from SKILL.md or the
feature doc.
Closes review round 1 for PR #162.
|
||
|
|
dadc5f4ffa |
Add rhwp-edit and rhwp-advanced skills with k-skill-rhwp CLI
Splits HWP handling into three focused skills per issue #155: - hwp (kept): kordoc-based read/convert (Markdown, JSON, diffing, form fields, Markdown->HWPX). Description narrowed to 'read-only' to make the routing policy explicit. - rhwp-edit (new): HWP binary editing via new k-skill-rhwp npm package that wraps the @rhwp/core WASM bindings as CLI subcommands: info, list-paragraphs, search, insert-text, delete-text, replace-all, create-table, set-cell-text, create-blank, and render. - rhwp-advanced (new): guidance for the upstream Rust rhwp CLI (export-svg --debug-overlay, dump, dump-pages, ir-diff, thumbnail, convert) for layout debugging, IR inspection, version comparison, and read-only-document unlocking. The new k-skill-rhwp package under packages/ ships a Node.js 18+ CLI and library that round-trips HWP 5.x documents entirely in-process; no Rust toolchain is required. It auto-installs the WASM-required globalThis.measureTextWidth shim for headless Node, and all editing subcommands always write to a distinct output path so the source file is never mutated. HWPX save remains disabled per the upstream rhwp #196 data-safety gate; HWPX input is accepted but output is written as HWP 5.x. Includes 24 node:test cases covering init, round-trip insertText, replaceAll, createTable + setCellText, deleteText, searchText, listParagraphs, renderPage (SVG/HTML), and full CLI arg-parse + end-to-end round-trip through the CLI layer. Wires README feature table (3 rows for hwp / rhwp-edit / rhwp-advanced), docs/install.md optional-install list, docs/roadmap.md (marks HWP advanced editing as shipped while keeping Windows/security-module automation out of scope), docs/sources.md (adds rhwp upstream, CLI source, @rhwp/core, @rhwp/editor, and rhwp #196 references), and the root pack:dry-run script. Adds a Changesets entry for k-skill-rhwp minor. Closes #155. |