Round 2 review flagged a latent Unicode safety bug: when replaceAll's
caseSensitive=false branch encounters characters whose toLowerCase()
changes UTF-16 length (e.g. Turkish İ U+0130 → i + U+0307 combining dot
above), offsets taken in the lowercased haystack drift by the expansion
delta for every subsequent match and silently corrupt the document.
Reviewer repro: 'ABCİABCİXYZ' + case-insensitive İ→Z reported
{ok:true,count:2} but rendered 'ABCZABCİZYZ' instead of 'ABCZABCZXYZ'
(the X at index 8 was corrupted while the second İ survived).
Surface a descriptive error rather than silently drift:
- findAllMatchOffsets: in the case-insensitive branch, verify that the
paragraph text and the query each preserve UTF-16 length under
toLowerCase; otherwise throw with an actionable message pointing the
user to --case-sensitive or input normalization.
- This is strictly a safety guard: the 2025→2026 headline workflow,
ASCII, Hangul, and every existing test are unaffected.
Tests (TDD red → green, net +4 in packages/k-skill-rhwp):
- 'replaceAll refuses case-insensitive matching when source text
contains case-folding length-changing chars (e.g. Turkish İ U+0130)'
reproduces the exact reviewer input and asserts rejection + no output
file
- 'replaceAll refuses case-insensitive matching when the query itself
contains case-folding length-changing chars' covers the query-side path
- 'replaceAll with --case-sensitive succeeds on inputs containing İ'
confirms the guard only fires in the case-insensitive path and that
case-sensitive produces ABCZABCZXYZ with no X corruption
- 'replaceAll case-insensitive still works for normal ASCII/Hangul'
regression-guards against the fix over-rejecting the common case
Doc disclosure in all 4 surfaces called out by the reviewer:
- rhwp-edit/SKILL.md: new failure-mode bullet naming U+0130 specifically
- docs/features/rhwp-edit.md: Unicode 대소문자 무시 주의 paragraph
under scenario 3 (replace-all)
- packages/k-skill-rhwp/README.md: extended Scope section
- packages/k-skill-rhwp/src/cli.js: USAGE 'Scope note' appended
- scripts/skill-docs.test.js: 2 new assertions locking the SKILL.md and
feature-doc disclosure so they can't be silently removed
- .changeset: note the guard in the pending v0.1.0 release notes
Manual QA (end-to-end via the published CLI):
$ k-skill-rhwp replace-all … --query İ --replacement Z
→ exit 1 + 'case-insensitive matching is unsafe because case folding
changes the UTF-16 length …'
→ no output file written
$ k-skill-rhwp replace-all … --query İ --replacement Z --case-sensitive
→ {ok:true,count:2}, render shows 'ABCZABCZXYZ', search İ ⇒ found:false
$ replace-all '2025'→'2026' on '2025 2025 2025' ⇒ {ok:true,count:3}
$ replace-all 'hello'→'hi' (case-insens.) on 'hello WORLD 안녕 HELLO'
⇒ {ok:true,count:2}
Verification:
- npm test --workspace k-skill-rhwp: 35 pass / 0 fail (+4 vs Round 2)
- node --test scripts/skill-docs.test.js: 114 pass / 0 fail
- npm run ci: exit 0 (lint + typecheck + all workspace tests +
pack:dry-run + validate-skills.sh all green)
Refs PR #162 Round 2 review 'Non-blocking residual risk — Unicode
case-insensitive offset drift'.
4.8 KiB
k-skill-rhwp
Node-side HWP editing CLI that wraps @rhwp/core
(Rust + WebAssembly, MIT, by Edward Kim) as subcommands.
- Ships the
k-skill-rhwpbinary for therhwp-editskill in NomaDamas/k-skill. - Round-trip safe HWP 5.x editing — insert/delete text, replace-all, create tables, set cell text, and render pages to SVG or HTML.
- Node 18+ only. No Rust toolchain required; the shipped WASM does the work.
For debugging the upstream rhwp Rust CLI (export-svg --debug-overlay,
dump, ir-diff, thumbnail, convert), see the rhwp-advanced skill —
this package does not wrap those commands.
For .hwp → Markdown / JSON / form-field extraction, see the hwp skill
(kordoc-based). This package is editing-only.
Install
npm install k-skill-rhwp
# or run one-off
npx --yes k-skill-rhwp --help
CLI
# Metadata / structure
k-skill-rhwp info <input.hwp>
k-skill-rhwp list-paragraphs <input.hwp> [--section N]
k-skill-rhwp search <input.hwp> --query TEXT [--from-section N] [--from-paragraph N] [--from-char N] [--case-sensitive]
# Body editing
k-skill-rhwp insert-text <input> <output> --section N --paragraph N --offset N --text TEXT
k-skill-rhwp delete-text <input> <output> --section N --paragraph N --offset N --count N
k-skill-rhwp replace-all <input> <output> --query TEXT --replacement TEXT [--case-sensitive]
# Tables
k-skill-rhwp create-table <input> <output> --section N --paragraph N --offset N --rows N --cols N
k-skill-rhwp set-cell-text <input> <output> --section N --parent-paragraph N --control N --cell N --text TEXT [--cell-paragraph N] [--no-replace]
# Rendering / creation
k-skill-rhwp create-blank <output.hwp>
k-skill-rhwp render <input.hwp> [--page N] [--format svg|html]
Every editing subcommand writes a brand-new HWP file (never overwrites the
input) and prints a JSON summary including ok, post-edit cursor position,
bytesWritten, and the resolved outputPath.
Scope of search and replace-all
Both search and replace-all operate on body paragraphs only. Text
inside table cells, headers/footers, or footnotes is not scanned. This
mirrors the upstream @rhwp/core searchText scope. For cell text, use
info or list-paragraphs to locate the table and then set-cell-text to
write. replace-all also rejects any --replacement that contains newline
or paragraph-break characters (\n, \r, U+2028, U+2029) because they would
split a paragraph — split those into multiple insert-text calls instead.
replace-all uses non-overlapping replacement semantics: matches are
computed against the original text before any replacement runs, so
--query a --replacement aa against aaa replaces 3 originals and yields
aaaaaa, not an infinite loop.
Case-insensitive matching (the default) relies on String.prototype.toLowerCase()
preserving UTF-16 length so offsets taken in the lowercased haystack still apply
to the original text. A handful of Unicode characters (notably Turkish İ
U+0130, which lowercases to i + combining dot above U+0307) violate that
invariant. When either the query or a paragraph contains such a character,
replace-all refuses the operation with exit code 1 and a case-insensitive matching is unsafe because case folding changes the UTF-16 length message
rather than silently drifting every subsequent offset. Rerun with
--case-sensitive, or normalize the input. ASCII, Hangul, and the common HWP
use cases (e.g. 2025 → 2026) are not affected.
Node API
const { insertText, createTable, setCellText, getDocumentInfo } = require("k-skill-rhwp");
await insertText({
input: "./draft.hwp",
output: "./draft-with-title.hwp",
section: 0,
paragraph: 0,
offset: 0,
text: "2026년 신청서"
});
console.log(await getDocumentInfo("./draft-with-title.hwp"));
The first call loads @rhwp/core WASM once per process. The WASM requires a
globalThis.measureTextWidth(font, text) callback for text layout; this
package auto-installs a deterministic approximation shim on first use so it
works headless on Node without canvas. Replace the shim before the first
call if you need pixel-accurate metrics.
Known limitations
- HWPX round-trip is disabled upstream (rhwp #196). HWPX input is accepted, but output is always written as HWP 5.x binary.
- rhwp v0.7.x is beta. Complex tables, images, charts, or form fields may
occasionally lose fidelity on round-trip; verify with
infoand visual render after non-trivial edits. - Windows security modules, Hancom GUI automation, read-only distribution
documents beyond
rhwp convertare out of scope.
Upstream references
- rhwp (Rust): https://github.com/edwardkim/rhwp
- @rhwp/core (npm): https://www.npmjs.com/package/@rhwp/core
- k-skill repo: https://github.com/NomaDamas/k-skill
License
MIT