k-skill/packages/k-skill-rhwp/src
Jeffrey (Dongkyu) Kim c563ef535b rhwp-edit (#155): guard replace-all case-insensitive path against UTF-16 length-drift
Round 2 review flagged a latent Unicode safety bug: when replaceAll's
caseSensitive=false branch encounters characters whose toLowerCase()
changes UTF-16 length (e.g. Turkish İ U+0130 → i + U+0307 combining dot
above), offsets taken in the lowercased haystack drift by the expansion
delta for every subsequent match and silently corrupt the document.
Reviewer repro: 'ABCİABCİXYZ' + case-insensitive İ→Z reported
{ok:true,count:2} but rendered 'ABCZABCİZYZ' instead of 'ABCZABCZXYZ'
(the X at index 8 was corrupted while the second İ survived).

Surface a descriptive error rather than silently drift:
- findAllMatchOffsets: in the case-insensitive branch, verify that the
  paragraph text and the query each preserve UTF-16 length under
  toLowerCase; otherwise throw with an actionable message pointing the
  user to --case-sensitive or input normalization.
- This is strictly a safety guard: the 2025→2026 headline workflow,
  ASCII, Hangul, and every existing test are unaffected.

Tests (TDD red → green, net +4 in packages/k-skill-rhwp):
- 'replaceAll refuses case-insensitive matching when source text
  contains case-folding length-changing chars (e.g. Turkish İ U+0130)'
  reproduces the exact reviewer input and asserts rejection + no output
  file
- 'replaceAll refuses case-insensitive matching when the query itself
  contains case-folding length-changing chars' covers the query-side path
- 'replaceAll with --case-sensitive succeeds on inputs containing İ'
  confirms the guard only fires in the case-insensitive path and that
  case-sensitive produces ABCZABCZXYZ with no X corruption
- 'replaceAll case-insensitive still works for normal ASCII/Hangul'
  regression-guards against the fix over-rejecting the common case

Doc disclosure in all 4 surfaces called out by the reviewer:
- rhwp-edit/SKILL.md: new failure-mode bullet naming U+0130 specifically
- docs/features/rhwp-edit.md: Unicode 대소문자 무시 주의 paragraph
  under scenario 3 (replace-all)
- packages/k-skill-rhwp/README.md: extended Scope section
- packages/k-skill-rhwp/src/cli.js: USAGE 'Scope note' appended
- scripts/skill-docs.test.js: 2 new assertions locking the SKILL.md and
  feature-doc disclosure so they can't be silently removed
- .changeset: note the guard in the pending v0.1.0 release notes

Manual QA (end-to-end via the published CLI):
  $ k-skill-rhwp replace-all … --query İ --replacement Z
  → exit 1 + 'case-insensitive matching is unsafe because case folding
    changes the UTF-16 length …'
  → no output file written
  $ k-skill-rhwp replace-all … --query İ --replacement Z --case-sensitive
  → {ok:true,count:2}, render shows 'ABCZABCZXYZ', search İ ⇒ found:false
  $ replace-all '2025'→'2026' on '2025 2025 2025' ⇒ {ok:true,count:3}
  $ replace-all 'hello'→'hi' (case-insens.) on 'hello WORLD 안녕 HELLO'
    ⇒ {ok:true,count:2}

Verification:
- npm test --workspace k-skill-rhwp: 35 pass / 0 fail (+4 vs Round 2)
- node --test scripts/skill-docs.test.js: 114 pass / 0 fail
- npm run ci: exit 0 (lint + typecheck + all workspace tests +
  pack:dry-run + validate-skills.sh all green)

Refs PR #162 Round 2 review 'Non-blocking residual risk — Unicode
case-insensitive offset drift'.
2026-04-22 15:23:23 +09:00
..
cli.js rhwp-edit (#155): guard replace-all case-insensitive path against UTF-16 length-drift 2026-04-22 15:23:23 +09:00
index.js rhwp-edit (#155): guard replace-all case-insensitive path against UTF-16 length-drift 2026-04-22 15:23:23 +09:00
wasm-init.js Add rhwp-edit and rhwp-advanced skills with k-skill-rhwp CLI 2026-04-22 12:45:13 +09:00