mirror of
https://github.com/NomaDamas/k-skill.git
synced 2026-06-24 02:04:11 +00:00
Round 2 review flagged a latent Unicode safety bug: when replaceAll's
caseSensitive=false branch encounters characters whose toLowerCase()
changes UTF-16 length (e.g. Turkish İ U+0130 → i + U+0307 combining dot
above), offsets taken in the lowercased haystack drift by the expansion
delta for every subsequent match and silently corrupt the document.
Reviewer repro: 'ABCİABCİXYZ' + case-insensitive İ→Z reported
{ok:true,count:2} but rendered 'ABCZABCİZYZ' instead of 'ABCZABCZXYZ'
(the X at index 8 was corrupted while the second İ survived).
Surface a descriptive error rather than silently drift:
- findAllMatchOffsets: in the case-insensitive branch, verify that the
paragraph text and the query each preserve UTF-16 length under
toLowerCase; otherwise throw with an actionable message pointing the
user to --case-sensitive or input normalization.
- This is strictly a safety guard: the 2025→2026 headline workflow,
ASCII, Hangul, and every existing test are unaffected.
Tests (TDD red → green, net +4 in packages/k-skill-rhwp):
- 'replaceAll refuses case-insensitive matching when source text
contains case-folding length-changing chars (e.g. Turkish İ U+0130)'
reproduces the exact reviewer input and asserts rejection + no output
file
- 'replaceAll refuses case-insensitive matching when the query itself
contains case-folding length-changing chars' covers the query-side path
- 'replaceAll with --case-sensitive succeeds on inputs containing İ'
confirms the guard only fires in the case-insensitive path and that
case-sensitive produces ABCZABCZXYZ with no X corruption
- 'replaceAll case-insensitive still works for normal ASCII/Hangul'
regression-guards against the fix over-rejecting the common case
Doc disclosure in all 4 surfaces called out by the reviewer:
- rhwp-edit/SKILL.md: new failure-mode bullet naming U+0130 specifically
- docs/features/rhwp-edit.md: Unicode 대소문자 무시 주의 paragraph
under scenario 3 (replace-all)
- packages/k-skill-rhwp/README.md: extended Scope section
- packages/k-skill-rhwp/src/cli.js: USAGE 'Scope note' appended
- scripts/skill-docs.test.js: 2 new assertions locking the SKILL.md and
feature-doc disclosure so they can't be silently removed
- .changeset: note the guard in the pending v0.1.0 release notes
Manual QA (end-to-end via the published CLI):
$ k-skill-rhwp replace-all … --query İ --replacement Z
→ exit 1 + 'case-insensitive matching is unsafe because case folding
changes the UTF-16 length …'
→ no output file written
$ k-skill-rhwp replace-all … --query İ --replacement Z --case-sensitive
→ {ok:true,count:2}, render shows 'ABCZABCZXYZ', search İ ⇒ found:false
$ replace-all '2025'→'2026' on '2025 2025 2025' ⇒ {ok:true,count:3}
$ replace-all 'hello'→'hi' (case-insens.) on 'hello WORLD 안녕 HELLO'
⇒ {ok:true,count:2}
Verification:
- npm test --workspace k-skill-rhwp: 35 pass / 0 fail (+4 vs Round 2)
- node --test scripts/skill-docs.test.js: 114 pass / 0 fail
- npm run ci: exit 0 (lint + typecheck + all workspace tests +
pack:dry-run + validate-skills.sh all green)
Refs PR #162 Round 2 review 'Non-blocking residual risk — Unicode
case-insensitive offset drift'.
|
||
|---|---|---|
| .. | ||
| SKILL.md | ||