Align cleaner helper contract with selective installs

The cleanup helper is now shipped inside the k-skill-cleaner skill payload while the repo-root script remains as a compatibility wrapper. The CLI also enforces the interview-selected usage window with --days/--since and reports the effective cutoff so recommendations match the documented contract.\n\nConstraint: Selective skill installs only receive files under the skill directory.\nRejected: Remove k-skill-cleaner from selective install docs | the feature is intended to be installable standalone.\nConfidence: high\nScope-risk: narrow\nDirective: Keep the skill-local helper as the canonical implementation; the root script should stay a thin wrapper.\nTested: PYTHONPATH=scripts python3 -m unittest scripts.test_k_skill_cleaner\nTested: node --test scripts/skill-docs.test.js\nTested: CLI smoke via k-skill-cleaner/scripts/k_skill_cleaner.py with --days 90\nTested: npm run lint\nTested: npm run typecheck && npm test\nTested: npm run ci\nTested: Architect verification APPROVED\nNot-tested: Live agent transcript schemas beyond fixture-style local log samples
This commit is contained in:
Jeffrey (Dongkyu) Kim 2026-04-28 17:50:54 +09:00
commit 10cb419212
7 changed files with 454 additions and 259 deletions

View file

@ -5,9 +5,10 @@
## 기본 흐름
1. 먼저 인터뷰로 보존할 스킬, 절대 쓰지 않는 스킬, 주로 쓰는 에이전트, 분석 기간을 확인한다.
2. `python3 scripts/k_skill_cleaner.py`로 root-level `SKILL.md` 디렉터리를 찾고, 사용자가 제공한 usage JSON 또는 로컬 로그를 스캔한다.
3. 결과 JSON의 `candidates`를 읽어 `remove``review`를 분리한다.
4. 삭제는 추천 이후 사용자가 명시적으로 승인한 경우에만 진행한다.
2. 설치된 단독 스킬에서는 `python3 scripts/k_skill_cleaner.py``k-skill-cleaner` 스킬 디렉터리 안에서 실행한다. 전체 저장소 checkout에서는 `python3 k-skill-cleaner/scripts/k_skill_cleaner.py` 또는 호환 wrapper `python3 scripts/k_skill_cleaner.py`를 사용할 수 있다.
3. helper는 root-level `SKILL.md` 디렉터리를 찾고, 사용자가 제공한 usage JSON 또는 로컬 로그를 스캔한다.
4. 결과 JSON의 `candidates`를 읽어 `remove``review`를 분리한다.
5. 삭제는 추천 이후 사용자가 명시적으로 승인한 경우에만 진행한다.
## 트리거 횟수 확인 방법
@ -25,8 +26,9 @@
python3 scripts/k_skill_cleaner.py \
--skills-root . \
--scan-default-logs \
--days 90 \
--never-use blue-ribbon-nearby,lotto-results \
--keep k-skill-setup,k-skill-cleaner
```
출력은 파일 삭제를 하지 않는 JSON 리포트다. `zero_triggers``low_usage`만 있는 항목은 바로 삭제하지 말고 검토 후보로 남긴다. `interview_never_use`가 포함된 항목은 사용자의 의도가 확인된 삭제 후보로 보고한다.
`--days 90`은 최근 90일 window만 카운트한다. timestamp가 없는 로그 줄은 파일 mtime으로 포함/제외를 결정한다. 출력은 파일 삭제를 하지 않는 JSON 리포트다. `zero_triggers``low_usage`만 있는 항목은 바로 삭제하지 말고 검토 후보로 남긴다. `interview_never_use`가 포함된 항목은 사용자의 의도가 확인된 삭제 후보로 보고한다.

View file

@ -21,7 +21,7 @@ Ask a compact interview before scanning or recommending deletion:
1. 어떤 에이전트를 주로 쓰나요? (Claude Code, Codex, OpenCode, OpenClaw/ClawHub, Hermes Agent, 기타)
2. 절대 지우면 안 되는 스킬은 무엇인가요?
3. 본인이 절대로 쓰지 않는다고 확신하는 스킬은 무엇인가요?
4. 최근 30/90/180일 중 어떤 기간의 사용 흔적을 우선 볼까요?
4. 최근 30/90/180일 중 어떤 기간의 사용 흔적을 우선 볼까요? helper 실행 시 `--days` 또는 `--since`로 반영합니다.
5. 추천만 원하나요, 아니면 승인 후 실제 삭제까지 원하나요?
## Trigger count sources by agent
@ -36,12 +36,13 @@ Ask a compact interview before scanning or recommending deletion:
## Local helper
From the repository root, run the deterministic helper to combine interview answers and local logs:
From an installed standalone skill, run the deterministic helper from the `k-skill-cleaner` skill directory. In a full repository checkout, the compatibility wrapper at `scripts/k_skill_cleaner.py` accepts the same options.
```bash
python3 scripts/k_skill_cleaner.py \
--skills-root . \
--scan-default-logs \
--days 90 \
--never-use blue-ribbon-nearby,lotto-results \
--keep k-skill-setup,k-skill-cleaner
```
@ -49,7 +50,7 @@ python3 scripts/k_skill_cleaner.py \
For agent exports or hand-curated counts, pass a JSON object mapping skill name to trigger count:
```bash
python3 scripts/k_skill_cleaner.py --skills-root . --usage-json usage-counts.json
python3 scripts/k_skill_cleaner.py --skills-root . --usage-json usage-counts.json --days 90
```
The helper prints JSON with:
@ -57,6 +58,7 @@ The helper prints JSON with:
- `skill_count`: number of root-level skills discovered.
- `candidates`: ranked `remove` or `review` candidates with `trigger_count` and `reasons`.
- `agent_usage_sources`: the agent-specific paths and caveats above.
- `time_window`: the effective `--since`/`--days` cutoff and mtime fallback caveat.
- `safety`: reminder that no files were deleted.
## Recommendation policy

View file

@ -0,0 +1,367 @@
#!/usr/bin/env python3
"""Utilities for the k-skill-cleaner skill.
The helper intentionally stays dependency-free: it scans root-level skill
folders, best-effort local agent logs, and optional interview choices to produce
a conservative cleanup shortlist. It never deletes files by itself.
"""
from __future__ import annotations
import argparse
import json
import os
import re
from collections.abc import Iterable, Mapping
from datetime import datetime, timedelta, timezone
from pathlib import Path
from typing import Any
EXCLUDED_ROOT_DIRS = {
".changeset",
".claude",
".codex",
".cursor",
".git",
".github",
".omx",
".ouroboros",
".vscode",
"docs",
"examples",
"node_modules",
"packages",
"python-packages",
"scripts",
}
AGENT_USAGE_SOURCES = [
{
"agent": "Claude Code",
"paths": ["~/.claude/projects/**/*.jsonl", "~/.claude/transcripts/**/*.jsonl"],
"method": "Scan JSONL transcript lines for skill-trigger events, $skill mentions, and SKILL.md load markers.",
"confidence": "best-effort",
},
{
"agent": "Codex",
"paths": ["~/.codex/sessions/**/*.jsonl", "~/.codex/log/**/*.log", ".omx/logs/**/*.log"],
"method": "Scan Codex session/log lines for routed skill names, $skill invocations, and SKILL.md reads.",
"confidence": "best-effort",
},
{
"agent": "OpenCode",
"paths": ["~/.local/share/opencode/**/*.jsonl", "~/.config/opencode/**/*.jsonl"],
"method": "Scan OpenCode data/config logs when available; ask for an exported transcript otherwise.",
"confidence": "best-effort",
},
{
"agent": "OpenClaw/ClawHub",
"paths": ["~/.openclaw/**/*.jsonl", "~/.clawhub/**/*.jsonl"],
"method": "No stable public trigger-count schema is assumed; use local logs if present or imported JSON counts.",
"confidence": "manual-confirm",
"fallback": "Ask the user to export trigger stats or provide a usage JSON file.",
},
{
"agent": "Hermes Agent",
"paths": ["~/.hermes/**/*.jsonl", "~/.config/hermes/**/*.jsonl"],
"method": "No stable public trigger-count schema is assumed; use local logs if present or imported JSON counts.",
"confidence": "manual-confirm",
"fallback": "Ask the user to export trigger stats or provide a usage JSON file.",
},
]
def find_skill_dirs(root: Path | str) -> list[str]:
"""Return root-level directories that look like installable skills."""
root_path = Path(root)
skills: list[str] = []
for child in root_path.iterdir():
if not child.is_dir() or child.name in EXCLUDED_ROOT_DIRS:
continue
if (child / "SKILL.md").is_file():
skills.append(child.name)
return sorted(skills)
def _walk_strings(value: Any, key_hint: str | None = None) -> Iterable[tuple[str | None, str]]:
if isinstance(value, str):
yield key_hint, value
elif isinstance(value, Mapping):
for key, child in value.items():
yield from _walk_strings(child, str(key))
elif isinstance(value, list):
for child in value:
yield from _walk_strings(child, key_hint)
def _line_mentions_skill(line: str, skill: str) -> bool:
escaped = re.escape(skill)
patterns = [
rf"(?<![\w-])\${escaped}(?![\w-])",
rf"(?i)\bskill(?:[_ -]?name|[_ -]?id)?\s*[:=]\s*['\"]?{escaped}(?![\w-])",
rf"(?<![\w-]){escaped}/SKILL\.md\b",
rf"(?i)\bloaded skill\s*[:=]?\s*['\"]?{escaped}(?![\w-])",
rf"(?i)\busing\s+\${escaped}(?![\w-])",
]
return any(re.search(pattern, line) for pattern in patterns)
def _json_mentions_skill(record: Any, skill: str) -> bool:
key_names = {"skill", "skillname", "skill_name", "skillid", "skill_id", "name"}
for key, value in _walk_strings(record):
normalized_key = (key or "").replace("-", "").replace("_", "").lower()
if normalized_key in key_names and value == skill:
return True
if _line_mentions_skill(value, skill):
return True
return False
def _parse_datetime(value: str | datetime | None) -> datetime | None:
if value is None or isinstance(value, datetime):
parsed = value
else:
raw = value.strip()
if not raw:
return None
if raw.endswith("Z"):
raw = f"{raw[:-1]}+00:00"
try:
parsed = datetime.fromisoformat(raw)
except ValueError:
try:
parsed = datetime.fromisoformat(f"{raw}T00:00:00")
except ValueError as exc:
raise ValueError("since must be an ISO date or datetime") from exc
if parsed is None:
return None
if parsed.tzinfo is None:
return parsed.replace(tzinfo=timezone.utc)
return parsed.astimezone(timezone.utc)
def _line_datetime_from_json(record: Any) -> datetime | None:
timestamp_keys = {"timestamp", "time", "created_at", "createdat", "date", "datetime", "ts"}
if not isinstance(record, Mapping):
return None
for key, value in record.items():
normalized_key = str(key).replace("-", "").replace("_", "").lower()
if normalized_key in timestamp_keys and isinstance(value, str):
try:
return _parse_datetime(value)
except ValueError:
return None
return None
def _line_datetime_from_text(line: str) -> datetime | None:
match = re.search(r"\b\d{4}-\d{2}-\d{2}(?:[T ]\d{2}:\d{2}:\d{2}(?:\.\d+)?(?:Z|[+-]\d{2}:?\d{2})?)?\b", line)
if not match:
return None
raw = match.group(0)
if "T" not in raw and " " not in raw:
raw = f"{raw}T00:00:00"
if re.search(r"[+-]\d{4}$", raw):
raw = f"{raw[:-2]}:{raw[-2:]}"
try:
return _parse_datetime(raw.replace(" ", "T", 1))
except ValueError:
return None
def _mtime_datetime(path: Path) -> datetime:
return datetime.fromtimestamp(path.stat().st_mtime, tz=timezone.utc)
def _line_is_in_window(path: Path, line: str, parsed: Any | None, since: datetime | None) -> bool:
if since is None:
return True
line_dt = _line_datetime_from_json(parsed) if parsed is not None else None
if line_dt is None:
line_dt = _line_datetime_from_text(line)
if line_dt is None:
line_dt = _mtime_datetime(path)
return line_dt >= since
def collect_skill_usage(
log_paths: Iterable[Path | str],
skill_names: Iterable[str],
since: str | datetime | None = None,
) -> dict[str, int]:
"""Best-effort count of skill trigger mentions across local agent logs.
When ``since`` is provided, timestamped records older than the cutoff are
skipped. Lines without parseable timestamps fall back to the log file mtime,
which keeps the selected interview window enforceable even for mixed log
formats.
"""
since_dt = _parse_datetime(since)
skills = sorted(set(skill_names))
counts = {skill: 0 for skill in skills}
for raw_path in log_paths:
path = Path(raw_path).expanduser()
if not path.is_file():
continue
for line in path.read_text(encoding="utf-8", errors="replace").splitlines():
parsed: Any | None = None
try:
parsed = json.loads(line)
except json.JSONDecodeError:
parsed = None
if not _line_is_in_window(path, line, parsed, since_dt):
continue
for skill in skills:
if (parsed is not None and _json_mentions_skill(parsed, skill)) or _line_mentions_skill(line, skill):
counts[skill] += 1
return counts
def load_usage_json(path: Path | str | None) -> dict[str, int]:
if path is None:
return {}
data = json.loads(Path(path).read_text(encoding="utf-8"))
if not isinstance(data, Mapping):
raise ValueError("usage JSON must be an object mapping skill names to counts")
counts: dict[str, int] = {}
for key, value in data.items():
try:
counts[str(key)] = int(value)
except (TypeError, ValueError) as exc:
raise ValueError(f"usage count for {key!r} must be an integer") from exc
return counts
def rank_cleanup_candidates(
skill_names: Iterable[str],
usage_counts: Mapping[str, int] | None = None,
never_use: Iterable[str] | None = None,
keep: Iterable[str] | None = None,
low_usage_threshold: int = 1,
) -> list[dict[str, Any]]:
"""Rank deletion/review candidates without touching the filesystem."""
counts = usage_counts or {}
never = set(never_use or [])
protected = set(keep or [])
candidates: list[dict[str, Any]] = []
for skill in sorted(set(skill_names)):
if skill in protected:
continue
count = int(counts.get(skill, 0))
reasons: list[str] = []
score = 0
action = "keep"
if skill in never:
reasons.append("interview_never_use")
score += 100
action = "remove"
if count == 0:
reasons.append("zero_triggers")
score += 50
elif count <= low_usage_threshold:
reasons.append("low_usage")
score += 20
if not reasons:
continue
if action != "remove":
action = "review"
candidates.append(
{
"skill": skill,
"action": action,
"trigger_count": count,
"score": score,
"reasons": reasons,
}
)
return sorted(candidates, key=lambda item: (-item["score"], item["skill"]))
def expand_default_log_paths() -> list[Path]:
paths: list[Path] = []
for source in AGENT_USAGE_SOURCES:
for pattern in source.get("paths", []):
paths.extend(Path().glob(os.path.expanduser(pattern)) if not pattern.startswith("~") else Path.home().glob(pattern[2:]))
return sorted({path for path in paths if path.is_file()})
def parse_csv(value: str | None) -> set[str]:
if not value:
return set()
return {item.strip() for item in value.split(",") if item.strip()}
def _resolve_since(days: int | None, since: str | None, now: datetime | None = None) -> datetime | None:
explicit_since = _parse_datetime(since)
if explicit_since is not None:
return explicit_since
if days is None:
return None
if days < 0:
raise ValueError("days must be zero or greater")
base = now or datetime.now(timezone.utc)
if base.tzinfo is None:
base = base.replace(tzinfo=timezone.utc)
else:
base = base.astimezone(timezone.utc)
return base - timedelta(days=days)
def build_parser() -> argparse.ArgumentParser:
parser = argparse.ArgumentParser(description="Suggest K-skill cleanup candidates from interviews and usage logs.")
parser.add_argument("--skills-root", default=".", help="Repository root containing root-level skill directories")
parser.add_argument("--usage-json", help="Optional JSON object mapping skill names to trigger counts")
parser.add_argument("--log", action="append", default=[], help="Agent log file to scan; repeatable")
parser.add_argument("--scan-default-logs", action="store_true", help="Best-effort scan known local agent log locations")
parser.add_argument("--never-use", default="", help="Comma-separated skills the user says they never use")
parser.add_argument("--keep", default="", help="Comma-separated skills to protect from suggestions")
parser.add_argument("--low-usage-threshold", type=int, default=1, help="Counts at or below this threshold are review candidates")
parser.add_argument("--days", type=int, help="Only count log records from the last N days; untimestamped lines use file mtime fallback")
parser.add_argument("--since", help="Only count log records on or after this ISO date/datetime; overrides --days")
return parser
def main(argv: list[str] | None = None) -> int:
args = build_parser().parse_args(argv)
skill_names = find_skill_dirs(args.skills_root)
usage_counts = {skill: 0 for skill in skill_names}
usage_counts.update(load_usage_json(args.usage_json))
log_paths = [Path(path) for path in args.log]
if args.scan_default_logs:
log_paths.extend(expand_default_log_paths())
since = _resolve_since(args.days, args.since)
log_counts = collect_skill_usage(log_paths, skill_names, since=since)
for skill, count in log_counts.items():
usage_counts[skill] = usage_counts.get(skill, 0) + count
report = {
"skill_count": len(skill_names),
"candidates": rank_cleanup_candidates(
skill_names=skill_names,
usage_counts=usage_counts,
never_use=parse_csv(args.never_use),
keep=parse_csv(args.keep),
low_usage_threshold=args.low_usage_threshold,
),
"agent_usage_sources": AGENT_USAGE_SOURCES,
"time_window": {
"since": since.isoformat() if since is not None else None,
"days": args.days if args.since is None else None,
"fallback": "Untimestamped log lines are included or skipped by log file mtime.",
},
"safety": "No files were deleted. Review candidates and remove skills in a separate explicit edit.",
}
print(json.dumps(report, ensure_ascii=False, indent=2))
return 0
if __name__ == "__main__":
raise SystemExit(main())

View file

@ -9,7 +9,7 @@
],
"scripts": {
"build": "npm run build --workspaces --if-present",
"lint": "node --check scripts/skill-docs.test.js scripts/korean_character_count.js scripts/test_korean_character_count.js && python3 -m py_compile scripts/k_skill_cleaner.py scripts/test_k_skill_cleaner.py scripts/fine_dust.py scripts/test_fine_dust.py scripts/ktx_booking.py scripts/test_ktx_booking.py scripts/sillok_search.py scripts/test_sillok_search.py scripts/korean_spell_check.py scripts/test_korean_spell_check.py scripts/patent_search.py scripts/test_patent_search.py scripts/mfds_drug_safety.py scripts/test_mfds_drug_safety.py scripts/mfds_food_safety.py scripts/test_mfds_food_safety.py scripts/zipcode_search.py scripts/test_zipcode_search.py scripts/subway_lost_property.py scripts/test_subway_lost_property.py scripts/geeknews_search.py scripts/test_geeknews_search.py scripts/test_naver_blog_search.py scripts/test_korean_slang_writing.py scripts/kakaotalk_mac.py scripts/test_kakaotalk_mac.py scripts/test_coupang_partners_mcp_wrapper.py coupang-product-search/scripts/coupang_partners_mcp.py kakaotalk-mac/scripts/kakaotalk_mac.py naver-blog-research/scripts/_naver_http.py naver-blog-research/scripts/naver_search.py naver-blog-research/scripts/naver_read.py naver-blog-research/scripts/naver_download_images.py korean-slang-writing/scripts/_slang_http.py korean-slang-writing/scripts/slang_search.py korean-slang-writing/scripts/slang_lookup.py korean-scholarship-search/scripts/scholarship_filter.py korean-scholarship-search/scripts/test_scholarship_filter.py korean-scholarship-search/scripts/university_search_plan.py && npm run lint --workspaces --if-present && ./scripts/validate-skills.sh",
"lint": "node --check scripts/skill-docs.test.js scripts/korean_character_count.js scripts/test_korean_character_count.js && python3 -m py_compile scripts/k_skill_cleaner.py scripts/test_k_skill_cleaner.py k-skill-cleaner/scripts/k_skill_cleaner.py scripts/fine_dust.py scripts/test_fine_dust.py scripts/ktx_booking.py scripts/test_ktx_booking.py scripts/sillok_search.py scripts/test_sillok_search.py scripts/korean_spell_check.py scripts/test_korean_spell_check.py scripts/patent_search.py scripts/test_patent_search.py scripts/mfds_drug_safety.py scripts/test_mfds_drug_safety.py scripts/mfds_food_safety.py scripts/test_mfds_food_safety.py scripts/zipcode_search.py scripts/test_zipcode_search.py scripts/subway_lost_property.py scripts/test_subway_lost_property.py scripts/geeknews_search.py scripts/test_geeknews_search.py scripts/test_naver_blog_search.py scripts/test_korean_slang_writing.py scripts/kakaotalk_mac.py scripts/test_kakaotalk_mac.py scripts/test_coupang_partners_mcp_wrapper.py coupang-product-search/scripts/coupang_partners_mcp.py kakaotalk-mac/scripts/kakaotalk_mac.py naver-blog-research/scripts/_naver_http.py naver-blog-research/scripts/naver_search.py naver-blog-research/scripts/naver_read.py naver-blog-research/scripts/naver_download_images.py korean-slang-writing/scripts/_slang_http.py korean-slang-writing/scripts/slang_search.py korean-slang-writing/scripts/slang_lookup.py korean-scholarship-search/scripts/scholarship_filter.py korean-scholarship-search/scripts/test_scholarship_filter.py korean-scholarship-search/scripts/university_search_plan.py && npm run lint --workspaces --if-present && ./scripts/validate-skills.sh",
"typecheck": "tsc --noEmit",
"test": "node --test scripts/skill-docs.test.js scripts/test_korean_character_count.js && PYTHONPATH=.:scripts python3 -m unittest scripts.test_k_skill_cleaner scripts.test_fine_dust scripts.test_ktx_booking scripts.test_sillok_search scripts.test_korean_spell_check scripts.test_patent_search scripts.test_mfds_drug_safety scripts.test_mfds_food_safety scripts.test_zipcode_search scripts.test_subway_lost_property scripts.test_geeknews_search scripts.test_naver_blog_search scripts.test_korean_slang_writing scripts.test_kakaotalk_mac scripts.test_coupang_partners_mcp_wrapper && PYTHONPATH=.:scripts:korean-scholarship-search/scripts python3 -m unittest discover -s korean-scholarship-search/scripts -p 'test_scholarship_filter.py' && npm run test --workspaces --if-present && ./scripts/validate-skills.sh",
"pack:dry-run": "npm pack --workspace k-lotto --dry-run && npm pack --workspace daiso-product-search --dry-run && npm pack --workspace market-kurly-search --dry-run && npm pack --workspace blue-ribbon-nearby --dry-run && npm pack --workspace kakao-bar-nearby --dry-run && npm pack --workspace cheap-gas-nearby --dry-run && npm pack --workspace public-restroom-nearby --dry-run && npm pack --workspace parking-lot-search --dry-run && npm pack --workspace kbl-results --dry-run && npm pack --workspace kleague-results --dry-run && npm pack --workspace lck-analytics --dry-run && npm pack --workspace toss-securities --dry-run && npm pack --workspace hipass-receipt --dry-run && npm pack --workspace used-car-price-search --dry-run && npm pack --workspace k-skill-rhwp --dry-run",

View file

@ -1,262 +1,43 @@
#!/usr/bin/env python3
"""Utilities for the k-skill-cleaner skill.
"""Compatibility wrapper for the k-skill-cleaner skill-local helper.
The helper intentionally stays dependency-free: it scans root-level skill
folders, best-effort local agent logs, and optional interview choices to produce
a conservative cleanup shortlist. It never deletes files by itself.
The standalone skill install includes ``k-skill-cleaner/scripts/k_skill_cleaner.py``.
This repository-root wrapper preserves existing checkout workflows and tests while
keeping the executable payload inside the skill directory.
"""
from __future__ import annotations
import argparse
import json
import os
import re
from collections.abc import Iterable, Mapping
import importlib.util
from pathlib import Path
from typing import Any
_HELPER_PATH = Path(__file__).resolve().parents[1] / "k-skill-cleaner" / "scripts" / "k_skill_cleaner.py"
_SPEC = importlib.util.spec_from_file_location("_k_skill_cleaner_impl", _HELPER_PATH)
if _SPEC is None or _SPEC.loader is None: # pragma: no cover - importlib defensive guard
raise ImportError(f"Unable to load k-skill-cleaner helper from {_HELPER_PATH}")
_MODULE = importlib.util.module_from_spec(_SPEC)
_SPEC.loader.exec_module(_MODULE)
EXCLUDED_ROOT_DIRS = {
".changeset",
".claude",
".codex",
".cursor",
".git",
".github",
".omx",
".ouroboros",
".vscode",
"docs",
"examples",
"node_modules",
"packages",
"python-packages",
"scripts",
}
AGENT_USAGE_SOURCES = _MODULE.AGENT_USAGE_SOURCES
collect_skill_usage = _MODULE.collect_skill_usage
find_skill_dirs = _MODULE.find_skill_dirs
rank_cleanup_candidates = _MODULE.rank_cleanup_candidates
load_usage_json = _MODULE.load_usage_json
expand_default_log_paths = _MODULE.expand_default_log_paths
parse_csv = _MODULE.parse_csv
build_parser = _MODULE.build_parser
main = _MODULE.main
AGENT_USAGE_SOURCES = [
{
"agent": "Claude Code",
"paths": ["~/.claude/projects/**/*.jsonl", "~/.claude/transcripts/**/*.jsonl"],
"method": "Scan JSONL transcript lines for skill-trigger events, $skill mentions, and SKILL.md load markers.",
"confidence": "best-effort",
},
{
"agent": "Codex",
"paths": ["~/.codex/sessions/**/*.jsonl", "~/.codex/log/**/*.log", ".omx/logs/**/*.log"],
"method": "Scan Codex session/log lines for routed skill names, $skill invocations, and SKILL.md reads.",
"confidence": "best-effort",
},
{
"agent": "OpenCode",
"paths": ["~/.local/share/opencode/**/*.jsonl", "~/.config/opencode/**/*.jsonl"],
"method": "Scan OpenCode data/config logs when available; ask for an exported transcript otherwise.",
"confidence": "best-effort",
},
{
"agent": "OpenClaw/ClawHub",
"paths": ["~/.openclaw/**/*.jsonl", "~/.clawhub/**/*.jsonl"],
"method": "No stable public trigger-count schema is assumed; use local logs if present or imported JSON counts.",
"confidence": "manual-confirm",
"fallback": "Ask the user to export trigger stats or provide a usage JSON file.",
},
{
"agent": "Hermes Agent",
"paths": ["~/.hermes/**/*.jsonl", "~/.config/hermes/**/*.jsonl"],
"method": "No stable public trigger-count schema is assumed; use local logs if present or imported JSON counts.",
"confidence": "manual-confirm",
"fallback": "Ask the user to export trigger stats or provide a usage JSON file.",
},
__all__ = [
"AGENT_USAGE_SOURCES",
"collect_skill_usage",
"find_skill_dirs",
"rank_cleanup_candidates",
"load_usage_json",
"expand_default_log_paths",
"parse_csv",
"build_parser",
"main",
]
def find_skill_dirs(root: Path | str) -> list[str]:
"""Return root-level directories that look like installable skills."""
root_path = Path(root)
skills: list[str] = []
for child in root_path.iterdir():
if not child.is_dir() or child.name in EXCLUDED_ROOT_DIRS:
continue
if (child / "SKILL.md").is_file():
skills.append(child.name)
return sorted(skills)
def _walk_strings(value: Any, key_hint: str | None = None) -> Iterable[tuple[str | None, str]]:
if isinstance(value, str):
yield key_hint, value
elif isinstance(value, Mapping):
for key, child in value.items():
yield from _walk_strings(child, str(key))
elif isinstance(value, list):
for child in value:
yield from _walk_strings(child, key_hint)
def _line_mentions_skill(line: str, skill: str) -> bool:
escaped = re.escape(skill)
patterns = [
rf"(?<![\w-])\${escaped}(?![\w-])",
rf"(?i)\bskill(?:[_ -]?name|[_ -]?id)?\s*[:=]\s*['\"]?{escaped}(?![\w-])",
rf"(?<![\w-]){escaped}/SKILL\.md\b",
rf"(?i)\bloaded skill\s*[:=]?\s*['\"]?{escaped}(?![\w-])",
rf"(?i)\busing\s+\${escaped}(?![\w-])",
]
return any(re.search(pattern, line) for pattern in patterns)
def _json_mentions_skill(record: Any, skill: str) -> bool:
key_names = {"skill", "skillname", "skill_name", "skillid", "skill_id", "name"}
for key, value in _walk_strings(record):
normalized_key = (key or "").replace("-", "").replace("_", "").lower()
if normalized_key in key_names and value == skill:
return True
if _line_mentions_skill(value, skill):
return True
return False
def collect_skill_usage(log_paths: Iterable[Path | str], skill_names: Iterable[str]) -> dict[str, int]:
"""Best-effort count of skill trigger mentions across local agent logs."""
skills = sorted(set(skill_names))
counts = {skill: 0 for skill in skills}
for raw_path in log_paths:
path = Path(raw_path).expanduser()
if not path.is_file():
continue
for line in path.read_text(encoding="utf-8", errors="replace").splitlines():
parsed: Any | None = None
try:
parsed = json.loads(line)
except json.JSONDecodeError:
parsed = None
for skill in skills:
if (parsed is not None and _json_mentions_skill(parsed, skill)) or _line_mentions_skill(line, skill):
counts[skill] += 1
return counts
def load_usage_json(path: Path | str | None) -> dict[str, int]:
if path is None:
return {}
data = json.loads(Path(path).read_text(encoding="utf-8"))
if not isinstance(data, Mapping):
raise ValueError("usage JSON must be an object mapping skill names to counts")
counts: dict[str, int] = {}
for key, value in data.items():
try:
counts[str(key)] = int(value)
except (TypeError, ValueError) as exc:
raise ValueError(f"usage count for {key!r} must be an integer") from exc
return counts
def rank_cleanup_candidates(
skill_names: Iterable[str],
usage_counts: Mapping[str, int] | None = None,
never_use: Iterable[str] | None = None,
keep: Iterable[str] | None = None,
low_usage_threshold: int = 1,
) -> list[dict[str, Any]]:
"""Rank deletion/review candidates without touching the filesystem."""
counts = usage_counts or {}
never = set(never_use or [])
protected = set(keep or [])
candidates: list[dict[str, Any]] = []
for skill in sorted(set(skill_names)):
if skill in protected:
continue
count = int(counts.get(skill, 0))
reasons: list[str] = []
score = 0
action = "keep"
if skill in never:
reasons.append("interview_never_use")
score += 100
action = "remove"
if count == 0:
reasons.append("zero_triggers")
score += 50
elif count <= low_usage_threshold:
reasons.append("low_usage")
score += 20
if not reasons:
continue
if action != "remove":
action = "review"
candidates.append(
{
"skill": skill,
"action": action,
"trigger_count": count,
"score": score,
"reasons": reasons,
}
)
return sorted(candidates, key=lambda item: (-item["score"], item["skill"]))
def expand_default_log_paths() -> list[Path]:
paths: list[Path] = []
for source in AGENT_USAGE_SOURCES:
for pattern in source.get("paths", []):
paths.extend(Path().glob(os.path.expanduser(pattern)) if not pattern.startswith("~") else Path.home().glob(pattern[2:]))
return sorted({path for path in paths if path.is_file()})
def parse_csv(value: str | None) -> set[str]:
if not value:
return set()
return {item.strip() for item in value.split(",") if item.strip()}
def build_parser() -> argparse.ArgumentParser:
parser = argparse.ArgumentParser(description="Suggest K-skill cleanup candidates from interviews and usage logs.")
parser.add_argument("--skills-root", default=".", help="Repository root containing root-level skill directories")
parser.add_argument("--usage-json", help="Optional JSON object mapping skill names to trigger counts")
parser.add_argument("--log", action="append", default=[], help="Agent log file to scan; repeatable")
parser.add_argument("--scan-default-logs", action="store_true", help="Best-effort scan known local agent log locations")
parser.add_argument("--never-use", default="", help="Comma-separated skills the user says they never use")
parser.add_argument("--keep", default="", help="Comma-separated skills to protect from suggestions")
parser.add_argument("--low-usage-threshold", type=int, default=1, help="Counts at or below this threshold are review candidates")
return parser
def main(argv: list[str] | None = None) -> int:
args = build_parser().parse_args(argv)
skill_names = find_skill_dirs(args.skills_root)
usage_counts = {skill: 0 for skill in skill_names}
usage_counts.update(load_usage_json(args.usage_json))
log_paths = [Path(path) for path in args.log]
if args.scan_default_logs:
log_paths.extend(expand_default_log_paths())
log_counts = collect_skill_usage(log_paths, skill_names)
for skill, count in log_counts.items():
usage_counts[skill] = usage_counts.get(skill, 0) + count
report = {
"skill_count": len(skill_names),
"candidates": rank_cleanup_candidates(
skill_names=skill_names,
usage_counts=usage_counts,
never_use=parse_csv(args.never_use),
keep=parse_csv(args.keep),
low_usage_threshold=args.low_usage_threshold,
),
"agent_usage_sources": AGENT_USAGE_SOURCES,
"safety": "No files were deleted. Review candidates and remove skills in a separate explicit edit.",
}
print(json.dumps(report, ensure_ascii=False, indent=2))
return 0
if __name__ == "__main__":
raise SystemExit(main())

View file

@ -3348,8 +3348,10 @@ test("repository docs advertise the k-skill-cleaner skill and agent usage source
const install = read(path.join("docs", "install.md"));
const featureDocPath = path.join(repoRoot, "docs", "features", "k-skill-cleaner.md");
const skillPath = path.join(repoRoot, "k-skill-cleaner", "SKILL.md");
const skillLocalHelperPath = path.join(repoRoot, "k-skill-cleaner", "scripts", "k_skill_cleaner.py");
assert.ok(fs.existsSync(skillPath), "expected k-skill-cleaner/SKILL.md to exist");
assert.ok(fs.existsSync(skillLocalHelperPath), "expected k-skill-cleaner/scripts/k_skill_cleaner.py to be included in standalone skill installs");
assert.ok(fs.existsSync(featureDocPath), "expected docs/features/k-skill-cleaner.md to exist");
const skill = read(path.join("k-skill-cleaner", "SKILL.md"));
@ -3362,6 +3364,9 @@ test("repository docs advertise the k-skill-cleaner skill and agent usage source
assert.match(skill, /OpenClaw\/ClawHub/);
assert.match(skill, /Hermes Agent/);
assert.match(skill, /python3 scripts\/k_skill_cleaner\.py/);
assert.match(skill, /--days 90/);
assert.match(featureDoc, /k-skill-cleaner\/scripts\/k_skill_cleaner\.py/);
assert.match(featureDoc, /--days 90/);
assert.match(featureDoc, /인터뷰/);
assert.match(featureDoc, /트리거 횟수/);
assert.match(readme, /\| K-스킬 클리너 \| `k-skill-cleaner` \|/);

View file

@ -44,6 +44,44 @@ class KSkillCleanerTest(unittest.TestCase):
self.assertEqual(counts["korean-law-search"], 2)
self.assertEqual(counts["unused"], 0)
def test_collects_usage_with_since_window_and_mtime_fallback(self):
with tempfile.TemporaryDirectory() as tmp:
root = Path(tmp)
recent_log = root / "recent.jsonl"
recent_log.write_text(
"\n".join(
[
json.dumps({"timestamp": "2026-04-20T12:00:00+09:00", "skill": "kbo-results"}),
json.dumps({"timestamp": "2026-01-10T12:00:00+09:00", "skill": "korean-law-search"}),
"loaded skill: fallback-skill",
]
),
encoding="utf-8",
)
old_log = root / "old.log"
old_log.write_text("loaded skill: old-fallback", encoding="utf-8")
# Lines without parseable timestamps use file mtime as the fallback signal.
recent_mtime = 1_776_643_200 # 2026-04-24T00:00:00Z
old_mtime = 1_766_275_200 # 2025-12-20T00:00:00Z
recent_log.touch()
old_log.touch()
import os
os.utime(recent_log, (recent_mtime, recent_mtime))
os.utime(old_log, (old_mtime, old_mtime))
counts = collect_skill_usage(
[recent_log, old_log],
["kbo-results", "korean-law-search", "fallback-skill", "old-fallback"],
since="2026-04-01T00:00:00+09:00",
)
self.assertEqual(counts["kbo-results"], 1)
self.assertEqual(counts["korean-law-search"], 0)
self.assertEqual(counts["fallback-skill"], 1)
self.assertEqual(counts["old-fallback"], 0)
def test_ranks_deletion_candidates_with_interview_and_usage_reasons(self):
candidates = rank_cleanup_candidates(
skill_names=["unused", "rare", "protected", "active"],