5.4 KiB
k-skill-qa-bot
Automated QA daemon for the k-skill skill library. Runs every 3 days via macOS launchd, tests every suitable skill via codex exec --json --dangerously-bypass-approvals-and-sandbox, has a read-only/no-approval LLM judge grade pass/fail/skip, and files dedup'd GitHub issues for skills that have broken.
What it does
- Refreshes a shallow clone of
NomaDamas/k-skillmainevery 3 days. - Discovers every
<skill>/SKILL.md. - Classifies each skill (read-only / location / login / destructive / api-key / proxy-dependent / deprecated).
- Runs each suitable skill through
codex exec --json --dangerously-bypass-approvals-and-sandboxwith a smoke-test prompt synthesized from the skill's## When to usebullets. The daemon runs as a dedicated LaunchAgent with non-interactive approvals; avoiding the Codex sandbox prevents false DNS/network failures during skill smoke tests. - Judges the result via a second read-only/no-approval
codex execcall using the configured judge model and a strict JSON Schema. - Files dedup'd issues on
NomaDamas/k-skillfor true failures (withauto-qalabel). Skipped skills (deprecated, login-required, missing API key) never create issues.
The k-skill repo itself is never modified by the bot — it is read-only SSOT. Test prompts are synthesized from each SKILL.md.
Install
Prereqs (one-time):
brew install bats-core coreutils gh jq python@3
pip3 install pyyaml jsonschema pytest
codex --version # codex-cli >= 0.130
codex login # one-time
gh auth login # one-time, needs `repo` scope
Then:
cd /path/to/k-skill
bash tools/k-skill-qa-bot/install.sh
Re-run install.sh to upgrade — it is idempotent and preserves state/.
Configure
The default CREATE_ISSUES=false means the first run does NOT file any issues. After reviewing the first summary.md, opt in:
echo 'CREATE_ISSUES=true' >> ~/.local/share/k-skill-qa-bot/.env
Overridable variables (see config/defaults.sh):
| Var | Default | Meaning |
|---|---|---|
CREATE_ISSUES |
false |
File GH issues for failures |
CODEX_MODEL |
gpt-5.5 |
Model for skill exec |
JUDGE_MODEL |
gpt-5.5 |
Model for LLM judge |
CODEX_PROVIDER |
openai |
Codex model provider for skill exec and judge calls |
TIMEOUT_SECS |
180 |
Per-skill timeout |
JUDGE_TIMEOUT_SECS |
60 |
Per-judge timeout |
MAX_PARALLEL |
4 |
Concurrent skill tests |
LAST_RUN_MIN_AGE |
259200 |
Min seconds between runs (72h) |
GH_REPO |
NomaDamas/k-skill |
Where to file issues |
config/skill-overrides.yml controls per-skill force_skip and category overrides. Destructive booking flows (ktx-booking, srt-booking, catchtable-sniper, etc.) and session-required skills (kakaotalk-mac, hipass-receipt, toss-securities, iros-registry-automation) are force-skipped by default so the bot never abuses an account.
Logs and inspection
tail -f ~/Library/Logs/k-skill-qa-bot/stderr.log
cat ~/.local/share/k-skill-qa-bot/state/runs/$(ls -t ~/.local/share/k-skill-qa-bot/state/runs/ | head -1)/summary.md
The bot keeps the most recent 12 runs and purges older ones.
Force a run
~/.local/share/k-skill-qa-bot/bin/run-qa.sh --force
~/.local/share/k-skill-qa-bot/bin/run-qa.sh --force --only kbo-results
~/.local/share/k-skill-qa-bot/bin/run-qa.sh --force --dry-run # no issues regardless of CREATE_ISSUES
Uninstall
bash ~/.local/share/k-skill-qa-bot/uninstall.sh
bash ~/.local/share/k-skill-qa-bot/uninstall.sh --yes --purge --purge-logs
Safety
- Skill smoke tests use
--dangerously-bypass-approvals-and-sandboxbecause the Codex sandbox can block legitimate DNS/network lookups for public skill endpoints exercised by smoke tests. - A dedicated LaunchAgent is scheduling isolation only; it is not a separate OS user, container, or filesystem sandbox.
- The bot-managed clone is not write-protected from the unsandboxed smoke agent; treat it as mutable bot state and judge only against inputs whose provenance is understood.
- The LLM judge stays on the safer
-s read-onlypath withapproval_policy="never"; read-only/no-approval limits writes and approval prompts, but does not make the judge a no-tools or file-isolated model call. Treat transcript and skill Markdown as untrusted input. - 10 destructive/login-required skills are force-skipped before any codex call is issued.
- Deprecated skills (
~~name~~ ⚠️ 지원 중단in README) are detected and skipped. update-clone.shrefuses anyK_SKILL_CLONEoutsideK_QA_HOME/k-skill-cloneunlessALLOW_EXTERNAL_CLONE_TARGET=1(prevents the script from git-reset'ing the wrong directory).CREATE_ISSUES=falsefirst-run default prevents accidental issue spam.- Local state only:
~/.local/share/k-skill-qa-bot/. Expected network egress is limited to git fetch, codex API, gh API, k-skill-proxy health checks, and the public skill endpoints exercised by smoke tests.
Troubleshooting
codex: command not found→ check the plist'sEnvironmentVariables.PATH. Default is/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin.gh: not authenticated→ rungh auth loginwithreposcope.gtimeout: command not found→brew install coreutils.- LaunchAgent state via
launchctl print "gui/$(id -u)/org.nomadamas.k-skill-qa-bot" | head. - Force a re-run:
launchctl kickstart -k "gui/$(id -u)/org.nomadamas.k-skill-qa-bot".