mirror of
https://github.com/NomaDamas/k-skill.git
synced 2026-06-24 02:04:11 +00:00
Document the QA bot trust boundary precisely
Lock the PR #263 review follow-up with Bats coverage so future wording does not overstate judge isolation or network boundaries. Constraint: PR review requested documentation or isolation changes without changing the current split execution boundary. Rejected: Claiming the read-only judge only reads provided prompts | codex exec remains a tool-capable read-only agent. Confidence: high Scope-risk: narrow Directive: Do not describe LaunchAgent scheduling as OS, container, or filesystem isolation unless runtime isolation is actually added. Tested: bats tools/k-skill-qa-bot/test/bats/; shellcheck -e SC1091,SC2016,SC2012 tools/k-skill-qa-bot/bin/*.sh tools/k-skill-qa-bot/bin/lib/*.sh tools/k-skill-qa-bot/install.sh tools/k-skill-qa-bot/uninstall.sh; python3 -m py_compile tools/k-skill-qa-bot/bin/*.py tools/k-skill-qa-bot/bin/lib/*.py; git diff --check Not-tested: Live LaunchAgent run and real external skill endpoint smoke tests
This commit is contained in:
parent
e9b6f03812
commit
e7dbaacce9
3 changed files with 37 additions and 5 deletions
|
|
@ -18,6 +18,12 @@ After running `install.sh`, the runtime lives at `~/.local/share/k-skill-qa-bot/
|
|||
|
||||
The k-skill repository itself is **never modified** by the bot — it is read-only SSOT. Test prompts are synthesized from each `SKILL.md`.
|
||||
|
||||
## Trust-boundary notes
|
||||
|
||||
- Smoke tests intentionally run unsandboxed and may contact public skill endpoints, plus git, Codex, GitHub, and k-skill-proxy health-check endpoints.
|
||||
- A dedicated LaunchAgent is scheduling isolation only; it is not a separate OS user, container, or filesystem sandbox.
|
||||
- The judge uses read-only/no-approval Codex settings, but is still a tool-capable Codex agent over untrusted transcripts and skill Markdown. Do not describe it as a no-tools or file-isolated model call unless the implementation changes to enforce that boundary.
|
||||
|
||||
## Design rules
|
||||
|
||||
- **SSOT**: All test prompts and skill metadata come from `SKILL.md` files in the bot's own shallow clone of `NomaDamas/k-skill` `main`. The k-skill repo gets no QA-bot-specific edits.
|
||||
|
|
|
|||
|
|
@ -1,6 +1,6 @@
|
|||
# k-skill-qa-bot
|
||||
|
||||
Automated QA daemon for the **k-skill** skill library. Runs every 3 days via macOS launchd, tests every skill via `codex exec --json --dangerously-bypass-approvals-and-sandbox`, has a read-only LLM judge grade pass/fail/skip, and files dedup'd GitHub issues for skills that have broken.
|
||||
Automated QA daemon for the **k-skill** skill library. Runs every 3 days via macOS launchd, tests every suitable skill via `codex exec --json --dangerously-bypass-approvals-and-sandbox`, has a read-only/no-approval LLM judge grade pass/fail/skip, and files dedup'd GitHub issues for skills that have broken.
|
||||
|
||||
## What it does
|
||||
|
||||
|
|
@ -8,7 +8,7 @@ Automated QA daemon for the **k-skill** skill library. Runs every 3 days via mac
|
|||
2. **Discovers** every `<skill>/SKILL.md`.
|
||||
3. **Classifies** each skill (read-only / location / login / destructive / api-key / proxy-dependent / deprecated).
|
||||
4. **Runs** each suitable skill through `codex exec --json --dangerously-bypass-approvals-and-sandbox` with a smoke-test prompt synthesized from the skill's `## When to use` bullets. The daemon runs as a dedicated LaunchAgent with non-interactive approvals; avoiding the Codex sandbox prevents false DNS/network failures during skill smoke tests.
|
||||
5. **Judges** the result via a second read-only `codex exec` call using the configured judge model and a strict JSON Schema.
|
||||
5. **Judges** the result via a second read-only/no-approval `codex exec` call using the configured judge model and a strict JSON Schema.
|
||||
6. **Files** dedup'd issues on `NomaDamas/k-skill` for true failures (with `auto-qa` label). Skipped skills (deprecated, login-required, missing API key) never create issues.
|
||||
|
||||
The k-skill repo itself is **never modified** by the bot — it is read-only SSOT. Test prompts are synthesized from each `SKILL.md`.
|
||||
|
|
@ -86,13 +86,14 @@ bash ~/.local/share/k-skill-qa-bot/uninstall.sh --yes --purge --purge-logs
|
|||
|
||||
## Safety
|
||||
|
||||
- Skill smoke tests use `--dangerously-bypass-approvals-and-sandbox` because this bot runs unattended as a dedicated LaunchAgent and the Codex sandbox can block legitimate DNS/network lookups.
|
||||
- The LLM judge stays on the safer `-s read-only` path with `approval_policy="never"`; it only reads transcripts/prompts and emits JSON.
|
||||
- Skill smoke tests use `--dangerously-bypass-approvals-and-sandbox` because the Codex sandbox can block legitimate DNS/network lookups for public skill endpoints exercised by smoke tests.
|
||||
- A dedicated LaunchAgent is scheduling isolation only; it is not a separate OS user, container, or filesystem sandbox.
|
||||
- The LLM judge stays on the safer `-s read-only` path with `approval_policy="never"`; read-only/no-approval limits writes and approval prompts, but does not make the judge a no-tools or file-isolated model call. Treat transcript and skill Markdown as untrusted input.
|
||||
- 10 destructive/login-required skills are force-skipped before any codex call is issued.
|
||||
- Deprecated skills (`~~name~~ ⚠️ 지원 중단` in README) are detected and skipped.
|
||||
- `update-clone.sh` refuses any `K_SKILL_CLONE` outside `K_QA_HOME/k-skill-clone` unless `ALLOW_EXTERNAL_CLONE_TARGET=1` (prevents the script from git-reset'ing the wrong directory).
|
||||
- `CREATE_ISSUES=false` first-run default prevents accidental issue spam.
|
||||
- Local state only: `~/.local/share/k-skill-qa-bot/`. No network egress except git fetch, codex API, gh API, k-skill-proxy health check.
|
||||
- Local state only: `~/.local/share/k-skill-qa-bot/`. Expected network egress is limited to git fetch, codex API, gh API, k-skill-proxy health checks, and the public skill endpoints exercised by smoke tests.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
|
|
|
|||
25
tools/k-skill-qa-bot/test/bats/docs_trust_boundary.bats
Normal file
25
tools/k-skill-qa-bot/test/bats/docs_trust_boundary.bats
Normal file
|
|
@ -0,0 +1,25 @@
|
|||
#!/usr/bin/env bats
|
||||
|
||||
setup() {
|
||||
QA_BOT_ROOT="$(cd "$BATS_TEST_DIRNAME/../.." && pwd)"
|
||||
README="$QA_BOT_ROOT/README.md"
|
||||
AGENTS="$QA_BOT_ROOT/AGENTS.md"
|
||||
}
|
||||
|
||||
@test "README accurately documents judge trust boundary" {
|
||||
run grep -F 'it only reads transcripts/prompts and emits JSON' "$README"
|
||||
[ "$status" -ne 0 ]
|
||||
|
||||
grep -Fq 'read-only/no-approval limits writes and approval prompts, but does not make the judge a no-tools or file-isolated model call' "$README"
|
||||
grep -Fq 'Treat transcript and skill Markdown as untrusted input' "$README"
|
||||
}
|
||||
|
||||
@test "README accurately documents smoke-test egress and LaunchAgent boundary" {
|
||||
grep -Fq 'public skill endpoints exercised by smoke tests' "$README"
|
||||
grep -Fq 'A dedicated LaunchAgent is scheduling isolation only; it is not a separate OS user, container, or filesystem sandbox' "$README"
|
||||
}
|
||||
|
||||
@test "QA-bot AGENTS guidance preserves split trust boundary" {
|
||||
grep -Fq 'Smoke tests intentionally run unsandboxed and may contact public skill endpoints' "$AGENTS"
|
||||
grep -Fq 'The judge uses read-only/no-approval Codex settings, but is still a tool-capable Codex agent over untrusted transcripts and skill Markdown' "$AGENTS"
|
||||
}
|
||||
Loading…
Add table
Add a link
Reference in a new issue