refactor: enhance glossary clarity and structure

Link skills into ~/.agents/skills as well as ~/.claude/skills
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 08:24:06 +00:00 · 2026-06-24 11:29:44 +01:00 · 2026-06-24 11:27:21 +01:00 · 2026-06-24 11:26:57 +01:00 · 2026-06-24 11:26:57 +01:00 · 2026-06-22 13:15:30 +01:00
94 changed files with 4265 additions and 838 deletions
--- a/.changeset/README.md
+++ b/.changeset/README.md
@ -0,0 +1,8 @@
+# Changesets
+
+Hello and welcome! This folder has been automatically generated by `@changesets/cli`, a build tool that works
+with multi-package repos, or single-package repos to help you version and publish your code. You can
+find the full documentation for it [in our repository](https://github.com/changesets/changesets).
+
+We have a quick list of common questions to get you started engaging with this project in
+[our documentation](https://github.com/changesets/changesets/blob/main/docs/common-questions.md).
--- a/.changeset/config.json
+++ b/.changeset/config.json
@ -0,0 +1,15 @@
+{
+  "$schema": "https://unpkg.com/@changesets/config@3.1.4/schema.json",
+  "changelog": [
+    "@changesets/changelog-github",
+    { "repo": "mattpocock/skills" }
+  ],
+  "commit": false,
+  "privatePackages": { "version": true, "tag": true },
+  "fixed": [],
+  "linked": [],
+  "access": "restricted",
+  "baseBranch": "main",
+  "updateInternalDependencies": "patch",
+  "ignore": []
+}
--- a/.changeset/triage-pull-requests.md
+++ b/.changeset/triage-pull-requests.md
@ -0,0 +1,5 @@
+---
+"mattpocock-skills": patch
+---
+
+Extend the **`triage`** skill to triage external pull requests, treating a PR as an issue with attached code that runs through the same roles and state machine. PRs flow inline alongside issues (gated by a per-repo setup toggle), discovery surfaces only external PRs, the bug-only "reproduce" step is generalized into a single "verify the claim" step, and a redundancy check resolves already-implemented requests to `wontfix` without polluting the out-of-scope knowledge base. `setup-matt-pocock-skills` gains the PRs-as-a-request-surface toggle for GitHub/GitLab.
--- a/.claude-plugin/plugin.json
+++ b/.claude-plugin/plugin.json
@ -0,0 +1,22 @@
+{
+  "name": "mattpocock-skills",
+  "skills": [
+    "./skills/engineering/ask-matt",
+    "./skills/engineering/diagnosing-bugs",
+    "./skills/engineering/grill-with-docs",
+    "./skills/engineering/triage",
+    "./skills/engineering/improve-codebase-architecture",
+    "./skills/engineering/setup-matt-pocock-skills",
+    "./skills/engineering/tdd",
+    "./skills/engineering/to-issues",
+    "./skills/engineering/to-prd",
+    "./skills/engineering/prototype",
+    "./skills/engineering/domain-modeling",
+    "./skills/engineering/codebase-design",
+    "./skills/productivity/grill-me",
+    "./skills/productivity/grilling",
+    "./skills/productivity/handoff",
+    "./skills/productivity/teach",
+    "./skills/productivity/writing-great-skills"
+  ]
+}
--- a/.github/workflows/release.yml
+++ b/.github/workflows/release.yml
@ -0,0 +1,37 @@
+name: Release
+
+on:
+  push:
+    branches:
+      - main
+
+concurrency: ${{ github.workflow }}-${{ github.ref }}
+
+jobs:
+  release:
+    name: Version
+    runs-on: ubuntu-latest
+    permissions:
+      contents: write
+      pull-requests: write
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v4
+
+      - name: Setup Node.js
+        uses: actions/setup-node@v4
+        with:
+          node-version: 22
+
+      - name: Install dependencies
+        run: npm ci
+
+      - name: Create Version Pull Request
+        uses: changesets/action@v1
+        with:
+          version: npx changeset version
+          publish: npx changeset tag
+          commit: "chore: version skills"
+          title: "chore: version skills"
+        env:
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1 @@
+node_modules
--- a/.out-of-scope/mainstream-issue-trackers-only.md
+++ b/.out-of-scope/mainstream-issue-trackers-only.md
@ -0,0 +1,25 @@
+# Issue tracker integrations are limited to mainstream tools
+
+`setup-matt-pocock-skills` only offers first-class support for **mainstream** issue trackers. Requests to add support for niche, new, or single-vendor experimental trackers are out of scope.
+
+## Why this is out of scope
+
+Every issue-tracker backend hard-codes a CLI shape into the skills (commands, flags, output parsing). Each new backend is permanent maintenance surface — it has to keep working as the tool's CLI evolves, and it has to keep being tested against `/to-prd`, `/to-issues`, `/triage`, and friends. That cost is only worth paying for trackers a meaningful fraction of users actually have.
+
+"Mainstream" is a judgment call, not a numeric bar:
+
+- GitHub, GitLab, and Backlog.md are the kind of tools we'd consider mainstream — broadly known, widely used, well past the experimental phase.
+- A brand-new agent-focused tool with a few hundred GitHub stars is not, no matter how interesting the design.
+
+Stars, age, and download counts are useful signals when making the call but none of them is the rule. The rule is: would a typical engineer recognise this tool and have plausibly chosen it for their team?
+
+The escape hatches for non-mainstream trackers already exist:
+
+- `local markdown` for lightweight in-repo tracking.
+- `other/custom` for users who want to wire something up themselves.
+
+Neither requires the core skills to know about the specific tool.
+
+## Prior requests
+
+- #99 — "Add dex as an issue tracker backend" (dex was ~3 months old and ~300 stars at the time of the request)
--- a/.out-of-scope/question-limits.md
+++ b/.out-of-scope/question-limits.md
@ -0,0 +1,18 @@
+# Hard limits on the number of questions during grilling
+
+The `/grill-me` skill (and grilling sessions inside other skills) does not enforce a maximum number of questions. Requests to add a configurable cap or hard ceiling are out of scope.
+
+## Why this is out of scope
+
+Grilling is intentionally open-ended. The point is to keep digging until each branch of the decision tree is resolved — some plans need three questions, some need fifty. A fixed cap would either cut off useful exploration on hard problems or feel arbitrary on easy ones.
+
+If a session feels too long, the right escape hatches already exist:
+
+- The user can stop the session at any time and accept the current state of the plan.
+- The user can tell the model to wrap up, summarise, and move on — natural-language steering is the intended control surface, not a numeric limit.
+
+Adding a hard cap would also conflate two different failure modes: a model that asks too many questions because the plan is genuinely under-specified (working as intended) vs. a model that asks redundant or low-value questions (a prompt-quality issue, not a quantity issue). The fix for the latter belongs in the skill prompt, not in a counter.
+
+## Prior requests
+
+- #44 — "Codex just asked me 200 questions"
--- a/.out-of-scope/setup-skill-verify-mode.md
+++ b/.out-of-scope/setup-skill-verify-mode.md
@ -0,0 +1,15 @@
+# Verify/Check Mode for `setup-matt-pocock-skills`
+
+This project will not add a dedicated verify/check mode (or a separate verify skill) for `setup-matt-pocock-skills`.
+
+## Why this is out of scope
+
+A second skill — or a `--verify` flag — for checking whether `docs/agents/*.md` artifacts still match the seed-template schema would duplicate work the existing setup skill already handles in conversation.
+
+The intended workflow is: **run `/setup-matt-pocock-skills` and tell it to verify your current setup.** The skill is prompt-driven, so the maintainer can scope it to a verification pass ("don't rewrite anything, just check my existing files against the current seed templates and report drift") without needing a separate code path. Adding a flag or a sibling skill would split the surface area of a feature that's already expressible through the natural-language entry point.
+
+Keeping configuration management to a single skill also avoids the maintenance cost of two skills drifting from each other when seed templates evolve.
+
+## Prior requests
+
+- #106 — Feature request: verify/check mode for setup-matt-pocock-skills
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -0,0 +1,54 @@
+# mattpocock-skills
+
+## 1.0.1
+
+### Patch Changes
+
+- [`d20ee26`](https://github.com/mattpocock/skills/commit/d20ee2684e2a9442698ac3c1e0f2c5b68c4cf296) Thanks [@mattpocock](https://github.com/mattpocock)! - Make the **`teach`** skill reuse-first. Lessons are now built from reusable **components** in `./assets/` — stylesheets, quiz widgets, simulators, diagram helpers. Reuse is the default: the agent reads `./assets/` before authoring a lesson, builds from what's there, and extracts anything new and reusable into a component rather than inlining it.
+
+## 1.0.0
+
+### Major Changes
+
+- [`47bde84`](https://github.com/mattpocock/skills/commit/47bde84da032afb2e5058f997f3bbca47d321dbd) Thanks [@mattpocock](https://github.com/mattpocock)! - Add the **`ask-matt`** skill — a user-invoked router that points you at the right skill or flow for your situation.
+
+  **Breaking:** `ask-matt` routes over the other user-invoked skills in this repo, so it expects them to be installed.
+
+- [`47bde84`](https://github.com/mattpocock/skills/commit/47bde84da032afb2e5058f997f3bbca47d321dbd) Thanks [@mattpocock](https://github.com/mattpocock)! - Add the shared design skills and rewire existing skills onto them.
+
+  - New **`codebase-design`** skill — the deep-module vocabulary (module, interface, depth, seam, adapter) and the principles for putting a lot of behaviour behind a small interface. The language that previously lived in `improve-codebase-architecture/LANGUAGE.md` now lives here, generalized for reuse across skills.
+  - New **`domain-modeling`** skill — actively build and sharpen a project's domain model, stress-testing terms against the glossary and keeping `CONTEXT.md` and ADRs current.
+  - `improve-codebase-architecture` now draws its architecture vocabulary from `/codebase-design` and its domain model from `/domain-modeling`.
+  - `tdd` now leans on `/codebase-design` for interface-design guidance — its inline `deep-modules.md` / `interface-design.md` notes were removed in favour of the shared skill.
+  - `grill-with-docs` now builds the domain model inline via `/domain-modeling`.
+
+  **Breaking:** these skills now depend on the new `codebase-design` / `domain-modeling` skills, so you must install them too.
+
+- [`47bde84`](https://github.com/mattpocock/skills/commit/47bde84da032afb2e5058f997f3bbca47d321dbd) Thanks [@mattpocock](https://github.com/mattpocock)! - Remove the **`caveman`** and **`zoom-out`** skills.
+
+  - `caveman` was a duplicate of another skill I was testing and was never meant to be public.
+  - `zoom-out` went unused in practice, so it's been removed from the repo.
+
+  **Breaking:** both skills have been removed.
+
+- [`47bde84`](https://github.com/mattpocock/skills/commit/47bde84da032afb2e5058f997f3bbca47d321dbd) Thanks [@mattpocock](https://github.com/mattpocock)! - Rename the **`diagnose`** skill to **`diagnosing-bugs`**.
+
+  **Breaking:** invoke it as `/diagnosing-bugs` — the old `/diagnose` name no longer exists.
+
+- [`47bde84`](https://github.com/mattpocock/skills/commit/47bde84da032afb2e5058f997f3bbca47d321dbd) Thanks [@mattpocock](https://github.com/mattpocock)! - Replace **`write-a-skill`** with **`writing-great-skills`**.
+
+  - Removed `write-a-skill`.
+  - Added `writing-great-skills` (plus its `GLOSSARY.md`) — a reference for writing and editing skills well: the vocabulary and principles that make a skill predictable, hunting no-ops down to the sentence level.
+  - Exposed `grilling` as a model-invoked skill — the reusable interview loop behind `grill-me` and `grill-with-docs`.
+
+  **Breaking:** `write-a-skill` has been removed; use `writing-great-skills` instead.
+
+### Minor Changes
+
+- [`47bde84`](https://github.com/mattpocock/skills/commit/47bde84da032afb2e5058f997f3bbca47d321dbd) Thanks [@mattpocock](https://github.com/mattpocock)! - Add the **`resolving-merge-conflicts`** skill — a loop for resolving an in-progress git merge or rebase conflict. Standalone, with no dependencies on other skills.
+
+- [`47bde84`](https://github.com/mattpocock/skills/commit/47bde84da032afb2e5058f997f3bbca47d321dbd) Thanks [@mattpocock](https://github.com/mattpocock)! - Rename the skill taxonomy from **Commands / Skills** to **User-invoked / Model-invoked** across the docs, and add `docs/invocation.md` defining the split: user-invoked skills are reachable only when you type them and exist to orchestrate; model-invoked skills can also be reached automatically when the task fits. A user-invoked skill may invoke model-invoked skills, but never another user-invoked one.
+
+### Patch Changes
+
+- [`47bde84`](https://github.com/mattpocock/skills/commit/47bde84da032afb2e5058f997f3bbca47d321dbd) Thanks [@mattpocock](https://github.com/mattpocock)! - Tighten the **`review`** skill: fail-fast ref check, single-sourced rules, and no-op cuts.
--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -0,0 +1,16 @@
+Skills are organized into bucket folders under `skills/`:
+
+- `engineering/` — daily code work
+- `productivity/` — daily non-code workflow tools
+- `misc/` — kept around but rarely used
+- `personal/` — tied to my own setup, not promoted
+- `in-progress/` — drafts not yet ready to ship
+- `deprecated/` — no longer used
+
+Every skill in `engineering/`, `productivity/`, or `misc/` must have a reference in the top-level `README.md` and an entry in `.claude-plugin/plugin.json`. Skills in `personal/`, `in-progress/`, and `deprecated/` must not appear in either.
+
+Each skill entry in the top-level `README.md` must link the skill name to its `SKILL.md`.
+
+Each bucket folder has a `README.md` that lists every skill in the bucket with a one-line description, with the skill name linked to its `SKILL.md`. Bucket `README.md`s and the top-level `README.md` group entries into **User-invoked** and **Model-invoked**.
+
+Every `SKILL.md` is either user-invoked (`disable-model-invocation: true`, reachable only by the human) or model-invoked (model- or user-reachable). For the full definitions, description conventions, and why a user-invoked skill can invoke model-invoked skills but never another user-invoked one, see [docs/invocation.md](./docs/invocation.md).
--- a/CONTEXT.md
+++ b/CONTEXT.md
@ -0,0 +1,26 @@
+# Matt Pocock Skills
+
+A collection of agent skills (slash commands and behaviors) loaded by Claude Code. Skills are organized into buckets and consumed by per-repo configuration emitted by `/setup-matt-pocock-skills`.
+
+## Language
+
+**Issue tracker**:
+The tool that hosts a repo's issues — GitHub Issues, Linear, a local `.scratch/` markdown convention, or similar. Skills like `to-issues`, `to-prd`, `triage`, and `qa` read from and write to it.
+_Avoid_: backlog manager, backlog backend, issue host
+
+**Issue**:
+A single tracked unit of work inside an **Issue tracker** — a bug, task, PRD, or slice produced by `to-issues`.
+_Avoid_: ticket (use only when quoting external systems that call them tickets)
+
+**Triage role**:
+A canonical state-machine label applied to an **Issue** during triage (e.g. `needs-triage`, `ready-for-afk`). Each role maps to a real label string in the **Issue tracker** via `docs/agents/triage-labels.md`.
+
+## Relationships
+
+- An **Issue tracker** holds many **Issues**
+- An **Issue** carries one **Triage role** at a time
+
+## Flagged ambiguities
+
+- "backlog" was previously used to mean both the *tool* hosting issues and the *body of work* inside it — resolved: the tool is the **Issue tracker**; "backlog" is no longer used as a domain term.
+- "backlog backend" / "backlog manager" — resolved: collapsed into **Issue tracker**.
--- a/README.md
+++ b/README.md
@ -1,115 +1,190 @@
-# Agent Skills For Real Engineers
+<p>
+  <a href="https://www.aihero.dev/s/skills-newsletter">
+    <picture>
+      <source media="(prefers-color-scheme: dark)" srcset="https://res.cloudinary.com/total-typescript/image/upload/v1777382277/skills-repo-dark_2x.png">
+      <source media="(prefers-color-scheme: light)" srcset="https://res.cloudinary.com/total-typescript/image/upload/v1777382277/skill-repo-light_2x.png">
+      <img alt="Skills" src="https://res.cloudinary.com/total-typescript/image/upload/v1777382277/skill-repo-light_2x.png" width="369">
+    </picture>
+  </a>
+</p>
+
+# Skills For Real Engineers
+
+[![skills.sh](https://skills.sh/b/mattpocock/skills)](https://skills.sh/mattpocock/skills)

 My agent skills that I use every day to do real engineering - not vibe coding.

+Developing real applications is hard. Approaches like GSD, BMAD, and Spec-Kit try to help by owning the process. But while doing so, they take away your control and make bugs in the process hard to resolve.
+
+These skills are designed to be small, easy to adapt, and composable. They work with any model. They're based on decades of engineering experience. Hack around with them. Make them your own. Enjoy.
+
 If you want to keep up with changes to these skills, and any new ones I create, you can join ~60,000 other devs on my newsletter:

 [Sign Up To The Newsletter](https://www.aihero.dev/s/skills-newsletter)

-## Planning & Design
+## Quickstart (30-second setup)

-These skills help you think through problems before writing code.
+1. Run the skills.sh installer:

- **to-prd** — Turn the current conversation context into a PRD and submit it as a GitHub issue. No interview — just synthesizes what you've already discussed.
+```bash
+npx skills@latest add mattpocock/skills
+```

-  ```
-  npx skills@latest add mattpocock/skills/to-prd
-  ```
+2. Pick the skills you want, and which coding agents you want to install them on. **Make sure you select `/setup-matt-pocock-skills`**.

- **to-issues** — Break any plan, spec, or PRD into independently-grabbable GitHub issues using vertical slices.
+3. Run `/setup-matt-pocock-skills` in your agent. It will:
+   - Ask you which issue tracker you want to use (GitHub, Linear, or local files)
+   - Ask you what labels you apply to tickets when you triage them (`/triage` uses labels)
+   - Ask you where you want to save any docs we create

-  ```
-  npx skills@latest add mattpocock/skills/to-issues
-  ```
+4. Bam - you're ready to go.

- **grill-me** — Get relentlessly interviewed about a plan or design until every branch of the decision tree is resolved.
+## Why These Skills Exist

-  ```
-  npx skills@latest add mattpocock/skills/grill-me
-  ```
+I built these skills as a way to fix common failure modes I see with Claude Code, Codex, and other coding agents.

- **design-an-interface** — Generate multiple radically different interface designs for a module using parallel sub-agents.
+### #1: The Agent Didn't Do What I Want

-  ```
-  npx skills@latest add mattpocock/skills/design-an-interface
-  ```
+> "No-one knows exactly what they want"
+>
+> David Thomas & Andrew Hunt, [The Pragmatic Programmer](https://www.amazon.co.uk/Pragmatic-Programmer-Anniversary-Journey-Mastery/dp/B0833F1T3V)

- **request-refactor-plan** — Create a detailed refactor plan with tiny commits via user interview, then file it as a GitHub issue.
+**The Problem**. The most common failure mode in software development is misalignment. You think the dev knows what you want. Then you see what they've built - and you realize it didn't understand you at all.

-  ```
-  npx skills@latest add mattpocock/skills/request-refactor-plan
-  ```
+This is just the same in the AI age. There is a communication gap between you and the agent. The fix for this is a **grilling session** - getting the agent to ask you detailed questions about what you're building.

-## Development
+**The Fix** is to use:

-These skills help you write, refactor, and fix code.
+- [`/grill-me`](./skills/productivity/grill-me/SKILL.md) - for non-code uses
+- [`/grill-with-docs`](./skills/engineering/grill-with-docs/SKILL.md) - same as [`/grill-me`](./skills/productivity/grill-me/SKILL.md), but adds more goodies (see below)

- **tdd** — Test-driven development with a red-green-refactor loop. Builds features or fixes bugs one vertical slice at a time.
+These are my most popular skills. They help you align with the agent before you get started, and think deeply about the change you're making. Use them _every_ time you want to make a change.

-  ```
-  npx skills@latest add mattpocock/skills/tdd
-  ```
+### #2: The Agent Is Way Too Verbose

- **triage-issue** — Investigate a bug by exploring the codebase, identify the root cause, and file a GitHub issue with a TDD-based fix plan.
+> With a ubiquitous language, conversations among developers and expressions of the code are all derived from the same domain model.
+>
+> Eric Evans, [Domain-Driven-Design](https://www.amazon.co.uk/Domain-Driven-Design-Tackling-Complexity-Software/dp/0321125215)

-  ```
-  npx skills@latest add mattpocock/skills/triage-issue
-  ```
+**The Problem**: At the start of a project, devs and the people they're building the software for (the domain experts) are usually speaking different languages.

- **improve-codebase-architecture** — Find deepening opportunities in a codebase, informed by the domain language in `CONTEXT.md` and the decisions in `docs/adr/`.
+I felt the same tension with my agents. Agents are usually dropped into a project and asked to figure out the jargon as they go. So they use 20 words where 1 will do.

-  ```
-  npx skills@latest add mattpocock/skills/improve-codebase-architecture
-  ```
+**The Fix** for this is a shared language. It's a document that helps agents decode the jargon used in the project.

- **migrate-to-shoehorn** — Migrate test files from `as` type assertions to @total-typescript/shoehorn.
+<details>
+<summary>
+Example
+</summary>

-  ```
-  npx skills@latest add mattpocock/skills/migrate-to-shoehorn
-  ```
+Here's an example [`CONTEXT.md`](https://github.com/mattpocock/course-video-manager/blob/076a5a7a182db0fe1e62971dd7a68bcadf010f1c/CONTEXT.md), from my `course-video-manager` repo. Which one is easier to read?

- **scaffold-exercises** — Create exercise directory structures with sections, problems, solutions, and explainers.
+- **BEFORE**: "There's a problem when a lesson inside a section of a course is made 'real' (i.e. given a spot in the file system)"
+- **AFTER**: "There's a problem with the materialization cascade"

-  ```
-  npx skills@latest add mattpocock/skills/scaffold-exercises
-  ```
+This concision pays off session after session.

-## Tooling & Setup
+</details>

- **setup-pre-commit** — Set up Husky pre-commit hooks with lint-staged, Prettier, type checking, and tests.
+This is built into [`/grill-with-docs`](./skills/engineering/grill-with-docs/SKILL.md). It's a grilling session, but that helps you build a shared language with the AI, and document hard-to-explain decisions in ADR's.

-  ```
-  npx skills@latest add mattpocock/skills/setup-pre-commit
-  ```
+It's hard to explain how powerful this is. It might be the single coolest technique in this repo. Try it, and see.

- **git-guardrails-claude-code** — Set up Claude Code hooks to block dangerous git commands (push, reset --hard, clean, etc.) before they execute.
+> [!TIP]
+> A shared language has many other benefits than reducing verbosity:
+>
+> - **Variables, functions and files are named consistently**, using the shared language
+> - As a result, the **codebase is easier to navigate** for the agent
+> - The agent also **spends fewer tokens on thinking**, because it has access to a more concise language

-  ```
-  npx skills@latest add mattpocock/skills/git-guardrails-claude-code
-  ```
+### #3: The Code Doesn't Work

-## Writing & Knowledge
+> "Always take small, deliberate steps. The rate of feedback is your speed limit. Never take on a task that’s too big."
+>
+> David Thomas & Andrew Hunt, [The Pragmatic Programmer](https://www.amazon.co.uk/Pragmatic-Programmer-Anniversary-Journey-Mastery/dp/B0833F1T3V)

- **write-a-skill** — Create new skills with proper structure, progressive disclosure, and bundled resources.
+**The Problem**: Let's say that you and the agent are aligned on what to build. What happens when the agent _still_ produces crap?

-  ```
-  npx skills@latest add mattpocock/skills/write-a-skill
-  ```
+It's time to look at your feedback loops. Without feedback on how the code it produces actually runs, the agent will be flying blind.

- **edit-article** — Edit and improve articles by restructuring sections, improving clarity, and tightening prose.
+**The Fix**: You need the usual tranche of feedback loops: static types, browser access, and automated tests.

-  ```
-  npx skills@latest add mattpocock/skills/edit-article
-  ```
+For automated tests, a red-green-refactor loop is critical. This is where the agent writes a failing test first, then fixes the test. This helps give the agent a consistent level of feedback that results in far better code.

- **ubiquitous-language** — Extract a DDD-style ubiquitous language glossary from the current conversation.
+I've built a **[`/tdd`](./skills/engineering/tdd/SKILL.md) skill** you can slot into any project. It encourages red-green-refactor and gives the agent plenty of guidance on what makes good and bad tests.

-  ```
-  npx skills@latest add mattpocock/skills/ubiquitous-language
-  ```
+For debugging, I've also built a **[`/diagnosing-bugs`](./skills/engineering/diagnosing-bugs/SKILL.md)** skill that wraps best debugging practices into a simple loop.

- **obsidian-vault** — Search, create, and manage notes in an Obsidian vault with wikilinks and index notes.
+### #4: We Built A Ball Of Mud

-  ```
-  npx skills@latest add mattpocock/skills/obsidian-vault
-  ```
+> "Invest in the design of the system _every day_."
+>
+> Kent Beck, [Extreme Programming Explained](https://www.amazon.co.uk/Extreme-Programming-Explained-Embrace-Change/dp/0321278658)
+
+> "The best modules are deep. They allow a lot of functionality to be accessed through a simple interface."
+>
+> John Ousterhout, [A Philosophy Of Software Design](https://www.amazon.co.uk/Philosophy-Software-Design-2nd/dp/173210221X)
+
+**The Problem**: Most apps built with agents are complex and hard to change. Because agents can radically speed up coding, they also accelerate software entropy. Codebases get more complex at an unprecedented rate.
+
+**The Fix** for this is a radical new approach to AI-powered development: caring about the design of the code.
+
+This is built in to every layer of these skills:
+
+- [`/to-prd`](./skills/engineering/to-prd/SKILL.md) quizzes you about which modules you're touching before creating a PRD
+
+And crucially, [`/improve-codebase-architecture`](./skills/engineering/improve-codebase-architecture/SKILL.md) helps you rescue a codebase that has become a ball of mud. I recommend running it on your codebase once every few days.
+
+### Summary
+
+Software engineering fundamentals matter more than ever. These skills are my best effort at condensing these fundamentals into repeatable practices, to help you ship the best apps of your career. Enjoy.
+
+## Reference
+
+These split on one axis — who can invoke them. **User-invoked** skills are reachable only when you type them (e.g. `/grill-me`); their job is to orchestrate. **Model-invoked** skills can be invoked by you _or_ reached for automatically by the agent when the task fits; they hold the reusable discipline. A user-invoked skill may invoke model-invoked skills, but never another user-invoked one.
+
+### Engineering
+
+Skills I use daily for code work.
+
+**User-invoked**
+
+- **[ask-matt](./skills/engineering/ask-matt/SKILL.md)** — Ask which skill or flow fits your situation. A router over the user-invoked skills in this repo.
+- **[grill-with-docs](./skills/engineering/grill-with-docs/SKILL.md)** — Grilling session that also builds your project's domain model, sharpening terminology and updating `CONTEXT.md` and ADRs inline.
+- **[triage](./skills/engineering/triage/SKILL.md)** — Move issues through a state machine of triage roles.
+- **[improve-codebase-architecture](./skills/engineering/improve-codebase-architecture/SKILL.md)** — Scan a codebase for deepening opportunities, present them as a visual HTML report, then grill through whichever one you pick.
+- **[setup-matt-pocock-skills](./skills/engineering/setup-matt-pocock-skills/SKILL.md)** — Configure this repo for the engineering skills (issue tracker, triage labels, domain doc layout). Run once per repo before using the other engineering skills.
+- **[to-issues](./skills/engineering/to-issues/SKILL.md)** — Break any plan, spec, or PRD into independently-grabbable issues using vertical slices.
+- **[to-prd](./skills/engineering/to-prd/SKILL.md)** — Turn the current conversation into a PRD and publish it to the issue tracker. No interview — just synthesizes what you've already discussed.
+- **[prototype](./skills/engineering/prototype/SKILL.md)** — Build a throwaway prototype to flesh out a design — either a runnable terminal app for state/business-logic questions, or several radically different UI variations toggleable from one route.
+
+**Model-invoked**
+
+- **[diagnosing-bugs](./skills/engineering/diagnosing-bugs/SKILL.md)** — Disciplined diagnosis loop for hard bugs and performance regressions: reproduce → minimise → hypothesise → instrument → fix → regression-test.
+- **[tdd](./skills/engineering/tdd/SKILL.md)** — Test-driven development with a red-green-refactor loop. Builds features or fixes bugs one vertical slice at a time.
+- **[domain-modeling](./skills/engineering/domain-modeling/SKILL.md)** — Actively build and sharpen a project's domain model — challenge terms against the glossary, stress-test with edge-case scenarios, and update `CONTEXT.md` and ADRs inline.
+- **[codebase-design](./skills/engineering/codebase-design/SKILL.md)** — Shared discipline and vocabulary for designing deep modules: a lot of behaviour behind a small interface, placed at a clean seam, testable through that interface.
+
+### Productivity
+
+General workflow tools, not code-specific.
+
+**User-invoked**
+
+- **[grill-me](./skills/productivity/grill-me/SKILL.md)** — Get relentlessly interviewed about a plan or design until every branch of the decision tree is resolved.
+- **[handoff](./skills/productivity/handoff/SKILL.md)** — Compact the current conversation into a handoff document so another agent can continue the work.
+- **[teach](./skills/productivity/teach/SKILL.md)** — Teach the user a new skill or concept over multiple sessions, using the current directory as a stateful teaching workspace.
+- **[writing-great-skills](./skills/productivity/writing-great-skills/SKILL.md)** — Reference for writing and editing skills well: the vocabulary and principles that make a skill predictable.
+
+**Model-invoked**
+
+- **[grilling](./skills/productivity/grilling/SKILL.md)** — Interview the user relentlessly about a plan or design until every branch of the decision tree is resolved. The reusable loop behind `grill-me` and `grill-with-docs`.
+
+### Misc
+
+Tools I keep around but rarely use.
+
+- **[git-guardrails-claude-code](./skills/misc/git-guardrails-claude-code/SKILL.md)** — Set up Claude Code hooks to block dangerous git commands (push, reset --hard, clean, etc.) before they execute.
+- **[migrate-to-shoehorn](./skills/misc/migrate-to-shoehorn/SKILL.md)** — Migrate test files from `as` type assertions to @total-typescript/shoehorn.
+- **[scaffold-exercises](./skills/misc/scaffold-exercises/SKILL.md)** — Create exercise directory structures with sections, problems, solutions, and explainers.
+- **[setup-pre-commit](./skills/misc/setup-pre-commit/SKILL.md)** — Set up Husky pre-commit hooks with lint-staged, Prettier, type checking, and tests.
--- a/caveman/SKILL.md
+++ b/caveman/SKILL.md
@ -1,49 +0,0 @@
---
-name: caveman
-description: >
-  Ultra-compressed communication mode. Cuts token usage ~75% by dropping
-  filler, articles, and pleasantries while keeping full technical accuracy.
-  Use when user says "caveman mode", "talk like caveman", "use caveman",
-  "less tokens", "be brief", or invokes /caveman.
---
-
-Respond terse like smart caveman. All technical substance stay. Only fluff die.
-
-## Persistence
-
-ACTIVE EVERY RESPONSE once triggered. No revert after many turns. No filler drift. Still active if unsure. Off only when user says "stop caveman" or "normal mode".
-
-## Rules
-
-Drop: articles (a/an/the), filler (just/really/basically/actually/simply), pleasantries (sure/certainly/of course/happy to), hedging. Fragments OK. Short synonyms (big not extensive, fix not "implement a solution for"). Abbreviate common terms (DB/auth/config/req/res/fn/impl). Strip conjunctions. Use arrows for causality (X -> Y). One word when one word enough.
-
-Technical terms stay exact. Code blocks unchanged. Errors quoted exact.
-
-Pattern: `[thing] [action] [reason]. [next step].`
-
-Not: "Sure! I'd be happy to help you with that. The issue you're experiencing is likely caused by..."
-Yes: "Bug in auth middleware. Token expiry check use `<` not `<=`. Fix:"
-
-### Examples
-
-**"Why React component re-render?"**
-
-> Inline obj prop -> new ref -> re-render. `useMemo`.
-
-**"Explain database connection pooling."**
-
-> Pool = reuse DB conn. Skip handshake -> fast under load.
-
-## Auto-Clarity Exception
-
-Drop caveman temporarily for: security warnings, irreversible action confirmations, multi-step sequences where fragment order risks misread, user asks to clarify or repeats question. Resume caveman after clear part done.
-
-Example -- destructive op:
-
-> **Warning:** This will permanently delete all rows in the `users` table and cannot be undone.
->
-> ```sql
-> DROP TABLE users;
-> ```
->
-> Caveman resume. Verify backup exist first.
--- a/docs/adr/0001-explicit-setup-pointer-only-for-hard-dependencies.md
+++ b/docs/adr/0001-explicit-setup-pointer-only-for-hard-dependencies.md
@ -0,0 +1,10 @@
+# Explicit `/setup-matt-pocock-skills` pointer only for hard dependencies
+
+Engineering skills depend on per-repo config (issue tracker, triage label vocabulary, domain doc layout) seeded by `/setup-matt-pocock-skills`. Some skills cannot meaningfully function without that config — they have to publish to a specific issue tracker or apply a specific label string. Others only use it to sharpen output (vocabulary, ADR awareness) and degrade gracefully without it.
+
+We split these into **hard-dependency** and **soft-dependency** skills:
+
+- **Hard dependency** (`to-issues`, `to-prd`, `triage`) — include an explicit one-liner: _"… should have been provided to you — run `/setup-matt-pocock-skills` if not."_ Without the mapping, output is wrong, not just fuzzy.
+- **Soft dependency** (`diagnose`, `tdd`, `improve-codebase-architecture`) — reference "the project's domain glossary" and "ADRs in the area you're touching" in vague prose only. If the docs aren't there, the skill still works; output is just less sharp.
+
+The split keeps soft-dependency skills token-light and avoids cargo-culting the setup pointer into places where it isn't load-bearing.
--- a/docs/invocation.md
+++ b/docs/invocation.md
@ -0,0 +1,18 @@
+# Model-invoked vs user-invoked
+
+Every `SKILL.md` in this repo is a skill. The one axis that splits them is **invocation** — who can reach it:
+
+- **User-invoked** — reachable **only by the human typing its name**. Set `disable-model-invocation: true` in the frontmatter. The `description` is **human-facing**: a one-line summary read by a person browsing slash-commands. Strip trigger lists ("Use when the user says…").
+- **Model-invoked** — reachable by **model or user**. The default: omit `disable-model-invocation`. The `description` is **model-facing** and keeps rich trigger phrasing ("Use when the user wants…, mentions…, asks for…") so auto-invocation fires. The test for whether a skill should stay model-invoked: _could the model usefully reach for this autonomously?_ (Reuse is the reason to extract a skill, not the test.)
+
+Because a user-invoked skill has no description, nothing but the human can reach it — no other skill can fire it. So a user-invoked skill may invoke model-invoked skills, but it can never reach another user-invoked skill.
+
+Bucket `README.md`s and the top-level `README.md` group entries into **User-invoked** and **Model-invoked**.
+
+## Dependencies between them
+
+Dependencies are expressed as **`/skill`-style prose invocation** ("Run the `/grilling` skill"), not deep `../other-skill/FILE.md` cross-references. Shared reference docs live inside the skill that owns them; other skills reach that material by invoking the skill, not by linking across folders.
+
+## Passive vs active domain work
+
+Merely _reading_ `CONTEXT.md` for vocabulary is a one-line prose pointer, not the `domain-modeling` skill. Only the active build/sharpen discipline (challenge terms, edge-case scenarios, write ADRs, update `CONTEXT.md` inline) is `domain-modeling`.
--- a/github-triage/SKILL.md
+++ b/github-triage/SKILL.md
@ -1,168 +0,0 @@
---
-name: github-triage
-description: Triage GitHub issues through a label-based state machine. Use when user wants to create an issue, triage issues, review incoming bugs or feature requests, prepare issues for an AFK agent, or manage issue workflow.
---
-
-# GitHub Issue Triage
-
-Triage issues in the current repo using a label-based state machine. Infer the repo from `git remote`. Use `gh` for all GitHub operations.
-
-## AI Disclaimer
-
-Every comment or issue posted to GitHub during triage **must** include the following disclaimer at the top of the comment body, before any other content:
-
-```
-> *This was generated by AI during triage.*
-```
-
-## Reference docs
-
- [AGENT-BRIEF.md](AGENT-BRIEF.md) — how to write durable agent briefs
- [OUT-OF-SCOPE.md](OUT-OF-SCOPE.md) — how the `.out-of-scope/` knowledge base works
-
-## Labels
-
-| Label             | Type     | Description                              |
-| ----------------- | -------- | ---------------------------------------- |
-| `bug`             | Category | Something is broken                      |
-| `enhancement`     | Category | New feature or improvement               |
-| `needs-triage`    | State    | Maintainer needs to evaluate this issue  |
-| `needs-info`      | State    | Waiting on reporter for more information |
-| `ready-for-agent` | State    | Fully specified, ready for AFK agent     |
-| `ready-for-human` | State    | Requires human implementation            |
-| `wontfix`         | State    | Will not be actioned                     |
-
-Every issue should have exactly **one** state label and **one** category label. If an issue has conflicting state labels (e.g. both `needs-triage` and `ready-for-agent`), flag the conflict and ask the maintainer which state is correct before doing anything else. Provide a recommendation.
-
-## State Machine
-
-| Current State  | Can transition to | Who triggers it        | What happens                                                                                                         |
-| -------------- | ----------------- | ---------------------- | -------------------------------------------------------------------------------------------------------------------- |
-| `unlabeled`    | `needs-triage`    | Skill (on first look)  | Issue needs maintainer evaluation. Skill applies label after presenting recommendation.                              |
-| `unlabeled`    | `ready-for-agent` | Maintainer (via skill) | Issue is already well-specified and agent-suitable. Skill writes agent brief comment, applies label.                 |
-| `unlabeled`    | `ready-for-human` | Maintainer (via skill) | Issue requires human implementation. Skill writes a brief comment summarizing the task, applies label.               |
-| `unlabeled`    | `wontfix`         | Maintainer (via skill) | Issue is spam, duplicate, or out of scope. Skill closes with comment (and writes `.out-of-scope/` for enhancements). |
-| `needs-triage` | `needs-info`      | Maintainer (via skill) | Issue is underspecified. Skill posts triage notes capturing progress so far + questions for reporter.                |
-| `needs-triage` | `ready-for-agent` | Maintainer (via skill) | Grilling session complete, agent-suitable. Skill writes agent brief comment, applies label.                          |
-| `needs-triage` | `ready-for-human` | Maintainer (via skill) | Grilling session complete, needs human. Skill writes a brief comment summarizing the task, applies label.            |
-| `needs-triage` | `wontfix`         | Maintainer (via skill) | Maintainer decides not to action. Skill closes with comment (and writes `.out-of-scope/` for enhancements).          |
-| `needs-info`   | `needs-triage`    | Skill (detects reply)  | Reporter has replied. Skill surfaces to maintainer for re-evaluation.                                                |
-
-An issue can only move along these transitions. The maintainer can override any state directly (see Quick State Override below), but the skill should flag if the transition is unusual.
-
-## Invocation
-
-The maintainer invokes `/github-triage` then describes what they want in natural language. The skill interprets the request and takes the appropriate action.
-
-Example requests:
-
- "Show me anything that needs my attention"
- "Let's look at #42"
- "Move #42 to ready-for-agent"
- "What's ready for agents to pick up?"
- "Are there any unlabeled issues?"
-
-## Workflow: Show What Needs Attention
-
-When the maintainer asks for an overview, query GitHub and present a summary grouped into three buckets:
-
-1. **Unlabeled issues** — new, no labels at all. These have never been triaged.
-2. **`needs-triage` issues** — maintainer needs to evaluate or continue evaluating.
-3. **`needs-info` issues with new activity** — the reporter has commented since the last triage notes comment. Check comment timestamps to determine this.
-
-Display counts per group. Within each group, show issues oldest first (longest-waiting gets attention first). For each issue, show: number, title, age, and a one-line summary of the issue body.
-
-Let the maintainer pick which issue to dive into.
-
-## Workflow: Triage a Specific Issue
-
-### Step 1: Gather context
-
-Before presenting anything to the maintainer:
-
- Read the full issue: body, all comments, all labels, who reported it, when
- If there are prior triage notes comments (from previous sessions), parse them to understand what has already been established
- Explore the codebase to build context — understand the domain, relevant interfaces, and existing behavior related to the issue
- Read `.out-of-scope/*.md` files and check if this issue matches or is similar to a previously rejected concept
-
-### Step 2: Present a recommendation
-
-Tell the maintainer:
-
- **Category recommendation:** bug or enhancement, with reasoning
- **State recommendation:** where this issue should go, with reasoning
- If it matches a prior out-of-scope rejection, surface that: "This is similar to `.out-of-scope/concept-name.md` — we rejected this before because X. Do you still feel the same way?"
- A brief summary of what you found in the codebase that's relevant
-
-Then wait for the maintainer's direction. They may:
-
- Agree and ask you to apply labels → do it
- Want to flesh it out → start a /domain-model session
- Override with a different state → apply their choice
- Want to discuss → have a conversation
-
-### Step 3: Bug reproduction (bugs only)
-
-If the issue is categorized as a bug, attempt to reproduce it before starting a /domain-model session. This will vary by codebase, but do your best:
-
- Read the reporter's reproduction steps (if provided)
- Explore the codebase to understand the relevant code paths
- Try to reproduce the bug: run tests, execute commands, or trace the logic to confirm the reported behavior
- If reproduction succeeds, report what you found to the maintainer — include the specific behavior you observed and where in the code it originates
- If reproduction fails, report that too — the bug may be environment-specific, already fixed, or the report may be inaccurate
- If the report lacks enough detail to attempt reproduction, note that — this is a strong signal the issue should move to `needs-info`
-
-The reproduction attempt informs the /domain-model session and the agent brief. A confirmed reproduction with a known code path makes for a much stronger brief.
-
-### Step 4: /domain-model session (if needed)
-
-If the issue needs to be fleshed out before it's ready for an agent, interview the maintainer to build a complete specification. Use the /domain-model skill.
-
-### Step 5: Apply the outcome
-
-Depending on the outcome:
-
- **ready-for-agent** — post an agent brief comment (see [AGENT-BRIEF.md](AGENT-BRIEF.md))
- **ready-for-human** — post a comment summarizing the task, what was established during triage, and why it needs human implementation. Use the same structure as an agent brief but note the reason it can't be delegated to an agent (e.g. requires judgment calls, external system access, design decisions, or manual testing).
- **needs-info** — post triage notes with progress so far and questions for the reporter (see Needs Info Output below)
- **wontfix (bug)** — post a polite comment explaining why, then close the issue
- **wontfix (enhancement)** — write to `.out-of-scope/`, post a comment linking to it, then close the issue (see [OUT-OF-SCOPE.md](OUT-OF-SCOPE.md))
- **needs-triage** — apply the label. Optionally leave a comment if there's partial progress to capture.
-
-## Workflow: Quick State Override
-
-When the maintainer explicitly tells you to move an issue to a specific state (e.g. "move #42 to ready-for-agent"), trust their judgment and apply the label directly.
-
-Still show a confirmation of what you're about to do: which labels will be added/removed, and whether you'll post a comment or close the issue. But skip the /domain-model session entirely.
-
-If moving to `ready-for-agent` without a /domain-model session, ask the maintainer if they want to write a brief agent brief comment or skip it.
-
-## Needs Info Output
-
-When moving an issue to `needs-info`, post a comment that captures the interview progress and tells the reporter what's needed:
-
-```markdown
-## Triage Notes
-
-**What we've established so far:**
-
- point 1
- point 2
-
-**What we still need from you (@reporter):**
-
- question 1
- question 2
-```
-
-Include everything resolved during the /domain-model session in "established so far" — this work should not be lost. The questions for the reporter should be specific and actionable, not vague ("please provide more info").
-
-## Resuming Previous Sessions
-
-When triaging an issue that already has triage notes from a previous session:
-
-1. Read all comments to find prior triage notes
-2. Parse what was already established
-3. Check if the reporter has answered any outstanding questions
-4. Present the maintainer with an updated picture: "Here's where we left off, and here's what the reporter has said since"
-5. Continue the /domain-model session from where it stopped — do not re-ask resolved questions
--- a/improve-codebase-architecture/LANGUAGE.md
+++ b/improve-codebase-architecture/LANGUAGE.md
@ -1,53 +0,0 @@
-# Language
-
-Shared vocabulary for every suggestion this skill makes. Use these terms exactly — don't substitute "component," "service," "API," or "boundary." Consistent language is the whole point.
-
-## Terms
-
-**Module**
-Anything with an interface and an implementation. Deliberately scale-agnostic — applies equally to a function, class, package, or tier-spanning slice.
-_Avoid_: unit, component, service.
-
-**Interface**
-Everything a caller must know to use the module correctly. Includes the type signature, but also invariants, ordering constraints, error modes, required configuration, and performance characteristics.
-_Avoid_: API, signature (too narrow — those refer only to the type-level surface).
-
-**Implementation**
-What's inside a module — its body of code. Distinct from **Adapter**: a thing can be a small adapter with a large implementation (a Postgres repo) or a large adapter with a small implementation (an in-memory fake). Reach for "adapter" when the seam is the topic; "implementation" otherwise.
-
-**Depth**
-Leverage at the interface — the amount of behaviour a caller (or test) can exercise per unit of interface they have to learn. A module is **deep** when a large amount of behaviour sits behind a small interface. A module is **shallow** when the interface is nearly as complex as the implementation.
-
-**Seam** _(from Michael Feathers)_
-A place where you can alter behaviour without editing in that place. The *location* at which a module's interface lives. Choosing where to put the seam is its own design decision, distinct from what goes behind it.
-_Avoid_: boundary (overloaded with DDD's bounded context).
-
-**Adapter**
-A concrete thing that satisfies an interface at a seam. Describes *role* (what slot it fills), not substance (what's inside).
-
-**Leverage**
-What callers get from depth. More capability per unit of interface they have to learn. One implementation pays back across N call sites and M tests.
-
-**Locality**
-What maintainers get from depth. Change, bugs, knowledge, and verification concentrate at one place rather than spreading across callers. Fix once, fixed everywhere.
-
-## Principles
-
- **Depth is a property of the interface, not the implementation.** A deep module can be internally composed of small, mockable, swappable parts — they just aren't part of the interface. A module can have **internal seams** (private to its implementation, used by its own tests) as well as the **external seam** at its interface.
- **The deletion test.** Imagine deleting the module. If complexity vanishes, the module wasn't hiding anything (it was a pass-through). If complexity reappears across N callers, the module was earning its keep.
- **The interface is the test surface.** Callers and tests cross the same seam. If you want to test *past* the interface, the module is probably the wrong shape.
- **One adapter means a hypothetical seam. Two adapters means a real one.** Don't introduce a seam unless something actually varies across it.
-
-## Relationships
-
- A **Module** has exactly one **Interface** (the surface it presents to callers and tests).
- **Depth** is a property of a **Module**, measured against its **Interface**.
- A **Seam** is where a **Module**'s **Interface** lives.
- An **Adapter** sits at a **Seam** and satisfies the **Interface**.
- **Depth** produces **Leverage** for callers and **Locality** for maintainers.
-
-## Rejected framings
-
- **Depth as ratio of implementation-lines to interface-lines** (Ousterhout): rewards padding the implementation. We use depth-as-leverage instead.
- **"Interface" as the TypeScript `interface` keyword or a class's public methods**: too narrow — interface here includes every fact a caller must know.
- **"Boundary"**: overloaded with DDD's bounded context. Say **seam** or **interface**.
--- a/improve-codebase-architecture/SKILL.md
+++ b/improve-codebase-architecture/SKILL.md
@ -1,76 +0,0 @@
---
-name: improve-codebase-architecture
-description: Find deepening opportunities in a codebase, informed by the domain language in CONTEXT.md and the decisions in docs/adr/. Use when the user wants to improve architecture, find refactoring opportunities, consolidate tightly-coupled modules, or make a codebase more testable and AI-navigable.
---
-
-# Improve Codebase Architecture
-
-Surface architectural friction and propose **deepening opportunities** — refactors that turn shallow modules into deep ones. The aim is testability and AI-navigability.
-
-## Glossary
-
-Use these terms exactly in every suggestion. Consistent language is the point — don't drift into "component," "service," "API," or "boundary." Full definitions in [LANGUAGE.md](LANGUAGE.md).
-
- **Module** — anything with an interface and an implementation (function, class, package, slice).
- **Interface** — everything a caller must know to use the module: types, invariants, error modes, ordering, config. Not just the type signature.
- **Implementation** — the code inside.
- **Depth** — leverage at the interface: a lot of behaviour behind a small interface. **Deep** = high leverage. **Shallow** = interface nearly as complex as the implementation.
- **Seam** — where an interface lives; a place behaviour can be altered without editing in place. (Use this, not "boundary.")
- **Adapter** — a concrete thing satisfying an interface at a seam.
- **Leverage** — what callers get from depth.
- **Locality** — what maintainers get from depth: change, bugs, knowledge concentrated in one place.
-
-Key principles (see [LANGUAGE.md](LANGUAGE.md) for the full list):
-
- **Deletion test**: imagine deleting the module. If complexity vanishes, it was a pass-through. If complexity reappears across N callers, it was earning its keep.
- **The interface is the test surface.**
- **One adapter = hypothetical seam. Two adapters = real seam.**
-
-This skill is _informed_ by the project's domain model — `CONTEXT.md` and any `docs/adr/`. The domain language gives names to good seams; ADRs record decisions the skill should not re-litigate. See [CONTEXT-FORMAT.md](../domain-model/CONTEXT-FORMAT.md) and [ADR-FORMAT.md](../domain-model/ADR-FORMAT.md).
-
-## Process
-
-### 1. Explore
-
-Read existing documentation first:
-
- `CONTEXT.md` (or `CONTEXT-MAP.md` + each `CONTEXT.md` in a multi-context repo)
- Relevant ADRs in `docs/adr/` (and any context-scoped `docs/adr/` directories)
-
-If any of these files don't exist, proceed silently — don't flag their absence or suggest creating them upfront.
-
-Then use the Agent tool with `subagent_type=Explore` to walk the codebase. Don't follow rigid heuristics — explore organically and note where you experience friction:
-
- Where does understanding one concept require bouncing between many small modules?
- Where are modules **shallow** — interface nearly as complex as the implementation?
- Where have pure functions been extracted just for testability, but the real bugs hide in how they're called (no **locality**)?
- Where do tightly-coupled modules leak across their seams?
- Which parts of the codebase are untested, or hard to test through their current interface?
-
-Apply the **deletion test** to anything you suspect is shallow: would deleting it concentrate complexity, or just move it? A "yes, concentrates" is the signal you want.
-
-### 2. Present candidates
-
-Present a numbered list of deepening opportunities. For each candidate:
-
- **Files** — which files/modules are involved
- **Problem** — why the current architecture is causing friction
- **Solution** — plain English description of what would change
- **Benefits** — explained in terms of locality and leverage, and also in how tests would improve
-
-**Use CONTEXT.md vocabulary for the domain, and [LANGUAGE.md](LANGUAGE.md) vocabulary for the architecture.** If `CONTEXT.md` defines "Order," talk about "the Order intake module" — not "the FooBarHandler," and not "the Order service."
-
-**ADR conflicts**: if a candidate contradicts an existing ADR, only surface it when the friction is real enough to warrant revisiting the ADR. Mark it clearly (e.g. _"contradicts ADR-0007 — but worth reopening because…"_). Don't list every theoretical refactor an ADR forbids.
-
-Do NOT propose interfaces yet. Ask the user: "Which of these would you like to explore?"
-
-### 3. Grilling loop
-
-Once the user picks a candidate, drop into a grilling conversation. Walk the design tree with them — constraints, dependencies, the shape of the deepened module, what sits behind the seam, what tests survive.
-
-Side effects happen inline as decisions crystallize:
-
- **Naming a deepened module after a concept not in `CONTEXT.md`?** Add the term to `CONTEXT.md` — same discipline as `/domain-model` (see [CONTEXT-FORMAT.md](../domain-model/CONTEXT-FORMAT.md)). Create the file lazily if it doesn't exist.
- **Sharpening a fuzzy term during the conversation?** Update `CONTEXT.md` right there.
- **User rejects the candidate with a load-bearing reason?** Offer an ADR, framed as: _"Want me to record this as an ADR so future architecture reviews don't re-suggest it?"_ Only offer when the reason would actually be needed by a future explorer to avoid re-suggesting the same thing — skip ephemeral reasons ("not worth it right now") and self-evident ones. See [ADR-FORMAT.md](../domain-model/ADR-FORMAT.md).
- **Want to explore alternative interfaces for the deepened module?** See [INTERFACE-DESIGN.md](INTERFACE-DESIGN.md).
--- a/package-lock.json
+++ b/package-lock.json
--- a/package.json
+++ b/package.json
@ -0,0 +1,20 @@
+{
+  "name": "mattpocock-skills",
+  "version": "1.0.1",
+  "private": true,
+  "description": "Matt Pocock's agent skills for real engineering",
+  "repository": {
+    "type": "git",
+    "url": "https://github.com/mattpocock/skills"
+  },
+  "license": "MIT",
+  "scripts": {
+    "changeset": "changeset",
+    "version": "changeset version"
+  },
+  "devDependencies": {
+    "@changesets/changelog-github": "^0.7.0",
+    "@changesets/cli": "^2.30.0"
+  },
+  "packageManager": "npm@10.9.4"
+}
--- a/scripts/link-skills.sh
+++ b/scripts/link-skills.sh
@ -0,0 +1,52 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+# Links all skills in the repository into the local skill directories used by
+# each agent harness:
+#   - ~/.claude/skills  — Claude Code
+#   - ~/.agents/skills  — pi and other Agent-Skills-standard harnesses
+# Each entry is a symlink into this repo, so a `git pull` is all that's needed
+# to keep installed skills up to date.
+
+REPO="$(cd "$(dirname "$0")/.." && pwd)"
+DESTS=("$HOME/.claude/skills" "$HOME/.agents/skills")
+
+# Collect the repo's skills once, link into every destination.
+names=()
+srcs=()
+while IFS= read -r -d '' skill_md; do
+  src="$(dirname "$skill_md")"
+  names+=("$(basename "$src")")
+  srcs+=("$src")
+done < <(find "$REPO/skills" -name SKILL.md -not -path '*/node_modules/*' -not -path '*/deprecated/*' -print0)
+
+for DEST in "${DESTS[@]}"; do
+  # If $DEST is a symlink that resolves into this repo, we'd end up writing the
+  # per-skill symlinks back into the repo's own skills/ tree. Detect and bail
+  # out instead of polluting the working copy.
+  if [ -L "$DEST" ]; then
+    resolved="$(readlink -f "$DEST")"
+    case "$resolved" in
+      "$REPO"|"$REPO"/*)
+        echo "error: $DEST is a symlink into this repo ($resolved)." >&2
+        echo "Remove it (rm \"$DEST\") and re-run; the script will recreate it as a real dir." >&2
+        exit 1
+        ;;
+    esac
+  fi
+
+  mkdir -p "$DEST"
+
+  for i in "${!names[@]}"; do
+    name="${names[$i]}"
+    src="${srcs[$i]}"
+    target="$DEST/$name"
+
+    if [ -e "$target" ] && [ ! -L "$target" ]; then
+      rm -rf "$target"
+    fi
+
+    ln -sfn "$src" "$target"
+    echo "linked $name -> $src ($DEST)"
+  done
+done
--- a/scripts/list-skills.sh
+++ b/scripts/list-skills.sh
@ -0,0 +1,7 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+REPO="$(cd "$(dirname "$0")/.." && pwd)"
+
+cd "$REPO"
+find . -name SKILL.md -not -path '*/node_modules/*' | sed 's|^\./||' | sort
--- a/skills/deprecated/README.md
+++ b/skills/deprecated/README.md
@ -0,0 +1,8 @@
+# Deprecated
+
+Skills I no longer use.
+
+- **[design-an-interface](./design-an-interface/SKILL.md)** — Generate multiple radically different interface designs for a module using parallel sub-agents.
+- **[qa](./qa/SKILL.md)** — Interactive QA session where user reports bugs conversationally and the agent files GitHub issues.
+- **[request-refactor-plan](./request-refactor-plan/SKILL.md)** — Create a detailed refactor plan with tiny commits via user interview, then file it as a GitHub issue.
+- **[ubiquitous-language](./ubiquitous-language/SKILL.md)** — Extract a DDD-style ubiquitous language glossary from the current conversation.
--- a/skills/deprecated/design-an-interface/SKILL.md
+++ b/skills/deprecated/design-an-interface/SKILL.md
--- a/skills/deprecated/qa/SKILL.md
+++ b/skills/deprecated/qa/SKILL.md
--- a/skills/deprecated/request-refactor-plan/SKILL.md
+++ b/skills/deprecated/request-refactor-plan/SKILL.md
--- a/skills/deprecated/ubiquitous-language/SKILL.md
+++ b/skills/deprecated/ubiquitous-language/SKILL.md
--- a/skills/engineering/README.md
+++ b/skills/engineering/README.md
@ -0,0 +1,25 @@
+# Engineering
+
+Skills I use daily for code work.
+
+## User-invoked
+
+Reachable only when you type them (`disable-model-invocation: true`).
+
+- **[ask-matt](./ask-matt/SKILL.md)** — Ask which skill or flow fits your situation. A router over the user-invoked skills in this repo.
+- **[grill-with-docs](./grill-with-docs/SKILL.md)** — Grilling session that also builds your project's domain model, sharpening terminology and updating `CONTEXT.md` and ADRs inline.
+- **[triage](./triage/SKILL.md)** — Move issues through a state machine of triage roles.
+- **[improve-codebase-architecture](./improve-codebase-architecture/SKILL.md)** — Scan a codebase for deepening opportunities, present them as a visual HTML report, then grill through whichever one you pick.
+- **[setup-matt-pocock-skills](./setup-matt-pocock-skills/SKILL.md)** — Configure this repo for the engineering skills (issue tracker, triage labels, domain doc layout). Run once per repo.
+- **[to-issues](./to-issues/SKILL.md)** — Break any plan, spec, or PRD into independently-grabbable issues using vertical slices.
+- **[to-prd](./to-prd/SKILL.md)** — Turn the current conversation into a PRD and publish it to the issue tracker.
+- **[prototype](./prototype/SKILL.md)** — Build a throwaway prototype — a runnable terminal app for state/logic questions, or several toggleable UI variations.
+
+## Model-invoked
+
+Model- or user-reachable (rich trigger phrasing so the model can reach for them).
+
+- **[diagnosing-bugs](./diagnosing-bugs/SKILL.md)** — Disciplined diagnosis loop for hard bugs and performance regressions: reproduce → minimise → hypothesise → instrument → fix → regression-test.
+- **[tdd](./tdd/SKILL.md)** — Test-driven development with a red-green-refactor loop. Builds features or fixes bugs one vertical slice at a time.
+- **[domain-modeling](./domain-modeling/SKILL.md)** — Actively build and sharpen a project's domain model — challenge terms, stress-test with scenarios, update `CONTEXT.md` and ADRs inline.
+- **[codebase-design](./codebase-design/SKILL.md)** — Shared discipline and vocabulary for designing deep modules: small interfaces, clean seams, testable through the interface.
--- a/skills/engineering/ask-matt/SKILL.md
+++ b/skills/engineering/ask-matt/SKILL.md
@ -0,0 +1,61 @@
+---
+name: ask-matt
+description: Ask which skill or flow fits your situation. A router over the user-invoked skills in this repo.
+disable-model-invocation: true
+---
+
+# Ask Matt
+
+You don't remember every skill, so ask.
+
+A **flow** is a path through the skills. Most paths run along one **main flow**, and two **on-ramps** merge onto it. Everything else is standalone.
+
+## The main flow: idea → ship
+
+The route most work travels. You have an idea and want it built.
+
+1. **`/grill-with-docs`** — sharpen the idea by interview. Start here when you **have a codebase**: it's stateful, retaining what it learns in `CONTEXT.md` and ADRs. (No codebase? Use `/grill-me` — see Standalone.)
+2. **Branch — can you settle every question in conversation?** If a question needs a runnable answer (state, business logic, a UI you have to see), detour through a prototype, bridged by **`/handoff`** in both directions (see Crossing sessions):
+   - **`/handoff`** out, then open a fresh session against that file,
+   - **`/prototype`** to answer the question with throwaway code,
+   - **`/handoff`** back what you learned, and reference it from the original idea thread.
+3. **Branch — is this a multi-session build?**
+   - **Yes** → **`/to-prd`** (turn the thread into a PRD) → **`/to-issues`** (split the PRD into independently-grabbable issues). Because the issues are independent, **clear context between each one**: start a fresh session per issue and kick off **`/implement`** by passing it the PRD and the single issue to work on.
+   - **No** → **`/implement`** right here, in the same context window.
+
+### Context hygiene
+
+Keep steps 1–3 in **one unbroken context window** — don't compact or clear until after `/to-issues` — so the grilling, PRD, and issues all build on the same thinking. Each `/implement` then starts fresh, working from the issue.
+
+The limit on this is the **[smart zone](https://www.aihero.dev/ai-coding-dictionary/smart-zone)**: the window (~120k tokens on state-of-the-art models) within which the model still reasons sharply. If a session approaches it before `/to-issues`, don't push on degraded — `/handoff` and continue in a fresh thread.
+
+## On-ramps
+
+A starting situation that generates work, then merges onto the main flow.
+
+- **Bugs and requests piling up** → **`/triage`**. It moves issues through triage roles and produces agent-ready issues, which **`/implement`** later picks up.
+
+  Triage is only for issues **you didn't create** — bug reports, incoming feature requests, anything that arrives raw. Issues that `/to-issues` produced are already agent-ready, so **don't triage them**.
+
+## Codebase health
+
+Not feature work — upkeep.
+
+- **`/improve-codebase-architecture`** — run whenever you have a spare moment to keep the codebase good for agents to operate in. It surfaces deepening opportunities; picking one _generates an idea_ you can take into the main flow at `/grill-with-docs`.
+
+## Crossing sessions
+
+- **`/handoff`** — when a thread is full or you need to branch off (e.g. into a `/prototype` session), this compacts the conversation into a markdown file. You don't continue in place — you **open a new session and reference that file** to carry the context across. It's the bridge between context windows, in either direction. Use it when you want a **fresh session** but need the **current conversation preserved**.
+- **`/compact`** (built-in) — stay in the **same conversation**, letting the earlier turns be summarized. Use it at **intentional breaks between phases**, when you don't mind losing the verbatim history. Don't compact mid-phase — the agent can lose its way. `/handoff` forks; `/compact` continues.
+
+## Standalone
+
+Off the main flow entirely.
+
+- **`/grill-me`** — the same relentless interview as `/grill-with-docs`, but for when you have **no codebase**. Stateless: it saves nothing locally, builds no `CONTEXT.md`. Reach for it to sharpen any plan or design that doesn't live in a repo.
+- **`/teach`** — learn a concept over multiple sessions, using the current directory as a stateful workspace.
+- **`/writing-great-skills`** — reference for writing and editing skills well.
+
+## Precondition
+
+**`/setup-matt-pocock-skills`** — run before your first engineering flow to configure the issue tracker, triage labels, and doc layout the other skills assume. Custom issue trackers also work.
--- a/skills/engineering/codebase-design/DEEPENING.md
+++ b/skills/engineering/codebase-design/DEEPENING.md
@ -1,6 +1,6 @@
 # Deepening

-How to deepen a cluster of shallow modules safely, given its dependencies. Assumes the vocabulary in [LANGUAGE.md](LANGUAGE.md) — **module**, **interface**, **seam**, **adapter**.
+How to deepen a cluster of shallow modules safely, given its dependencies. Assumes the vocabulary in [SKILL.md](SKILL.md) — **module**, **interface**, **seam**, **adapter**.

 ## Dependency categories

--- a/skills/engineering/codebase-design/DESIGN-IT-TWICE.md
+++ b/skills/engineering/codebase-design/DESIGN-IT-TWICE.md
@ -1,8 +1,8 @@
-# Interface Design
+# Design It Twice

 When the user wants to explore alternative interfaces for a chosen deepening candidate, use this parallel sub-agent pattern. Based on "Design It Twice" (Ousterhout) — your first idea is unlikely to be the best.

-Uses the vocabulary in [LANGUAGE.md](LANGUAGE.md) — **module**, **interface**, **seam**, **adapter**, **leverage**.
+Uses the vocabulary in [SKILL.md](SKILL.md) — **module**, **interface**, **seam**, **adapter**, **leverage**.

 ## Process

@ -27,7 +27,7 @@ Prompt each sub-agent with a separate technical brief (file paths, coupling deta
 - Agent 3: "Optimise for the most common caller — make the default case trivial."
 - Agent 4 (if applicable): "Design around ports & adapters for cross-seam dependencies."

-Include both [LANGUAGE.md](LANGUAGE.md) vocabulary and CONTEXT.md vocabulary in the brief so each sub-agent names things consistently with the architecture language and the project's domain language.
+Include both [SKILL.md](SKILL.md) vocabulary and CONTEXT.md vocabulary in the brief so each sub-agent names things consistently with the architecture language and the project's domain language.

 Each sub-agent outputs:

--- a/skills/engineering/codebase-design/SKILL.md
+++ b/skills/engineering/codebase-design/SKILL.md
@ -0,0 +1,114 @@
+---
+name: codebase-design
+description: Shared vocabulary for designing deep modules. Use when the user wants to design or improve a module's interface, find deepening opportunities, decide where a seam goes, make code more testable or AI-navigable, or when another skill needs the deep-module vocabulary.
+---
+
+# Codebase Design
+
+Design **deep modules**: a lot of behaviour behind a small interface, placed at a clean seam, testable through that interface. Use this language and these principles wherever code is being designed or restructured. The aim is leverage for callers, locality for maintainers, and testability for everyone.
+
+## Glossary
+
+Use these terms exactly — don't substitute "component," "service," "API," or "boundary." Consistent language is the whole point.
+
+**Module** — anything with an interface and an implementation. Deliberately scale-agnostic: a function, class, package, or tier-spanning slice. _Avoid_: unit, component, service.
+
+**Interface** — everything a caller must know to use the module correctly: the type signature, but also invariants, ordering constraints, error modes, required configuration, and performance characteristics. _Avoid_: API, signature (too narrow — they refer only to the type-level surface).
+
+**Implementation** — what's inside a module, its body of code. Distinct from **Adapter**: a thing can be a small adapter with a large implementation (a Postgres repo) or a large adapter with a small implementation (an in-memory fake). Reach for "adapter" when the seam is the topic; "implementation" otherwise.
+
+**Depth** — leverage at the interface: the amount of behaviour a caller (or test) can exercise per unit of interface they have to learn. A module is **deep** when a large amount of behaviour sits behind a small interface, **shallow** when the interface is nearly as complex as the implementation.
+
+**Seam** _(Michael Feathers)_ — a place where you can alter behaviour without editing in that place; the *location* at which a module's interface lives. Where to put the seam is its own design decision, distinct from what goes behind it. _Avoid_: boundary (overloaded with DDD's bounded context).
+
+**Adapter** — a concrete thing that satisfies an interface at a seam. Describes *role* (what slot it fills), not substance (what's inside).
+
+**Leverage** — what callers get from depth: more capability per unit of interface they learn. One implementation pays back across N call sites and M tests.
+
+**Locality** — what maintainers get from depth: change, bugs, knowledge, and verification concentrate in one place rather than spreading across callers. Fix once, fixed everywhere.
+
+## Deep vs shallow
+
+**Deep module** = small interface + lots of implementation:
+
+```
+┌─────────────────────┐
+│   Small Interface   │  ← Few methods, simple params
+├─────────────────────┤
+│                     │
+│  Deep Implementation│  ← Complex logic hidden
+│                     │
+└─────────────────────┘
+```
+
+**Shallow module** = large interface + little implementation (avoid):
+
+```
+┌─────────────────────────────────┐
+│       Large Interface           │  ← Many methods, complex params
+├─────────────────────────────────┤
+│  Thin Implementation            │  ← Just passes through
+└─────────────────────────────────┘
+```
+
+When designing an interface, ask:
+
+- Can I reduce the number of methods?
+- Can I simplify the parameters?
+- Can I hide more complexity inside?
+
+## Principles
+
+- **Depth is a property of the interface, not the implementation.** A deep module can be internally composed of small, mockable, swappable parts — they just aren't part of the interface. A module can have **internal seams** (private to its implementation, used by its own tests) as well as the **external seam** at its interface.
+- **The deletion test.** Imagine deleting the module. If complexity vanishes, it was a pass-through. If complexity reappears across N callers, it was earning its keep.
+- **The interface is the test surface.** Callers and tests cross the same seam. If you want to test *past* the interface, the module is probably the wrong shape.
+- **One adapter means a hypothetical seam. Two adapters means a real one.** Don't introduce a seam unless something actually varies across it.
+
+## Designing for testability
+
+Good interfaces make testing natural:
+
+1. **Accept dependencies, don't create them.**
+
+   ```typescript
+   // Testable
+   function processOrder(order, paymentGateway) {}
+
+   // Hard to test
+   function processOrder(order) {
+     const gateway = new StripeGateway();
+   }
+   ```
+
+2. **Return results, don't produce side effects.**
+
+   ```typescript
+   // Testable
+   function calculateDiscount(cart): Discount {}
+
+   // Hard to test
+   function applyDiscount(cart): void {
+     cart.total -= discount;
+   }
+   ```
+
+3. **Small surface area.** Fewer methods = fewer tests needed. Fewer params = simpler test setup.
+
+## Relationships
+
+- A **Module** has exactly one **Interface** (the surface it presents to callers and tests).
+- **Depth** is a property of a **Module**, measured against its **Interface**.
+- A **Seam** is where a **Module**'s **Interface** lives.
+- An **Adapter** sits at a **Seam** and satisfies the **Interface**.
+- **Depth** produces **Leverage** for callers and **Locality** for maintainers.
+
+## Rejected framings
+
+- **Depth as ratio of implementation-lines to interface-lines** (Ousterhout): rewards padding the implementation. We use depth-as-leverage instead.
+- **"Interface" as the TypeScript `interface` keyword or a class's public methods**: too narrow — interface here includes every fact a caller must know.
+- **"Boundary"**: overloaded with DDD's bounded context. Say **seam** or **interface**.
+
+## Going deeper
+
+- **Deepening a cluster given its dependencies** — see [DEEPENING.md](DEEPENING.md): dependency categories, seam discipline, and replace-don't-layer testing.
+- **Exploring alternative interfaces** — see [DESIGN-IT-TWICE.md](DESIGN-IT-TWICE.md): spin up parallel sub-agents to design the interface several radically different ways, then compare on depth, locality, and seam placement.
--- a/skills/engineering/diagnosing-bugs/SKILL.md
+++ b/skills/engineering/diagnosing-bugs/SKILL.md
@ -0,0 +1,134 @@
+---
+name: diagnosing-bugs
+description: Diagnosis loop for hard bugs and performance regressions. Use when the user says "diagnose"/"debug this", or reports something broken/throwing/failing/slow.
+---
+
+# Diagnosing Bugs
+
+A discipline for hard bugs. Skip phases only when explicitly justified.
+
+When exploring the codebase, read `CONTEXT.md` (if it exists) to get a clear mental model of the relevant modules, and check ADRs in the area you're touching.
+
+## Phase 1 — Build a feedback loop
+
+**This is the skill.** Everything else is mechanical. If you have a **tight** pass/fail signal for the bug — one that goes red on _this_ bug — you will find the cause; bisection, hypothesis-testing, and instrumentation all just consume it. If you don't have one, no amount of staring at code will save you.
+
+Spend disproportionate effort here. **Be aggressive. Be creative. Refuse to give up.**
+
+### Ways to construct one — try them in roughly this order
+
+1. **Failing test** at whatever seam reaches the bug — unit, integration, e2e.
+2. **Curl / HTTP script** against a running dev server.
+3. **CLI invocation** with a fixture input, diffing stdout against a known-good snapshot.
+4. **Headless browser script** (Playwright / Puppeteer) — drives the UI, asserts on DOM/console/network.
+5. **Replay a captured trace.** Save a real network request / payload / event log to disk; replay it through the code path in isolation.
+6. **Throwaway harness.** Spin up a minimal subset of the system (one service, mocked deps) that exercises the bug code path with a single function call.
+7. **Property / fuzz loop.** If the bug is "sometimes wrong output", run 1000 random inputs and look for the failure mode.
+8. **Bisection harness.** If the bug appeared between two known states (commit, dataset, version), automate "boot at state X, check, repeat" so you can `git bisect run` it.
+9. **Differential loop.** Run the same input through old-version vs new-version (or two configs) and diff outputs.
+10. **HITL bash script.** Last resort. If a human must click, drive _them_ with `scripts/hitl-loop.template.sh` so the loop is still structured. Captured output feeds back to you.
+
+Build the right feedback loop, and the bug is 90% fixed.
+
+### Tighten the loop
+
+Treat the loop as a product. Once you have _a_ loop, **tighten** it:
+
+- Can I make it faster? (Cache setup, skip unrelated init, narrow the test scope.)
+- Can I make the signal sharper? (Assert on the specific symptom, not "didn't crash".)
+- Can I make it more deterministic? (Pin time, seed RNG, isolate filesystem, freeze network.)
+
+A 30-second flaky loop is barely better than no loop; a 2-second deterministic one is tight — a debugging superpower.
+
+### Non-deterministic bugs
+
+The goal is not a clean repro but a **higher reproduction rate**. Loop the trigger 100×, parallelise, add stress, narrow timing windows, inject sleeps. A 50%-flake bug is debuggable; 1% is not — keep raising the rate until it's debuggable.
+
+### When you genuinely cannot build a loop
+
+Stop and say so explicitly. List what you tried. Ask the user for: (a) access to whatever environment reproduces it, (b) a captured artifact (HAR file, log dump, core dump, screen recording with timestamps), or (c) permission to add temporary production instrumentation. Do **not** proceed to hypothesise without a loop.
+
+### Completion criterion — a tight loop that goes red
+
+Phase 1 is done when the loop is **tight** and **red-capable**: you can name **one command** — a script path, a test invocation, a curl — that you have **already run at least once** (paste the invocation and its output), and that is:
+
+- [ ] **Red-capable** — it drives the actual bug code path and asserts the **user's exact symptom**, so it can go red on this bug and green once fixed. Not "runs without erroring" — it must be able to _catch this specific bug_.
+- [ ] **Deterministic** — same verdict every run (flaky bugs: a pinned, high reproduction rate, per above).
+- [ ] **Fast** — seconds, not minutes.
+- [ ] **Agent-runnable** — you can run it unattended; a human in the loop only via `scripts/hitl-loop.template.sh`.
+
+If you catch yourself reading code to build a theory before this command exists, **stop — jumping straight to a hypothesis is the exact failure this skill prevents.** No red-capable command, no Phase 2.
+
+## Phase 2 — Reproduce + minimise
+
+Run the loop. Watch it go red — the bug appears.
+
+Confirm:
+
+- [ ] The loop produces the failure mode the **user** described — not a different failure that happens to be nearby. Wrong bug = wrong fix.
+- [ ] The failure is reproducible across multiple runs (or, for non-deterministic bugs, reproducible at a high enough rate to debug against).
+- [ ] You have captured the exact symptom (error message, wrong output, slow timing) so later phases can verify the fix actually addresses it.
+
+### Minimise
+
+Once it's red, shrink the repro to the **smallest scenario that still goes red**. Cut inputs, callers, config, data, and steps **one at a time**, re-running the loop after each cut — keep only what's load-bearing for the failure.
+
+Why bother: a minimal repro shrinks the hypothesis space in Phase 3 (fewer moving parts left to suspect) and becomes the clean regression test in Phase 5.
+
+Done when **every remaining element is load-bearing** — removing any one of them makes the loop go green.
+
+Do not proceed until you have reproduced **and** minimised.
+
+## Phase 3 — Hypothesise
+
+Generate **3–5 ranked hypotheses** before testing any of them. Single-hypothesis generation anchors on the first plausible idea.
+
+Each hypothesis must be **falsifiable**: state the prediction it makes.
+
+> Format: "If <X> is the cause, then <changing Y> will make the bug disappear / <changing Z> will make it worse."
+
+If you cannot state the prediction, the hypothesis is a vibe — discard or sharpen it.
+
+**Show the ranked list to the user before testing.** They often have domain knowledge that re-ranks instantly ("we just deployed a change to #3"), or know hypotheses they've already ruled out. Cheap checkpoint, big time saver. Don't block on it — proceed with your ranking if the user is AFK.
+
+## Phase 4 — Instrument
+
+Each probe must map to a specific prediction from Phase 3. **Change one variable at a time.**
+
+Tool preference:
+
+1. **Debugger / REPL inspection** if the env supports it. One breakpoint beats ten logs.
+2. **Targeted logs** at the boundaries that distinguish hypotheses.
+3. Never "log everything and grep".
+
+**Tag every debug log** with a unique prefix, e.g. `[DEBUG-a4f2]`. Cleanup at the end becomes a single grep. Untagged logs survive; tagged logs die.
+
+**Perf branch.** For performance regressions, logs are usually wrong. Instead: establish a baseline measurement (timing harness, `performance.now()`, profiler, query plan), then bisect. Measure first, fix second.
+
+## Phase 5 — Fix + regression test
+
+Write the regression test **before the fix** — but only if there is a **correct seam** for it.
+
+A correct seam is one where the test exercises the **real bug pattern** as it occurs at the call site. If the only available seam is too shallow (single-caller test when the bug needs multiple callers, unit test that can't replicate the chain that triggered the bug), a regression test there gives false confidence.
+
+**If no correct seam exists, that itself is the finding.** Note it. The codebase architecture is preventing the bug from being locked down. Flag this for the next phase.
+
+If a correct seam exists:
+
+1. Turn the minimised repro into a failing test at that seam.
+2. Watch it fail.
+3. Apply the fix.
+4. Watch it pass.
+5. Re-run the Phase 1 feedback loop against the original (un-minimised) scenario.
+
+## Phase 6 — Cleanup + post-mortem
+
+Required before declaring done:
+
+- [ ] Original repro no longer reproduces (re-run the Phase 1 loop)
+- [ ] Regression test passes (or absence of seam is documented)
+- [ ] All `[DEBUG-...]` instrumentation removed (`grep` the prefix)
+- [ ] Throwaway prototypes deleted (or moved to a clearly-marked debug location)
+- [ ] The hypothesis that turned out correct is stated in the commit / PR message — so the next debugger learns
+
+**Then ask: what would have prevented this bug?** If the answer involves architectural change (no good test seam, tangled callers, hidden coupling) hand off to the `/improve-codebase-architecture` skill with the specifics. Make the recommendation **after** the fix is in, not before — you have more information now than when you started.
--- a/skills/engineering/diagnosing-bugs/scripts/hitl-loop.template.sh
+++ b/skills/engineering/diagnosing-bugs/scripts/hitl-loop.template.sh
@ -0,0 +1,41 @@
+#!/usr/bin/env bash
+# Human-in-the-loop reproduction loop.
+# Copy this file, edit the steps below, and run it.
+# The agent runs the script; the user follows prompts in their terminal.
+#
+# Usage:
+#   bash hitl-loop.template.sh
+#
+# Two helpers:
+#   step "<instruction>"          → show instruction, wait for Enter
+#   capture VAR "<question>"      → show question, read response into VAR
+#
+# At the end, captured values are printed as KEY=VALUE for the agent to parse.
+
+set -euo pipefail
+
+step() {
+  printf '\n>>> %s\n' "$1"
+  read -r -p "    [Enter when done] " _
+}
+
+capture() {
+  local var="$1" question="$2" answer
+  printf '\n>>> %s\n' "$question"
+  read -r -p "    > " answer
+  printf -v "$var" '%s' "$answer"
+}
+
+# --- edit below ---------------------------------------------------------
+
+step "Open the app at http://localhost:3000 and sign in."
+
+capture ERRORED "Click the 'Export' button. Did it throw an error? (y/n)"
+
+capture ERROR_MSG "Paste the error message (or 'none'):"
+
+# --- edit above ---------------------------------------------------------
+
+printf '\n--- Captured ---\n'
+printf 'ERRORED=%s\n' "$ERRORED"
+printf 'ERROR_MSG=%s\n' "$ERROR_MSG"
--- a/skills/engineering/domain-modeling/ADR-FORMAT.md
+++ b/skills/engineering/domain-modeling/ADR-FORMAT.md
--- a/skills/engineering/domain-modeling/CONTEXT-FORMAT.md
+++ b/skills/engineering/domain-modeling/CONTEXT-FORMAT.md
@ -10,7 +10,7 @@
 ## Language

 **Order**:
-{A concise description of the term}
+{A one or two sentence description of the term}
 _Avoid_: Purchase, transaction

 **Invoice**:
@ -20,31 +20,14 @@ _Avoid_: Bill, payment request
 **Customer**:
 A person or organization that places orders.
 _Avoid_: Client, buyer, account
-
-## Relationships
-
- An **Order** produces one or more **Invoices**
- An **Invoice** belongs to exactly one **Customer**
-
-## Example dialogue
-
-> **Dev:** "When a **Customer** places an **Order**, do we create the **Invoice** immediately?"
-> **Domain expert:** "No — an **Invoice** is only generated once a **Fulfillment** is confirmed."
-
-## Flagged ambiguities
-
- "account" was used to mean both **Customer** and **User** — resolved: these are distinct concepts.
 ```

 ## Rules

- **Be opinionated.** When multiple words exist for the same concept, pick the best one and list the others as aliases to avoid.
- **Flag conflicts explicitly.** If a term is used ambiguously, call it out in "Flagged ambiguities" with a clear resolution.
- **Keep definitions tight.** One sentence max. Define what it IS, not what it does.
- **Show relationships.** Use bold term names and express cardinality where obvious.
+- **Be opinionated.** When multiple words exist for the same concept, pick the best one and list the others under `_Avoid_`.
+- **Keep definitions tight.** One or two sentences max. Define what it IS, not what it does.
 - **Only include terms specific to this project's context.** General programming concepts (timeouts, error types, utility patterns) don't belong even if the project uses them extensively. Before adding a term, ask: is this a concept unique to this context, or a general programming concept? Only the former belongs.
 - **Group terms under subheadings** when natural clusters emerge. If all terms belong to a single cohesive area, a flat list is fine.
- **Write an example dialogue.** A conversation between a dev and a domain expert that demonstrates how the terms interact naturally and clarifies boundaries between related concepts.

 ## Single vs multi-context repos

--- a/skills/engineering/domain-modeling/SKILL.md
+++ b/skills/engineering/domain-modeling/SKILL.md
@ -1,20 +1,13 @@
 ---
-name: domain-model
-description: Grilling session that challenges your plan against the existing domain model, sharpens terminology, and updates documentation (CONTEXT.md, ADRs) inline as decisions crystallise. Use when user wants to stress-test a plan against their project's language and documented decisions.
-disable-model-invocation: true
+name: domain-modeling
+description: Build and sharpen a project's domain model. Use when the user wants to pin down domain terminology or a ubiquitous language, record an architectural decision, or when another skill needs to maintain the domain model.
 ---

-Interview me relentlessly about every aspect of this plan until we reach a shared understanding. Walk down each branch of the design tree, resolving dependencies between decisions one-by-one. For each question, provide your recommended answer.
+# Domain Modeling

-Ask the questions one at a time, waiting for feedback on each question before continuing.
+Actively build and sharpen the project's domain model as you design. This is the *active* discipline — challenging terms, inventing edge-case scenarios, and writing the glossary and decisions down the moment they crystallise. (Merely *reading* `CONTEXT.md` for vocabulary is not this skill — that's a one-line habit any skill can do. This skill is for when you're changing the model, not just consuming it.)

-If a question can be answered by exploring the codebase, explore the codebase instead.
-
-## Domain awareness
-
-During codebase exploration, also look for existing documentation:
-
-### File structure
+## File structure

 Most repos have a single context:

@ -68,7 +61,7 @@ When the user states how something works, check whether the code agrees. If you

 When a term is resolved, update `CONTEXT.md` right there. Don't batch these up — capture them as they happen. Use the format in [CONTEXT-FORMAT.md](./CONTEXT-FORMAT.md).

-Don't couple `CONTEXT.md` to implementation details. Only include terms that are meaningful to domain experts.
+`CONTEXT.md` should be totally devoid of implementation details. Do not treat `CONTEXT.md` as a spec, a scratch pad, or a repository for implementation decisions. It is a glossary and nothing else.

 ### Offer ADRs sparingly

--- a/skills/engineering/grill-with-docs/SKILL.md
+++ b/skills/engineering/grill-with-docs/SKILL.md
@ -0,0 +1,7 @@
+---
+name: grill-with-docs
+description: A relentless interview to sharpen a plan or design, which also creates docs (ADR's and glossary) as we go.
+disable-model-invocation: true
+---
+
+Run a `/grilling` session, using the `/domain-modeling` skill.
--- a/skills/engineering/implement/SKILL.md
+++ b/skills/engineering/implement/SKILL.md
@ -0,0 +1,15 @@
+---
+name: implement
+description: "Implement a piece of work based on a PRD or set of issues."
+disable-model-invocation: true
+---
+
+Implement the work described by the user in the PRD or issues.
+
+Use /tdd where possible, at pre-agreed seams.
+
+Run typechecking regularly, single test files regularly, and the full test suite once at the end.
+
+Once done, use /review to review the work.
+
+Commit your work to the current branch.
--- a/skills/engineering/improve-codebase-architecture/HTML-REPORT.md
+++ b/skills/engineering/improve-codebase-architecture/HTML-REPORT.md
@ -0,0 +1,123 @@
+# HTML Report Format
+
+The architectural review is rendered as a single self-contained HTML file in the OS temp directory. Tailwind and Mermaid both come from CDNs. Mermaid handles graph-shaped diagrams reliably; hand-built divs and inline SVG handle the more editorial visuals (mass diagrams, cross-sections). Mix the two — don't lean on Mermaid for everything, it'll start to look generic.
+
+## Scaffold
+
+```html
+<!doctype html>
+<html lang="en">
+  <head>
+    <meta charset="utf-8" />
+    <title>Architecture review — {{repo name}}</title>
+    <script src="https://cdn.tailwindcss.com"></script>
+    <script type="module">
+      import mermaid from "https://cdn.jsdelivr.net/npm/mermaid@11/dist/mermaid.esm.min.mjs";
+      mermaid.initialize({ startOnLoad: true, theme: "neutral", securityLevel: "loose" });
+    </script>
+    <style>
+      /* small custom layer for things Tailwind doesn't cover cleanly:
+         dashed seam lines, hand-drawn-feeling arrow heads, etc. */
+      .seam { stroke-dasharray: 4 4; }
+      .leak { stroke: #dc2626; }
+      .deep { background: linear-gradient(135deg, #0f172a, #1e293b); }
+    </style>
+  </head>
+  <body class="bg-stone-50 text-slate-900 font-sans">
+    <main class="max-w-5xl mx-auto px-6 py-12 space-y-12">
+      <header>...</header>
+      <section id="candidates" class="space-y-10">...</section>
+      <section id="top-recommendation">...</section>
+    </main>
+  </body>
+</html>
+```
+
+## Header
+
+Repo name, date, and a compact legend: solid box = module, dashed line = seam, red arrow = leakage, thick dark box = deep module. No introduction paragraph — straight into the candidates.
+
+## Candidate card
+
+The diagrams carry the weight. Prose is sparse, plain, and uses the glossary terms (from the `/codebase-design` skill) without ceremony.
+
+Each candidate is one `<article>`:
+
+- **Title** — short, names the deepening (e.g. "Collapse the Order intake pipeline").
+- **Badge row** — recommendation strength (`Strong` = emerald, `Worth exploring` = amber, `Speculative` = slate), plus a tag for the dependency category (`in-process`, `local-substitutable`, `ports & adapters`, `mock`).
+- **Files** — monospaced list, `font-mono text-sm`.
+- **Before / After diagram** — the centrepiece. Two columns, side by side. See patterns below.
+- **Problem** — one sentence. What hurts.
+- **Solution** — one sentence. What changes.
+- **Wins** — bullets, ≤6 words each. e.g. "Tests hit one interface", "Pricing logic stops leaking", "Delete 4 shallow wrappers".
+- **ADR callout** (if applicable) — one line in an amber-tinted box.
+
+No paragraphs of explanation. If the diagram needs a paragraph to be understood, redraw the diagram.
+
+## Diagram patterns
+
+Pick the pattern that fits the candidate. Mix them. Don't make every diagram look the same — variety is part of the point.
+
+### Mermaid graph (the workhorse for dependencies / call flow)
+
+Use a Mermaid `flowchart` or `graph` when the point is "X calls Y calls Z, and look at the mess." Wrap it in a Tailwind-styled card so it doesn't feel parachuted in. Style with classDef to colour leakage edges red and the deep module dark. Sequence diagrams work well for "before: 6 round-trips; after: 1."
+
+```html
+<div class="rounded-lg border border-slate-200 bg-white p-4">
+  <pre class="mermaid">
+    flowchart LR
+      A[OrderHandler] --> B[OrderValidator]
+      B --> C[OrderRepo]
+      C -.leak.-> D[PricingClient]
+      classDef leak stroke:#dc2626,stroke-width:2px;
+      class C,D leak
+  </pre>
+</div>
+```
+
+### Hand-built boxes-and-arrows (when Mermaid's layout fights you)
+
+Modules as `<div>`s with borders and labels. Arrows as inline SVG `<line>` or `<path>` elements positioned absolutely over a relative container. Reach for this when you want the "after" diagram to feel like one thick-bordered deep module with greyed-out internals — Mermaid won't render that with the right weight.
+
+### Cross-section (good for layered shallowness)
+
+Stack horizontal bands (`h-12 border-l-4`) to show layers a call passes through. Before: 6 thin layers each doing nothing. After: 1 thick band labelled with the consolidated responsibility.
+
+### Mass diagram (good for "interface as wide as implementation")
+
+Two rectangles per module — one for interface surface area, one for implementation. Before: interface rectangle is nearly as tall as the implementation rectangle (shallow). After: interface rectangle is short, implementation rectangle is tall (deep).
+
+### Call-graph collapse
+
+Before: a tree of function calls rendered as nested boxes. After: the same tree collapsed into one box, with the now-internal calls shown faded inside it.
+
+## Style guidance
+
+- Lean editorial, not corporate-dashboard. Generous whitespace. Serif optional for headings (`font-serif` works well with stone/slate).
+- Colour sparingly: one accent (emerald or indigo) plus red for leakage and amber for warnings.
+- Keep diagrams ~320px tall so before/after sits comfortably side by side without scrolling.
+- Use `text-xs uppercase tracking-wider` for module labels inside diagrams — they should read as schematic, not as UI.
+- The only scripts are the Tailwind CDN and the Mermaid ESM import. The report is otherwise static — no app code, no interactivity beyond Mermaid's own rendering.
+
+## Top recommendation section
+
+One larger card. Candidate name, one sentence on why, anchor link to its card. That's it.
+
+## Tone
+
+Plain English, concise — but the architectural nouns and verbs come straight from the `/codebase-design` skill. Concision is not an excuse to drift.
+
+**Use exactly:** module, interface, implementation, depth, deep, shallow, seam, adapter, leverage, locality.
+
+**Never substitute:** component, service, unit (for module) · API, signature (for interface) · boundary (for seam) · layer, wrapper (for module, when you mean module).
+
+**Phrasings that fit the style:**
+
+- "Order intake module is shallow — interface nearly matches the implementation."
+- "Pricing leaks across the seam."
+- "Deepen: one interface, one place to test."
+- "Two adapters justify the seam: HTTP in prod, in-memory in tests."
+
+**Wins bullets** name the gain in glossary terms: *"locality: bugs concentrate in one module"*, *"leverage: one interface, N call sites"*, *"interface shrinks; implementation absorbs the wrappers"*. Don't write *"easier to maintain"* or *"cleaner code"* — those terms aren't in the glossary and don't earn their place.
+
+No hedging, no throat-clearing, no "it's worth noting that…". If a sentence could be a bullet, make it a bullet. If a bullet could be cut, cut it. If a term isn't in the `/codebase-design` glossary, reach for one that is before inventing a new one.
--- a/skills/engineering/improve-codebase-architecture/SKILL.md
+++ b/skills/engineering/improve-codebase-architecture/SKILL.md
@ -0,0 +1,66 @@
+---
+name: improve-codebase-architecture
+description: Scan a codebase for deepening opportunities, present them as a visual HTML report, then grill through whichever one you pick.
+disable-model-invocation: true
+---
+
+# Improve Codebase Architecture
+
+Surface architectural friction and propose **deepening opportunities** — refactors that turn shallow modules into deep ones. The aim is testability and AI-navigability.
+
+This command is _informed_ by the project's domain model and built on a shared design vocabulary:
+
+- Run the `/codebase-design` skill for the architecture vocabulary (**module**, **interface**, **depth**, **seam**, **adapter**, **leverage**, **locality**) and its principles (the deletion test, "the interface is the test surface", "one adapter = hypothetical seam, two = real"). Use these terms exactly in every suggestion — don't drift into "component," "service," "API," or "boundary."
+- The domain language in `CONTEXT.md` gives names to good seams; ADRs in `docs/adr/` record decisions this command should not re-litigate.
+
+## Process
+
+### 1. Explore
+
+Read the project's domain glossary (`CONTEXT.md`) and any ADRs in the area you're touching first.
+
+Then use the Agent tool with `subagent_type=Explore` to walk the codebase. Don't follow rigid heuristics — explore organically and note where you experience friction:
+
+- Where does understanding one concept require bouncing between many small modules?
+- Where are modules **shallow** — interface nearly as complex as the implementation?
+- Where have pure functions been extracted just for testability, but the real bugs hide in how they're called (no **locality**)?
+- Where do tightly-coupled modules leak across their seams?
+- Which parts of the codebase are untested, or hard to test through their current interface?
+
+Apply the **deletion test** to anything you suspect is shallow: would deleting it concentrate complexity, or just move it? A "yes, concentrates" is the signal you want.
+
+### 2. Present candidates as an HTML report
+
+Write a self-contained HTML file to the OS temp directory so nothing lands in the repo. Resolve the temp dir from `$TMPDIR`, falling back to `/tmp` (or `%TEMP%` on Windows), and write to `<tmpdir>/architecture-review-<timestamp>.html` so each run gets a fresh file. Open it for the user — `xdg-open <path>` on Linux, `open <path>` on macOS, `start <path>` on Windows — and tell them the absolute path.
+
+The report uses **Tailwind via CDN** for layout and styling, and **Mermaid via CDN** for diagrams where a graph/flow/sequence reliably communicates the structure. Mix Mermaid with hand-crafted CSS/SVG visuals — use Mermaid when relationships are graph-shaped (call graphs, dependencies, sequences), and hand-built divs/SVG when you want something more editorial (mass diagrams, cross-sections, collapse animations). Each candidate gets a **before/after visualisation**. Be visual.
+
+For each candidate, render a card with:
+
+- **Files** — which files/modules are involved
+- **Problem** — why the current architecture is causing friction
+- **Solution** — plain English description of what would change
+- **Benefits** — explained in terms of locality and leverage, and how tests would improve
+- **Before / After diagram** — side-by-side, custom-drawn, illustrating the shallowness and the deepening
+- **Recommendation strength** — one of `Strong`, `Worth exploring`, `Speculative`, rendered as a badge
+
+End the report with a **Top recommendation** section: which candidate you'd tackle first and why.
+
+**Use CONTEXT.md vocabulary for the domain, and the `/codebase-design` vocabulary for the architecture.** If `CONTEXT.md` defines "Order," talk about "the Order intake module" — not "the FooBarHandler," and not "the Order service."
+
+**ADR conflicts**: if a candidate contradicts an existing ADR, only surface it when the friction is real enough to warrant revisiting the ADR. Mark it clearly in the card (e.g. a warning callout: _"contradicts ADR-0007 — but worth reopening because…"_). Don't list every theoretical refactor an ADR forbids.
+
+See [HTML-REPORT.md](HTML-REPORT.md) for the full HTML scaffold, diagram patterns, and styling guidance.
+
+Do NOT propose interfaces yet. After the file is written, ask the user: "Which of these would you like to explore?"
+
+### 3. Grilling loop
+
+Once the user picks a candidate, run the `/grilling` skill to walk the design tree with them — constraints, dependencies, the shape of the deepened module, what sits behind the seam, what tests survive.
+
+Side effects happen inline as decisions crystallize — run the `/domain-modeling` skill to keep the domain model current as you go:
+
+- **Naming a deepened module after a concept not in `CONTEXT.md`?** Add the term to `CONTEXT.md`. Create the file lazily if it doesn't exist.
+- **Sharpening a fuzzy term during the conversation?** Update `CONTEXT.md` right there.
+- **User rejects the candidate with a load-bearing reason?** Offer an ADR, framed as: _"Want me to record this as an ADR so future architecture reviews don't re-suggest it?"_ Only offer when the reason would actually be needed by a future explorer to avoid re-suggesting the same thing — skip ephemeral reasons ("not worth it right now") and self-evident ones.
+- **Want to explore alternative interfaces for the deepened module?** Run the `/codebase-design` skill and use its design-it-twice parallel sub-agent pattern.
--- a/skills/engineering/prototype/LOGIC.md
+++ b/skills/engineering/prototype/LOGIC.md
@ -0,0 +1,79 @@
+# Logic Prototype
+
+A tiny interactive terminal app that lets the user drive a state model by hand. Use this when the question is about **business logic, state transitions, or data shape** — the kind of thing that looks reasonable on paper but only feels wrong once you push it through real cases.
+
+## When this is the right shape
+
+- "I'm not sure if this state machine handles the edge case where X then Y."
+- "Does this data model actually let me represent the case where..."
+- "I want to feel out what the API should look like before writing it."
+- Anything where the user wants to **press buttons and watch state change**.
+
+If the question is "what should this look like" — wrong branch. Use [UI.md](UI.md).
+
+## Process
+
+### 1. State the question
+
+Before writing code, write down what state model and what question you're prototyping. One paragraph, in the prototype's README or a comment at the top of the file. A logic prototype that answers the wrong question is pure waste — make the question explicit so it can be checked later, whether the user is watching now or returning to it AFK.
+
+### 2. Pick the language
+
+Use whatever the host project uses. If the project has no obvious runtime (e.g. a docs repo), ask.
+
+Match the project's existing conventions for tooling — don't add a new package manager or runtime just for the prototype.
+
+### 3. Isolate the logic in a portable module
+
+Put the actual logic — the bit that's answering the question — behind a small, pure interface that could be lifted out and dropped into the real codebase later. The TUI around it is throwaway; the logic module shouldn't be.
+
+The right shape depends on the question:
+
+- **A pure reducer** — `(state, action) => state`. Good when actions are discrete events and state is a single value.
+- **A state machine** — explicit states and transitions. Good when "which actions are even legal right now" is part of the question.
+- **A small set of pure functions** over a plain data type. Good when there's no implicit current state — just transformations.
+- **A class or module with a clear method surface** when the logic genuinely owns ongoing internal state.
+
+Pick whichever shape best fits the question being asked, *not* whichever is easiest to wire to a TUI. Keep it pure: no I/O, no terminal code, no `console.log` for control flow. The TUI imports it and calls into it; nothing flows the other direction.
+
+This is what makes the prototype useful past its own lifetime. When the question's been answered, the validated reducer / machine / function set can be lifted into the real module — the TUI shell gets deleted.
+
+### 4. Build the smallest TUI that exposes the state
+
+Build it as a **lightweight TUI** — on every tick, clear the screen (`console.clear()` / `print("\033[2J\033[H")` / equivalent) and re-render the whole frame. The user should always see one stable view, not an ever-growing scrollback.
+
+Each frame has two parts, in this order:
+
+1. **Current state**, pretty-printed and diff-friendly (one field per line, or formatted JSON). Use **bold** for field names or section headers and **dim** for less important context (timestamps, IDs, derived values). Native ANSI escape codes are fine — `\x1b[1m` bold, `\x1b[2m` dim, `\x1b[0m` reset. No need to pull in a styling library unless one is already in the project.
+2. **Keyboard shortcuts**, listed at the bottom: `[a] add user  [d] delete user  [t] tick clock  [q] quit`. Bold the key, dim the description, or vice-versa — whatever reads cleanly.
+
+Behaviour:
+
+1. **Initialise state** — a single in-memory object/struct. Render the first frame on start.
+2. **Read one keystroke (or one line)** at a time, dispatch to a handler that mutates state.
+3. **Re-render** the full frame after every action — don't append, replace.
+4. **Loop until quit.**
+
+The whole frame should fit on one screen.
+
+### 5. Make it runnable in one command
+
+Add a script to the project's existing task runner (`package.json` scripts, `Makefile`, `justfile`, `pyproject.toml`). The user should run `pnpm run <prototype-name>` or equivalent — never need to remember a path.
+
+If the host project has no task runner, just put the command at the top of the prototype's README.
+
+### 6. Hand it over
+
+Give the user the run command. They'll drive it themselves; the interesting moments are when they say "wait, that shouldn't be possible" or "huh, I assumed X would be different" — those are the bugs in the _idea_, which is the whole point. If they want new actions added, add them. Prototypes evolve.
+
+### 7. Capture the answer
+
+When the prototype has done its job, the answer to the question is the only thing worth keeping. If the user is around, ask what it taught them. If not, leave a `NOTES.md` next to the prototype so the answer can be filled in (or filled in by you, if you've watched the session) before the prototype gets deleted.
+
+## Anti-patterns
+
+- **Don't add tests.** A prototype that needs tests is no longer a prototype.
+- **Don't wire it to the real database.** Use an in-memory store unless the question is specifically about persistence.
+- **Don't generalise.** No "what if we wanted to support X later." The prototype answers one question.
+- **Don't blur the logic and the TUI together.** If the reducer / state machine references `console.log`, prompts, or terminal escape codes, it's no longer portable. Keep the TUI as a thin shell over a pure module.
+- **Don't ship the TUI shell into production.** The shell is optimised for being driven by hand from a terminal. The logic module behind it is the bit worth keeping.
--- a/skills/engineering/prototype/SKILL.md
+++ b/skills/engineering/prototype/SKILL.md
@ -0,0 +1,31 @@
+---
+name: prototype
+description: Build a throwaway prototype to flesh out a design — a runnable terminal app for state/business-logic questions, or several radically different UI variations toggleable from one route.
+disable-model-invocation: true
+---
+
+# Prototype
+
+A prototype is **throwaway code that answers a question**. The question decides the shape.
+
+## Pick a branch
+
+Identify which question is being answered — from the user's prompt, the surrounding code, or by asking if the user is around:
+
+- **"Does this logic / state model feel right?"** → [LOGIC.md](LOGIC.md). Build a tiny interactive terminal app that pushes the state machine through cases that are hard to reason about on paper.
+- **"What should this look like?"** → [UI.md](UI.md). Generate several radically different UI variations on a single route, switchable via a URL search param and a floating bottom bar.
+
+The two branches produce very different artifacts — getting this wrong wastes the whole prototype. If the question is genuinely ambiguous and the user isn't reachable, default to whichever branch better matches the surrounding code (a backend module → logic; a page or component → UI) and state the assumption at the top of the prototype.
+
+## Rules that apply to both
+
+1. **Throwaway from day one, and clearly marked as such.** Locate the prototype code close to where it will actually be used (next to the module or page it's prototyping for) so context is obvious — but name it so a casual reader can see it's a prototype, not production. For throwaway UI routes, obey whatever routing convention the project already uses; don't invent a new top-level structure.
+2. **One command to run.** Whatever the project's existing task runner supports — `pnpm <name>`, `python <path>`, `bun <path>`, etc. The user must be able to start it without thinking.
+3. **No persistence by default.** State lives in memory. Persistence is the thing the prototype is _checking_, not something it should depend on. If the question explicitly involves a database, hit a scratch DB or a local file with a clear "PROTOTYPE — wipe me" name.
+4. **Skip the polish.** No tests, no error handling beyond what makes the prototype _runnable_, no abstractions. The point is to learn something fast and then delete it.
+5. **Surface the state.** After every action (logic) or on every variant switch (UI), print or render the full relevant state so the user can see what changed.
+6. **Delete or absorb when done.** When the prototype has answered its question, either delete it or fold the validated decision into the real code — don't leave it rotting in the repo.
+
+## When done
+
+The _answer_ is the only thing worth keeping from a prototype. Capture it somewhere durable (commit message, ADR, issue, or a `NOTES.md` next to the prototype) along with the question it was answering. If the user is around, that capture is a quick conversation; if not, leave the placeholder so they (or you, on the next pass) can fill in the verdict before deleting the prototype.
--- a/skills/engineering/prototype/UI.md
+++ b/skills/engineering/prototype/UI.md
@ -0,0 +1,112 @@
+# UI Prototype
+
+Generate **several radically different UI variations** on a single route, switchable from a floating bottom bar. The user flips between variants in the browser, picks one (or steals bits from each), then throws the rest away.
+
+If the question is about logic/state rather than what something looks like — wrong branch. Use [LOGIC.md](LOGIC.md).
+
+## When this is the right shape
+
+- "What should this page look like?"
+- "I want to see a few options for this dashboard before committing."
+- "Try a different layout for the settings screen."
+- Any time the user would otherwise spend a day picking between three vague mockups in their head.
+
+## Two sub-shapes — strongly prefer sub-shape A
+
+A UI prototype is much easier to judge when it's **butting up against the rest of the app** — real header, real sidebar, real data, real density. A throwaway route on its own is a vacuum: every variant looks fine in isolation. Default to sub-shape A whenever there's a plausible existing page to host the variants. Only reach for sub-shape B if the prototype genuinely has no nearby home.
+
+### Sub-shape A — adjustment to an existing page (preferred)
+
+The route already exists. Variants are rendered **on the same route**, gated by a `?variant=` URL search param. The existing data fetching, params, and auth all stay — only the rendering swaps. This is the default; pick it unless there's a specific reason not to.
+
+If the prototype is for something that doesn't yet have a page but *would naturally live inside one* (a new section of the dashboard, a new card on the settings screen, a new step in an existing flow) — that's still sub-shape A. Mount the variants inside the host page.
+
+### Sub-shape B — a new page (last resort)
+
+Only use this when the thing being prototyped genuinely has no existing page to live inside — e.g. an entirely new top-level surface, or a flow that can't be embedded anywhere sensible.
+
+Create a **throwaway route** following whatever routing convention the project already uses — don't invent a new top-level structure. Name it so it's obviously a prototype (e.g. include the word `prototype` in the path or filename). Same `?variant=` pattern.
+
+Before committing to sub-shape B, sanity-check: is there really no existing page this could be embedded in? An empty route hides design problems that a populated one would expose.
+
+In both sub-shapes the floating bottom bar is identical.
+
+## Process
+
+### 1. State the question and pick N
+
+Default to **3 variants**. More than 5 stops being radically different and starts being noise — cap there.
+
+Write down the plan in one line, in the prototype's location or a top-of-file comment:
+
+> "Three variants of the settings page, switchable via `?variant=`, on the existing `/settings` route."
+
+This works whether the user is here to push back or not.
+
+### 2. Generate radically different variants
+
+Draft each variant. Hold each one to:
+
+- The page's purpose and the data it has access to.
+- The project's component library / styling system (TailwindCSS, shadcn, MUI, plain CSS, whatever).
+- A clear exported component name, e.g. `VariantA`, `VariantB`, `VariantC`.
+
+Variants must be **structurally different** — different layout, different information hierarchy, different primary affordance, not just different colours. Three slightly-tweaked card grids isn't a UI prototype, it's wallpaper. If two drafts come out too similar, redo one with explicit "do not use a card grid" guidance.
+
+### 3. Wire them together
+
+Create a single switcher component on the route:
+
+```tsx
+// pseudo-code — adapt to the project's framework
+const variant = searchParams.get('variant') ?? 'A';
+return (
+  <>
+    {variant === 'A' && <VariantA {...data} />}
+    {variant === 'B' && <VariantB {...data} />}
+    {variant === 'C' && <VariantC {...data} />}
+    <PrototypeSwitcher variants={['A','B','C']} current={variant} />
+  </>
+);
+```
+
+For sub-shape A (existing page): keep all the existing data fetching above the switcher; only the rendered subtree changes per variant.
+
+For sub-shape B (new page): the throwaway route under `/prototype/<name>` mounts the same switcher.
+
+### 4. Build the floating switcher
+
+A small fixed-position bar at the bottom-centre of the screen with three pieces:
+
+- **Left arrow** — cycles to the previous variant (wraps around).
+- **Variant label** — shows the current variant key and, if the variant exports a name, that name too. e.g. `B — Sidebar layout`.
+- **Right arrow** — cycles forward (wraps around).
+
+Behaviour:
+
+- Clicking an arrow updates the URL search param (use the framework's router — `router.replace` on Next, `navigate` on React Router, etc) so the variant is shareable and reload-stable.
+- Keyboard: `←` and `→` arrow keys also cycle. Don't intercept arrow keys when an `<input>`, `<textarea>`, or `[contenteditable]` is focused.
+- Visually distinct from the page (e.g. high-contrast pill, subtle shadow) so it's obviously not part of the design being evaluated.
+- Hidden in production builds — gate on `process.env.NODE_ENV !== 'production'` or an equivalent check, so a stray prototype merge can't ship the bar to users.
+
+Put the switcher in a single shared component so both sub-shapes can reuse it. Locate it wherever shared UI lives in the project.
+
+### 5. Hand it over
+
+Surface the URL (and the `?variant=` keys). The user will flip through whenever they get to it. The interesting feedback is usually **"I want the header from B with the sidebar from C"** — that's the actual design they want.
+
+### 6. Capture the answer and clean up
+
+Once a variant has won, write down which one and why (commit message, ADR, issue, or a `NOTES.md` next to the prototype if running AFK and the user hasn't responded yet). Then:
+
+- **Sub-shape A** — delete the losing variants and the switcher; fold the winner into the existing page.
+- **Sub-shape B** — promote the winning variant to a real route, delete the throwaway route and the switcher.
+
+Don't leave variant components or the switcher lying around. They rot fast and confuse the next reader.
+
+## Anti-patterns
+
+- **Variants that differ only in colour or copy.** That's a tweak, not a prototype. Real variants disagree about structure.
+- **Sharing too much code between variants.** A shared `<Header>` is fine; a shared `<Layout>` defeats the point. Each variant should be free to throw out the layout.
+- **Wiring variants to real mutations.** Read-only prototypes are fine. If a variant needs to mutate, point it at a stub — the question is "what should this look like", not "does the backend work".
+- **Promoting the prototype directly to production.** The variant code was written under prototype constraints (no tests, minimal error handling). Rewrite it properly when you fold it in.
--- a/skills/engineering/resolving-merge-conflicts/SKILL.md
+++ b/skills/engineering/resolving-merge-conflicts/SKILL.md
@ -0,0 +1,14 @@
+---
+name: resolving-merge-conflicts
+description: "Use when you need to resolve an in-progress git merge/rebase conflict."
+---
+
+1. **See the current state** of the merge/rebase. Check git history, and the conflicting files.
+
+2. **Find the primary sources** for each conflict. Understand deeply why each change was made, and what the original intent was. Read the commit messages, check the PRs, check original issues/tickets.
+
+3. **Resolve each hunk.** Preserve both intents where possible. Where incompatible, pick the one matching the merge's stated goal and note the trade-off. Do **not** invent new behaviour. Always resolve; never `--abort`.
+
+4. Discover the project's **automated checks** and run them — typically typecheck, then tests, then format. Fix anything the merge broke.
+
+5. **Finish the merge/rebase.** Stage everything and commit. If rebasing, continue the rebase process until all commits are rebased.
--- a/skills/engineering/setup-matt-pocock-skills/SKILL.md
+++ b/skills/engineering/setup-matt-pocock-skills/SKILL.md
@ -0,0 +1,127 @@
+---
+name: setup-matt-pocock-skills
+description: Configure this repo for the engineering skills — set up its issue tracker, triage label vocabulary, and domain doc layout. Run once before first use of the other engineering skills.
+disable-model-invocation: true
+---
+
+# Setup Matt Pocock's Skills
+
+Scaffold the per-repo configuration that the engineering skills assume:
+
+- **Issue tracker** — where issues live (GitHub by default; local markdown is also supported out of the box)
+- **Triage labels** — the strings used for the five canonical triage roles
+- **Domain docs** — where `CONTEXT.md` and ADRs live, and the consumer rules for reading them
+
+This is a prompt-driven skill, not a deterministic script. Explore, present what you found, confirm with the user, then write.
+
+## Process
+
+### 1. Explore
+
+Look at the current repo to understand its starting state. Read whatever exists; don't assume:
+
+- `git remote -v` and `.git/config` — is this a GitHub repo? Which one?
+- `AGENTS.md` and `CLAUDE.md` at the repo root — does either exist? Is there already an `## Agent skills` section in either?
+- `CONTEXT.md` and `CONTEXT-MAP.md` at the repo root
+- `docs/adr/` and any `src/*/docs/adr/` directories
+- `docs/agents/` — does this skill's prior output already exist?
+- `.scratch/` — sign that a local-markdown issue tracker convention is already in use
+
+### 2. Present findings and ask
+
+Summarise what's present and what's missing. Then walk the user through the three decisions **one at a time** — present a section, get the user's answer, then move to the next. Don't dump all three at once.
+
+Assume the user does not know what these terms mean. Each section starts with a short explainer (what it is, why these skills need it, what changes if they pick differently). Then show the choices and the default.
+
+**Section A — Issue tracker.**
+
+> Explainer: The "issue tracker" is where issues live for this repo. Skills like `to-issues`, `triage`, `to-prd`, and `qa` read from and write to it — they need to know whether to call `gh issue create`, write a markdown file under `.scratch/`, or follow some other workflow you describe. Pick the place you actually track work for this repo.
+
+Default posture: these skills were designed for GitHub. If a `git remote` points at GitHub, propose that. If a `git remote` points at GitLab (`gitlab.com` or a self-hosted host), propose GitLab. Otherwise (or if the user prefers), offer:
+
+- **GitHub** — issues live in the repo's GitHub Issues (uses the `gh` CLI)
+- **GitLab** — issues live in the repo's GitLab Issues (uses the [`glab`](https://gitlab.com/gitlab-org/cli) CLI)
+- **Local markdown** — issues live as files under `.scratch/<feature>/` in this repo (good for solo projects or repos without a remote)
+- **Other** (Jira, Linear, etc.) — ask the user to describe the workflow in one paragraph; the skill will record it as freeform prose
+
+If — and only if — the user picked **GitHub** or **GitLab**, ask one follow-up:
+
+> Explainer: Open-source repos often receive feature requests as pull requests, not just issues — a PR is an issue with attached code. If you turn this on, `/triage` pulls *external* PRs into the same queue and runs them through the same labels and states as issues (collaborators' in-flight PRs are left alone). Leave it off if PRs aren't a request surface for you.
+
+- **PRs as a request surface** — yes / no (default: no). Record the answer in `docs/agents/issue-tracker.md`. For local-markdown and other trackers, skip this question — there are no PRs.
+
+**Section B — Triage label vocabulary.**
+
+> Explainer: When the `triage` skill processes an incoming issue, it moves it through a state machine — needs evaluation, waiting on reporter, ready for an AFK agent to pick up, ready for a human, or won't fix. To do that, it needs to apply labels (or the equivalent in your issue tracker) that match strings *you've actually configured*. If your repo already uses different label names (e.g. `bug:triage` instead of `needs-triage`), map them here so the skill applies the right ones instead of creating duplicates.
+
+The five canonical roles:
+
+- `needs-triage` — maintainer needs to evaluate
+- `needs-info` — waiting on reporter
+- `ready-for-agent` — fully specified, AFK-ready (an agent can pick it up with no human context)
+- `ready-for-human` — needs human implementation
+- `wontfix` — will not be actioned
+
+Default: each role's string equals its name. Ask the user if they want to override any. If their issue tracker has no existing labels, the defaults are fine.
+
+**Section C — Domain docs.**
+
+> Explainer: Some skills (`improve-codebase-architecture`, `diagnosing-bugs`, `tdd`) read a `CONTEXT.md` file to learn the project's domain language, and `docs/adr/` for past architectural decisions. They need to know whether the repo has one global context or multiple (e.g. a monorepo with separate frontend/backend contexts) so they look in the right place.
+
+Confirm the layout:
+
+- **Single-context** — one `CONTEXT.md` + `docs/adr/` at the repo root. Most repos are this.
+- **Multi-context** — `CONTEXT-MAP.md` at the root pointing to per-context `CONTEXT.md` files (typically a monorepo).
+
+### 3. Confirm and edit
+
+Show the user a draft of:
+
+- The `## Agent skills` block to add to whichever of `CLAUDE.md` / `AGENTS.md` is being edited (see step 4 for selection rules)
+- The contents of `docs/agents/issue-tracker.md`, `docs/agents/triage-labels.md`, `docs/agents/domain.md`
+
+Let them edit before writing.
+
+### 4. Write
+
+**Pick the file to edit:**
+
+- If `CLAUDE.md` exists, edit it.
+- Else if `AGENTS.md` exists, edit it.
+- If neither exists, ask the user which one to create — don't pick for them.
+
+Never create `AGENTS.md` when `CLAUDE.md` already exists (or vice versa) — always edit the one that's already there.
+
+If an `## Agent skills` block already exists in the chosen file, update its contents in-place rather than appending a duplicate. Don't overwrite user edits to the surrounding sections.
+
+The block:
+
+```markdown
+## Agent skills
+
+### Issue tracker
+
+[one-line summary of where issues are tracked, plus whether external PRs are a triage surface]. See `docs/agents/issue-tracker.md`.
+
+### Triage labels
+
+[one-line summary of the label vocabulary]. See `docs/agents/triage-labels.md`.
+
+### Domain docs
+
+[one-line summary of layout — "single-context" or "multi-context"]. See `docs/agents/domain.md`.
+```
+
+Then write the three docs files using the seed templates in this skill folder as a starting point:
+
+- [issue-tracker-github.md](./issue-tracker-github.md) — GitHub issue tracker
+- [issue-tracker-gitlab.md](./issue-tracker-gitlab.md) — GitLab issue tracker
+- [issue-tracker-local.md](./issue-tracker-local.md) — local-markdown issue tracker
+- [triage-labels.md](./triage-labels.md) — label mapping
+- [domain.md](./domain.md) — domain doc consumer rules + layout
+
+For "other" issue trackers, write `docs/agents/issue-tracker.md` from scratch using the user's description.
+
+### 5. Done
+
+Tell the user the setup is complete and which engineering skills will now read from these files. Mention they can edit `docs/agents/*.md` directly later — re-running this skill is only necessary if they want to switch issue trackers or restart from scratch.
--- a/skills/engineering/setup-matt-pocock-skills/domain.md
+++ b/skills/engineering/setup-matt-pocock-skills/domain.md
@ -0,0 +1,51 @@
+# Domain Docs
+
+How the engineering skills should consume this repo's domain documentation when exploring the codebase.
+
+## Before exploring, read these
+
+- **`CONTEXT.md`** at the repo root, or
+- **`CONTEXT-MAP.md`** at the repo root if it exists — it points at one `CONTEXT.md` per context. Read each one relevant to the topic.
+- **`docs/adr/`** — read ADRs that touch the area you're about to work in. In multi-context repos, also check `src/<context>/docs/adr/` for context-scoped decisions.
+
+If any of these files don't exist, **proceed silently**. Don't flag their absence; don't suggest creating them upfront. The `/domain-modeling` skill (reached via `/grill-with-docs` and `/improve-codebase-architecture`) creates them lazily when terms or decisions actually get resolved.
+
+## File structure
+
+Single-context repo (most repos):
+
+```
+/
+├── CONTEXT.md
+├── docs/adr/
+│   ├── 0001-event-sourced-orders.md
+│   └── 0002-postgres-for-write-model.md
+└── src/
+```
+
+Multi-context repo (presence of `CONTEXT-MAP.md` at the root):
+
+```
+/
+├── CONTEXT-MAP.md
+├── docs/adr/                          ← system-wide decisions
+└── src/
+    ├── ordering/
+    │   ├── CONTEXT.md
+    │   └── docs/adr/                  ← context-specific decisions
+    └── billing/
+        ├── CONTEXT.md
+        └── docs/adr/
+```
+
+## Use the glossary's vocabulary
+
+When your output names a domain concept (in an issue title, a refactor proposal, a hypothesis, a test name), use the term as defined in `CONTEXT.md`. Don't drift to synonyms the glossary explicitly avoids.
+
+If the concept you need isn't in the glossary yet, that's a signal — either you're inventing language the project doesn't use (reconsider) or there's a real gap (note it for `/domain-modeling`).
+
+## Flag ADR conflicts
+
+If your output contradicts an existing ADR, surface it explicitly rather than silently overriding:
+
+> _Contradicts ADR-0007 (event-sourced orders) — but worth reopening because…_
--- a/skills/engineering/setup-matt-pocock-skills/issue-tracker-github.md
+++ b/skills/engineering/setup-matt-pocock-skills/issue-tracker-github.md
@ -0,0 +1,34 @@
+# Issue tracker: GitHub
+
+Issues and PRDs for this repo live as GitHub issues. Use the `gh` CLI for all operations.
+
+## Conventions
+
+- **Create an issue**: `gh issue create --title "..." --body "..."`. Use a heredoc for multi-line bodies.
+- **Read an issue**: `gh issue view <number> --comments`, filtering comments by `jq` and also fetching labels.
+- **List issues**: `gh issue list --state open --json number,title,body,labels,comments --jq '[.[] | {number, title, body, labels: [.labels[].name], comments: [.comments[].body]}]'` with appropriate `--label` and `--state` filters.
+- **Comment on an issue**: `gh issue comment <number> --body "..."`
+- **Apply / remove labels**: `gh issue edit <number> --add-label "..."` / `--remove-label "..."`
+- **Close**: `gh issue close <number> --comment "..."`
+
+Infer the repo from `git remote -v` — `gh` does this automatically when run inside a clone.
+
+## Pull requests as a triage surface
+
+**PRs as a request surface: no.** _(Set to `yes` if this repo treats external PRs as feature requests; `/triage` reads this flag.)_
+
+When set to `yes`, PRs run through the same labels and states as issues, using the `gh pr` equivalents:
+
+- **Read a PR**: `gh pr view <number> --comments` and `gh pr diff <number>` for the diff.
+- **List external PRs for triage**: `gh pr list --state open --json number,title,body,labels,author,authorAssociation,comments` then keep only `authorAssociation` of `CONTRIBUTOR`, `FIRST_TIME_CONTRIBUTOR`, or `NONE` (drop `OWNER`/`MEMBER`/`COLLABORATOR`).
+- **Comment / label / close**: `gh pr comment`, `gh pr edit --add-label`/`--remove-label`, `gh pr close`.
+
+GitHub shares one number space across issues and PRs, so a bare `#42` may be either — resolve with `gh pr view 42` and fall back to `gh issue view 42`.
+
+## When a skill says "publish to the issue tracker"
+
+Create a GitHub issue.
+
+## When a skill says "fetch the relevant ticket"
+
+Run `gh issue view <number> --comments`.
--- a/skills/engineering/setup-matt-pocock-skills/issue-tracker-gitlab.md
+++ b/skills/engineering/setup-matt-pocock-skills/issue-tracker-gitlab.md
@ -0,0 +1,35 @@
+# Issue tracker: GitLab
+
+Issues and PRDs for this repo live as GitLab issues. Use the [`glab`](https://gitlab.com/gitlab-org/cli) CLI for all operations.
+
+## Conventions
+
+- **Create an issue**: `glab issue create --title "..." --description "..."`. Use a heredoc for multi-line descriptions. Pass `--description -` to open an editor.
+- **Read an issue**: `glab issue view <number> --comments`. Use `-F json` for machine-readable output.
+- **List issues**: `glab issue list -F json` with appropriate `--label` filters.
+- **Comment on an issue**: `glab issue note <number> --message "..."`. GitLab calls comments "notes".
+- **Apply / remove labels**: `glab issue update <number> --label "..."` / `--unlabel "..."`. Multiple labels can be comma-separated or by repeating the flag.
+- **Close**: `glab issue close <number>`. `glab issue close` does not accept a closing comment, so post the explanation first with `glab issue note <number> --message "..."`, then close.
+- **Merge requests**: GitLab calls PRs "merge requests". Use `glab mr create`, `glab mr view`, `glab mr note`, etc. — the same shape as `gh pr ...` with `mr` in place of `pr` and `note`/`--message` in place of `comment`/`--body`.
+
+Infer the repo from `git remote -v` — `glab` does this automatically when run inside a clone.
+
+## Merge requests as a triage surface
+
+**MRs as a request surface: no.** _(Set to `yes` if this repo treats external merge requests as feature requests; `/triage` reads this flag.)_
+
+When set to `yes`, MRs run through the same labels and states as issues, using the `glab mr` equivalents:
+
+- **Read an MR**: `glab mr view <number> --comments` and `glab mr diff <number>` for the diff.
+- **List external MRs for triage**: `glab mr list -F json`, then keep only MRs whose author is not a project member/owner (a contributor's MR, not a maintainer's in-flight work).
+- **Comment / label / close**: `glab mr note`, `glab mr update --label`/`--unlabel`, `glab mr close`.
+
+Unlike GitHub, GitLab numbers issues and MRs separately, so `#42` is unambiguous once you know which surface the maintainer means.
+
+## When a skill says "publish to the issue tracker"
+
+Create a GitLab issue.
+
+## When a skill says "fetch the relevant ticket"
+
+Run `glab issue view <number> --comments`.
--- a/skills/engineering/setup-matt-pocock-skills/issue-tracker-local.md
+++ b/skills/engineering/setup-matt-pocock-skills/issue-tracker-local.md
@ -0,0 +1,19 @@
+# Issue tracker: Local Markdown
+
+Issues and PRDs for this repo live as markdown files in `.scratch/`.
+
+## Conventions
+
+- One feature per directory: `.scratch/<feature-slug>/`
+- The PRD is `.scratch/<feature-slug>/PRD.md`
+- Implementation issues are `.scratch/<feature-slug>/issues/<NN>-<slug>.md`, numbered from `01`
+- Triage state is recorded as a `Status:` line near the top of each issue file (see `triage-labels.md` for the role strings)
+- Comments and conversation history append to the bottom of the file under a `## Comments` heading
+
+## When a skill says "publish to the issue tracker"
+
+Create a new file under `.scratch/<feature-slug>/` (creating the directory if needed).
+
+## When a skill says "fetch the relevant ticket"
+
+Read the file at the referenced path. The user will normally pass the path or the issue number directly.
--- a/skills/engineering/setup-matt-pocock-skills/triage-labels.md
+++ b/skills/engineering/setup-matt-pocock-skills/triage-labels.md
@ -0,0 +1,15 @@
+# Triage Labels
+
+The skills speak in terms of five canonical triage roles. This file maps those roles to the actual label strings used in this repo's issue tracker.
+
+| Label in mattpocock/skills | Label in our tracker | Meaning                                  |
+| -------------------------- | -------------------- | ---------------------------------------- |
+| `needs-triage`             | `needs-triage`       | Maintainer needs to evaluate this issue  |
+| `needs-info`               | `needs-info`         | Waiting on reporter for more information |
+| `ready-for-agent`          | `ready-for-agent`    | Fully specified, ready for an AFK agent  |
+| `ready-for-human`          | `ready-for-human`    | Requires human implementation            |
+| `wontfix`                  | `wontfix`            | Will not be actioned                     |
+
+When a skill mentions a role (e.g. "apply the AFK-ready triage label"), use the corresponding label string from this table.
+
+Edit the right-hand column to match whatever vocabulary you actually use.
--- a/skills/engineering/tdd/SKILL.md
+++ b/skills/engineering/tdd/SKILL.md
@ -1,6 +1,6 @@
 ---
 name: tdd
-description: Test-driven development with red-green-refactor loop. Use when user wants to build features or fix bugs using TDD, mentions "red-green-refactor", wants integration tests, or asks for test-first development.
+description: Test-driven development. Use when the user wants to build features or fix bugs test-first, mentions "red-green-refactor", or wants integration tests.
 ---

 # Test-Driven Development
@ -44,12 +44,13 @@ RIGHT (vertical):

 ### 1. Planning

+When exploring the codebase, read `CONTEXT.md` (if it exists) so that test names and interface vocabulary match the project's domain language, and respect ADRs in the area you're touching.
+
 Before writing any code:

 - [ ] Confirm with user what interface changes are needed
 - [ ] Confirm with user which behaviors to test (prioritize)
- [ ] Identify opportunities for [deep modules](deep-modules.md) (small interface, deep implementation)
- [ ] Design interfaces for [testability](interface-design.md)
+- [ ] Identify opportunities for deep modules (small interface, deep implementation) — run the `/codebase-design` skill for the vocabulary and the testability checks
 - [ ] List the behaviors to test (not implementation steps)
 - [ ] Get user approval on the plan

--- a/skills/engineering/tdd/mocking.md
+++ b/skills/engineering/tdd/mocking.md
--- a/skills/engineering/tdd/refactoring.md
+++ b/skills/engineering/tdd/refactoring.md
--- a/skills/engineering/tdd/tests.md
+++ b/skills/engineering/tdd/tests.md
--- a/skills/engineering/to-issues/SKILL.md
+++ b/skills/engineering/to-issues/SKILL.md
@ -0,0 +1,84 @@
+---
+name: to-issues
+description: Break a plan, spec, or PRD into independently-grabbable issues on the project issue tracker using tracer-bullet vertical slices.
+disable-model-invocation: true
+---
+
+# To Issues
+
+Break a plan into independently-grabbable issues using vertical slices (tracer bullets).
+
+The issue tracker and triage label vocabulary should have been provided to you — run `/setup-matt-pocock-skills` if not.
+
+## Process
+
+### 1. Gather context
+
+Work from whatever is already in the conversation context. If the user passes an issue reference (issue number, URL, or path) as an argument, fetch it from the issue tracker and read its full body and comments.
+
+### 2. Explore the codebase (optional)
+
+If you have not already explored the codebase, do so to understand the current state of the code. Issue titles and descriptions should use the project's domain glossary vocabulary, and respect ADRs in the area you're touching.
+
+Look for opportunities to prefactor the code to make the implementation easier. "Make the change easy, then make the easy change."
+
+### 3. Draft vertical slices
+
+Break the plan into **tracer bullet** issues. Each issue is a thin vertical slice that cuts through ALL integration layers end-to-end, NOT a horizontal slice of one layer.
+
+<vertical-slice-rules>
+
+- Each slice delivers a narrow but COMPLETE path through every layer (schema, API, UI, tests)
+- A completed slice is demoable or verifiable on its own
+- Any prefactoring should be done first
+
+</vertical-slice-rules>
+
+### 4. Quiz the user
+
+Present the proposed breakdown as a numbered list. For each slice, show:
+
+- **Title**: short descriptive name
+- **Blocked by**: which other slices (if any) must complete first
+- **User stories covered**: which user stories this addresses (if the source material has them)
+
+Ask the user:
+
+- Does the granularity feel right? (too coarse / too fine)
+- Are the dependency relationships correct?
+- Should any slices be merged or split further?
+
+Iterate until the user approves the breakdown.
+
+### 5. Publish the issues to the issue tracker
+
+For each approved slice, publish a new issue to the issue tracker. Use the issue body template below. These issues are considered ready for AFK agents, so publish them with the correct triage label unless instructed otherwise.
+
+Publish issues in dependency order (blockers first) so you can reference real issue identifiers in the "Blocked by" field.
+
+<issue-template>
+## Parent
+
+A reference to the parent issue on the issue tracker (if the source was an existing issue, otherwise omit this section).
+
+## What to build
+
+A concise description of this vertical slice. Describe the end-to-end behavior, not layer-by-layer implementation.
+
+Avoid specific file paths or code snippets — they go stale fast. Exception: if a prototype produced a snippet that encodes a decision more precisely than prose can (state machine, reducer, schema, type shape), inline it here and note briefly that it came from a prototype. Trim to the decision-rich parts — not a working demo, just the important bits.
+
+## Acceptance criteria
+
+- [ ] Criterion 1
+- [ ] Criterion 2
+- [ ] Criterion 3
+
+## Blocked by
+
+- A reference to the blocking ticket (if any)
+
+Or "None - can start immediately" if no blockers.
+
+</issue-template>
+
+Do NOT close or modify any parent issue.
--- a/skills/engineering/to-prd/SKILL.md
+++ b/skills/engineering/to-prd/SKILL.md
@ -1,21 +1,22 @@
 ---
 name: to-prd
-description: Turn the current conversation context into a PRD and submit it as a GitHub issue. Use when user wants to create a PRD from the current context.
+description: Turn the current conversation into a PRD and publish it to the project issue tracker — no interview, just synthesis of what you've already discussed.
+disable-model-invocation: true
 ---

 This skill takes the current conversation context and codebase understanding and produces a PRD. Do NOT interview the user — just synthesize what you already know.

+The issue tracker and triage label vocabulary should have been provided to you — run `/setup-matt-pocock-skills` if not.
+
 ## Process

-1. Explore the repo to understand the current state of the codebase, if you haven't already.
+1. Explore the repo to understand the current state of the codebase, if you haven't already. Use the project's domain glossary vocabulary throughout the PRD, and respect any ADRs in the area you're touching.

-2. Sketch out the major modules you will need to build or modify to complete the implementation. Actively look for opportunities to extract deep modules that can be tested in isolation.
+2. Sketch out the seams at which you're going to test the feature. Existing seams should be preferred to new ones. Use the highest seam possible. If new seams are needed, propose them at the highest point you can. The fewer seams across the codebase, the better - the ideal number is one.

-A deep module (as opposed to a shallow module) is one which encapsulates a lot of functionality in a simple, testable interface which rarely changes.
+Check with the user that these seams match their expectations.

-Check with the user that these modules match their expectations. Check with the user which modules they want tests written for.
-
-3. Write the PRD using the template below and submit it as a GitHub issue.
+3. Write the PRD using the template below, then publish it to the project issue tracker. Apply the `ready-for-agent` triage label - no need for additional triage.

 <prd-template>

@ -53,6 +54,8 @@ A list of implementation decisions that were made. This can include:

 Do NOT include specific file paths or code snippets. They may end up being outdated very quickly.

+Exception: if a prototype produced a snippet that encodes a decision more precisely than prose can (state machine, reducer, schema, type shape), inline it within the relevant decision and note briefly that it came from a prototype. Trim to the decision-rich parts — not a working demo, just the important bits.
+
 ## Testing Decisions

 A list of testing decisions that were made. Include:
--- a/skills/engineering/triage/AGENT-BRIEF.md
+++ b/skills/engineering/triage/AGENT-BRIEF.md
@ -1,6 +1,8 @@
 # Writing Agent Briefs

-An agent brief is a structured comment posted on a GitHub issue when it moves to `ready-for-agent`. It is the authoritative specification that an AFK agent will work from. The original issue body and discussion are context — the agent brief is the contract.
+An agent brief is a structured comment posted on a GitHub issue or PR when it moves to `ready-for-agent`. It is the authoritative specification that an AFK agent will work from. The original body and discussion are context — the agent brief is the contract.
+
+The brief states **what the agent should do**, which stretches to both surfaces: for an issue, that's building the change from nothing; for a PR, it's what's left to do *to the existing diff* — finish it, close gaps, address review points. Same principles either way; the PR example below shows the difference.

 ## Principles

@ -143,6 +145,43 @@ checked for matches.
 - Bug reports (only enhancement rejections go to `.out-of-scope/`)
 ```

+### Good agent brief (PR)
+
+For a PR, "Current behavior" describes the state of the diff, and the brief asks the agent to finish or fix it rather than build from scratch.
+
+```markdown
+## Agent Brief
+
+**Category:** enhancement
+**Summary:** Finish the contributor's `--json` output flag for `triage list`
+
+**Current behavior:**
+The PR adds a `--json` flag that serializes the issue list to JSON. The happy
+path works and the diff matches the project's command structure. Two gaps
+remain: errors are still printed as human text (not JSON), and the new flag has
+no test coverage.
+
+**Desired behavior:**
+With `--json`, all output — including errors — is well-formed JSON on stdout,
+and the command's exit codes are unchanged. The existing human-readable output
+is untouched when the flag is absent.
+
+**Key interfaces:**
+- The command's error path should emit `{ "error": string }` under `--json`
+  instead of the plain-text error
+- Reuse the existing serializer the PR already added; don't introduce a second
+
+**Acceptance criteria:**
+- [ ] `triage list --json` emits valid JSON for both success and error cases
+- [ ] Exit codes match the non-JSON command
+- [ ] A test covers the `--json` success output and one error case
+- [ ] Default (non-JSON) output is byte-for-byte unchanged
+
+**Out of scope:**
+- Adding `--json` to any other command
+- Changing the JSON shape of the success payload the PR already defined
+```
+
 ### Bad agent brief

 ```markdown
--- a/skills/engineering/triage/OUT-OF-SCOPE.md
+++ b/skills/engineering/triage/OUT-OF-SCOPE.md
@ -83,7 +83,11 @@ The maintainer may:

 ## When to write to `.out-of-scope/`

-Only when an **enhancement** (not a bug) is rejected as `wontfix`. The flow:
+Only when an **enhancement** (not a bug) is *rejected* as `wontfix`. This applies to enhancement PRs exactly as it does to issues — a rejected PR is recorded here so the same request doesn't return as fresh code.
+
+Do **not** write here when something is closed as `wontfix` because it's **already implemented**. That's a built feature, not a rejected one; recording it would poison the dedup checks with false rejections. Instead, the closing comment points to where the feature already lives.
+
+The flow:

 1. Maintainer decides a feature request is out of scope
 2. Check if a matching `.out-of-scope/` file already exists
--- a/skills/engineering/triage/SKILL.md
+++ b/skills/engineering/triage/SKILL.md
@ -0,0 +1,112 @@
+---
+name: triage
+description: Move issues and external PRs through a state machine of triage roles — categorise, verify, grill if needed, and write agent-ready briefs.
+disable-model-invocation: true
+---
+
+# Triage
+
+Move issues on the project issue tracker through a small state machine of triage roles.
+
+If this repo treats external pull requests as a request surface (see the issue-tracker config), triage covers them too: **a PR is an issue with attached code** — same roles, same states, same machine, with a few deltas marked "for a PR" below. Resolve a bare `#42` to an issue or PR per the tracker config.
+
+Every comment or issue posted to the issue tracker during triage **must** start with this disclaimer:
+
+```
+> *This was generated by AI during triage.*
+```
+
+## Reference docs
+
+- [AGENT-BRIEF.md](AGENT-BRIEF.md) — how to write durable agent briefs
+- [OUT-OF-SCOPE.md](OUT-OF-SCOPE.md) — how the `.out-of-scope/` knowledge base works
+
+## Roles
+
+Two **category** roles:
+
+- `bug` — something is broken
+- `enhancement` — new feature or improvement
+
+Five **state** roles:
+
+- `needs-triage` — maintainer needs to evaluate
+- `needs-info` — waiting on reporter for more information
+- `ready-for-agent` — fully specified, ready for an AFK agent
+- `ready-for-human` — needs human implementation
+- `wontfix` — will not be actioned
+
+For a PR, the same states read against the attached code: `ready-for-agent` means a brief is attached and an agent should take the next step on the diff; `ready-for-human` means it's ready for a human to merge.
+
+Every triaged issue should carry exactly one category role and one state role. If state roles conflict, flag it and ask the maintainer before doing anything else.
+
+These are canonical role names — the actual label strings used in the issue tracker may differ. The mapping should have been provided to you - run `/setup-matt-pocock-skills` if not.
+
+State transitions: an unlabeled issue normally goes to `needs-triage` first; from there it moves to `needs-info`, `ready-for-agent`, `ready-for-human`, or `wontfix`. `needs-info` returns to `needs-triage` once the reporter replies. The maintainer can override at any time — flag transitions that look unusual and ask before proceeding.
+
+## Invocation
+
+The maintainer invokes `/triage` and describes what they want in natural language. Interpret the request and act. Examples:
+
+- "Show me anything that needs my attention"
+- "Let's look at #42" (issue or PR)
+- "Move #42 to ready-for-agent"
+- "What's ready for agents to pick up?"
+
+## Show what needs attention
+
+Query the issue tracker and present three buckets, oldest first:
+
+1. **Unlabeled** — never triaged.
+2. **`needs-triage`** — evaluation in progress.
+3. **`needs-info` with reporter activity since the last triage notes** — needs re-evaluation.
+
+When PRs are in scope, include external PRs in these buckets and tag each line `[PR]` or `[issue]`. Discovery surfaces only *external* PRs (the tracker config defines who counts as external) — a collaborator's in-flight PR is not triage work. This filter is discovery-only; an explicitly named PR is always triaged regardless of author.
+
+Show counts and a one-line summary per item. Let the maintainer pick.
+
+## Triage a specific issue or PR
+
+1. **Gather context.** Read the full issue or PR (body, comments, labels, author, dates; for a PR, the diff too). Parse any prior triage notes so you don't re-ask resolved questions. Explore the codebase using the project's domain glossary, respecting ADRs in the area. Run two checks against the codebase: (a) **redundancy** — search for an existing implementation of the requested behavior by domain concept (not just the request's wording), and report where you looked. If found, it's an already-implemented `wontfix` (step 5). (b) **prior rejection** — read `.out-of-scope/*.md` and surface any that resembles this request.
+
+2. **Recommend.** Tell the maintainer your category and state recommendation with reasoning, plus a brief codebase summary relevant to the request — including whether it's already implemented. Wait for direction.
+
+3. **Verify the claim.** Before any grilling, check that the claim holds up. For a bug, reproduce it from the reporter's steps. For a PR, confirm the diff does what it claims — check it out, run the relevant tests or commands. Report what happened: confirmed (with code path), failed, or insufficient detail (a strong `needs-info` signal). A confirmed verification makes a much stronger agent brief.
+
+4. **Grill (if needed).** If the request needs fleshing out, run the `/grilling` and `/domain-modeling` skills together — grill it into shape one question at a time, sharpening domain terms and updating `CONTEXT.md`/ADRs inline as decisions land.
+
+5. **Apply the outcome:**
+   - `ready-for-agent` — post an agent brief comment ([AGENT-BRIEF.md](AGENT-BRIEF.md)).
+   - `ready-for-human` — same structure as an agent brief, but note why it can't be delegated (judgment calls, external access, design decisions, manual testing).
+   - `needs-info` — post triage notes (template below).
+   - `wontfix` — close, with the comment depending on *why*:
+     - **Already implemented** — the change already exists in the codebase. Point to where it lives; do **not** write to `.out-of-scope/` (that KB is for *rejected* requests, not built ones).
+     - **Rejected (bug)** — polite explanation, then close.
+     - **Rejected (enhancement)** — write to `.out-of-scope/`, link to it from a comment, then close ([OUT-OF-SCOPE.md](OUT-OF-SCOPE.md)).
+   - `needs-triage` — apply the role. Optional comment if there's partial progress.
+
+## Quick state override
+
+If the maintainer says "move #42 to ready-for-agent", trust them and apply the role directly. Confirm what you're about to do (role changes, comment, close), then act. Skip grilling. If moving to `ready-for-agent` without a grilling session, ask whether they want to write an agent brief.
+
+## Needs-info template
+
+```markdown
+## Triage Notes
+
+**What we've established so far:**
+
+- point 1
+- point 2
+
+**What we still need from you (@reporter):**
+
+- question 1
+- question 2
+```
+
+Capture everything resolved during grilling under "established so far" so the work isn't lost. Questions must be specific and actionable, not "please provide more info".
+
+## Resuming a previous session
+
+If prior triage notes exist on the issue or PR, read them, check whether the reporter has answered any outstanding questions, and present an updated picture before continuing. Don't re-ask resolved questions.
--- a/skills/in-progress/README.md
+++ b/skills/in-progress/README.md
@ -0,0 +1,10 @@
+# In Progress
+
+Skills that are still being developed. They're not ready to ship — expect rough edges, breaking changes, and abandoned experiments. They're excluded from the plugin and the top-level README until they graduate to a stable bucket.
+
+- **[decision-mapping](./decision-mapping/SKILL.md)** — Turn a loose idea into a sequenced map of investigation tickets, then drive them to resolution one at a time. User-invoked.
+- **[loop-me](./loop-me/SKILL.md)** — Grill yourself into implementable workflow specs over multiple sessions, using the current directory as a stateful workspace. User-invoked.
+- **[review](./review/SKILL.md)** — Review changes since a fixed point along two parallel axes: **Standards** (does the diff follow the repo's coding standards?) and **Spec** (does the diff faithfully implement the originating issue/PRD?).
+- **[writing-beats](./writing-beats/SKILL.md)** — Shape an article as a journey of beats, choose-your-own-adventure style. Pick a starting beat, write only that beat, then pivot to the next, until the article reaches a natural end.
+- **[writing-fragments](./writing-fragments/SKILL.md)** — Grilling session that mines you for fragments — heterogeneous nuggets of writing — and appends them to a single document as raw material for a future article.
+- **[writing-shape](./writing-shape/SKILL.md)** — Take a markdown file of raw material and shape it into an article paragraph by paragraph, arguing format choices at each step.
--- a/skills/in-progress/decision-mapping/SKILL.md
+++ b/skills/in-progress/decision-mapping/SKILL.md
@ -0,0 +1,84 @@
+---
+name: decision-mapping
+description: Turn a loose idea into a sequenced map of investigation tickets, then drive them to resolution one at a time.
+disable-model-invocation: true
+---
+
+This skill is invoked when a loose idea requires more than one agent session to turn into a plan. It creates a stateful decision map in a markdown file, and drives the user through a sequence of tickets to resolve the open questions - which may require either prototyping, research or discussion.
+
+## The Decision Map
+
+The decision map is a single compact Markdown file, one per planning effort, git-tracked alongside the project. It is the canonical artifact — the **whole map is loaded as context into every session**, so it must stay compact.
+
+Assets created during tickets should be linked to from the map, not duplicated within it.
+
+### Structure
+
+Numbered entries ("tickets"), each its own section keyed by its number:
+
+```markdown
+## #1: Relational Or Non-Relational Database?
+
+Blocked by: #<ticket-number>, #<ticket-number>
+Type: Research | Prototype | Grilling
+
+### Question
+
+<question-here>
+
+### Answer
+
+<answer-here>
+```
+
+Each ticket must be sized to one 100K token agent session.
+
+## Ticket Types
+
+There are three types of tickets:
+
+- **Research**: Reading documentation, third-party API's, or local resources like knowledge bases. Creates a markdown summary as an asset. Use this when knowledge outside the current working directory is required.
+- **Prototype**: Writing UI or logic code to test a hypothesis, or to explore a design space. Uses the /prototype skill. Creates a prototype as an asset. Use this when "how should it look" or "how should it behave" is the key question.
+- **Grilling**: Conversation with the agent. Uses the /grilling and /domain-modelling skills. Asks one question at a time. The default case.
+
+## Fog of war
+
+The map is _deliberately_ incomplete beyond the frontier. Your job is to investigate the frontier, and to resolve tickets in order to push the frontier forward. Push back the fog of war, one node at a time.
+
+At some point, the fog of war should have been pushed back far enough that the path to the finish line is clear. At that point, no more tickets will be required and the decision map can be considered 'done'.
+
+## Invocation
+
+There are two ways this skill can be invoked: **bootstrap** and **resume**.
+
+### Bootstrap
+
+User invokes with a loose idea.
+
+1. Run a /grilling + /domain-modelling session to surface the open decisions. Ask one question at a time.
+2. Write a new decision map — mostly fog, frontier identified, trivially-decidable entries resolved inline.
+3. Stop. Map-building is one session's work; do not also resolve tickets.
+
+### Resume
+
+User invokes with a path to an existing map and a ticket number.
+
+1. Load the **whole map** as context.
+2. Run a session to resolve the ticket, invoking skills as needed. If in doubt, use `/grilling` and `/domain-modelling`.
+3. Record what the session resolved in the ticket's body.
+4. Add newly-discovered tickets (with correct `blocked_by` edges).
+5. Stop.
+
+If the decisions made invalidate other parts of the map, update or delete those nodes.
+
+## Parallelism
+
+The user may choose to run tickets in parallel, so expect other agents to make changes to the map.
+
+## Skipping The Decision Map
+
+Many times, the initial grilling will result in no fog of war. No unresolved tickets. Nothing to do, except implement.
+
+In those situations, you should offer the user the chance to skip the decision map - since the decision map is only needed if multi-session decisions need to be made.
+
+If they skip it, you should recommend either implementing directly or using `/to-prd` to schedule a multi-session implementation.
--- a/skills/in-progress/loop-me/SKILL.md
+++ b/skills/in-progress/loop-me/SKILL.md
@ -0,0 +1,32 @@
+---
+name: loop-me
+description: Grill me about specs for the workflows I want to build, within this workspace.
+disable-model-invocation: true
+argument-hint: "A workflow to design, or nothing to go find one"
+---
+
+Run a stateful `/grilling` session whose only output is **workflow** specs. Use the grilling discipline — relentless, one question at a time, a recommended answer attached to each — aimed at the vocabulary and goal below. Create, edit, and delete specs as the grilling resolves things.
+
+## The loop lens
+
+A **loop** is a recurring pattern in the user's life: their career, their week, their morning, a single repeated activity. Picturing a life as loops within loops reveals how predictable its activities really are — which is what makes them worth **delegating**. Use the lens to find loops worth specifying, and propose ones the user hasn't noticed.
+
+A **workflow** is the spec of one loop, made real. You run a workflow on a loop — the loop is its running instantiation. Workflows live in `workflows/*.md` and are the source of truth.
+
+## Vocabulary
+
+A shared language, reached for only when a workflow calls for it — never a checklist. **Mandate nothing structural**: a workflow needs no AI, no checkpoint, and no schedule unless the grilling shows it does.
+
+- **Trigger** — what fires each run: an **event** (a new email, a new issue) or a **schedule** (every morning). Event-triggering is usually the more efficient.
+- **Checkpoint** — a human-in-the-loop point where the user is asked to verify or decide. Some workflows have none and run autonomously; some use no AI at all.
+- **Push right** — defer the checkpoint as far as it will go. Do maximal work before involving the human, so they are asked once, late, with everything prepared.
+- **Brief** — what a checkpoint presents: a tight, decision-ready summary — what was produced, why, and a link down to the asset itself — never the raw output. The user reads a brief, not a draft. Speed of review is imperative.
+
+## Definition of done
+
+A workflow spec is done when an implementer agent could build it without asking a single question. Grill until then; nothing is done while a question remains.
+
+## The workspace
+
+- `workflows/*.md` — one spec per workflow.
+- `NOTES.md` — raw notes on the user's world: the tools they use, the channels they process, and their own terminology for both. When it is empty or thin, interview them about their world before specifying anything. Sharpen fuzzy terms into canonical ones as they surface, and record them here.
--- a/skills/in-progress/review/SKILL.md
+++ b/skills/in-progress/review/SKILL.md
@ -0,0 +1,69 @@
+---
+name: review
+description: Review the changes since a fixed point (commit, branch, tag, or merge-base) along two axes — Standards (does the code follow this repo's documented coding standards?) and Spec (does the code match what the originating issue/PRD asked for?). Runs both reviews in parallel sub-agents and reports them side by side. Use when the user wants to review a branch, a PR, work-in-progress changes, or asks to "review since X".
+---
+
+Two-axis review of the diff between `HEAD` and a fixed point the user supplies:
+
+- **Standards** — does the code conform to this repo's documented coding standards?
+- **Spec** — does the code faithfully implement the originating issue / PRD / spec?
+
+Both axes run as **parallel sub-agents** so they don't pollute each other's context, then this skill aggregates their findings.
+
+The issue tracker should have been provided to you — run `/setup-matt-pocock-skills` if `docs/agents/issue-tracker.md` is missing.
+
+## Process
+
+### 1. Pin the fixed point
+
+Whatever the user said is the fixed point — a commit SHA, branch name, tag, `main`, `HEAD~5`, etc. If they didn't specify one, ask for it.
+
+Capture the diff command once: `git diff <fixed-point>...HEAD` (three-dot, so the comparison is against the merge-base). Also note the list of commits via `git log <fixed-point>..HEAD --oneline`.
+
+Before going further, confirm the fixed point resolves (`git rev-parse <fixed-point>`) and the diff is non-empty. A bad ref or empty diff should fail here — not inside two parallel sub-agents.
+
+### 2. Identify the spec source
+
+Look for the originating spec, in this order:
+
+1. Issue references in the commit messages (`#123`, `Closes #45`, GitLab `!67`, etc.) — fetch via the workflow in `docs/agents/issue-tracker.md`.
+2. A path the user passed as an argument.
+3. A PRD/spec file under `docs/`, `specs/`, or `.scratch/` matching the branch name or feature.
+4. If nothing is found, ask the user where the spec is. If they say there isn't one, the **Spec** sub-agent will skip and report "no spec available".
+
+### 3. Identify the standards sources
+
+Anything in the repo that documents how code should be written, such as `CODING_STANDARDS.md` or `CONTRIBUTING.md`.
+
+### 4. Spawn both sub-agents in parallel
+
+Send a single message with two `Agent` tool calls. Use the `general-purpose` subagent for both.
+
+**Standards sub-agent prompt** — include:
+
+- The full diff command and commit list.
+- The list of standards-source files you found in step 3.
+- The brief: "Report — per file/hunk where relevant — every place the diff violates a documented standard. Cite the standard (file + the rule). Distinguish hard violations from judgement calls. Skip anything tooling enforces. Under 400 words."
+
+**Spec sub-agent prompt** — include:
+
+- The diff command and commit list.
+- The path or fetched contents of the spec.
+- The brief: "Report: (a) requirements the spec asked for that are missing or partial; (b) behaviour in the diff that wasn't asked for (scope creep); (c) requirements that look implemented but where the implementation looks wrong. Quote the spec line for each finding. Under 400 words."
+
+If the spec is missing, skip the Spec sub-agent and note this in the final report.
+
+### 5. Aggregate
+
+Present the two reports under `## Standards` and `## Spec` headings, verbatim or lightly cleaned. Do **not** merge or rerank findings — the two axes are deliberately separate (see _Why two axes_).
+
+End with a one-line summary: total findings per axis, and the worst issue _within each axis_ (if any). Don't pick a single winner across axes — that's the reranking the separation exists to prevent.
+
+## Why two axes
+
+A change can pass one axis and fail the other:
+
+- Code that follows every standard but implements the wrong thing → **Standards pass, Spec fail.**
+- Code that does exactly what the issue asked but breaks the project's conventions → **Spec pass, Standards fail.**
+
+Reporting them separately stops one axis from masking the other.
--- a/skills/in-progress/writing-beats/SKILL.md
+++ b/skills/in-progress/writing-beats/SKILL.md
@ -0,0 +1,52 @@
+---
+name: writing-beats
+description: Shape an article as a journey of beats, choose-your-own-adventure style. The user picks a starting beat from the raw material, you write only that beat, then offer options for where to pivot next, beat by beat, until the article reaches a natural end. Use when the user has raw material and wants to assemble it as a narrative rather than an argument.
+---
+
+<what-to-do>
+
+The user has passed (or will pass) a markdown file of raw material.
+
+If the user did not say where to save the article, ask once and remember the path.
+
+Then run a beat-by-beat journey:
+
+1. Write 2–3 candidate **starting beats**, drawn from the raw material. Each is a different entry point into the article. Show the user the beats before writing it to the article file. The user picks one. Preview what beats that might lead to once written - as if the user is seeing a little way down the path.
+2. Once the user picks a starting beat, write **only that beat** to the article file. A beat may be one sentence or several paragraphs — whatever that beat naturally is. Stop there.
+3. Re-read the article file from disk. Then offer 2–3 candidate **next beats** — different directions the journey could pivot to from where the article now stands.
+4. Loop steps 2–4 until the article reaches a natural end.
+
+</what-to-do>
+
+<supporting-info>
+
+## What is a beat
+
+A beat is one move in the journey. It does one thing — sets a scene, lands a point, asks a question, drops an aside, twists the angle. Then it stops, leaving the reader at a place where the next beat can pivot.
+
+A beat is sized by what it needs:
+
+- A single sentence if that's all the move is ("And then nothing happened for three weeks.").
+- A short paragraph if the move needs setup.
+- Multiple paragraphs if the beat is a self-contained vignette, argument, or example.
+
+If a "beat" needs five paragraphs and three subheadings, it's not a beat — it's two beats glued together. Split it.
+
+## Writing one beat
+
+Once a beat is picked, write _that beat only_ to the article file. Do not write the next beat.
+
+Pull material from the raw pile to populate the beat. You can paraphrase, split, recombine, or quote. The pile is a quarry.
+
+## Ending the journey
+
+The article ends when the journey is complete — not when the pile is empty. Most piles will have leftover fragments that don't make it in. That is fine; that is the point of having more raw material than you need.
+
+## Writing rhythm
+
+- Append one beat at a time. Never write ahead.
+- Re-read the article file from disk before every write. Preserve user edits absolutely.
+- If the user edits a previous beat substantially, let it change what comes next.
+- If the user says "rewrite that beat" or "go back and try a different beat 3", do it — edit in place, leave the rest alone.
+
+</supporting-info>
--- a/skills/in-progress/writing-fragments/SKILL.md
+++ b/skills/in-progress/writing-fragments/SKILL.md
@ -0,0 +1,75 @@
+---
+name: writing-fragments
+description: Grilling session that mines the user for fragments — heterogeneous nuggets of writing (claims, vignettes, sharp sentences, half-thoughts) — and appends them to a single document as raw material for a future article. Use when the user wants to develop ideas before imposing structure, or mentions "fragments", "ideate", or "raw material" for writing.
+---
+
+<what-to-do>
+
+Run a grilling session that produces fragments. Interview the user relentlessly about whatever they want to write about. Do not impose phases, outlines, or structure — that is explicitly out of scope.
+
+As fragments emerge from either side of the conversation, append them to a single markdown file. The user will be editing this file during the session; always re-read it before writing so their edits are preserved.
+
+If the user did not pass a path, ask once where to save the document, then remember it for the rest of the session.
+
+Capture fragments from the very first thing the user says, including the initial prompt.
+
+On first write, put a single H1 at the top with a working title (it can change later) and nothing else — no metadata, no TOC, no date.
+
+</what-to-do>
+
+<supporting-info>
+
+## What is a fragment
+
+A fragment is any piece of text that might survive into the final article. It must be _readable by the author_ — the author can tell what it means — but it does not need to define its terms or be comprehensible to a cold reader. The bar is "is this a piece of good writing?", not "is this a self-contained argument?"
+
+Fragments are deliberately heterogeneous. Examples of what could be a fragment:
+
+- A sharp sentence you'd want to deploy somewhere but don't yet know where.
+- A claim with a one-line justification.
+- A vignette: a thing that happened, a code snippet, a scenario, an analogy.
+- A half-thought: "something about how X feels like Y, work this out later."
+- A quote, a piece of dialogue, an overheard line.
+- A list of related observations that hang together by feel.
+- A complaint, a confession, a punchline.
+
+The novelist's diary is the model: years of unstructured noticings that later get mined for raw material. Fragments are noticings.
+
+## File format
+
+```markdown
+# Working title
+
+A first fragment lives here.
+
+It can be multiple paragraphs. It can include lists, code, quotes — whatever
+shape the fragment naturally takes.
+
+---
+
+A second fragment.
+
+---
+
+> A quoted line that the user wants to keep around.
+
+A reaction to it.
+
+---
+
+- A cluster of related observations
+- That hang together by feel
+- And want to be near each other
+```
+
+Fragments are separated by a horizontal rule (`\n---\n`). No headings inside the body. No tags. No order beyond the order they were added.
+
+## Writing rhythm
+
+Append silently. Don't ask permission for each fragment. Mention what you added in passing ("adding that"), but don't interrupt the conversation with save dialogs.
+
+Before every write: re-read the file from disk. The user may have edited, reordered, or deleted fragments between turns — preserve their changes. Never overwrite the file; only append (or, if the user asks, edit a specific fragment in place).
+
+The user can say "cut the last one", "rewrite that one sharper", "merge those two" at any time. Treat those as first-class instructions.
+
+</supporting-info>
--- a/skills/in-progress/writing-shape/SKILL.md
+++ b/skills/in-progress/writing-shape/SKILL.md
@ -0,0 +1,64 @@
+---
+name: writing-shape
+description: Take a markdown file of raw material and shape it into an article through a conversational session — drafting candidate openings, growing the piece paragraph by paragraph, arguing about format (lists, tables, callouts, quotes) at each step. Use when the user has a pile of notes, fragments, or a rough draft and wants help turning it into something publishable.
+---
+
+<what-to-do>
+
+The user has passed (or will pass) a markdown file of raw material. Treat it as the input pile — anything from a tidy list of fragments to a wall of unstructured prose to a transcript. The format does not matter. Read it end-to-end before doing anything else.
+
+Then run a shaping session that produces a separate article document. Do not edit the raw material file — it is read-only to this skill.
+
+If the user did not say where to save the article, ask once and remember the path. The user will be editing the article file during the session; always re-read it before writing so their edits are preserved.
+
+</what-to-do>
+
+<supporting-info>
+
+## The loop
+
+1. **Read the pile.** Read the input file in full. Form a sense of what's in it.
+2. **Draft 2–3 candidate openings.** Each opening should imply a different thesis or angle for the article. Show all of them. Force the user to pick or compose a hybrid. The chosen opening defines what the rest of the article must do.
+3. **Grow paragraph by paragraph.** After the opening lands, ask "given this opening, what does the reader need to hear next?" Pull material from the pile to answer. Argue about whether the next beat is a paragraph, a list, a table, a callout, a quote, a code block. Each format choice should be deliberate and defensible.
+4. **Append to the article file as you go.** Don't batch. Write each agreed paragraph or block immediately so the user can see the article taking shape.
+5. **Loop step 3 until the article is done.** The user decides when it's done.
+
+## Conversational feel
+
+This is a grilling session inverted. In ideation, the question was "what are you actually noticing?" Here it's "what is this article actually arguing, and in what order does the reader need to hear it?" Push back. Refuse to let weak transitions slide. If a paragraph doesn't earn its place, cut it.
+
+Specific moves to keep using:
+
+- "What does this paragraph do for the reader that the previous one didn't?"
+- "If I cut this, what breaks?"
+- "Is this prose, or should it be a list? Why prose?"
+- "This sentence is doing two jobs — split it or pick one."
+- "The opening promised X. We've drifted to Y. Either re-thread it or change the opening."
+
+## Pulling from the pile
+
+Treat the raw material as a quarry, not a script. Pull a fragment, rework it to fit the surrounding paragraph, and place it. A fragment may be split across multiple paragraphs, merged with another, or paraphrased. The pile's job is to be mined; the article's job is to read as one voice.
+
+If the pile lacks something the article needs, name the gap explicitly: "We need an example here and the pile doesn't have one — give me one now or we cut this section."
+
+## Format arguments to actually have
+
+When choosing how to render a beat, weigh these tradeoffs out loud with the user, not silently:
+
+- **Prose vs. list.** Prose carries argument; lists carry parallel items. If items aren't truly parallel, prose is better. If they are, a list is faster to scan.
+- **Inline vs. callout.** Tips, warnings, and asides go in callouts (`> [!TIP]`, `> [!NOTE]`) — but only if they'd genuinely derail the main argument inline. Otherwise leave them inline.
+- **Table vs. repeated structure.** If the same shape repeats 3+ times with the same fields, a table. Otherwise prose with bold leads.
+- **Quote vs. paraphrase.** Quote when the original wording is the point. Paraphrase when only the idea matters.
+- **Code block vs. inline code.** Multi-line, runnable, or illustrative → block. Single token or identifier → inline.
+
+## Writing rhythm
+
+Append to the article file as each block is agreed. Re-read the file from disk before every write — the user may have edited between turns. Never overwrite blindly. If the user wants a paragraph rewritten, edit that specific paragraph in place; leave the rest alone.
+
+## Out of scope
+
+- Mining for new fragments that aren't in the pile (the pile is the input — if it's incomplete, name the gap and either get the user to fill it or cut the section).
+- Editing the raw material file.
+- Publishing, formatting for a specific platform, or adding frontmatter the user didn't ask for.
+
+</supporting-info>
--- a/skills/misc/README.md
+++ b/skills/misc/README.md
@ -0,0 +1,8 @@
+# Misc
+
+Tools I keep around but rarely use.
+
+- **[git-guardrails-claude-code](./git-guardrails-claude-code/SKILL.md)** — Set up Claude Code hooks to block dangerous git commands (push, reset --hard, clean, etc.) before they execute.
+- **[migrate-to-shoehorn](./migrate-to-shoehorn/SKILL.md)** — Migrate test files from `as` type assertions to @total-typescript/shoehorn.
+- **[scaffold-exercises](./scaffold-exercises/SKILL.md)** — Create exercise directory structures with sections, problems, solutions, and explainers.
+- **[setup-pre-commit](./setup-pre-commit/SKILL.md)** — Set up Husky pre-commit hooks with lint-staged, Prettier, type checking, and tests.
--- a/skills/misc/git-guardrails-claude-code/SKILL.md
+++ b/skills/misc/git-guardrails-claude-code/SKILL.md
--- a/skills/misc/git-guardrails-claude-code/scripts/block-dangerous-git.sh
+++ b/skills/misc/git-guardrails-claude-code/scripts/block-dangerous-git.sh
--- a/skills/misc/migrate-to-shoehorn/SKILL.md
+++ b/skills/misc/migrate-to-shoehorn/SKILL.md
--- a/skills/misc/scaffold-exercises/SKILL.md
+++ b/skills/misc/scaffold-exercises/SKILL.md
--- a/skills/misc/setup-pre-commit/SKILL.md
+++ b/skills/misc/setup-pre-commit/SKILL.md
--- a/skills/personal/README.md
+++ b/skills/personal/README.md
@ -0,0 +1,6 @@
+# Personal
+
+Skills tied to my own setup, not promoted in the plugin.
+
+- **[edit-article](./edit-article/SKILL.md)** — Edit and improve articles by restructuring sections, improving clarity, and tightening prose.
+- **[obsidian-vault](./obsidian-vault/SKILL.md)** — Search, create, and manage notes in an Obsidian vault with wikilinks and index notes.
--- a/skills/personal/edit-article/SKILL.md
+++ b/skills/personal/edit-article/SKILL.md
@ -1,6 +1,7 @@
 ---
 name: edit-article
 description: Edit and improve articles by restructuring sections, improving clarity, and tightening prose. Use when user wants to edit, revise, or improve an article draft.
+disable-model-invocation: true
 ---

 1. First, divide the article into sections based on its headings. Think about the main points you want to make during those sections.
--- a/skills/personal/obsidian-vault/SKILL.md
+++ b/skills/personal/obsidian-vault/SKILL.md
--- a/skills/productivity/README.md
+++ b/skills/productivity/README.md
@ -0,0 +1,18 @@
+# Productivity
+
+General workflow tools, not code-specific.
+
+## User-invoked
+
+Reachable only when you type them (`disable-model-invocation: true`).
+
+- **[grill-me](./grill-me/SKILL.md)** — Get relentlessly interviewed about a plan or design until every branch of the decision tree is resolved.
+- **[handoff](./handoff/SKILL.md)** — Compact the current conversation into a handoff document so another agent can continue the work.
+- **[teach](./teach/SKILL.md)** — Teach the user a new skill or concept over multiple sessions, using the current directory as a stateful teaching workspace.
+- **[writing-great-skills](./writing-great-skills/SKILL.md)** — Reference for writing and editing skills well: the vocabulary and principles that make a skill predictable.
+
+## Model-invoked
+
+Model- or user-reachable (rich trigger phrasing so the model can reach for them).
+
+- **[grilling](./grilling/SKILL.md)** — Interview the user relentlessly about a plan or design until every branch of the decision tree is resolved.
--- a/skills/productivity/grill-me/SKILL.md
+++ b/skills/productivity/grill-me/SKILL.md
@ -0,0 +1,7 @@
+---
+name: grill-me
+description: A relentless interview to sharpen a plan or design.
+disable-model-invocation: true
+---
+
+Run a `/grilling` session.
--- a/skills/productivity/grilling/SKILL.md
+++ b/skills/productivity/grilling/SKILL.md
@ -1,10 +1,10 @@
 ---
-name: grill-me
-description: Interview the user relentlessly about a plan or design until reaching shared understanding, resolving each branch of the decision tree. Use when user wants to stress-test a plan, get grilled on their design, or mentions "grill me".
+name: grilling
+description: Interview the user relentlessly about a plan or design. Use when the user wants to stress-test a plan before building, or uses any 'grill' trigger phrases.
 ---

 Interview me relentlessly about every aspect of this plan until we reach a shared understanding. Walk down each branch of the design tree, resolving dependencies between decisions one-by-one. For each question, provide your recommended answer.

-Ask the questions one at a time.
+Ask the questions one at a time, waiting for feedback on each question before continuing. Asking multiple questions at once is bewildering.

 If a question can be answered by exploring the codebase, explore the codebase instead.
--- a/skills/productivity/handoff/SKILL.md
+++ b/skills/productivity/handoff/SKILL.md
@ -0,0 +1,16 @@
+---
+name: handoff
+description: Compact the current conversation into a handoff document for another agent to pick up.
+argument-hint: "What will the next session be used for?"
+disable-model-invocation: true
+---
+
+Write a handoff document summarising the current conversation so a fresh agent can continue the work. Save to the temporary directory of the user's OS - not the current workspace.
+
+Include a "suggested skills" section in the document, which suggests skills that the agent should invoke.
+
+Do not duplicate content already captured in other artifacts (PRDs, plans, ADRs, issues, commits, diffs). Reference them by path or URL instead.
+
+Redact any sensitive information, such as API keys, passwords, or personally identifiable information.
+
+If the user passed arguments, treat them as a description of what the next session will focus on and tailor the doc accordingly.
--- a/skills/productivity/teach/GLOSSARY-FORMAT.md
+++ b/skills/productivity/teach/GLOSSARY-FORMAT.md
@ -0,0 +1,35 @@
+# GLOSSARY.md Format
+
+`GLOSSARY.md` is the canonical language for this teaching workspace. All explainers, exercises, and learning records should adhere to its terminology. Building it is itself part of learning: compressing a concept into a tight definition is evidence the user understands it.
+
+## Structure
+
+```md
+# {Topic} Glossary
+
+{One or two sentence description of the topic this glossary covers.}
+
+## Terms
+
+**Hypertrophy**:
+Muscle growth driven by mechanical tension and metabolic stress over repeated training sessions.
+_Avoid_: Bulking, getting big
+
+**Progressive overload**:
+Systematically increasing the demand on a muscle over time — via load, volume, or intensity.
+_Avoid_: Pushing harder, levelling up
+
+**RPE (Rate of Perceived Exertion)**:
+A 1–10 self-rating of how hard a set felt, where 10 is failure and 8 means two reps left in the tank.
+_Avoid_: Effort score, intensity rating
+```
+
+## Rules
+
+- **Add a term only when the user understands it.** The glossary is a record of compressed knowledge, not a dictionary the user reads to learn. If the user has just been introduced to a concept, wait until they can use it correctly before promoting it here.
+- **Be opinionated.** When several words exist for the same concept, pick the best one and list the rest as aliases to avoid. This is how language compresses.
+- **Keep definitions tight.** One or two sentences. Define what the term IS, not what it does or how to do it.
+- **Use the glossary's own terms inside definitions.** Once a term is in the glossary, prefer it everywhere — including inside other definitions. This is what makes complex terms easier to grasp later.
+- **Group under subheadings** when natural clusters emerge (e.g. `## Anatomy`, `## Programming`). A flat list is fine when terms cohere.
+- **Flag ambiguities explicitly.** If a term is used loosely in the wider field, note the resolution: "In this workspace, 'set' always means a working set — warm-ups are tracked separately."
+- **Revise as understanding deepens.** A definition the user wrote in week one may be wrong by week six. Update in place; do not leave stale entries.
--- a/skills/productivity/teach/LEARNING-RECORD-FORMAT.md
+++ b/skills/productivity/teach/LEARNING-RECORD-FORMAT.md
@ -0,0 +1,46 @@
+# Learning Record Format
+
+Learning records live in `./learning-records/` and use sequential numbering: `0001-slug.md`, `0002-slug.md`, etc. Create the directory lazily — only when the first record is written.
+
+They are the teaching equivalent of ADRs: they capture non-obvious lessons, key insights, and stated prior knowledge that will steer future sessions. They are used to calculate the zone of proximal development.
+
+## Template
+
+```md
+# {Short title of what was learned or established}
+
+{1-3 sentences: what was learned (or what prior knowledge was established), and why it matters for future sessions.}
+```
+
+That is the whole format. A learning record can be a single paragraph. The value is recording _that_ this is now known and _why_ it changes what to teach next — not in filling out sections.
+
+## Optional sections
+
+Only include these when they add genuine value. Most records won't need them.
+
+- **Status** frontmatter (`active | superseded by LR-NNNN`) — useful when an earlier understanding turns out to be wrong and is replaced.
+- **Evidence** — how the user demonstrated the understanding (a question answered, an exercise completed, prior experience cited). Useful when the claim might be revisited.
+- **Implications** — what this unlocks or rules out for future sessions. Worth recording when non-obvious.
+
+## Numbering
+
+Scan `./learning-records/` for the highest existing number and increment by one.
+
+## When to write a learning record
+
+Write one when any of these is true:
+
+1. **The user demonstrated genuine understanding of something non-trivial** — not just exposure, but evidence they can use the concept correctly. This sets a new floor for what to teach next.
+2. **The user disclosed prior knowledge** — "I already know X." Record it so future sessions don't re-teach it. Also record the _depth_ claimed.
+3. **A misconception was corrected** — the user previously believed something wrong and now sees why. These are high-value: they predict future stumbling blocks for related topics.
+4. **The mission shifted in response to learning** — the user discovered they cared about something different than they thought. Cross-link to [[MISSION.md]] and update it.
+
+### What does _not_ qualify
+
+- Material that was merely covered. Coverage is not learning. Wait for evidence.
+- Anything already captured tersely in [[GLOSSARY.md]] as a term definition. Don't duplicate.
+- Session-by-session activity logs. Learning records are not a journal — they are decision-grade insights.
+
+## Supersession
+
+When a later record contradicts an earlier one (the user's understanding deepened or corrected), mark the old record `Status: superseded by LR-NNNN` rather than deleting it. The history of how understanding evolved is itself useful signal.
--- a/skills/productivity/teach/MISSION-FORMAT.md
+++ b/skills/productivity/teach/MISSION-FORMAT.md
@ -0,0 +1,31 @@
+# MISSION.md Format
+
+`MISSION.md` lives at the workspace root. It captures the _reason_ the user is learning this topic. Every teaching decision — what to teach next, which resources to surface, which exercises to design — should trace back to this document.
+
+## Template
+
+```md
+# Mission: {Topic}
+
+## Why
+{1-3 sentences. The concrete real-world goal the user is chasing. What changes in their life or work when they have this skill? Avoid abstract framings like "to understand X" — push for the underlying outcome.}
+
+## Success looks like
+- {A specific, observable thing the user will be able to do}
+- {Another specific thing}
+- {…}
+
+## Constraints
+- {Time, budget, prior commitments, learning preferences, anything that bounds the approach}
+
+## Out of scope
+- {Adjacent topics the user explicitly does not want to chase right now — protects the zone of proximal development}
+```
+
+## Rules
+
+- **One mission per workspace.** If the user wants to learn two unrelated things, that is two workspaces.
+- **Concrete over abstract.** "Run a half marathon by October" beats "get fitter." "Ship a Rust CLI to my team" beats "learn Rust."
+- **Push back on vagueness.** If the user cannot articulate why, interview them before writing anything. A bad mission is worse than no mission.
+- **Revise when reality shifts.** Missions change. When the user's goal moves, update this file — don't leave a stale mission steering future sessions.
+- **Keep it short.** If `MISSION.md` runs past a screen, it has stopped being a compass and started being a plan.
--- a/skills/productivity/teach/RESOURCES-FORMAT.md
+++ b/skills/productivity/teach/RESOURCES-FORMAT.md
@ -0,0 +1,32 @@
+# RESOURCES.md Format
+
+`RESOURCES.md` is the curated set of trusted sources for this topic. Knowledge for explainers should be drawn from here, not from parametric guesses. Wisdom comes from the communities listed here.
+
+## Structure
+
+```md
+# {Topic} Resources
+
+## Knowledge
+
+- [Book: _The Science and Practice of Strength Training_ — Zatsiorsky & Kraemer](https://example.com)
+  Foundational text on programming and adaptation. Use for: anything to do with periodisation, recovery, intensity zones.
+- [Article: "How Much Should I Train?" — Greg Nuckols (Stronger By Science)](https://example.com)
+  Evidence-based review of volume landmarks. Use for: weekly set targets per muscle group.
+
+## Wisdom (Communities)
+
+- [r/weightroom](https://reddit.com/r/weightroom)
+  High-signal subreddit, moderated against bro-science. Use for: programme critique, plateau troubleshooting.
+- Local: Tuesday strength class at {gym name}
+  Use for: real-time coaching feedback on lifts.
+```
+
+## Rules
+
+- **High-trust only.** Prefer primary sources, recognised experts, peer-reviewed work, and communities with strong moderation. If a resource is marketing dressed as education, leave it out.
+- **Annotate every entry.** A bare link is useless in three months. Add one line: what it covers and when to reach for it.
+- **Group by Knowledge / Wisdom.** Mirrors the philosophy in [SKILL.md](./SKILL.md). It is fine for a resource to appear in only one group.
+- **Surface gaps explicitly.** If no good resource exists for an area the mission needs, write a `## Gaps` section listing what is missing. This drives future search.
+- **Prune ruthlessly.** A resource that turned out to be wrong, shallow, or off-mission should be removed, not buried. Better five sharp sources than thirty mediocre ones.
+- **Record community preferences.** If the user has opted out of joining communities, note it here so future sessions don't keep proposing them.
--- a/skills/productivity/teach/SKILL.md
+++ b/skills/productivity/teach/SKILL.md
@ -0,0 +1,140 @@
+---
+name: teach
+description: Teach the user a new skill or concept, within this workspace.
+disable-model-invocation: true
+argument-hint: "What would you like to learn about?"
+---
+
+The user has asked you to teach them something. This is a stateful request - they intend to learn the topic over multiple sessions.
+
+## Teaching Workspace
+
+Treat the current directory as a teaching workspace. The state of their learning is captured in this directory in several files:
+
+- `MISSION.md`: A document capturing the _reason_ the user is interested in the topic. This should be used to ground all teaching. Use the format in [MISSION-FORMAT.md](./MISSION-FORMAT.md).
+- `./reference/*.html`: A directory of reference materials. These are the compressed learnings from the lessons - cheat sheets, reference algorithms, syntax, yoga poses, glossaries. They are the raw units of learning. They should be beautiful documents which print out well, and are designed for quick reference.
+- `RESOURCES.md`: A list of resources which can be explored to ground your teaching in contextual knowledge, or to acquire knowledge and wisdom. Use the format in [RESOURCES-FORMAT.md](./RESOURCES-FORMAT.md).
+- `./learning-records/*.md`: A directory of learning records, which capture what the user has learned. These are loosely equivalent to architectural decision records in software development - they capture non-obvious lessons and key insights that may need to be revised later, or drive future sessions. These should be used to calculate the zone of proximal development. They are titled `0001-<dash-case-name>.md`, where the number increments each time. Use the format in [LEARNING-RECORD-FORMAT.md](./LEARNING-RECORD-FORMAT.md).
+- `./lessons/*.html`: A directory of lessons. A **lesson** is a single, self-contained HTML output that teaches one tightly-scoped thing tied to the mission. This is the primary unit of teaching in this workspace.
+- `./assets/*`: Reusable **components** shared across lessons. See [Assets](#assets).
+- `NOTES.md`: A scratchpad for you to jot down user preferences, or working notes.
+
+## Philosophy
+
+To learn at a deep level, the user needs three things:
+
+- **Knowledge**, captured from high-quality, high-trust resources
+- **Skills**, acquired through highly-relevant interactive lessons devised by you, based on the knowledge
+- **Wisdom**, which comes from interacting with other learners and practitioners
+
+Before the `RESOURCES.md` is well-populated, your focus should be to find high-quality resources which will help the user acquire knowledge. Never trust your parametric knowledge.
+
+Some topics may require more skills than knowledge. Learning more about theoretical physics might be more knowledge-based. For yoga, more skills-based.
+
+### Fluency vs Storage Strength
+
+You should be careful to split between two types of learning:
+
+- **Fluency strength**: in-the-moment retrieval of knowledge
+- **Storage strength**: long-term retention of knowledge
+
+Fluency can give the user an illusory sense of mastery, but storage strength is the real goal. Try to design lessons which build long-term retention by desirable difficulty:
+
+- Using retrieval practice (recall from memory)
+- Spacing (distributing practice over time)
+- Interleaving (mixing up different but related topics in practice - for skills practice only)
+
+## Lessons
+
+A lesson is the main thing you produce — the unit in which knowledge and skills reach the user. Each lesson is one self-contained HTML file, saved to `./lessons/` and titled `0001-<dash-case-name>.html` where the number increments each time.
+
+A lesson should be **beautiful** — clean, readable typography and layout — since the user will return to these later to review. Think Tufte.
+
+The lesson should be short, and completable very quickly. Learners' working memory is very small, and we need to stay within it. But each lesson should give the user a single tangible win that they can build on. It should be directly tied to the mission, and should be in the user's zone of proximal development.
+
+If possible, open the lesson file for the user by running a CLI command.
+
+Each lesson should link via HTML anchors to other lessons and reference documents.
+
+Each lesson should recommend a primary source for the user to read or watch. This should be the most high-quality, high-trust resource you found on the topic.
+
+Each lesson should contain a reminder to ask followup questions to the agent. The agent is their teacher, and can assist with anything that's unclear.
+
+## Assets
+
+Lessons are built from reusable **components**, stored in `./assets/`: stylesheets, quiz widgets, simulators, diagram helpers — anything a second lesson could reuse.
+
+Reuse is the default, not the exception. Before authoring a lesson, read `./assets/` and build from the components already there. When a lesson needs something new and reusable, write it as a component in `./assets/` and link to it — never inline code a future lesson would duplicate.
+
+A shared stylesheet is the first component every workspace earns: every lesson links it, so the lessons look like one consistent course rather than a pile of one-offs. As the workspace grows, so should the component library.
+
+## The Mission
+
+Every lesson should be tied into the mission - the reason that the user is interested in learning about the topic.
+
+If the user is unclear about the mission, or the `MISSION.md` is not populated, your first job should be to question the user on why they want to learn this.
+
+Failing to understand the mission will mean knowledge acquisition is not grounded in real-world goals. Lessons will feel too abstract. You will have no way of judging what the user should do next.
+
+Missions may change as the user develops more skills and knowledge. This is normal - make sure to update the `MISSION.md` and add a learning record to capture the change. Confirm with the user before changing the mission.
+
+## Zone Of Proximal Development
+
+Each lesson, the user should always feel as if they are being challenged 'just enough'.
+
+The user may specify an exact thing they want to learn. If they don't, figure out their zone of proximal development by:
+
+- Reading their `learning-records`
+- Figuring out the right thing to teach them based on their mission
+- Teach the most relevant thing that fits in their zone of proximal development
+
+## Knowledge
+
+Lessons should be designed around a skill the user is going to learn. The knowledge in the lesson should be only what's required to acquire that skill. You teach the knowledge first, then get the user to practice the skills via an interactive feedback loop.
+
+Knowledge should first be gathered from trusted resources. Use `RESOURCES.md` to keep track of them. Lessons should be littered with citations - links to external resources to back up any claim made. This increases the trustworthiness of the lesson.
+
+For acquiring knowledge, difficulty is the enemy. It eats working memory you need for understanding.
+
+## Skills
+
+If knowledge is all about acquisition, skills are about durability and flexibility. Make the knowledge stick.
+
+For skill acquisition, difficulty is the tool. Effortful retrieval is what builds storage strength. Skills should be taught through interactive lessons. There are several tools at your disposal:
+
+- Interactive lessons, using quizzes and light in-browser tasks
+- Lessons which guide the user through a list of real-world steps to take (for instance, yoga poses)
+
+Each of these should be based on a **feedback loop**, where the user receives feedback on their performance. This feedback loop should be as tight as possible, giving feedback immediately - and ideally automatically.
+
+For quizzes, each answer should be exactly the same number of words (and characters, if possible). Don't give the user any clues about the answer through formatting.
+
+## Acquiring Wisdom
+
+Wisdom comes from true real-world interaction - testing your skills outside the learning environment.
+
+When the user asks a question that appears to require wisdom, your default posture should be to attempt to answer - but to ultimately delegate to a **community**.
+
+A community is a place (online or offline) where the user can test their skills in the real world. This might be a forum, a subreddit, a real-world class (budget permitting) or a local interest group.
+
+You should attempt to find high-reputation communities the user can join. If the user expresses a preference that they don't want to join a community, respect it.
+
+## Reference Documents
+
+While creating lessons, you should also create reference documents. Lessons can reference these documents - they are useful for tracking raw units of knowledge useful across lessons.
+
+Lessons will rarely be revisited later - reference documents will be. They should be the compressed essence of the lesson, in a format designed for quick reference.
+
+Some learning topics lend themselves to reference:
+
+- Syntax and code snippets for programming
+- Algorithms and flowcharts for processes
+- Yoga poses and sequences for yoga
+- Exercises and routines for fitness
+- Glossaries for any topic with its own nomenclature
+
+Glossaries, in particular, are an essential reference. Once one is created, it should be adhered to in every lesson.
+
+## `NOTES.md`
+
+The user will sometimes express preferences of how they want to be taught, or things you should keep in mind. This is the place to record those preferences, so you can refer back to them when designing lessons or working with the user.
--- a/skills/productivity/writing-great-skills/GLOSSARY.md
+++ b/skills/productivity/writing-great-skills/GLOSSARY.md
@ -0,0 +1,195 @@
+# Glossary — Building Great Skills
+
+The domain model for what makes a skill great. A skill exists to wrangle determinism out of a stochastic system; the root virtue is **Predictability**, and every term below is a lever on it. This is the disclosed reference for [`writing-great-skills`](SKILL.md).
+
+The terms are grouped by axis: **Invocation** (how a skill is reached), **Information Hierarchy** (how its content is arranged), **Steering** (how the agent's runtime behaviour is shaped), and **Pruning** (how it is kept lean). Each **failure mode** lives beside the lever that cures it, tagged _failure mode_.
+
+**Bold terms** in any definition are themselves defined in this glossary; find them by their heading.
+
+## Predictability
+
+The degree to which a skill makes the agent behave the same _way_ on every run — the same process, not the same output (a brainstorming skill should _predictably_ diverge; its tokens vary, its behaviour doesn't). The root virtue every other term serves — cost and maintainability are symptoms of it, not rivals.
+
+_Avoid_: consistency, reliability, robustness, output-determinism
+
+## Invocation
+
+How a skill is reached — and the two loads you pay for the choice.
+
+### Model-Invoked
+
+A skill that keeps its **description** field, so the agent can see it and fire it autonomously — and the human can still type its name, so model-invocation always _includes_ user reach. There is no model-only state: a description only ever _adds_ agent discovery, never removes the human's. Pays a permanent **context load** on every turn in exchange for that discoverability. Reachable by other skills, because the description that makes it agent-discoverable makes it invocable. A model-invoked skill whose content is all **reference** is also one home for shared reference: another skill can invoke it, so reference needed by several skills lives in one place. Pick model-invocation only when the agent must reach the skill on its own; if it never fires except by hand, drop the description and pay no context load.
+
+_Avoid_: ability, tool, capability
+
+### User-Invoked
+
+A skill with its **description** stripped — invisible to the agent and reachable only by the human typing its name (user-_only_, where **model-invoked** is user-_and-agent_). Trades agent-discoverability for zero **context load**. Because it has no description, nothing but the human can reach it: no other skill can fire it.
+
+_Avoid_: procedure, workflow, command
+
+### Description
+
+The skill's machine-readable trigger, and the one **context pointer** a **model-invoked** skill is forced to keep loaded at all times. Its mere presence _is_ the invocation axis: keep it and the skill is model-invoked (and reachable by other skills); delete it and the skill is **user-invoked**, reachable only by the human. The source of a model-invoked skill's **context load**.
+
+_Avoid_: frontmatter, summary
+
+### Context Pointer
+
+A reference held in the agent's context that names some out-of-context material and encodes the condition for reaching it. The **description** is the top-level context pointer (context window → skill); pointers to disclosed files are the same object one level down. Its wording, not the target, decides _when_ the agent reaches — and _how reliably_. A must-have target behind a weakly worded pointer is a variance bug: fix the wording first, and inline the material only if sharpening fails.
+
+_Avoid_: link, reference, import
+
+### Context Load
+
+The cost a **model-invoked** skill imposes on the agent's context window — its **description**, always loaded, spending both tokens and attention. What **user-invoked** skills escape by having no description, and the brake on splitting into more model-invoked skills.
+
+_Avoid_: token cost, context bloat
+
+### Cognitive Load
+
+The cost a **user-invoked** skill imposes on the human — what they must hold in their head: which skills exist and when to reach for each (the human is the index). What **model-invocation** removes by being agent-discoverable, and the brake on splitting into more user-invoked skills. Not a cost to minimise: it is the price of human agency, the reason some skills stay user-invoked. Spend it where human judgement matters; remove it where it does not.
+
+_Avoid_: human index, burden, overhead
+
+### Router Skill
+
+A **user-invoked** skill whose job is to point at your other user-invoked skills — naming each and when to reach for it — so the human has one skill to remember instead of many. It can only hint, never fire them: user-invoked skills have no **description**, so nothing but the human can reach them. The cure for **cognitive load** when user-invoked skills multiply.
+
+_Avoid_: dispatcher, menu, registry, index, router procedure
+
+### Granularity
+
+How finely you divide skills. Finer division spends one of the two loads: more **model-invoked** skills spend **context load** (more descriptions crowding the window and competing for attention); more **user-invoked** skills spend **cognitive load** (more for the human to remember and reach for). Two cuts guide the division. By **invocation**, split off a model-invoked skill where you have a distinct **leading word** to trigger it — a trigger word you actually use in your prompts. By **sequence**, split a run of **steps** where a step's **post-completion steps** need hiding, since isolating it in its own context clears what follows. Beware the reverse: merging sequences exposes each step's post-completion steps to what follows, inviting premature completion.
+
+_Avoid_: chunking, modularity
+
+## Information Hierarchy
+
+How a skill's content is arranged, and how far down the ladder each piece sits.
+
+### Information Hierarchy
+
+A skill's content ranked by how immediately the agent needs it — a single ladder, produced by two cuts: in-file or behind a pointer, and step or reference. The rungs:
+
+- **Steps** — in-file, primary
+- **Reference**, in-file — secondary
+- **Reference**, disclosed — behind a **context pointer**
+
+A skill with no **steps** uses just the bottom two rungs — often a legitimately flat peer-set (e.g. every rule of a review on one rung), which is a fine arrangement, not a smell. The hierarchy is independent of invocation: a skill can be model- or user-invoked whether it is all steps, all reference, or both. When a skill has steps, in-file reference that should be disclosed buries them and turns attending to them into a coin-flip — a variance lever, not just a legibility one. Keep the top of the ladder legible; push down it whatever you can.
+
+_Avoid_: structure, organization, layout
+
+### Steps
+
+The ordered actions the agent performs — when a skill has them, the primary tier of its content, and the part that earns its place in SKILL.md. Not every skill has steps: a skill can be all steps (`tdd`), all **reference** (a review), or both, independent of invocation. Every step ends on a **completion criterion**, clear or vague.
+
+_Avoid_: workflow, instructions, choreography
+
+### Reference
+
+Material the agent refers to on demand — definitions, facts, parameters, examples, conditional instructions. When a skill has **steps** it is secondary to them; when a skill has none it is the entire content; or it lives outside any skill entirely — see **External Reference**. Reached via **context pointers**, and the prime candidate for **progressive disclosure**.
+
+_Avoid_: supporting material, docs, background
+
+### External Reference
+
+**Reference** that lives outside the skill system — a plain file, no **description**, no **steps**, not invocable — that any skill can point at. The home for shared reference that needn't fire on its own, and the only shared home two **user-invoked** skills can use, since neither has a description and so neither can fire the other.
+
+_Avoid_: doc, resource, knowledge base
+
+### Progressive Disclosure
+
+Moving **reference** down the ladder — out of SKILL.md and behind a **context pointer** — so the top stays legible. Not primarily a token optimisation; it is how the **information hierarchy** is protected. Licensed by **branching**: disclose what only some branches need, inline what every path needs, and if a pointer fires unreliably on must-have material, sharpen its wording, and pull it back inline only if that fails.
+
+_Avoid_: lazy loading, chunking
+
+### Co-location
+
+Keeping the material an agent needs at once in one place — a concept's definition, rules, and caveats under a single heading, not scattered across the file — so reading one part brings its neighbours with it. The within-file companion to the **Information Hierarchy**: the hierarchy ranks _how far down_ a piece sits; co-location decides _what sits beside it_ once there. There is no formula for the right format of a body of **reference**; the test is that a skill should read like documentation written for the agent, and grouped material reads that way where scattered material does not. Distinct from **Duplication**: that repeats one meaning in two places, where scattering fragments a single meaning across many.
+
+_Avoid_: grouping, clustering, cohesion
+
+### Sprawl
+
+_Failure mode._ A skill that is simply too long — too many lines in SKILL.md — independent of whether they are stale or repeated. Even an all-live, all-unique skill can sprawl. It costs readability (the agent wades through more before it can act, and attention thins across the excess), maintainability (every extra line is one more to keep **relevant**), and tokens. The cure is the **information hierarchy**: push **reference** down behind **context pointers**, and split by **branch** or sequence so each path carries only what it needs. Distinct from **sediment** (length from stale accumulation) and **duplication** (length from repeated meaning) — sprawl is length itself, whatever its cause.
+
+_Avoid_: bloat, length, size, verbosity
+
+## Steering
+
+The levers that shape the agent's runtime behaviour toward **Predictability**.
+
+### Branch
+
+A distinct way a skill can be invoked — a case the skill handles — so different runs take different paths through it. A skill with many steps may carry many branches; a linear one has none.
+
+_Avoid_: path, case, fork
+
+### Leading Word
+
+A compact concept — also called a _Leitwort_ — already living in the model's pretraining, that the agent thinks with while running the skill. It encodes a behavioural principle in the fewest possible tokens by invoking priors the model already holds (e.g. _lesson_, _proximal zone of development_, _fog of war_, _tracer bullets_). Repeated as a token, never as a sentence, it accumulates a distributed definition across the skill and anchors a whole region of behaviour. Coining your own works if you define it clearly, but a made-up word recruits no priors — you pay in definition tokens what a pretrained word gives free. Reach for an existing word first.
+
+A leading word serves **predictability** twice. In the body it anchors **execution** — the agent reaches for the same behaviour every time the concept appears, and inside flat reference it focuses attention on a class of thing to look for, recruiting the right checks each run. In the **description** it anchors **invocation** — and not only within the skill: when the same word lives in your prompts, your docs, and your codebase, the agent links that shared language to the skill and fires it more reliably. Word a description with the leading words you actually use when you want the skill.
+
+_Avoid_: keyword, term, motif
+
+### Completion Criterion
+
+The condition that tells the agent a unit of work is done — the target it judges against. Two properties make it a lever, not just a quality. Its **clarity** (can the agent tell done from not-done?) resists **premature completion** — a vague bound ("understanding reached") lets the agent declare done and slip to the next step; this axis needs _steps_ to bite, since premature completion is a between-steps failure. Its **demand** (how much it requires) sets **legwork** — "every modified model accounted for" forces thorough work where "produce a change list" does not — and this axis is _not_ step-bound: it can bind a body of flat reference too, which is how a skill with no steps still carries an exhaustiveness bar ("every rule applied"). The strongest criteria are both checkable and exhaustive.
+
+_Avoid_: done condition, exit condition, stopping rule
+
+### Legwork
+
+The work an agent does behind the scenes within a single step — reading files, exploring the codebase, making changes, digging up what it needs rather than offloading to the user. It lives below the step structure: never written as its own step, latent in the wording, controlled by the agent rather than the skill. The within-step counterpart to **post-completion steps**' across-step pull. Raised by a **leading word** (_comprehensive_, _thorough_) or a **completion criterion** that demands the work be exhaustive — including the demand axis applied to flat reference, which is what drives a skill of flat reference to cover all its rungs. Goes thin either when that demand is missing or when **premature completion** cuts the step short.
+
+_Avoid_: scope, effort, diligence, coverage
+
+### Post-Completion Steps
+
+The **steps** that follow the current step. Visible, they pull the agent forward into **premature completion** — the more it sees, the stronger the tug; the defence is to hide them by splitting the sequence of steps into two.
+
+_Avoid_: horizon, fog of war, lookahead
+
+### Premature Completion
+
+_Failure mode._ Ending the current step before it is genuinely done, because the agent's attention slips to being done rather than to the work. A between-steps failure: it needs **steps** to occur — a skill with no steps that quits early isn't premature completion but thin **legwork** under an unmet demand. A tug-of-war between two forces: visible **post-completion steps** (the pull forward) and the **completion criterion**'s clarity (the resistance — a sharp, checkable bar holds; a vague one gives way). Fuzziness is the necessary condition: a sharp bound resists the pull no matter how many later steps are visible, so a step that never rushes needs no defending. Two levers hold a step that does, but reach for them in order: **sharpen the bound first** — it is local and cheap. Only when the criterion is irreducibly fuzzy _and_ you actually observe the rush do you **hide the later steps** — and hiding only works across a real context boundary (a user-invoked hand-off or a subagent dispatch; an inline model-invoked call leaves the later steps in context and clears nothing). One cause of thin legwork, but distinct from it: legwork can be thin even when a step runs to full completion.
+
+_Avoid_: premature closure, the rush, rushing, shortcutting
+
+## Pruning
+
+Keeping a skill lean — each remedy paired with the failure it cures.
+
+### Single Source of Truth
+
+The desired state where each meaning lives in exactly one authoritative place, so a change to the skill's behaviour is a change in one place. **Duplication** is its violation.
+
+_Avoid_: home, canonical location
+
+### Duplication
+
+_Failure mode._ The same meaning given more than one **single source of truth**. It costs maintenance (change one place, you must change the others), costs tokens, and inflates prominence — repeating a meaning weights it on the ladder past its real rank. The accidental inverse of a **leading word**, which raises attention on purpose by repeating a token, never the meaning.
+
+_Avoid_: repetition, redundancy
+
+### Relevance
+
+Whether a line still bears on what the skill does — the lens for what to keep. A line loses relevance either by never bearing on the task (mere exposition, or a **branch** that should be disclosed) or by going stale: drifting out of date as the behaviour or world it describes changes. Shorter skills are easier to keep relevant, because each line is cheaper to check. Distinct from **no-op**: relevance asks whether a line bears on the task, not whether it changes behaviour.
+
+_Avoid_: load-bearing, staleness, freshness
+
+### Sediment
+
+_Failure mode._ Layers of old content that settle in a skill and are never cleared, because adding feels safe and removing feels risky — so stale and irrelevant lines accumulate and you must core down through them to find what is still live. The default fate of any skill without a pruning discipline; the slow erosion of **relevance**, as opposed to **duplication**'s repeated meaning.
+
+_Avoid_: accretion, bloat, cruft, rot
+
+### No-Op
+
+_Failure mode._ An instruction that changes nothing because the model already does it by default — you pay load to tell the agent what it would do anyway. The test: does a line change behaviour versus the default? A line can be perfectly **relevant** and still be a no-op. The same priors that make a **leading word** free make a no-op worthless.
+
+A leading word is a _technique_; No-Op is a _verdict_ on a line — and they cross. A leading word too weak to beat the default is a no-op (_be thorough_ when the agent is already thorough-ish), and the fix is a stronger word that passes the verdict (_relentless_), not a different technique. So the No-Op test — does it change behaviour versus the default? — is also how you grade whether a leading word is earning its repetitions. This is model-relative, not reader-relative: two people disagreeing over whether a line is a no-op disagree about the default, and settle it by running the skill, not by debate.
+
+_Avoid_: redundant instruction, restating the obvious, belaboring
--- a/skills/productivity/writing-great-skills/SKILL.md
+++ b/skills/productivity/writing-great-skills/SKILL.md
@ -0,0 +1,82 @@
+---
+name: writing-great-skills
+description: Reference for writing and editing skills well — the vocabulary and principles that make a skill predictable.
+disable-model-invocation: true
+---
+
+A skill exists to wrangle determinism out of a stochastic system. **Predictability** — the agent taking the same _process_ every run, not producing the same output — is the root virtue; every lever below serves it.
+
+**Bold terms** are defined in [`GLOSSARY.md`](GLOSSARY.md); look them up there for the full meaning.
+
+## Invocation
+
+Two choices, trading different costs:
+
+- A **model-invoked** skill keeps a **description**, so the agent can fire it autonomously _and_ other skills can reach it (you can still type its name too). It contributes to **context load** — the description sits in the window every turn. Mechanics: omit `disable-model-invocation`, and write a model-facing description with rich trigger phrasing ("Use when the user wants…, mentions…").
+- A **user-invoked** skill strips the description from the agent's reach: only you, typing its name, can invoke it — and no other skill can. Zero context load, but it spends **cognitive load**: _you_ are the index that must remember it exists. Mechanics: set `disable-model-invocation: true`; the `description` becomes human-facing — a one-line summary, trigger lists stripped.
+
+Pick model-invocation only when the agent must reach the skill on its own, or another skill must. If it only ever fires by hand, make it user-invoked and pay no context load.
+
+When user-invoked skills multiply past what you can remember, that piled-up cognitive load is cured by a **router skill**: one user-invoked skill that names the others and when to reach for each.
+
+## Writing the description
+
+A model-invoked **description** does two jobs — state what the skill is, and list the **branches** that should trigger it. Every word increases **context load**, so a description earns even harder pruning than the body:
+
+- **Front-load the skill's leading word** — the description is where it does its invocation work.
+- **One trigger per branch.** Synonyms that rename a single branch are **duplication** — "build features using TDD … asks for test-first development" is one branch written twice. Collapse them; keep only genuinely distinct branches.
+- **Cut identity that's already in the body.** Keep the description to triggers, plus any "when another skill needs…" reach clause.
+
+## Information hierarchy
+
+A skill is built from two content types — **steps** and **reference** — that mix freely: a skill can be all steps, all reference, or both. The core decision is which to use and where each sits on the **information hierarchy**, a ladder ranked by how immediately the agent needs the material:
+
+1. **In-skill step** — an ordered action in `SKILL.md`, the primary tier: what the agent does, in order. Each step ends on a **completion criterion**, the condition that tells the agent the work is done. Make it _checkable_ (can the agent tell done from not-done?) and, where it matters, _exhaustive_ ("every modified model accounted for", not "produce a change list") — a vague criterion invites **premature completion**.
+2. **In-skill reference** — a definition, rule, or fact in `SKILL.md`, consulted on demand. Often a legitimately flat peer-set (every rule of a review on one rung) — a fine arrangement, not a smell. _This skill is all reference._
+3. **External reference** — reference pushed out of `SKILL.md` into a separate file, reached by a **context pointer**, loaded only when the pointer fires. (Spans _disclosed_ reference — a sibling file like `GLOSSARY.md`, still part of the skill — through fully **external reference** that lives outside the skill system and any skill can point at.)
+
+A demanding completion criterion drives thorough **legwork** — the digging the agent does within the work — whether the skill has steps or not, since "every rule applied" binds flat reference just as "every step done" binds a sequence.
+
+Push too little down and the top bloats; push too much and you hide material the agent actually needs. That tension is the whole decision.
+
+**Progressive disclosure** is the move down the ladder — out of `SKILL.md` into a linked file — so the top stays legible. Mechanics: a linked `.md` file in the skill folder, named for what it holds (this skill discloses its full definitions to `GLOSSARY.md`). Some skills are used in more than one way, and each distinct way is a **branch** — different runs taking different paths through the skill. Branching is the cleanest disclosure test: inline what every branch needs, and push behind a pointer what only some branches reach. A **context pointer**'s _wording_, not its target, decides when and how reliably the agent reaches the material.
+
+Where the ladder decides _how far down_ a piece sits, **co-location** decides _what sits beside it_ once there: keep a concept's definition, rules, and caveats under one heading rather than scattered, so reading one part brings its neighbours with it.
+
+## When to split
+
+**Granularity** is how finely you divide skills, and each cut spends one of the two loads, so split only when the cut earns it. Two cuts:
+
+- **By invocation** — split off a **model-invoked** skill when you have a distinct **leading word** that should trigger it on its own, or another skill must reach it. You pay **context load** for the new always-loaded **description**, so that independent reach has to be worth it.
+- **By sequence** — split a run of **steps** when the steps still ahead (a step's **post-completion steps**) tempt the agent to rush the one in front of it (**premature completion**). Keeping them out of view encourages the agent to do more **legwork** on the current task.
+
+## Pruning
+
+Keep each meaning in a **single source of truth**: one authoritative place, so changing the behaviour is a one-place edit.
+
+Check every line for **relevance**: does it still bear on what the skill does?
+
+Then hunt **no-ops** sentence by sentence, not just line by line: run the no-op test on each sentence in isolation, and when one fails, delete the whole sentence rather than trim words from it. Be aggressive — most prose that fails should go, not be rewritten.
+
+## Leading words
+
+A **leading word** is a compact concept already living in the model's pretraining that the agent thinks with while running the skill (e.g. _lesson_, _fog of war_, _tracer bullets_). Repeated throughout the text (though not necessarily - a strong leading word might only be needed once), it accumulates a distributed definition and anchors a whole region of behaviour in the fewest tokens, by recruiting priors the model already holds.
+
+It serves predictability twice. In the body it anchors _execution_: the agent reaches for the same behaviour every time the word appears. In the description it anchors _invocation_: when the same word lives in your prompts, docs, and code, the agent links that shared language to the skill and fires it more reliably.
+
+Hunt for opportunities to refactor skills to use leading words. A triad spelled out at three sites (**duplication**), a description spending a sentence to gesture at one idea — each is a passage begging to **collapse** into a single token. Examples include:
+
+- "fast, deterministic, low-overhead" -> _tight_ — one quality restated across a phase — into a single pretrained word (a _tight_ loop).
+- "a loop you believe in" -> _red_ — converts a fuzzy gate into a binary observable state (the loop goes _red_ on the bug, or it doesn't).
+
+You win twice over: fewer tokens, _and_ a sharper hook for the agent to hang its thinking on. Assume every skill is carrying restatements that leading words retire — go find them.
+
+## Failure modes
+
+Use these to diagnose issues the user may be having with the skill.
+
+- **Premature completion** — ending a step before it's genuinely done, attention slipping to _being done_. Defence, in order: sharpen the completion criterion first (cheap, local); only if it is irreducibly fuzzy _and_ you observe the rush, hide the post-completion steps by splitting (the sequence cut).
+- **Duplication** — the same meaning in more than one place. Costs maintenance and tokens, and inflates a meaning's prominence on the ladder past its real rank.
+- **Sediment** — stale layers that settle because adding feels safe and removing feels risky. The default fate of any skill without a pruning discipline.
+- **Sprawl** — a skill simply too long, even when every line is live and unique. Hurts readability and maintainability and wastes tokens. The cure is the ladder: disclose **reference** behind pointers, and split by **branch** or sequence so each path carries only what it needs.
+- **No-op** — a line the model already obeys by default, so you pay load to say nothing. The test: does it change behaviour versus the default? A weak leading word (_be thorough_ when the agent is already thorough-ish) is a no-op; the fix is a stronger word (_relentless_), not a different technique.
--- a/tdd/deep-modules.md
+++ b/tdd/deep-modules.md
@ -1,33 +0,0 @@
-# Deep Modules
-
-From "A Philosophy of Software Design":
-
-**Deep module** = small interface + lots of implementation
-
-```
-┌─────────────────────┐
-│   Small Interface   │  ← Few methods, simple params
-├─────────────────────┤
-│                     │
-│                     │
-│  Deep Implementation│  ← Complex logic hidden
-│                     │
-│                     │
-└─────────────────────┘
-```
-
-**Shallow module** = large interface + little implementation (avoid)
-
-```
-┌─────────────────────────────────┐
-│       Large Interface           │  ← Many methods, complex params
-├─────────────────────────────────┤
-│  Thin Implementation            │  ← Just passes through
-└─────────────────────────────────┘
-```
-
-When designing interfaces, ask:
-
- Can I reduce the number of methods?
- Can I simplify the parameters?
- Can I hide more complexity inside?
--- a/tdd/interface-design.md
+++ b/tdd/interface-design.md
@ -1,31 +0,0 @@
-# Interface Design for Testability
-
-Good interfaces make testing natural:
-
-1. **Accept dependencies, don't create them**
-
-   ```typescript
-   // Testable
-   function processOrder(order, paymentGateway) {}
-
-   // Hard to test
-   function processOrder(order) {
-     const gateway = new StripeGateway();
-   }
-   ```
-
-2. **Return results, don't produce side effects**
-
-   ```typescript
-   // Testable
-   function calculateDiscount(cart): Discount {}
-
-   // Hard to test
-   function applyDiscount(cart): void {
-     cart.total -= discount;
-   }
-   ```
-
-3. **Small surface area**
-   - Fewer methods = fewer tests needed
-   - Fewer params = simpler test setup
--- a/to-issues/SKILL.md
+++ b/to-issues/SKILL.md
@ -1,79 +0,0 @@
---
-name: to-issues
-description: Break a plan, spec, or PRD into independently-grabbable GitHub issues using tracer-bullet vertical slices. Use when user wants to convert a plan into issues, create implementation tickets, or break down work into issues.
---
-
-# To Issues
-
-Break a plan into independently-grabbable GitHub issues using vertical slices (tracer bullets).
-
-## Process
-
-### 1. Gather context
-
-Work from whatever is already in the conversation context. If the user passes a GitHub issue number or URL as an argument, fetch it with `gh issue view <number>` (with comments).
-
-### 2. Explore the codebase (optional)
-
-If you have not already explored the codebase, do so to understand the current state of the code.
-
-### 3. Draft vertical slices
-
-Break the plan into **tracer bullet** issues. Each issue is a thin vertical slice that cuts through ALL integration layers end-to-end, NOT a horizontal slice of one layer.
-
-Slices may be 'HITL' or 'AFK'. HITL slices require human interaction, such as an architectural decision or a design review. AFK slices can be implemented and merged without human interaction. Prefer AFK over HITL where possible.
-
-<vertical-slice-rules>
- Each slice delivers a narrow but COMPLETE path through every layer (schema, API, UI, tests)
- A completed slice is demoable or verifiable on its own
- Prefer many thin slices over few thick ones
-</vertical-slice-rules>
-
-### 4. Quiz the user
-
-Present the proposed breakdown as a numbered list. For each slice, show:
-
- **Title**: short descriptive name
- **Type**: HITL / AFK
- **Blocked by**: which other slices (if any) must complete first
- **User stories covered**: which user stories this addresses (if the source material has them)
-
-Ask the user:
-
- Does the granularity feel right? (too coarse / too fine)
- Are the dependency relationships correct?
- Should any slices be merged or split further?
- Are the correct slices marked as HITL and AFK?
-
-Iterate until the user approves the breakdown.
-
-### 5. Create the GitHub issues
-
-For each approved slice, create a GitHub issue using `gh issue create`. Use the issue body template below.
-
-Create issues in dependency order (blockers first) so you can reference real issue numbers in the "Blocked by" field.
-
-<issue-template>
-## Parent
-
-#<parent-issue-number> (if the source was a GitHub issue, otherwise omit this section)
-
-## What to build
-
-A concise description of this vertical slice. Describe the end-to-end behavior, not layer-by-layer implementation.
-
-## Acceptance criteria
-
- [ ] Criterion 1
- [ ] Criterion 2
- [ ] Criterion 3
-
-## Blocked by
-
- Blocked by #<issue-number> (if any)
-
-Or "None - can start immediately" if no blockers.
-
-</issue-template>
-
-Do NOT close or modify any parent issue.
--- a/triage-issue/SKILL.md
+++ b/triage-issue/SKILL.md
@ -1,102 +0,0 @@
---
-name: triage-issue
-description: Triage a bug or issue by exploring the codebase to find root cause, then create a GitHub issue with a TDD-based fix plan. Use when user reports a bug, wants to file an issue, mentions "triage", or wants to investigate and plan a fix for a problem.
---
-
-# Triage Issue
-
-Investigate a reported problem, find its root cause, and create a GitHub issue with a TDD fix plan. This is a mostly hands-off workflow - minimize questions to the user.
-
-## Process
-
-### 1. Capture the problem
-
-Get a brief description of the issue from the user. If they haven't provided one, ask ONE question: "What's the problem you're seeing?"
-
-Do NOT ask follow-up questions yet. Start investigating immediately.
-
-### 2. Explore and diagnose
-
-Use the Agent tool with subagent_type=Explore to deeply investigate the codebase. Your goal is to find:
-
- **Where** the bug manifests (entry points, UI, API responses)
- **What** code path is involved (trace the flow)
- **Why** it fails (the root cause, not just the symptom)
- **What** related code exists (similar patterns, tests, adjacent modules)
-
-Look at:
- Related source files and their dependencies
- Existing tests (what's tested, what's missing)
- Recent changes to affected files (`git log` on relevant files)
- Error handling in the code path
- Similar patterns elsewhere in the codebase that work correctly
-
-### 3. Identify the fix approach
-
-Based on your investigation, determine:
-
- The minimal change needed to fix the root cause
- Which modules/interfaces are affected
- What behaviors need to be verified via tests
- Whether this is a regression, missing feature, or design flaw
-
-### 4. Design TDD fix plan
-
-Create a concrete, ordered list of RED-GREEN cycles. Each cycle is one vertical slice:
-
- **RED**: Describe a specific test that captures the broken/missing behavior
- **GREEN**: Describe the minimal code change to make that test pass
-
-Rules:
- Tests verify behavior through public interfaces, not implementation details
- One test at a time, vertical slices (NOT all tests first, then all code)
- Each test should survive internal refactors
- Include a final refactor step if needed
- **Durability**: Only suggest fixes that would survive radical codebase changes. Describe behaviors and contracts, not internal structure. Tests assert on observable outcomes (API responses, UI state, user-visible effects), not internal state. A good suggestion reads like a spec; a bad one reads like a diff.
-
-### 5. Create the GitHub issue
-
-Create a GitHub issue using `gh issue create` with the template below. Do NOT ask the user to review before creating - just create it and share the URL.
-
-<issue-template>
-
-## Problem
-
-A clear description of the bug or issue, including:
- What happens (actual behavior)
- What should happen (expected behavior)
- How to reproduce (if applicable)
-
-## Root Cause Analysis
-
-Describe what you found during investigation:
- The code path involved
- Why the current code fails
- Any contributing factors
-
-Do NOT include specific file paths, line numbers, or implementation details that couple to current code layout. Describe modules, behaviors, and contracts instead. The issue should remain useful even after major refactors.
-
-## TDD Fix Plan
-
-A numbered list of RED-GREEN cycles:
-
-1. **RED**: Write a test that [describes expected behavior]
-   **GREEN**: [Minimal change to make it pass]
-
-2. **RED**: Write a test that [describes next behavior]
-   **GREEN**: [Minimal change to make it pass]
-
-...
-
-**REFACTOR**: [Any cleanup needed after all tests pass]
-
-## Acceptance Criteria
-
- [ ] Criterion 1
- [ ] Criterion 2
- [ ] All new tests pass
- [ ] Existing tests still pass
-
-</issue-template>
-
-After creating the issue, print the issue URL and a one-line summary of the root cause.
--- a/write-a-skill/SKILL.md
+++ b/write-a-skill/SKILL.md
@ -1,117 +0,0 @@
---
-name: write-a-skill
-description: Create new agent skills with proper structure, progressive disclosure, and bundled resources. Use when user wants to create, write, or build a new skill.
---
-
-# Writing Skills
-
-## Process
-
-1. **Gather requirements** - ask user about:
-   - What task/domain does the skill cover?
-   - What specific use cases should it handle?
-   - Does it need executable scripts or just instructions?
-   - Any reference materials to include?
-
-2. **Draft the skill** - create:
-   - SKILL.md with concise instructions
-   - Additional reference files if content exceeds 500 lines
-   - Utility scripts if deterministic operations needed
-
-3. **Review with user** - present draft and ask:
-   - Does this cover your use cases?
-   - Anything missing or unclear?
-   - Should any section be more/less detailed?
-
-## Skill Structure
-
-```
-skill-name/
-├── SKILL.md           # Main instructions (required)
-├── REFERENCE.md       # Detailed docs (if needed)
-├── EXAMPLES.md        # Usage examples (if needed)
-└── scripts/           # Utility scripts (if needed)
-    └── helper.js
-```
-
-## SKILL.md Template
-
-```md
---
-name: skill-name
-description: Brief description of capability. Use when [specific triggers].
---
-
-# Skill Name
-
-## Quick start
-
-[Minimal working example]
-
-## Workflows
-
-[Step-by-step processes with checklists for complex tasks]
-
-## Advanced features
-
-[Link to separate files: See [REFERENCE.md](REFERENCE.md)]
-```
-
-## Description Requirements
-
-The description is **the only thing your agent sees** when deciding which skill to load. It's surfaced in the system prompt alongside all other installed skills. Your agent reads these descriptions and picks the relevant skill based on the user's request.
-
-**Goal**: Give your agent just enough info to know:
-
-1. What capability this skill provides
-2. When/why to trigger it (specific keywords, contexts, file types)
-
-**Format**:
-
- Max 1024 chars
- Write in third person
- First sentence: what it does
- Second sentence: "Use when [specific triggers]"
-
-**Good example**:
-
-```
-Extract text and tables from PDF files, fill forms, merge documents. Use when working with PDF files or when user mentions PDFs, forms, or document extraction.
-```
-
-**Bad example**:
-
-```
-Helps with documents.
-```
-
-The bad example gives your agent no way to distinguish this from other document skills.
-
-## When to Add Scripts
-
-Add utility scripts when:
-
- Operation is deterministic (validation, formatting)
- Same code would be generated repeatedly
- Errors need explicit handling
-
-Scripts save tokens and improve reliability vs generated code.
-
-## When to Split Files
-
-Split into separate files when:
-
- SKILL.md exceeds 100 lines
- Content has distinct domains (finance vs sales schemas)
- Advanced features are rarely needed
-
-## Review Checklist
-
-After drafting, verify:
-
- [ ] Description includes triggers ("Use when...")
- [ ] SKILL.md under 100 lines
- [ ] No time-sensitive info
- [ ] Consistent terminology
- [ ] Concrete examples included
- [ ] References one level deep
--- a/zoom-out/SKILL.md
+++ b/zoom-out/SKILL.md
@ -1,7 +0,0 @@
---
-name: zoom-out
-description: Tell the agent to zoom out and give broader context or a higher-level perspective. Use when you're unfamiliar with a section of code or need to understand how it fits into the bigger picture.
-disable-model-invocation: true
---
-
-I don't know this area of code well. Go up a layer of abstraction. Give me a map of all the relevant modules and callers.
Author	SHA1	Message	Date
Matt Pocock	8370e760d0	refactor: enhance glossary clarity and structure	2026-06-24 11:29:44 +01:00
Matt Pocock	74f0450b9c	Link skills into ~/.agents/skills as well as ~/.claude/skills Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-24 11:27:21 +01:00
Matt Pocock	c71fce0297	Refine decision-mapping bootstrap step Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-24 11:26:57 +01:00
Matt Pocock	e95b1813c5	Move loop-me into in-progress Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-24 11:26:57 +01:00
Matt Pocock	42396a51d6	fix: update ticket type terminology from "Discuss" to "Grilling"	2026-06-22 13:15:30 +01:00
Matt Pocock	6eeb81b5fc	Merge branch 'main' of https://github.com/mattpocock/skills	2026-06-18 09:41:33 +01:00
Matt Pocock	e00eadb4bb	Added PR's as a triage surface	2026-06-18 09:41:30 +01:00
Matt Pocock	2454c95dc3	Merge pull request #347 from mattpocock/changeset-release/main chore: version skills	2026-06-17 23:07:39 +01:00
github-actions[bot]	7d684becad	chore: version skills	2026-06-17 21:26:49 +00:00
Matt Pocock	aa024cb195	feat: enhance teach skill to prioritize reusable components from ./assets	2026-06-17 22:26:34 +01:00
Matt Pocock	d20ee2684e	feat: let teach skill share code between lessons via ./assets Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-17 22:18:21 +01:00
Matt Pocock	bddb833cba	fix: create GitHub releases on version bump Add `publish: npx changeset tag` to the release workflow and enable tagging for private packages in the changesets config, so merging a Version PR creates a git tag and GitHub release. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-17 15:46:32 +01:00
Matt Pocock	00ff03cac2	Merge pull request #345 from mattpocock/changeset-release/main chore: version skills	2026-06-17 15:40:47 +01:00
github-actions[bot]	82e85064ea	chore: version skills	2026-06-17 14:39:42 +00:00
Matt Pocock	47bde84da0	feat: introduce new skills and refine existing ones - Add `ask-matt` skill for user-invoked routing. - Introduce `codebase-design` and `domain-modeling` skills, updating dependencies. - Remove unused `caveman` and `zoom-out` skills. - Rename `diagnose` skill to `diagnosing-bugs`. - Add `resolving-merge-conflicts` skill for handling git conflicts. - Tighten `review` skill with improved checks. - Update skill taxonomy to user-invoked and model-invoked. - Replace `write-a-skill` with `writing-great-skills` for better guidance.	2026-06-17 15:39:06 +01:00
Matt Pocock	6a62d72ab1	Merge pull request #291 from mattpocock/matt/commands-and-skills Split skills into model-invoked vs user-invoked	2026-06-17 15:21:40 +01:00
Matt Pocock	a032401486	chore: set up changesets Add changesets for tracking and versioning skill changes: - package.json (private) + @changesets/cli and @changesets/changelog-github - .changeset/config.json using the GitHub release-style changelog - release workflow that opens a version PR on push to main (no npm publish) - initial changeset recording recent skill updates Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-17 15:21:20 +01:00
Matt Pocock	ee8bae4006	Reduced description lengths	2026-06-17 15:15:11 +01:00
Matt Pocock	460f19f240	refine: Simplify description in 'ask-matt' skill documentation	2026-06-17 14:57:25 +01:00
Matt Pocock	245e31bbbd	refine: Remove 'caveman' skill and update documentation accordingly	2026-06-17 14:56:08 +01:00
Matt Pocock	48f3d31fef	refine: Clarify multi-session build process and enhance triage guidance in 'ask-matt' skill documentation	2026-06-17 14:53:11 +01:00
Matt Pocock	df129fcfce	refine: Add 'ask-matt' skill and update documentation for user-invoked skills	2026-06-17 14:45:47 +01:00
Matt Pocock	aa7ed40cbe	refine: Make writing-great-skills hunt no-ops at the sentence level Add a Pruning directive to run the no-op test sentence by sentence and delete whole failing sentences rather than trim words. References the existing no-op test as single source of truth — no restated definition. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-17 14:33:42 +01:00
Matt Pocock	2e647324f9	refine: Tighten review skill — fail-fast ref check, single-sourced rules, no-op cuts - Validate the fixed point resolves and diff is non-empty before spawning sub-agents - Single-source the tooling-enforced and two-axis rationale (point to Why two axes) - Fix rerank contradiction: summarize worst issue per axis, not across axes - Drop dead "Read X, then read the diff" preambles from both briefs Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-17 14:29:34 +01:00
Matt Pocock	e112a6b03c	Remove zoom-out skill and all references Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-17 14:19:52 +01:00
Matt Pocock	801a01cc7d	refine: Update terminology from Commands/Skills to User-invoked/Model-invoked and enhance documentation	2026-06-17 14:18:05 +01:00
Matt Pocock	bc4cf903ff	Added writing-great-skills skill	2026-06-17 13:45:28 +01:00
Matt Pocock	ab7196a158	feat: Add decision-mapping skill Adds a new engineering skill that turns loose ideas with many interdependent open decisions into a sequenced DAG of one-session investigation tickets, driving them to resolution one at a time. Sits at the front of the pipeline, before /to-prd. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-15 13:30:21 +01:00
Matt Pocock	ffb2fa6623	feat: Add implement skill documentation for PRD-based work	2026-06-12 09:25:19 +01:00
Matt Pocock	788b5c382b	Added prefactoring	2026-06-12 09:25:19 +01:00
Matt Pocock	81ddacb08b	refine: Add skill documentation for resolving merge conflicts	2026-06-12 09:25:19 +01:00
Matt Pocock	3832253f14	The fewer seams, the better	2026-06-12 09:25:19 +01:00
Matt Pocock	658d53e6de	Improved /grill-with-docs commit message	2026-06-12 09:25:19 +01:00
Matt Pocock	800201f7b2	refine: Update description for grilling skill to enhance clarity and focus	2026-06-12 09:25:19 +01:00
Matt Pocock	0a8105bba9	refine: Remove caveman skill documentation to streamline productivity skills	2026-06-12 09:25:19 +01:00
Matt Pocock	7d3ada9716	refine: Remove caveman skill documentation to streamline productivity skills	2026-06-12 09:25:19 +01:00
Matt Pocock	0a4b76776d	refine: Add disable-model-invocation flag to edit-article skill for improved functionality	2026-06-12 09:25:19 +01:00
Matt Pocock	cbf6db4233	refine: Update description for grill-me skill and adjust session instruction for clarity	2026-06-12 09:25:19 +01:00
Matt Pocock	2064fb64bc	refine: Simplify instructions for running grilling and domain modeling skills	2026-06-12 09:25:19 +01:00
Matt Pocock	221ffca967	feat: Add new skills and templates for domain modeling, bug diagnosis, and architectural improvement - Introduced a human-in-the-loop script for bug diagnosis to capture user feedback. - Created ADR and CONTEXT formats to standardize architectural decision records and domain context documentation. - Developed a domain modeling skill to refine terminology and maintain a shared language. - Enhanced the improve-codebase-architecture skill to provide visual reports and deepening opportunities. - Consolidated grilling and handoff skills for better user interaction and documentation. - Updated existing skills to improve clarity and consistency in descriptions and functionality.	2026-06-12 09:25:19 +01:00
Matt Pocock	694fa30311	Refine quiz guidelines by standardizing answer length to prevent user clues and enhance assessment integrity.	2026-06-10 16:01:44 +01:00
Matt Pocock	dabb725dfd	Clarify terminology in teaching documentation by replacing 'learner' with 'user' for consistency and improved user experience.	2026-06-10 15:53:12 +01:00
Matt Pocock	3b37863fc3	Enhance teaching guidelines by introducing concepts of fluency vs storage strength, emphasizing long-term retention strategies, and adding reminders for follow-up questions to improve learner engagement.	2026-06-10 15:52:32 +01:00
Matt Pocock	d75217795c	Refine lesson guidelines for clarity and user experience by emphasizing brevity and actionable outcomes	2026-06-10 14:32:02 +01:00
Matt Pocock	26ba8ae34b	Add recommendation for primary source in lesson guidelines to enhance resource quality	2026-06-10 14:12:04 +01:00
Matt Pocock	e3d8b735ef	Add HTML anchor linking between lessons and reference documents for improved navigation	2026-06-09 18:56:43 +01:00
Matt Pocock	59c92aa9e7	Clarify mission updates in teaching documentation to ensure alignment with user skill development	2026-06-09 18:56:13 +01:00
Matt Pocock	2bf7005192	Add teaching skill and related documentation for enhanced learning experience	2026-06-08 15:02:04 +01:00
Matt Pocock	be55a79703	Add NOTES.md for recording user preferences and teaching notes	2026-06-06 10:12:27 +01:00
Matt Pocock	dbcb1bda03	Enhance lesson guidelines by emphasizing the importance of citations and interactive elements for improved user engagement	2026-06-06 10:10:38 +01:00
Matt Pocock	cef541c888	Merge branch 'main' of github.com:mattpocock/skills # Conflicts: # skills/in-progress/teach/SKILL.md	2026-06-06 10:09:36 +01:00
Matt Pocock	88eb3f8d04	Refactor teaching documentation to enhance clarity and structure; added lessons section and improved terminology consistency.	2026-06-06 10:06:03 +01:00
Matt Pocock	aaf2453fbd	Refine teaching skill documentation to enhance clarity and interactivity of explainers	2026-05-31 10:08:36 +01:00
Matt Pocock	e3b90b5238	Refine rules in CONTEXT-FORMAT.md for clarity and consistency	2026-05-28 16:48:30 +01:00
Matt Pocock	0288510dd6	Merge branch 'main' of github.com:mattpocock/skills	2026-05-27 13:36:22 +01:00
Matt Pocock	bd96a9f0c6	Moved /teach to in-progress	2026-05-27 13:36:16 +01:00
Matt Pocock	fadf11d45f	Added /teach	2026-05-27 13:35:56 +01:00
Matt Pocock	b8be62ffac	Merge branch 'main' of https://github.com/mattpocock/skills	2026-05-20 09:46:53 +01:00
Matt Pocock	a36584e09e	Updated ICA	2026-05-20 09:46:51 +01:00
Matt Pocock	d54c497aa9	Improved wording of /handoff	2026-05-19 17:07:02 +01:00
Matt Pocock	e7df78bb81	Removed relationships, example dialogue, and flagged ambiguities from CONTEXT.md template	2026-05-19 14:40:27 +01:00
Matt Pocock	bc32841b2e	Update handoff skill documentation to clarify saving location for handoff documents	2026-05-19 14:34:59 +01:00
Matt Pocock	feaaf4204d	Added redaction info to handoff skill	2026-05-19 14:29:31 +01:00
Matt Pocock	67bce91c80	Fix typo in README.md regarding ticket labels	2026-05-18 13:21:28 +01:00
Matt Pocock	e74f0061bb	Clarify purpose of CONTEXT.md to emphasize it as a glossary, removing implementation details	2026-05-13 14:05:18 +01:00
Matt Pocock	f304057d61	Fix typo in prototype skill description for clarity	2026-05-12 08:36:19 +01:00
Matt Pocock	9f2e0bd0ea	Enhance writing-fragments skill to capture initial user prompts as fragments	2026-05-11 10:19:41 +01:00
Matt Pocock	b6250ed602	Add 'handoff' skill to improve workflow documentation and agent transitions	2026-05-11 09:13:01 +01:00
Matt Pocock	9fecab929a	Add review skill to facilitate two-axis code reviews	2026-05-10 20:59:48 +01:00
Matt Pocock	733d312884	Add skills.sh badge to README for visibility	2026-05-07 07:34:11 +01:00
Matt Pocock	3bb0b1b917	Update issue listing command in GitLab documentation for clarity	2026-05-07 07:20:57 +01:00
Matt Pocock	70141119e9	Update triage label in PRD writing instructions to 'ready-for-agent'	2026-05-06 16:07:58 +01:00
Matt Pocock	494e4b2699	Exclude deprecated skills from linking in the link-skills script	2026-05-06 14:22:15 +01:00
Matt Pocock	918f3fe30f	Add instruction to read the handoff file before writing to it	2026-05-06 13:40:13 +01:00
Matt Pocock	c8364b0d1f	Add suggestion for skills to be used in next session in handoff documentation	2026-05-06 12:56:32 +01:00
Matt Pocock	c63fd971a7	Remove disable-model-invocation flag from handoff skill documentation	2026-05-06 12:53:07 +01:00
Matt Pocock	b27a1a46f8	Add handoff skill documentation for agent transition	2026-05-06 09:33:40 +01:00
Matt Pocock	ff3ee1dda5	Add prototype skill and related documentation for interactive design exploration	2026-05-06 09:26:48 +01:00
Matt Pocock	e736396b35	Clarify issue publication instructions for AFK agents in SKILL.md	2026-05-06 07:40:30 +01:00
Matt Pocock	b843cb5ea7	Add structured sections for 'what-to-do' and 'supporting-info' in SKILL.md Co-authored-by: Copilot <copilot@github.com>	2026-04-30 12:38:11 +01:00
Matt Pocock	839025ae17	Remove disable-model-invocation flag from grill-with-docs	2026-04-30 11:46:00 +01:00
Matt Pocock	3dae2b1945	Add verify/check mode documentation and list-skills script	2026-04-30 10:12:35 +01:00
Matt Pocock	f71bb975bf	Add out-of-scope note: issue trackers must be mainstream Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 10:56:28 +01:00
Matt Pocock	4369256220	Add GitLab as a first-class issue-tracker option Closes #98 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 08:21:19 +01:00
Matt Pocock	b56795bed1	Add out-of-scope note for grilling question limits Records the rejection of #44 (request for a hard cap on grilling questions) so the reasoning isn't lost when the issue is closed and so future similar requests can be deduplicated against it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 19:25:12 +01:00
Matt Pocock	179a14e721	Swapped 'backlog' for 'issue tracker Co-authored-by: Copilot <copilot@github.com>	2026-04-28 19:06:32 +01:00
Matt Pocock	5fed805a92	Enhance quickstart instructions in README; clarify setup steps for `/setup-matt-pocock-skills` and triage labels	2026-04-28 16:51:49 +01:00
Matt Pocock	0e51243253	Merge pull request #90 from mattpocock/setup-skill-and-vague-prose Add setup-matt-pocock-skills; rename github-triage; migrate skills to vague prose	2026-04-28 16:46:49 +01:00
Matt Pocock	70653e105c	Enhance user guidance in setup-matt-pocock-skills; add explanations for backlog backend, triage labels, and domain docs decisions	2026-04-28 16:41:14 +01:00
Matt Pocock	a32ebfb550	Remove deprecated triage-issue skill and update README to reflect changes	2026-04-28 16:35:22 +01:00
Matt Pocock	7afa86d3a5	Add setup-matt-pocock-skills; rename github-triage to triage; migrate engineering skills to vague prose Engineering skills no longer hard-code GitHub or specific label strings. A new setup skill scaffolds an `## Agent skills` block in AGENTS.md/CLAUDE.md plus `docs/agents/` so each repo can declare its own backlog backend, triage label vocabulary, and domain doc layout. Skills that need the mapping (to-issues, to-prd, triage) point at the setup skill; skills that only soften with it (diagnose, tdd, improve-codebase-architecture, zoom-out) stay vague. ADR-0001 records the split. Closes #88, #89. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 16:33:37 +01:00
Matt Pocock	49cec7be01	Update title in README.md	2026-04-28 14:42:04 +01:00
Matt Pocock	d7c8dcfd02	Enhance README with skills newsletter images Added images for skills newsletter in README.	2026-04-28 14:40:16 +01:00
Matt Pocock	073a37e75d	Enhance link-skills.sh to check for symlink conflicts and provide user guidance	2026-04-28 12:38:36 +01:00
Matt Pocock	f76e3ba4af	Add DOMAIN-AWARENESS.md and update SKILL.md files to reference it for code exploration	2026-04-28 12:04:01 +01:00
Matt Pocock	ecdadb4076	Refine README to improve clarity and conciseness in the explanation of agent communication challenges and solutions	2026-04-28 11:21:32 +01:00
Matt Pocock	20861cb0a1	Enhance README by adding links to skill documentation and clarifying the purpose of feedback loops in AI development	2026-04-28 11:20:15 +01:00
Matt Pocock	c21cf6ec93	Fix grammatical errors and enhance clarity in README section on agent skills	2026-04-28 10:55:14 +01:00
Matt Pocock	1eed8a689b	Enhance README with detailed explanations of common AI failure modes and solutions	2026-04-28 10:50:40 +01:00
Matt Pocock	edd9893326	Remove installation commands for individual skills from README	2026-04-28 10:19:48 +01:00
Matt Pocock	3911642d96	Add quickstart setup instructions to README	2026-04-28 10:18:47 +01:00
Matt Pocock	fb847c6ade	Refactor skill references and documentation for grill-with-docs skill - Updated skill reference from domain-model to grill-with-docs in plugin.json and README.md - Added detailed descriptions and context for grill-with-docs in its SKILL.md - Created ADR-FORMAT.md and CONTEXT-FORMAT.md for grill-with-docs to standardize decision recording - Adjusted references in improve-codebase-architecture to align with new grill-with-docs structure	2026-04-28 10:17:37 +01:00
Matt Pocock	51384f4e70	Remove deprecated skills from plugin.json	2026-04-28 10:11:06 +01:00
Matt Pocock	71542f9d1c	Update skill references in README files and add new skills to deprecated and personal sections	2026-04-28 09:44:54 +01:00
Matt Pocock	62f43a1817	Add new skills for TDD, issue management, PRD creation, and productivity tools - Introduced TDD skills including deep modules, interface design, mocking, refactoring, and testing guidelines. - Added skills for breaking plans into GitHub issues and creating PRDs from conversation context. - Implemented productivity skills for scaffolding exercises, setting up pre-commit hooks, and managing notes in Obsidian. - Created a caveman communication mode for concise technical responses and a grilling technique for thorough plan discussions. - Developed a skill for writing new agent skills with structured templates and guidelines. - Included git guardrails to prevent dangerous git commands and a migration guide for using @total-typescript/shoehorn in tests.	2026-04-28 09:42:34 +01:00
Matt Pocock	3e3ca9b9fa	Add initial implementation of design-an-interface skill and linking script Co-authored-by: Copilot <copilot@github.com>	2026-04-28 09:23:06 +01:00
Matt Pocock	383b6a06d5	Moved to ./skills directory	2026-04-28 08:00:37 +01:00
Matt Pocock	e7f0b58a4b	Added diagnose	2026-04-28 07:58:41 +01:00