Added writing-great-skills skill

This commit is contained in:
Matt Pocock 2026-06-17 13:45:28 +01:00
commit bc4cf903ff
9 changed files with 310 additions and 186 deletions

View file

@ -19,6 +19,6 @@
"./skills/productivity/grilling",
"./skills/productivity/handoff",
"./skills/productivity/teach",
"./skills/productivity/write-a-skill"
"./skills/productivity/writing-great-skills"
]
}

View file

@ -177,7 +177,7 @@ General workflow tools, not code-specific.
- **[grill-me](./skills/productivity/grill-me/SKILL.md)** — Get relentlessly interviewed about a plan or design until every branch of the decision tree is resolved.
- **[handoff](./skills/productivity/handoff/SKILL.md)** — Compact the current conversation into a handoff document so another agent can continue the work.
- **[teach](./skills/productivity/teach/SKILL.md)** — Teach the user a new skill or concept over multiple sessions, using the current directory as a stateful teaching workspace.
- **[write-a-skill](./skills/productivity/write-a-skill/SKILL.md)** — Create new skills with proper structure, progressive disclosure, and bundled resources.
- **[writing-great-skills](./skills/productivity/writing-great-skills/SKILL.md)** — Reference for writing and editing skills well: the vocabulary and principles that make a skill predictable.
**Skills**

View file

@ -1,77 +1,60 @@
---
name: decision-mapping
description: Turn a loose idea with many interdependent open decisions into a sequenced map of one-session investigation tickets, then drive them to resolution one at a time. Use when the user's idea is too loose to plan in one sitting, there are many dependent design decisions, the user wants to map or sequence open decisions, plan prototyping or spike sessions, "plan the planning", or resume work on an existing decision map.
description: Turn a loose idea into a sequenced map of investigation tickets, then drive them to resolution one at a time. Use when the user's idea is too loose to plan in one sitting, they want to plan prototyping sessions, or resume work on an existing decision map.
---
# Decision Mapping
This skill is invoked when a loose idea requires more than one agent session to turn into a plan. It creates a stateful decision map in a markdown file, and drives the user through a sequence of tickets to resolve the open questions - which may require either prototyping, research or discussion.
This skill bridges loose idea and actionable plan. When a concept has too many open, interdependent design decisions to grill to a conclusion in one session — some unresolvable by discussion alone, others blocked on spikes — it maps those decisions into a sequenced DAG of one-session investigation tickets, then drives them to resolution one at a time.
It is the planning-phase mirror of `/to-issues`: `/to-issues` breaks a *stable* design into implementation tickets; this breaks a *loose* one into investigation tickets. It sits at the front of the pipeline, before `/to-prd`.
## The decision map
## The Decision Map
The decision map is a single compact Markdown file, one per planning effort, git-tracked alongside the project. It is the canonical artifact — the **whole map is loaded as context into every session**, so it must stay compact.
Assets created during tickets should be linked to from the map, not duplicated within it.
### Structure
Numbered entries ("tickets"), each its own section keyed by its number:
```markdown
## 1. Should we use a relational or document store?
## #1: Relational Or Non-Relational Database?
blocked_by: []
Blocked by: #<ticket-number>, #<ticket-number>
Type: Research | Prototype | Discuss
**Question.** ...
### Question
**Answer.** ... (once resolved)
<question-here>
**Reasoning.** ...
### Answer
<answer-here>
```
Each ticket carries:
Each ticket must be sized to one 100K token agent session.
- **Number** — unique, stable, never reused.
- **`blocked_by: [numbers]`** — the dependency edges, forming a DAG. Omit or leave empty when unblocked.
- **Free-form body** — the decision phrased as a question, and once resolved, the answer and reasoning. Keep it minimal but non-lossy. No rigid extra fields (no risk scores, no expiry fields).
## Ticket Types
Each ticket must be sized to **one context-window session** — resolvable before the agent drifts out of the smart zone. Same sizing constraint as a Ticket in `/to-issues`. Each ticket owns its own section so parallel sessions editing different tickets produce clean diffs.
There are three types of tickets:
## Two traversals
**Breadth-first** — used to build or update the map. Fan across the frontier of open decisions. Resolve trivially-decidable ones inline. Tag the rest by how they'll be resolved:
- **discuss** — run `/grilling` for a focused interview.
- **spike** — build throwaway code via `/prototype` to learn something the discussion can't settle. The build step may be delegated to a subagent.
- **defer** — record the question but mark it out-of-scope until a later session unlocks it.
Do not descend into or resolve decisions deeply during breadth-first traversal.
**Depth-first** — used inside a session to drive one ticket as far as it goes until its question resolves. Invoke `/grilling` for the interview itself.
These compose: breadth-first to see the tree, depth-first to drive one node to ground.
- **Research**: Reading documentation, third-party API's, or local resources like knowledge bases. Creates a markdown summary as an asset. Use this when knowledge outside the current working directory is required.
- **Prototype**: Writing UI or logic code to test a hypothesis, or to explore a design space. Uses the /prototype skill. Creates a prototype as an asset. Use this when "how should it look" or "how should it behave" is the key question.
- **Discuss**: Conversation with the agent. Uses the /grilling and /domain-modelling skills. The default case.
## Fog of war
The map is *deliberately* incomplete beyond the frontier — only map as deep as the next spike, because a spike's outcome determines which decisions even exist downstream.
The map is _deliberately_ incomplete beyond the frontier. Your job is to investigate the frontier, and to resolve tickets in order to push the frontier forward. Push back the fog of war, one node at a time.
Every session ends with a **re-map**:
1. Record what the session resolved in the ticket's body.
2. Add newly-discovered tickets (with correct `blocked_by` edges).
3. Reopen any previously-resolved ticket the session invalidated, and propagate suspicion to anything `blocked_by` it.
Never try to prove the map complete. The only validation target is that the **next session is correctly chosen**.
At some point, the fog of war should have been pushed back far enough that the path to the finish line is clear. At that point, no more tickets will be required and the decision map can be considered 'done'.
## Invocation
There is no auto-discovery. The user always points the skill at what to work on.
There are two ways this skill can be invoked: **bootstrap** and **resume**.
### Bootstrap
User invokes with a loose idea.
1. Run breadth-first grilling (via `/grilling`) to surface the open decisions.
1. Run a /grilling and /domain-modelling session to surface the open decisions.
2. Write a new decision map — mostly fog, frontier identified, trivially-decidable entries resolved inline.
3. Stop. Map-building is one session's work; do not also resolve tickets.
@ -80,24 +63,21 @@ User invokes with a loose idea.
User invokes with a path to an existing map and a ticket number.
1. Load the **whole map** as context.
2. Depth-first grill that ticket to resolution (via `/grilling`).
3. Write the answer and reasoning back into that ticket's section.
4. Re-map: add new tickets, reopen invalidated ones, update `blocked_by` edges.
2. Run a session to resolve the ticket, invoking skills as needed. If in doubt, use `/grilling` and `/domain-modelling`.
3. Record what the session resolved in the ticket's body.
4. Add newly-discovered tickets (with correct `blocked_by` edges).
5. Stop.
The user picks the ticket. You may advise which tickets are currently unblocked, but do not choose. The process is entirely user-driven and never autonomous.
If the decisions made invalidate other parts of the map, update or delete those nodes.
A pair of independent (unblocked, non-overlapping) tickets may be run in parallel as separate sessions; any cross-cutting edits are a git merge the user reconciles.
## Parallelism
## Lifecycle
The user may choose to run tickets in parallel, so expect other agents to make changes to the map.
```
loose idea
→ bootstrap map (this skill)
→ resume sessions (this skill, one ticket per session)
→ concept stabilises
→ /to-prd absorbs the map
→ /to-issues
→ /implement
→ map archived/deleted (durable conclusions now live in the PRD)
```
## Skipping The Decision Map
Many times, the initial grilling will result in no fog of war. No unresolved tickets. Nothing to do, except implement.
In those situations, you should offer the user the chance to skip the decision map - since the decision map is only needed if multi-session decisions need to be made.
If they skip it, you should recommend either implementing directly or using `/to-prd` to schedule a multi-session implementation.

View file

@ -11,7 +11,7 @@ When exploring the codebase, read `CONTEXT.md` (if it exists) to get a clear men
## Phase 1 — Build a feedback loop
**This is the skill.** Everything else is mechanical. If you have a fast, deterministic, agent-runnable pass/fail signal for the bug, you will find the cause — bisection, hypothesis-testing, and instrumentation all just consume that signal. If you don't have one, no amount of staring at code will save you.
**This is the skill.** Everything else is mechanical. If you have a **tight** pass/fail signal for the bug — one that goes red on _this_ bug — you will find the cause; bisection, hypothesis-testing, and instrumentation all just consume it. If you don't have one, no amount of staring at code will save you.
Spend disproportionate effort here. **Be aggressive. Be creative. Refuse to give up.**
@ -30,15 +30,15 @@ Spend disproportionate effort here. **Be aggressive. Be creative. Refuse to give
Build the right feedback loop, and the bug is 90% fixed.
### Iterate on the loop itself
### Tighten the loop
Treat the loop as a product. Once you have _a_ loop, ask:
Treat the loop as a product. Once you have _a_ loop, **tighten** it:
- Can I make it faster? (Cache setup, skip unrelated init, narrow the test scope.)
- Can I make the signal sharper? (Assert on the specific symptom, not "didn't crash".)
- Can I make it more deterministic? (Pin time, seed RNG, isolate filesystem, freeze network.)
A 30-second flaky loop is barely better than no loop. A 2-second deterministic loop is a debugging superpower.
A 30-second flaky loop is barely better than no loop; a 2-second deterministic one is tight — a debugging superpower.
### Non-deterministic bugs
@ -48,11 +48,20 @@ The goal is not a clean repro but a **higher reproduction rate**. Loop the trigg
Stop and say so explicitly. List what you tried. Ask the user for: (a) access to whatever environment reproduces it, (b) a captured artifact (HAR file, log dump, core dump, screen recording with timestamps), or (c) permission to add temporary production instrumentation. Do **not** proceed to hypothesise without a loop.
Do not proceed to Phase 2 until you have a loop you believe in.
### Completion criterion — a tight loop that goes red
## Phase 2 — Reproduce
Phase 1 is done when the loop is **tight** and **red-capable**: you can name **one command** — a script path, a test invocation, a curl — that you have **already run at least once** (paste the invocation and its output), and that is:
Run the loop. Watch the bug appear.
- [ ] **Red-capable** — it drives the actual bug code path and asserts the **user's exact symptom**, so it can go red on this bug and green once fixed. Not "runs without erroring" — it must be able to _catch this specific bug_.
- [ ] **Deterministic** — same verdict every run (flaky bugs: a pinned, high reproduction rate, per above).
- [ ] **Fast** — seconds, not minutes.
- [ ] **Agent-runnable** — you can run it unattended; a human in the loop only via `scripts/hitl-loop.template.sh`.
If you catch yourself reading code to build a theory before this command exists, **stop — jumping straight to a hypothesis is the exact failure this skill prevents.** No red-capable command, no Phase 2.
## Phase 2 — Reproduce + minimise
Run the loop. Watch it go red — the bug appears.
Confirm:
@ -60,7 +69,15 @@ Confirm:
- [ ] The failure is reproducible across multiple runs (or, for non-deterministic bugs, reproducible at a high enough rate to debug against).
- [ ] You have captured the exact symptom (error message, wrong output, slow timing) so later phases can verify the fix actually addresses it.
Do not proceed until you reproduce the bug.
### Minimise
Once it's red, shrink the repro to the **smallest scenario that still goes red**. Cut inputs, callers, config, data, and steps **one at a time**, re-running the loop after each cut — keep only what's load-bearing for the failure.
Why bother: a minimal repro shrinks the hypothesis space in Phase 3 (fewer moving parts left to suspect) and becomes the clean regression test in Phase 5.
Done when **every remaining element is load-bearing** — removing any one of them makes the loop go green.
Do not proceed until you have reproduced **and** minimised.
## Phase 3 — Hypothesise

View file

@ -9,7 +9,7 @@ User-invoked entry points (`disable-model-invocation: true`).
- **[grill-me](./grill-me/SKILL.md)** — Get relentlessly interviewed about a plan or design until every branch of the decision tree is resolved.
- **[handoff](./handoff/SKILL.md)** — Compact the current conversation into a handoff document so another agent can continue the work.
- **[teach](./teach/SKILL.md)** — Teach the user a new skill or concept over multiple sessions, using the current directory as a stateful teaching workspace.
- **[write-a-skill](./write-a-skill/SKILL.md)** — Create new skills with proper structure, progressive disclosure, and bundled resources.
- **[writing-great-skills](./writing-great-skills/SKILL.md)** — Reference for writing and editing skills well: the vocabulary and principles that make a skill predictable.
## Skills

View file

@ -5,6 +5,6 @@ description: Interview the user relentlessly about a plan or design. Use when th
Interview me relentlessly about every aspect of this plan until we reach a shared understanding. Walk down each branch of the design tree, resolving dependencies between decisions one-by-one. For each question, provide your recommended answer.
Ask the questions one at a time, waiting for feedback on each question before continuing.
Ask the questions one at a time, waiting for feedback on each question before continuing. Asking multiple questions at once is bewildering.
If a question can be answered by exploring the codebase, explore the codebase instead.

View file

@ -1,118 +0,0 @@
---
name: write-a-skill
description: Create a new agent skill with proper structure, progressive disclosure, and bundled resources.
disable-model-invocation: true
---
# Writing Skills
## Process
1. **Gather requirements** - ask user about:
- What task/domain does the skill cover?
- What specific use cases should it handle?
- Does it need executable scripts or just instructions?
- Any reference materials to include?
2. **Draft the skill** - create:
- SKILL.md with concise instructions
- Additional reference files if content exceeds 500 lines
- Utility scripts if deterministic operations needed
3. **Review with user** - present draft and ask:
- Does this cover your use cases?
- Anything missing or unclear?
- Should any section be more/less detailed?
## Skill Structure
```
skill-name/
├── SKILL.md # Main instructions (required)
├── REFERENCE.md # Detailed docs (if needed)
├── EXAMPLES.md # Usage examples (if needed)
└── scripts/ # Utility scripts (if needed)
└── helper.js
```
## SKILL.md Template
```md
---
name: skill-name
description: Brief description of capability. Use when [specific triggers].
---
# Skill Name
## Quick start
[Minimal working example]
## Workflows
[Step-by-step processes with checklists for complex tasks]
## Advanced features
[Link to separate files: See [REFERENCE.md](REFERENCE.md)]
```
## Description Requirements
The description is **the only thing your agent sees** when deciding which skill to load. It's surfaced in the system prompt alongside all other installed skills. Your agent reads these descriptions and picks the relevant skill based on the user's request.
**Goal**: Give your agent just enough info to know:
1. What capability this skill provides
2. When/why to trigger it (specific keywords, contexts, file types)
**Format**:
- Max 1024 chars
- Write in third person
- First sentence: what it does
- Second sentence: "Use when [specific triggers]"
**Good example**:
```
Extract text and tables from PDF files, fill forms, merge documents. Use when working with PDF files or when user mentions PDFs, forms, or document extraction.
```
**Bad example**:
```
Helps with documents.
```
The bad example gives your agent no way to distinguish this from other document skills.
## When to Add Scripts
Add utility scripts when:
- Operation is deterministic (validation, formatting)
- Same code would be generated repeatedly
- Errors need explicit handling
Scripts save tokens and improve reliability vs generated code.
## When to Split Files
Split into separate files when:
- SKILL.md exceeds 100 lines
- Content has distinct domains (finance vs sales schemas)
- Advanced features are rarely needed
## Review Checklist
After drafting, verify:
- [ ] Description includes triggers ("Use when...")
- [ ] SKILL.md under 100 lines
- [ ] No time-sensitive info
- [ ] Consistent terminology
- [ ] Concrete examples included
- [ ] References one level deep

View file

@ -0,0 +1,175 @@
# Glossary — Building Great Skills
The domain model for what makes a skill great. A skill exists to wrangle determinism out of a stochastic system; every term below is a lever on that goal. This is the disclosed reference for [`writing-great-skills`](SKILL.md).
**Bold terms** in any definition are themselves defined in this glossary; find them by their heading.
## Language
### Predictability
The degree to which a skill makes the agent behave the same *way* on every run — the same process, not the same output (a brainstorming skill should *predictably* diverge; its tokens vary, its behaviour doesn't). The root virtue every other term serves — cost and maintainability are symptoms of it, not rivals.
_Avoid_: consistency, reliability, robustness, output-determinism
### Model-Invoked
A skill that keeps its **description** field, so the agent can see it and fire it autonomously — and the human can still type its name, so model-invocation always *includes* user reach. There is no model-only state: a description only ever *adds* agent discovery, never removes the human's. Pays a permanent **context load** on every turn in exchange for that discoverability. Reachable by other skills, because the description that makes it agent-discoverable makes it invocable. A model-invoked skill whose content is all **reference** is also one home for shared reference: another skill can invoke it, so reference needed by several skills lives in one place. Pick model-invocation only when the agent must reach the skill on its own; if it never fires except by hand, drop the description and pay no context load.
_Avoid_: ability, tool, capability
### User-Invoked
A skill with its **description** stripped — invisible to the agent and reachable only by the human typing its name (user-*only*, where **model-invoked** is user-*and-agent*). Trades agent-discoverability for zero **context load**. Because it has no description, nothing but the human can reach it: no other skill can fire it.
_Avoid_: procedure, workflow, command
### Description
The skill's machine-readable trigger, and the one **context pointer** a **model-invoked** skill is forced to keep loaded at all times. Its mere presence *is* the invocation axis: keep it and the skill is model-invoked (and reachable by other skills); delete it and the skill is **user-invoked**, reachable only by the human. The source of a model-invoked skill's **context load**.
_Avoid_: frontmatter, summary
### Context Pointer
A reference held in the agent's context that names some out-of-context material and encodes the condition for reaching it. The **description** is the top-level context pointer (context window → skill); pointers to disclosed files are the same object one level down. Its wording, not the target, decides *when* the agent reaches — and *how reliably*. A must-have target behind a weakly worded pointer is a variance bug: fix the wording first, and inline the material only if sharpening fails.
_Avoid_: link, reference, import
### Context Load
The cost a **model-invoked** skill imposes on the agent's context window — its **description**, always loaded, spending both tokens and attention. What **user-invoked** skills escape by having no description, and the brake on splitting into more model-invoked skills.
_Avoid_: token cost, context bloat
### Cognitive Load
The cost a **user-invoked** skill imposes on the human — what they must hold in their head: which skills exist and when to reach for each (the human is the index). What **model-invocation** removes by being agent-discoverable, and the brake on splitting into more user-invoked skills. Not a cost to minimise: it is the price of human agency, the reason some skills stay user-invoked. Spend it where human judgement matters; remove it where it does not.
_Avoid_: human index, burden, overhead
### Granularity
How finely you divide skills. Finer division spends one of the two loads: more **model-invoked** skills spend **context load** (more descriptions crowding the window and competing for attention); more **user-invoked** skills spend **cognitive load** (more for the human to remember and reach for). Two cuts guide the division. By **invocation**, split off a model-invoked skill where you have a distinct **leading word** to trigger it — a trigger word you actually use in your prompts. By **sequence**, split a run of **steps** where a step's **post-completion steps** need hiding, since isolating it in its own context clears what follows. Beware the reverse: merging sequences exposes each step's post-completion steps to what follows, inviting premature completion.
_Avoid_: chunking, modularity
### Router Skill
A **user-invoked** skill whose job is to point at your other user-invoked skills — naming each and when to reach for it — so the human has one skill to remember instead of many. It can only hint, never fire them: user-invoked skills have no **description**, so nothing but the human can reach them. The cure for **cognitive load** when user-invoked skills multiply.
_Avoid_: dispatcher, menu, registry, index, router procedure
### Information Hierarchy
A skill's content ranked by how immediately the agent needs it — a single ladder, produced by two cuts: in-file or behind a pointer, and step or reference. The rungs:
- **Steps** — in-file, primary
- **Reference**, in-file — secondary
- **Reference**, disclosed — behind a **context pointer**
A skill with no **steps** uses just the bottom two rungs — often a legitimately flat peer-set (e.g. every rule of a review on one rung), which is a fine arrangement, not a smell. The hierarchy is independent of invocation: a skill can be model- or user-invoked whether it is all steps, all reference, or both. When a skill has steps, in-file reference that should be disclosed buries them and turns attending to them into a coin-flip — a variance lever, not just a legibility one. Keep the top of the ladder legible; push down it whatever you can.
_Avoid_: structure, organization, layout
### Branch
A distinct way a skill can be invoked — a case the skill handles — so different runs take different paths through it. A skill with many steps may carry many branches; a linear one has none.
_Avoid_: path, case, fork
### Progressive Disclosure
Moving **reference** down the ladder — out of SKILL.md and behind a **context pointer** — so the top stays legible. Not primarily a token optimisation; it is how the **information hierarchy** is protected. Licensed by **branching**: disclose what only some branches need, inline what every path needs, and if a pointer fires unreliably on must-have material, sharpen its wording, and pull it back inline only if that fails.
_Avoid_: lazy loading, chunking
### Steps
The ordered actions the agent performs — when a skill has them, the primary tier of its content, and the part that earns its place in SKILL.md. Not every skill has steps: a skill can be all steps (`tdd`), all **reference** (a review), or both, independent of invocation. Every step ends on a **completion criterion**, clear or vague.
_Avoid_: workflow, instructions, choreography
### Completion Criterion
The condition that tells the agent a unit of work is done — the target it judges against. Two properties make it a lever, not just a quality. Its **clarity** (can the agent tell done from not-done?) resists **premature completion** — a vague bound ("understanding reached") lets the agent declare done and slip to the next step; this axis needs *steps* to bite, since premature completion is a between-steps failure. Its **demand** (how much it requires) sets **legwork** — "every modified model accounted for" forces thorough work where "produce a change list" does not — and this axis is *not* step-bound: it can bind a body of flat reference too, which is how a skill with no steps still carries an exhaustiveness bar ("every rule applied"). The strongest criteria are both checkable and exhaustive.
_Avoid_: done condition, exit condition, stopping rule
### Post-Completion Steps
The **steps** that follow the current step. Visible, they pull the agent forward into **premature completion** — the more it sees, the stronger the tug; the defence is to hide them by splitting the sequence of steps into two.
_Avoid_: horizon, fog of war, lookahead
### Legwork
The work an agent does behind the scenes within a single step — reading files, exploring the codebase, making changes, digging up what it needs rather than offloading to the user. It lives below the step structure: never written as its own step, latent in the wording, controlled by the agent rather than the skill. The within-step counterpart to **post-completion steps**' across-step pull. Raised by a **leading word** (_comprehensive_, _thorough_) or a **completion criterion** that demands the work be exhaustive — including the demand axis applied to flat reference, which is what drives a skill of flat reference to cover all its rungs. Goes thin either when that demand is missing or when **premature completion** cuts the step short.
_Avoid_: scope, effort, diligence, coverage
### Reference
Material the agent refers to on demand — definitions, facts, parameters, examples, conditional instructions. When a skill has **steps** it is secondary to them; when a skill has none it is the entire content; or it lives outside any skill entirely — see **External Reference**. Reached via **context pointers**, and the prime candidate for **progressive disclosure**.
_Avoid_: supporting material, docs, background
### External Reference
**Reference** that lives outside the skill system — a plain file, no **description**, no **steps**, not invocable — that any skill can point at. The home for shared reference that needn't fire on its own, and the only shared home two **user-invoked** skills can use, since neither has a description and so neither can fire the other.
_Avoid_: doc, resource, knowledge base
### Leading Word
A compact concept — also called a *Leitwort* — already living in the model's pretraining, that the agent thinks with while running the skill. It encodes a behavioural principle in the fewest possible tokens by invoking priors the model already holds (e.g. _lesson_, _proximal zone of development_, _fog of war_, _tracer bullets_). Repeated as a token, never as a sentence, it accumulates a distributed definition across the skill and anchors a whole region of behaviour. Coining your own works if you define it clearly, but a made-up word recruits no priors — you pay in definition tokens what a pretrained word gives free. Reach for an existing word first.
A leading word serves **predictability** twice. In the body it anchors **execution** — the agent reaches for the same behaviour every time the concept appears, and inside flat reference it focuses attention on a class of thing to look for, recruiting the right checks each run. In the **description** it anchors **invocation** — and not only within the skill: when the same word lives in your prompts, your docs, and your codebase, the agent links that shared language to the skill and fires it more reliably. Word a description with the leading words you actually use when you want the skill.
_Avoid_: keyword, term, motif
### Single Source of Truth
The desired state where each meaning lives in exactly one authoritative place, so a change to the skill's behaviour is a change in one place. **Duplication** is its violation.
_Avoid_: home, canonical location
### Relevance
Whether a line still bears on what the skill does — the lens for what to keep. A line loses relevance either by never bearing on the task (mere exposition, or a **branch** that should be disclosed) or by going stale: drifting out of date as the behaviour or world it describes changes. Shorter skills are easier to keep relevant, because each line is cheaper to check. Distinct from **no-op**: relevance asks whether a line bears on the task, not whether it changes behaviour.
_Avoid_: load-bearing, staleness, freshness
## Failure Modes
### Premature Completion
Ending the current step before it is genuinely done, because the agent's attention slips to being done rather than to the work. A between-steps failure: it needs **steps** to occur — a skill with no steps that quits early isn't premature completion but thin **legwork** under an unmet demand. A tug-of-war between two forces: visible **post-completion steps** (the pull forward) and the **completion criterion**'s clarity (the resistance — a sharp, checkable bar holds; a vague one gives way). Fuzziness is the necessary condition: a sharp bound resists the pull no matter how many later steps are visible, so a step that never rushes needs no defending. Two levers hold a step that does, but reach for them in order: **sharpen the bound first** — it is local and cheap. Only when the criterion is irreducibly fuzzy *and* you actually observe the rush do you **hide the later steps** — and hiding only works across a real context boundary (a user-invoked hand-off or a subagent dispatch; an inline model-invoked call leaves the later steps in context and clears nothing). One cause of thin legwork, but distinct from it: legwork can be thin even when a step runs to full completion.
_Avoid_: premature closure, the rush, rushing, shortcutting
### Duplication
The same meaning given more than one **single source of truth**. It costs maintenance (change one place, you must change the others), costs tokens, and inflates prominence — repeating a meaning weights it on the ladder past its real rank. The accidental inverse of a **leading word**, which raises attention on purpose by repeating a token, never the meaning.
_Avoid_: repetition, redundancy
### Sediment
Layers of old content that settle in a skill and are never cleared, because adding feels safe and removing feels risky — so stale and irrelevant lines accumulate and you must core down through them to find what is still live. The default fate of any skill without a pruning discipline; the slow erosion of **relevance**, as opposed to **duplication**'s repeated meaning.
_Avoid_: accretion, bloat, cruft, rot
### Sprawl
A skill that is simply too long — too many lines in SKILL.md — independent of whether they are stale or repeated. Even an all-live, all-unique skill can sprawl. It costs readability (the agent wades through more before it can act, and attention thins across the excess), maintainability (every extra line is one more to keep **relevant**), and tokens. The cure is the **information hierarchy**: push **reference** down behind **context pointers**, and split by **branch** or sequence so each path carries only what it needs. Distinct from **sediment** (length from stale accumulation) and **duplication** (length from repeated meaning) — sprawl is length itself, whatever its cause.
_Avoid_: bloat, length, size, verbosity
### No-Op
An instruction that changes nothing because the model already does it by default — you pay load to tell the agent what it would do anyway. The test: does a line change behaviour versus the default? A line can be perfectly **relevant** and still be a no-op. The same priors that make a **leading word** free make a no-op worthless.
A leading word is a *technique*; No-Op is a *verdict* on a line — and they cross. A leading word too weak to beat the default is a no-op (_be thorough_ when the agent is already thorough-ish), and the fix is a stronger word that passes the verdict (_relentless_), not a different technique. So the No-Op test — does it change behaviour versus the default? — is also how you grade whether a leading word is earning its repetitions. This is model-relative, not reader-relative: two people disagreeing over whether a line is a no-op disagree about the default, and settle it by running the skill, not by debate.
_Avoid_: redundant instruction, restating the obvious, belaboring

View file

@ -0,0 +1,70 @@
---
name: writing-great-skills
description: Reference for writing and editing skills well — the vocabulary and principles that make a skill predictable.
disable-model-invocation: true
---
A skill exists to wrangle determinism out of a stochastic system. **Predictability** — the agent taking the same _process_ every run, not producing the same output — is the root virtue; every lever below serves it.
**Bold terms** are defined in [`GLOSSARY.md`](GLOSSARY.md); look them up there for the full meaning.
## Invocation
Two choices, trading different costs:
- A **model-invoked** skill keeps a **description**, so the agent can fire it autonomously _and_ other skills can reach it (you can still type its name too). It contributes to **context load** — the description sits in the window every turn. Mechanics: omit `disable-model-invocation`, and write a model-facing description with rich trigger phrasing ("Use when the user wants…, mentions…").
- A **user-invoked** skill strips the description from the agent's reach: only you, typing its name, can invoke it — and no other skill can. Zero context load, but it spends **cognitive load**: _you_ are the index that must remember it exists. Mechanics: set `disable-model-invocation: true`; the `description` becomes human-facing — a one-line summary, trigger lists stripped.
Pick model-invocation only when the agent must reach the skill on its own, or another skill must. If it only ever fires by hand, make it user-invoked and pay no context load.
When user-invoked skills multiply past what you can remember, that piled-up cognitive load is cured by a **router skill**: one user-invoked skill that names the others and when to reach for each.
## Information hierarchy
A skill is built from two content types — **steps** and **reference** — that mix freely: a skill can be all steps, all reference, or both. The core decision is which to use and where each sits on the **information hierarchy**, a ladder ranked by how immediately the agent needs the material:
1. **In-skill step** — an ordered action in `SKILL.md`, the primary tier: what the agent does, in order. Each step ends on a **completion criterion**, the condition that tells the agent the work is done. Make it _checkable_ (can the agent tell done from not-done?) and, where it matters, _exhaustive_ ("every modified model accounted for", not "produce a change list") — a vague criterion invites **premature completion**.
2. **In-skill reference** — a definition, rule, or fact in `SKILL.md`, consulted on demand. Often a legitimately flat peer-set (every rule of a review on one rung) — a fine arrangement, not a smell. _This skill is all reference._
3. **External reference** — reference pushed out of `SKILL.md` into a separate file, reached by a **context pointer**, loaded only when the pointer fires. (Spans _disclosed_ reference — a sibling file like `GLOSSARY.md`, still part of the skill — through fully **external reference** that lives outside the skill system and any skill can point at.)
A demanding completion criterion drives thorough **legwork** — the digging the agent does within the work — whether the skill has steps or not, since "every rule applied" binds flat reference just as "every step done" binds a sequence.
Push too little down and the top bloats; push too much and you hide material the agent actually needs. That tension is the whole decision.
**Progressive disclosure** is the move down the ladder — out of `SKILL.md` into a linked file — so the top stays legible. Mechanics: a linked `.md` file in the skill folder, named for what it holds (this skill discloses its full definitions to `GLOSSARY.md`). Some skills are used in more than one way, and each distinct way is a **branch** — different runs taking different paths through the skill. Branching is the cleanest disclosure test: inline what every branch needs, and push behind a pointer what only some branches reach. A **context pointer**'s _wording_, not its target, decides when and how reliably the agent reaches the material.
## When to split
**Granularity** is how finely you divide skills, and each cut spends one of the two loads, so split only when the cut earns it. Two cuts:
- **By invocation** — split off a **model-invoked** skill when you have a distinct **leading word** that should trigger it on its own, or another skill must reach it. You pay **context load** for the new always-loaded **description**, so that independent reach has to be worth it.
- **By sequence** — split a run of **steps** when the steps still ahead (a step's **post-completion steps**) tempt the agent to rush the one in front of it (**premature completion**). Keeping them out of view encourages the agent to do more **legwork** on the current task.
## Pruning
Keep each meaning in a **single source of truth**: one authoritative place, so changing the behaviour is a one-place edit.
Check every line for **relevance**: does it still bear on what the skill does?
## Leading words
A **leading word** is a compact concept already living in the model's pretraining that the agent thinks with while running the skill (e.g. _lesson_, _fog of war_, _tracer bullets_). Repeated throughout the text (though not necessarily - a strong leading word might only be needed once), it accumulates a distributed definition and anchors a whole region of behaviour in the fewest tokens, by recruiting priors the model already holds.
It serves predictability twice. In the body it anchors _execution_: the agent reaches for the same behaviour every time the word appears. In the description it anchors _invocation_: when the same word lives in your prompts, docs, and code, the agent links that shared language to the skill and fires it more reliably.
Hunt for opportunities to refactor skills to use leading words. A triad spelled out at three sites (**duplication**), a description spending a sentence to gesture at one idea — each is a passage begging to **collapse** into a single token. Examples include:
- "fast, deterministic, low-overhead" -> _tight_ — one quality restated across a phase — into a single pretrained word (a _tight_ loop).
- "a loop you believe in" -> _red_ — converts a fuzzy gate into a binary observable state (the loop goes _red_ on the bug, or it doesn't).
You win twice over: fewer tokens, _and_ a sharper hook for the agent to hang its thinking on. Assume every skill is carrying restatements that leading words retire — go find them.
## Failure modes
Use these to diagnose issues the user may be having with the skill.
- **Premature completion** — ending a step before it's genuinely done, attention slipping to _being done_. Defence, in order: sharpen the completion criterion first (cheap, local); only if it is irreducibly fuzzy _and_ you observe the rush, hide the post-completion steps by splitting (the sequence cut).
- **Duplication** — the same meaning in more than one place. Costs maintenance and tokens, and inflates a meaning's prominence on the ladder past its real rank.
- **Sediment** — stale layers that settle because adding feels safe and removing feels risky. The default fate of any skill without a pruning discipline.
- **Sprawl** — a skill simply too long, even when every line is live and unique. Hurts readability and maintainability and wastes tokens. The cure is the ladder: disclose **reference** behind pointers, and split by **branch** or sequence so each path carries only what it needs.
- **No-op** — a line the model already obeys by default, so you pay load to say nothing. The test: does it change behaviour versus the default? A weak leading word (_be thorough_ when the agent is already thorough-ish) is a no-op; the fix is a stronger word (_relentless_), not a different technique.