The Quality Gap Is Structural, Not Topical
The strongest skills in the corpus do not merely expose a tool. They remove runtime ambiguity. They turn decisions the model would otherwise infer on the fly into explicit operating surfaces: activation boundaries, routing rules, error recovery, negative constraints, output formats, and state lifecycles.
The weakest skills fail in the opposite direction. They look usable because the topic is present, but the agent is left to improvise everything that matters after the happy path. That gap compounds with longer context windows, shifting model behavior, and pressure-filled situations where the base model is most tempted to guess.
Reliable skills narrow the action space. They tell the agent when to activate, how to branch, where to write, what to return, and what failures should trigger escalation instead of another blind attempt.
Weak skills read like wrappers or man pages. They describe capabilities, but the model still has to infer the operational contract in real time, which is exactly where drift and unsafe improvisation show up.
"The best skill files are not richer descriptions. They are executable decision systems written in natural language."
Ten Dimensions Separate Prompt Injections From Operating Manuals
The 10 dimensions below are the recurring fault lines in the corpus. Together they describe whether a skill behaves like a precise runtime contract or a loose essay about a tool.
Activation Boundaries
Clear "use this / do not use this" logic prevents accidental invocation in the wrong context.
Selective Loading
Good skills route to sub-files or references instead of forcing every deep detail into every invocation.
Intent and Complexity Routing
The skill should branch explicitly between direct execution, adaptive loops, and longer multi-step or research work.
Error Handling
Reliable skills specify recovery paths, stop conditions, and when to escalate instead of retrying blindly.
Negative Constraints
High-risk actions need explicit "never" and "do not" rules, not just optimistic examples.
Security Disclosure
The skill should make the data boundary legible: what leaves the machine, what stays local, and what is retained elsewhere.
State Architecture
Any persistence mechanism needs location, size, promotion, and demotion rules or it decays into unbounded context sludge.
Cross-Skill Contracts
Related skills should declare typed reads, writes, and preconditions instead of leaving coordination implicit.
Output Formats
Different modes should return different structures. A browse action and a deep-dive report should not share one fuzzy template.
Frontmatter Metadata
Bins, environment variables, OS constraints, and install steps should be machine-readable rather than buried in prose.
Most of the quality gap in the corpus is not evenly distributed across all ten dimensions. It clusters around the dimensions that make the model's decision process observable and stable under pressure.
Where the Gaps Actually Show Up
Selective loading beats inline encyclopedias
The corpus repeatedly shows that long, always-loaded skill bodies are not a sign of rigor. They are usually a sign that the author has not separated routing from reference material. The best files stay lightweight at invocation time and defer depth to targeted sub-files or reference sections.
| Example | Always-loaded footprint | What works | Failure mode |
|---|---|---|---|
| Git by Ivan G Davila | 141 lines | Quick-reference rules route the agent to deeper files only when a task actually needs them. | Depth is conditional, not paid up front. |
| Anthropic production skills | Concise shell + references | Core behavior stays short, while richer examples sit behind secondary references. | The model can stay oriented without carrying dead weight. |
| API Gateway by byungkyu | 664 lines | Broad routing coverage exists, but it is paid on every invocation regardless of relevance. | The model loads a service directory before it knows whether the service matters. |
Routing and failure handling remove improvisation
The most technically sophisticated skills in the corpus succeed because they make branching explicit without pretending the boundary is perfectly crisp. In practice, the useful distinction is not "simple versus deep" by call count. It is whether uncertainty, branching, and execution shape imply a direct path, an adaptive loop, or a longer research workflow.