Fifteen capability Skills, two days

Two days ago my personal AI system had a couple of Skills running, the rest scaffolded as files I hadn’t tested, and a longer list of capabilities I kept telling myself I’d get to.

Today it has fifteen Skills running in production, each one shipped under either test-driven discipline or a written SPEC, and the bones of a twenty-one-piece suite ecosystem that actually does what I claimed it would do when I started.

I want to walk through what that looks like, because I think most discussions of personal AI tooling are either vaporware demos or carefully cropped screenshots of a single feature working once. This is what a real working capability suite looks like at the end of two days of focused build work. The shape of it. What it cost. What discipline made the velocity possible.

i.The milestone count

Two days ago: eleven Skills shipped, five blocked on missing canonical docs, one unblocked-pending-author availability, one as a system hook, three more emerging mid-build that I hadn’t planned for. The v0 target was eighteen capability Skills. We were two-thirds of the way there by Friday morning.

By the end of the session that produced this post, the count moved. Two more Skills shipped this session alone. sync-to-repo shipped earlier in the day under SPEC discipline. prose-in-voice shipped after dinner under TDD. That brought capability Skills to thirteen. Add session-handoff (which lives in the same suite but as a sequencing tool, not a capability), and the directory holds fourteen Skills in .claude/skills/. Fifteen with the session-handoff counted.

The suite ecosystem inventory now reads: eleven shipped, plus two shipped this session, plus five blocked, plus one unblocked-pending-author, plus the hook, plus three emergent, plus session-handoff. Total twenty-one in the ecosystem. The v0 target of eighteen capability Skills is one Skill away.

That’s the headline. The interesting part is what each of those Skills actually does, and how the velocity got produced.

ii.What each Skill does, sketched

I’m not going to catalog all fifteen. Catalogs read like LinkedIn announcements, and the work isn’t a catalog, it’s a working surface. What I’ll do instead is sketch the shape of the suite by capability cluster, so you can see what kind of agent this actually is.

There’s an identity and persistence cluster. The Skills there make sure Bishop remembers who he is across sessions, doesn’t fragment back into generic Claude under context pressure, and verifies that everything that should be written down is actually written down before any unit of work gets called complete. session-handoff, verify-state-fresh, log-decision, sync-to-repo. The persistence cascade has machinery now. It used to be discipline I had to remember. It’s not anymore.

There’s a vetting and capture cluster. Nothing third-party gets installed without G0 through G5. Nothing automated fires outside the class permissions I authorized. Nothing said in passing gets lost. vet-third-party-code, vet-plugin, capture-thought. The discipline is mechanical, not aspirational.

There’s a brand and voice cluster. Anything that would land on a public surface gets routed through an approval-gated workflow that flags voice violations, never auto-publishes, and writes drafts to a staging folder where I make the per-piece decision. content-approval-workflow, flag-for-backlog, prose-in-voice. The hard rule is upstream of every draft I touch.

There’s an operations and cadence cluster. Daily briefs, weekly reviews, journal prompts, protected-time defense for the work window I can’t give away, deadline discipline that calls me out when I quietly let a self-imposed deadline slide. daily-weekly-review, journal-reflection, protected-time-defense, project-deadline-discipline. The cadence is invocation-driven by design. The Skills don’t fire on their own. I ask for them when I need them.

That’s twelve. The rest cover edge cases (the build discipline cluster, the code-spec-before-build Skill, the failed-eval-recovery scaffolding). The full inventory lives in the wiki. The point isn’t the count. The point is that each of those clusters represents a category of failure mode I used to handle by remembering to handle it. None of them are remembered now. They run.

On the velocity Velocity-from-disciplineisn't a paradox.Discipline IS the velocity.

iii.How discipline produced the velocity

Two days. Fifteen Skills. Most people would read that and assume corner-cutting. I want to push back on that read directly.

The velocity didn’t come from cutting discipline. It came from two specific disciplines that compound when applied in sequence.

The first is spec-before-build. Every Skill that landed this week had a written SPEC before the first line of body content got drafted. WHAT does this Skill do. HOW does it do it. VERIFY criteria, mechanical not aspirational. RISKS named. The SPEC is the contract. Once the contract is written, the body follows almost mechanically, because the body is the smallest legal expression of the contract. The SPEC pass takes twenty minutes. The body pass takes thirty. Most of the work happens in the SPEC pass, which means most of the rewriting also happens there, which means the body almost never gets thrown out and rewritten.

The second is TDD-for-Skills. The Skills authored under TDD this week each had a RED test before the body got written. A fresh subagent context, a scenario chosen to expose the failure mode the Skill is supposed to fix, watching the bare agent handle it badly. Then the body. Then a GREEN test in another fresh subagent context, watching the loaded Skill flip the behavior. Ten out of ten Skills hit GREEN on the first iteration this week. Not because the Skills were perfect. Because the RED test forced the body to address the exact failure mode it was scoped against, and the GREEN test caught any drift before deploy.

Velocity-from-discipline isn’t a paradox. Discipline IS the velocity. Without the SPEC pass, every Skill goes through three or four revisions after it ships. Without the RED-to-GREEN cycle, every Skill ships with a failure mode that surfaces later in real work, where the cost is real. The disciplines collapse those after-the-fact revisions back into author-time. The work happens once. That’s where the speed comes from.

iv.The suite ecosystem map

Here’s the part most build-in-public posts skip. The shape of the ecosystem matters more than the Skill count.

Skills don’t live alone. They compose. content-approval-workflow calls prose-in-voice when the draft is in Ryan’s manuscript register. It calls flag-for-backlog when a post-worthy moment surfaces during a session. session-handoff calls verify-state-fresh as its first step, and flag-for-backlog as its pre-handoff sweep, and sync-to-repo as its closing step. vet-third-party-code calls log-decision when an audit passes and a new dependency gets locked in. The Skills cross-link the same way a working knowledge graph cross-links. The links are where the value lives.

That’s why the ecosystem inventory is twenty-one and not fifteen. The hook (a system-level harness instruction) and the three emergent Skills (drafted mid-build, not in the original v0 plan) are part of the surface even though they’re not in the capability count. The five blocked Skills are part of the surface too, because they’re scoped, planned, and waiting on a specific canonical doc to be written before they can ship. They’re not missing. They’re sequenced.

A suite is not a list. A suite is a shape with edges and cross-links and intentional gaps. The shape this one took on this week has the property that every capability cluster has the Skills it needs, the cross-links are real (not nominal), and the gaps are documented and waiting on a specific blocker rather than on vague “we’ll get there eventually.”

A suite is nota list.A suite is a shapewith edges, cross-links,and intentional gaps.

v.What this is and what it isn’t

I want to be clear about the frame.

This is not a product. It is a working surface I built for one user (me), tested against one workflow (mine), and shaped against the failure modes I actually hit. Some of the discipline might generalize. Some of it probably doesn’t. The Skills are written against my specific domains, my specific vetting standards, my specific persistence model. They’re not designed to be portable, and I haven’t tried.

This is not a demo. The Skills run on real sessions, every day, against real work. The vetting Skill has gated three plugin installs this week. The brand workflow has flagged two draft posts for voice violations. The session-handoff Skill has been the last thing I run every night for the last two weeks. The compounding is real. The maintenance overhead is real too, and I track it in the cost log alongside the Skills themselves.

This is not finished. The eighteen-Skill v0 target is one Skill away. After v0 closes, the suite still has the five blocked Skills sitting in the queue waiting on canonical docs, and a longer list of Skills that haven’t been scoped yet because they’re still ideas. Two days of build velocity does not equal “the work is done.” It equals “the work compounded fast enough this week to clear a milestone.”

What this is, is proof of capability. The kind of personal AI infrastructure that actually changes how a working day feels, built incrementally, on a single machine, with a single user, under disciplines that any solo builder can apply to their own surface.

Fifteen Skills. Two days. Discipline as the multiplier. That’s the milestone. The next one will look different. They always do.

Drafted with Bishop, my AI partner.
Words picked, edited, and approved by me. Model provenance: Claude Code (Claude Opus and Sonnet)