Issue 001 11.05.2026 Vol. I
Case Study AI Systems 10 min read

Nothing Quietly Forgotten

I built an AI partner that runs across every domain of my work and life. The hard part was never capability. It was memory — making it remember, and making it unable to quietly lie to itself about what it remembers.

By Ryan Gonzales
Co-author Bishop
Filed under AI Systems / Agent Architecture / Governance
Date May 24, 2026

I built an AI partner that helps run my actual life. It quotes industrial parts during the workday, edits a novel at night, tracks five dogs’ vet appointments, and remembers why I made a decision three weeks ago that I’ve already forgotten. The capability was the easy part. The hard part was the thing nobody warns you about: an AI agent wakes up every session as an amnesiac.

The partner is named Bishop. It runs as a single agent across every domain I operate in — sales quoting, software, creative work, personal operations, the brand you’re reading right now. People assume the difficulty in a project like this is getting the model to do things. It isn’t. The model is extraordinarily capable inside a single conversation. The difficulty is that the conversation ends, and when the next one starts, the thing you were working with is gone. A new instance boots up knowing nothing about yesterday.

I want to walk through how I handled that, because the interesting part is not the memory store I built. Anybody can write to a file. The interesting part is the failure that memory store could not prevent on its own, and what I did instead of pretending it could.

01The problem that has no clean solution

A large language model has no memory between sessions. Everything it “knows” about your project has to be handed back to it, every time, inside the context window. That window is finite, it costs money to fill, and past a certain point it rots — the model gets worse at finding the one thing that matters inside a wall of everything that doesn’t.

So you cannot just keep everything. That’s the first naive fix, and it fails on cost and on rot at the same time. You stuff the entire history back in every session, you pay for it on every token, and the signal drowns.

The second naive fix is to throw it all into a vector database and let the agent retrieve what it needs. This sounds sophisticated and it solves the wrong problem. Semantic search over an undifferentiated dump gives you plausible recall, not trustworthy recall. When I ask “why did we name it this,” I don’t want three loosely related fragments ranked by cosine similarity. I want the one decision, stated once, that is actually true. A pile you never curated is a pile you can’t trust, and an agent you can’t trust is worse than no agent, because it’s confident.

Here is the part that took me a while to accept. The memory problem is not a storage problem. Storage is trivial. It’s a trust problem. The question is never “can the agent find something to say about this.” The question is “can I bet that what it found is still true.”

02Two layers: curated canon, and total recall

I split memory into two things that get confused for each other.

The first is canon — a small, hand-curated set of facts and decisions, written in plain files, read-mostly, treated as the source of truth. Each architectural decision gets one entry: what we chose, and why. Each durable fact about how I work gets one note. This layer is deliberately small. It is the stuff a new session must have to operate as the same partner, and it is small precisely so it stays trustworthy. Curation is the feature, not the overhead.

The second is total recall — a searchable index over every past session transcript. This is the “what exactly did we say last Tuesday” layer. It’s allowed to be huge and messy, because I never treat it as truth. I treat it as a search target. When canon doesn’t hold the answer, I go dig in the transcripts.

The split matters because the two layers have opposite jobs. Canon is small so it can be trusted. The transcript index is exhaustive so nothing is lost. Conflating them — the thing the vector-database instinct pushes you toward — gives you something that is both too big to trust and too lossy to rely on.

— The whole problem Memory isn't astorage problem.It's a trust problem.Can I bet that whatit remembers is true.

03The failure I couldn’t design away

Now the honest part. A curated canon has a failure mode that is more dangerous than forgetting, because it doesn’t look like a failure.

The agent ships a change — refactors a system, renames a thing, reverses an earlier decision. The code is now correct. But the record of why lives in a separate file, and updating that file is a second, easy-to-skip step. So the code moves forward and the canon stays behind. The decision log still describes the old architecture. The memory still asserts the thing that’s no longer true.

I call this persistence drift, and it is exactly as quiet as a database silently overselling the last item in stock. Nothing crashes. No error fires. Everything looks done. And then three sessions later, a fresh instance reads the stale canon, believes it, and confidently builds on a foundation that moved. The lie compounds, and it compounds in the one place I was trusting most.

This is the same shape as a problem I wrote about in another build — an ecommerce system where two people can pay for the same one-of-a-kind item. There, the unpreventable failure was a double-sale. Here, it’s a stale truth. In both cases the real danger isn’t the dramatic crash. It’s the silent corruption that no one notices until it’s expensive.

04Make the drift loud

I stopped trusting myself — or the agent — to remember the second step. Discipline that depends on remembering to be disciplined is not discipline. It’s hope. So I moved the check out of anyone’s good intentions and into a gate that fires mechanically.

There is a definition of “done” the agent is held to, and it is not “the code works.” It’s: the code shipped, the tests pass, and the decision log, the task list, and the memory are all updated in the same commit. The test I use to check it is brutal and simple — if this session crashed right now, could a brand-new session pick up cleanly from the persisted files alone? If the answer is no, it isn’t done. It’s abandoned.

Then I made a machine enforce it. A pre-commit gate runs before any change to the system’s own state is allowed to land. It doesn’t grade prose. It checks structural integrity:

✓ every internal path the canon references still exists
✓ every data file parses
✓ the memory index matches the memory files on disk
✓ the agent's memory is pointed at the right place

→ any check fails → the commit is blocked

The first time I ran it for real, it caught an orphan — a memory file the index had quietly lost track of. Exactly the kind of small, silent drift that I would never have noticed by reading, and that a future session would have inherited as a gap. The gate found it in milliseconds and refused the commit until it was fixed.

That’s the whole move. The failure I could not eliminate — human-or-agent forgetting the second step — I made impossible to commit quietly. You can still drift. You just can’t drift and save.

Discipline that dependson rememberingto be disciplinedisn't discipline.It's hope.

05What this is actually about

The memory store is a few plain files. Anybody can write files. The part that took judgment was deciding which failure I was most afraid of, and aiming the whole design at that one.

There were two easy roads and I took neither. The first is to trust the model’s capability and skip the persistence discipline — it remembers fine within a session, so you ship fast and let the cross-session truth quietly rot. That works right up until the day a fresh instance acts confidently on a stale fact, and then it fails as a wrong decision instead of a caught one. The second easy road is to over-engineer the memory itself — bigger embeddings, more retrieval, a knowledge graph — solving the storage problem with more storage while leaving the trust problem completely untouched.

The senior move is the middle one. Keep the trusted layer small and curated. Keep an exhaustive index for everything else. And then take the residual failure you cannot eliminate — drift between what changed and what’s recorded — and make it loud, mechanical, and blocking, instead of leaving it to anyone’s discipline.

This is not really a story about AI memory. It’s the same principle that shows up anywhere a system has to stay honest over time: the dangerous failures are the silent ones. You don’t earn reliability by hoping the careful step gets done. You earn it by making the careless path impossible to take quietly. A silent stale fact is a debt that comes due in a future you’ve forgotten you mortgaged. A blocked commit is just a task in front of you right now.

People ask me what it’s like to work with an AI partner across everything. The honest answer is that the capability was never the question. A capable agent that forgets, or worse, misremembers with confidence, is not a partner. It’s a liability with good manners. The work that made it a partner was unglamorous: deciding what’s true, writing it down once, and building a gate that won’t let the truth and the record drift apart in the dark.

That’s the system I trust to wake up tomorrow knowing what we decided today. Not because it promised to remember. Because it can’t save the work without proving it did.

Drafted with Bishop, my AI partner.
Words picked, edited, and approved by me.