A memory-correct reliability layer for an AI product team
Their assistant was storing users' questions as facts. A drop-in gate took memory precision from about 35% to 100% on their labeled test set.
The problem
An AI product team had a subtle, corrosive bug. Their assistant's memory was storing users' questions as if they were facts. Ask it "is the office open on Friday?" and it might later "remember" that the office is open on Friday — a fact it was never told.
Over time this poisoned answers with confident, invented information. It is one of the most damaging failure modes an AI product can have, because it erodes the one thing the product is supposed to provide: trust.
"It erodes the one thing the product is supposed to provide: trust."
Our approach
The fix was not a bigger model or a longer prompt — it was a gate at the moment of storage. Before anything is written to memory, classify what kind of statement it is. A question is not a fact. A hypothetical is not a fact. Only genuine assertions of fact should be stored.
And even then, one mention should not be treated as certainty — so we added a trust model where confidence builds with corroboration instead of being assumed on first contact.
How the gate works
What we built
A drop-in reliability layer that sits in front of memory — the same layer we ship underneath every AI worker we build.
- Classifies every incoming message before anything is stored
- Stores only genuine facts — questions and hypotheticals are filtered out
- Applies a trust model — a single mention is not treated as established truth
- Drops into an existing assistant without rebuilding it
- Validated against the team's own labeled test set
Open-source tooling
We open-source the reliability tooling behind this work — the approach is public, the engagement itself stays anonymous. The same layer ships underneath every AI worker we build.
The result
On their labeled test set, precision went from roughly 35% to 100% — the layer stopped writing non-facts into memory. The assistant's answers stopped drifting toward confidently invented information.
We open-source the reliability tooling behind this work, so the approach is public; the engagement itself stays anonymous.
How we think about it
Reliability is the whole game for an AI product. A clever assistant that quietly corrupts its own memory is worse than a simple one that does not, because users learn they cannot trust it.
We would rather an AI say "I do not know" than confidently make something up — so we build the gate that makes that the default. This is the same layer we ship underneath every AI worker we build.
Is your AI product storing the right things?
A free audit will tell you whether your assistant's memory is reliable — and what a drop-in gate would actually cost.