Reliability Open-source

A memory-correct reliability layer for an AI product team

Their assistant was storing users' questions as facts. A drop-in gate took memory precision from about 35% to 100% on their labeled test set.

The problem

An AI product team had a subtle, corrosive bug. Their assistant's memory was storing users' questions as if they were facts. Ask it "is the office open on Friday?" and it might later "remember" that the office is open on Friday — a fact it was never told.

Over time this poisoned answers with confident, invented information. It is one of the most damaging failure modes an AI product can have, because it erodes the one thing the product is supposed to provide: trust.

"It erodes the one thing the product is supposed to provide: trust."

Our approach

The fix was not a bigger model or a longer prompt — it was a gate at the moment of storage. Before anything is written to memory, classify what kind of statement it is. A question is not a fact. A hypothetical is not a fact. Only genuine assertions of fact should be stored.

And even then, one mention should not be treated as certainty — so we added a trust model where confidence builds with corroboration instead of being assumed on first contact.

How the gate works

Incoming message

→

Classify speech act

Genuine fact ✓

→ stored with trust score

Question / hypothetical

→ filtered, never stored

What we built

A drop-in reliability layer that sits in front of memory — the same layer we ship underneath every AI worker we build.

Classifies every incoming message before anything is stored
Stores only genuine facts — questions and hypotheticals are filtered out
Applies a trust model — a single mention is not treated as established truth
Drops into an existing assistant without rebuilding it
Validated against the team's own labeled test set

Open-source tooling

We open-source the reliability tooling behind this work — the approach is public, the engagement itself stays anonymous. The same layer ships underneath every AI worker we build.

The result

~35% ~35%→100%rarr; 100%

precision on labeled test set

Drop-in

no rebuild required

Public

tooling is open-source

On their labeled test set, precision went from roughly 35% to 100% — the layer stopped writing non-facts into memory. The assistant's answers stopped drifting toward confidently invented information.

We open-source the reliability tooling behind this work, so the approach is public; the engagement itself stays anonymous.

How we think about it

Reliability is the whole game for an AI product. A clever assistant that quietly corrupts its own memory is worse than a simple one that does not, because users learn they cannot trust it.

We would rather an AI say "I do not know" than confidently make something up — so we build the gate that makes that the default. This is the same layer we ship underneath every AI worker we build.

Is your AI product storing the right things?

A free audit will tell you whether your assistant's memory is reliable — and what a drop-in gate would actually cost.

Get a quote All case studies