RESEARCH2026-03-268 min read

Governed Memory: How Per-Phase Injection Cuts Token Usage by 38%

AI coding agents are voracious consumers of context. The more an agent knows about your codebase, conventions, and project history, the better its output. But there is a cost: every token of context consumed is a token you pay for, and large context windows introduce noise that can degrade output quality. We built governed memory to solve this tension. This article describes the architecture, the injection policies, and the measured results from deploying governed memory in production.

The problem: context overload

Traditional memory systems take a retrieval-augmented generation approach: search for relevant context, stuff it into the prompt, and let the model figure out what matters. This works acceptably for simple queries but breaks down for multi-phase agent workflows.

Consider a typical Memi workflow. The agent receives a task like “add rate limiting to the API.” It needs to research the existing API structure, plan the implementation, write the code, and verify the result. Each phase has different context requirements. During research, the agent needs broad knowledge about the codebase architecture, past decisions about rate limiting discussions, and relevant documentation. During coding, it needs specific patterns from the codebase, coding conventions, and the exact files it needs to modify. During the summary phase, it needs almost no memory context at all.

Injecting the full memory context at every phase wastes tokens and introduces noise. Research context that is useful for understanding the problem space becomes a distraction during the focused coding phase. Codebase conventions that guide implementation are irrelevant during summary generation. We measured that naive full-context injection was consuming 40 to 60 percent more tokens than necessary across typical workflows.

The dual-model architecture

Governed memory uses a dual-model architecture. A lightweight router model determines which phase the agent is currently in and applies the appropriate injection policy. The executor model receives only the context that the router has approved for that phase. This separation keeps the routing logic fast and cheap (using a smaller model) while giving the executor model a focused, relevant context window.

The router classifies each agent step into one of five phases: research, planning, coding, verification, and summary. Each phase has a defined access tier that governs which memory blocks are injected and at what detail level.

Tiered access policies

Memoire organizes agent memory into core blocks: company context (tech stack, architecture, team structure), learnings (patterns discovered from past sessions), preferences (coding conventions, style rules), and codebase knowledge (file structure, key abstractions, dependency versions). Each block can be injected at full detail, summary level, or not at all.

The research phase gets broad access. All memory blocks are injected at full detail because the agent needs maximum context to understand the problem space. Company context helps the agent understand architectural constraints. Learnings from past sessions help avoid known pitfalls. Preferences ensure the research phase considers team standards from the start.

The planning phase gets focused access. Company context and preferences are injected at full detail, but learnings and codebase knowledge are injected at summary level. The agent needs to know constraints and conventions to create a good plan, but it does not need every detail from every past session.

The coding phase gets conventions-only access. Preferences (coding style, naming conventions, patterns to follow) are injected at full detail. Codebase knowledge is injected at summary level. Company context and learnings are not injected. This keeps the coding model focused on writing clean, consistent code without being distracted by high-level organizational context.

The verification phase gets minimal access. Only codebase knowledge (for understanding test patterns) is injected at summary level. The model's job is to verify that code compiles, tests pass, and linting is clean. It does not need broad context for this.

The summary phase gets no memory injection at all. The model summarizes what it did based on the conversation history. Injecting memory context at this phase adds noise and increases cost with no quality benefit.

Entity isolation for multi-tenant safety

In a multi-tenant environment, governed memory must also enforce entity isolation. Memory from Organization A must never leak into the context window of an agent working on Organization B's task. This sounds obvious, but it is surprisingly easy to get wrong when memory is stored in a shared vector database and retrieved based on semantic similarity.

Memoire enforces entity isolation at the storage level (separate memory partitions per organization), the retrieval level (organization-scoped queries with mandatory filters), and the injection level (the router verifies entity boundaries before approving any memory block for injection). This three-layer approach ensures that even a compromised or buggy retrieval query cannot cross organizational boundaries.

Measured results

We deployed governed memory in production across all Memi workflows and measured the impact across three dimensions: token usage, output quality, and latency.

Token usage decreased by an average of 38 percent across all workflows. The largest savings came from the coding and verification phases, where naive injection was stuffing thousands of tokens of irrelevant context. Research-heavy tasks saw smaller savings (15 to 20 percent) because the research phase intentionally receives broad context.

Output quality, measured by PR acceptance rate and number of review iterations, improved by 12 percent. We attribute this to reduced noise in the context window. When the coding model receives only conventions and relevant patterns, it produces more consistent code that better matches team standards. The focused context window appears to reduce the “distraction effect” where models latch onto irrelevant context and produce tangential output.

Latency decreased by 15 to 25 percent for coding and verification phases, proportional to the reduction in input tokens. Research and planning phases showed minimal latency change because they still receive broad context. The net effect across a full workflow is approximately 18 percent faster end-to-end completion.

Implementation details

The governed memory system is implemented as a middleware layer between Memoire's memory retrieval engine and the procedure engine that orchestrates agent workflows. When the procedure engine advances to a new step, it calls the governed memory middleware with the current phase identifier. The middleware consults the injection policy for that phase, retrieves the appropriate memory blocks at the specified detail level, and returns the assembled context block.

The injection policies are defined declaratively in a configuration object. This makes it straightforward to experiment with different policies, A/B test injection strategies, and adjust the system as we learn more about optimal context allocation.

For teams running Memoire self-hosted, governed memory is enabled by default but fully configurable. You can define custom phases, adjust the access tiers for each memory block, and even create organization-specific injection policies. The enterprise plan adds policy audit logging, so compliance teams can verify exactly what context was injected for every agent invocation.

What comes next

Governed memory is version one of a larger vision for intelligent context management. We are exploring adaptive injection, where the system learns optimal context allocation from feedback signals like PR acceptance rates and review iteration counts. We are also investigating per-task type policies, where a bug fix receives different context than a new feature, and a refactoring task receives different context than a documentation task.

The broader insight is that more context is not always better context. The art of memory systems for AI agents is not just in what you remember, but in knowing what to surface, when to surface it, and when to stay silent. Governed memory is our first step toward that future.

← Back to blog