The pipeline
When a message arrives, the Context Engine executes a nine-step pipeline to build the context that gets injected into the model.Time-ordered recent window
The engine starts by pulling the most recent messages from STM — roughly the last 30 messages, up to a 7,000-character budget. This anchors the response in the current conversation.
Semantic search + BM25 text search
The engine runs two parallel searches across both STM and LTM: a vector similarity search for semantic relevance and a BM25 keyword search for exact text matches. Both sets of candidates are collected for the next stage.
Temporal hybrid retrieval
Date-aware retrieval pulls in memories that are relevant to the time period implied by the query — for example, surfacing memories from “last month” when you ask about something that happened then.
Knowledge graph boost
Memories with strong graph connections to the current topic are ranked higher. If a decision memory links to a lesson that links to a rule, and your query is about that topic, all three get a relevance boost. See Knowledge Graph below.
Multi-signal reranker
All candidates from the previous steps are scored across seven signals: vector similarity, memory strength, freshness, access frequency, salience, conversational continuity, and text match score. The reranker produces a unified ranked list.
Diversity filter
Near-duplicate results are deduplicated so that a cluster of very similar memories doesn’t crowd out other relevant context. The filter keeps the highest-ranked memory from each cluster.
Context budget gating
The ranked list is trimmed to fit within the model’s context window. The engine knows the token budget for different models and fills it precisely — maximizing context without overflowing.
Inject memories and recent context
The top-ranked memories and the recent conversation window are assembled and injected into the model’s system prompt before the response is generated.
Adaptive recall modes
The Context Engine doesn’t use the same recall strategy for every question. It selects one of six modes based on what the current message appears to need.General
General
Broad recall across both STM and LTM. Used for open-ended questions and general conversation where no specific memory type is clearly most relevant.
Decision
Decision
Prioritizes decision, lesson, and rule memories. Automatically selected before risky or consequential actions — so your AI can surface the context of past choices before making new ones.
Project
Project
Pulls goals, workflows, dependencies, and active implementation context. Used when the conversation is centered on a specific project or codebase.
People
People
Focuses on person memories, ownership records, collaborator history, and relationship context. Used when the conversation is about a specific person or team.
Workflow
Workflow
Surfaces procedural memories, rules, and reusable how-to knowledge. Used when the conversation involves following a process or running a repeatable task.
Conversation
Conversation
Favors recent conversational continuity and session context. Used when the message is clearly a follow-up within the current session and recency matters most.
Recall mode selection is fully automatic. You don’t set or override it — the Context Engine picks the right mode based on what you’re asking.
Knowledge graph
Memories in LTM are not isolated entries. An async background job connects related memories into a graph after each extraction pass.- A decision memory links to the lessons that informed it.
- A person memory links to the projects they worked on.
- A rule memory links to the events that created it.
