Skip to main content
The Context Engine is the core of Memory Crystal. It runs automatically before every response your AI generates, searching your short-term and long-term memory and injecting the most relevant context into the model — so your AI always responds with full awareness of what you’ve discussed before. You don’t configure the Context Engine. It runs silently on every turn.

The pipeline

When a message arrives, the Context Engine executes a nine-step pipeline to build the context that gets injected into the model.
1

Time-ordered recent window

The engine starts by pulling the most recent messages from STM — roughly the last 30 messages, up to a 7,000-character budget. This anchors the response in the current conversation.
2

Semantic search + BM25 text search

The engine runs two parallel searches across both STM and LTM: a vector similarity search for semantic relevance and a BM25 keyword search for exact text matches. Both sets of candidates are collected for the next stage.
3

Temporal hybrid retrieval

Date-aware retrieval pulls in memories that are relevant to the time period implied by the query — for example, surfacing memories from “last month” when you ask about something that happened then.
4

Knowledge graph boost

Memories with strong graph connections to the current topic are ranked higher. If a decision memory links to a lesson that links to a rule, and your query is about that topic, all three get a relevance boost. See Knowledge Graph below.
5

Multi-signal reranker

All candidates from the previous steps are scored across seven signals: vector similarity, memory strength, freshness, access frequency, salience, conversational continuity, and text match score. The reranker produces a unified ranked list.
6

Diversity filter

Near-duplicate results are deduplicated so that a cluster of very similar memories doesn’t crowd out other relevant context. The filter keeps the highest-ranked memory from each cluster.
7

Context budget gating

The ranked list is trimmed to fit within the model’s context window. The engine knows the token budget for different models and fills it precisely — maximizing context without overflowing.
8

Inject memories and recent context

The top-ranked memories and the recent conversation window are assembled and injected into the model’s system prompt before the response is generated.
9

Reinforcement injection

For long sessions (5+ turns), key memories are periodically re-injected to prevent them from falling out of the model’s active attention as the conversation grows.

Adaptive recall modes

The Context Engine doesn’t use the same recall strategy for every question. It selects one of six modes based on what the current message appears to need.
Broad recall across both STM and LTM. Used for open-ended questions and general conversation where no specific memory type is clearly most relevant.
Prioritizes decision, lesson, and rule memories. Automatically selected before risky or consequential actions — so your AI can surface the context of past choices before making new ones.
Pulls goals, workflows, dependencies, and active implementation context. Used when the conversation is centered on a specific project or codebase.
Focuses on person memories, ownership records, collaborator history, and relationship context. Used when the conversation is about a specific person or team.
Surfaces procedural memories, rules, and reusable how-to knowledge. Used when the conversation involves following a process or running a repeatable task.
Favors recent conversational continuity and session context. Used when the message is clearly a follow-up within the current session and recency matters most.
Recall mode selection is fully automatic. You don’t set or override it — the Context Engine picks the right mode based on what you’re asking.

Knowledge graph

Memories in LTM are not isolated entries. An async background job connects related memories into a graph after each extraction pass.
  • A decision memory links to the lessons that informed it.
  • A person memory links to the projects they worked on.
  • A rule memory links to the events that created it.
When the Context Engine searches, it uses these graph connections as a relevance signal. A memory that is closely connected to the current topic — even if its text doesn’t directly match the query — gets ranked higher because of its graph proximity. This means your AI doesn’t just retrieve isolated facts. It understands relationships between the things it knows, and surfaces context that a keyword search alone would miss.
You can explore these relationships directly using crystal_explain_connection to understand how two concepts relate, or crystal_dependency_chain to trace chains between entities.