Skip to main content
Memory Crystal implements tier-based rate limiting to ensure fair usage and protect infrastructure. Limits are per-user and per-minute.

Tier Limits

Rate limits are enforced per user, per minute, based on their subscription tier.
TierRequests/MinuteConcurrent RequestsEmbedding Calls/Day
Free100250
Starter2505200
Pro50010500
Ultra2,000505,000
UnlimitedUnlimitedUnlimitedUnlimited

Per-Endpoint Budgets

Different endpoints have different costs based on computational complexity:

Low Cost (1 credit)

  • GET /api/mcp/stats — no search
  • GET /api/knowledge-bases — list only
  • POST /api/mcp/recent-messages — retrieval only
  • POST /api/mcp/checkpoint — write only
  • POST /api/mcp/forget — delete only
  • POST /api/mcp/trace — lookup only

Medium Cost (5 credits)

  • POST /api/mcp/search-messages — lexical search
  • POST /api/mcp/capture — write + embedding
  • POST /api/knowledge-bases — create KB
  • POST /api/knowledge-bases/:id/import — batch import (cost scales with chunk count)

High Cost (10 credits)

  • POST /api/mcp/recall — vector search (embedding query)
  • POST /api/knowledge-bases/:id/query — KB vector search
  • POST /api/mcp/edit — update + re-embedding
What does this mean? A Free tier user (100 req/min) can make:
  • 100 calls to low-cost endpoints per minute
  • 20 calls to medium-cost endpoints per minute
  • 10 calls to high-cost endpoints per minute
Costs are counted against a shared pool, so a mix of calls reduces overall throughput.

Rate Limit Headers

Every response includes rate limit information:
X-RateLimit-Limit: 100          # Max requests/minute for this user's tier
X-RateLimit-Remaining: 87       # Requests remaining in current window
X-RateLimit-Reset: 1681234567   # Unix timestamp when window resets
Retry-After: 45                 # Seconds to wait before retry (if rate-limited)

Handling Rate Limits

Response: 429 Too Many Requests

When you exceed the limit:
{
  "error": "Rate limit exceeded",
  "retryAfterMs": 45000
}

Retry Strategy

Implement exponential backoff:
async function fetchWithRetry(url, options, maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    const response = await fetch(url, options);
    
    if (response.status === 429) {
      const retryAfter = response.headers.get('Retry-After');
      const delay = parseInt(retryAfter || '60') * 1000;
      
      console.log(`Rate limited. Retrying after ${delay}ms...`);
      await new Promise(resolve => setTimeout(resolve, delay));
      continue;
    }
    
    if (!response.ok) {
      throw new Error(`HTTP ${response.status}`);
    }
    
    return response.json();
  }
  
  throw new Error('Max retries exceeded');
}

// Usage
const result = await fetchWithRetry(
  'https://your-deployment/api/mcp/recall',
  {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${API_KEY}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({ query: 'authentication', limit: 5 })
  }
);

Python Example

import time
import requests
from backoff import expo, on_exception

@on_exception(expo, requests.exceptions.HTTPError, max_tries=5)
def fetch_memory(query):
    response = requests.post(
        'https://your-deployment/api/mcp/recall',
        headers={
            'Authorization': f'Bearer {API_KEY}',
            'Content-Type': 'application/json'
        },
        json={'query': query, 'limit': 5},
        timeout=10
    )
    
    if response.status_code == 429:
        retry_after = int(response.headers.get('Retry-After', 60))
        print(f'Rate limited. Waiting {retry_after}s...')
        time.sleep(retry_after)
        raise requests.exceptions.HTTPError(response=response)
    
    response.raise_for_status()
    return response.json()

result = fetch_memory('authentication strategy')

Optimization Strategies

1. Batch Requests

Instead of 10 individual recall requests, batch them:
// ❌ Bad: 10 separate calls
for (const query of queries) {
  const results = await fetchRecall(query);
}

// ✅ Good: 1 call per query type, then distribute results
const allResults = await Promise.all(
  queries.map(q => fetchRecall(q))
);

2. Use Low-Cost Endpoints

Prefer crystal_recent over crystal_recall when recent context is sufficient:
// ❌ High cost
const memories = await recall({ query: 'recent context', limit: 10 });

// ✅ Low cost
const recent = await recentMessages({ limit: 20 });

3. Cache Results Locally

Don’t repeat the same query within a minute:
const cache = new Map();

async function cachedRecall(query) {
  if (cache.has(query)) {
    console.log('Using cached result');
    return cache.get(query);
  }
  
  const result = await recall({ query, limit: 5 });
  cache.set(query, result);
  return result;
}

4. Increase Tier

If you consistently hit limits:
ScenarioRecommendation
3+ API calls per user sessionUpgrade to Pro
10+ embedding operations dailyUpgrade to Pro/Ultra
Batch imports > 1000 chunksUse Ultra tier
Building a production appUse Pro at minimum

Monitoring Rate Limit Usage

Check Current Usage

curl https://your-deployment/api/mcp/stats \
  -H "Authorization: Bearer $API_KEY" | jq '.usage'

Parse Headers After Each Request

curl -i https://your-deployment/api/mcp/recall \
  -H "Authorization: Bearer $API_KEY" \
  -d '{"query": "test", "limit": 1}' | grep X-RateLimit
Output:
X-RateLimit-Limit: 500
X-RateLimit-Remaining: 487
X-RateLimit-Reset: 1681234567

Set up Alerts

Monitor X-RateLimit-Remaining in your logs. Alert when it falls below 20% of limit:
function checkRateLimitHealth(remaining, limit) {
  const threshold = limit * 0.2;
  if (remaining < threshold) {
    console.warn(`⚠️ Approaching rate limit: ${remaining}/${limit} remaining`);
    // Send alert to Slack, PagerDuty, etc.
  }
}

Burst Handling

Memory Crystal allows brief bursts above the per-minute average:
  • Soft limit (90% of tier): Warning only
  • Hard limit (100% of tier): Rate-limited (429)
This means you can have short spikes without penalty, but sustained high usage will hit the limit.

Special Cases

Shared API Keys

If multiple users or services share one API key:
  1. All requests count against the same tier limit
  2. One heavy user can starve others
  3. Solution: Use separate API keys per user/service

Batch Imports

Large imports are rate-limited per-chunk:
// Chunking strategy for large imports
const chunkSize = 100;
for (let i = 0; i < allChunks.length; i += chunkSize) {
  const batch = allChunks.slice(i, i + chunkSize);
  await import({ chunks: batch });
  
  // Wait between batches to avoid hitting limit
  if (i + chunkSize < allChunks.length) {
    await sleep(1000);
  }
}

Contact Support

If you need:
  • Higher rate limits for legitimate use
  • Whitelist/priority handling for batch jobs
  • Custom SLA agreements
Reach out to support@memorycrystal.com with:
  • Your tier
  • Current usage pattern
  • Intended use case
  • Requested limits