Tier Limits
Rate limits are enforced per user, per minute, based on their subscription tier.| Tier | Requests/Minute | Concurrent Requests | Embedding Calls/Day |
|---|---|---|---|
| Free | 100 | 2 | 50 |
| Starter | 250 | 5 | 200 |
| Pro | 500 | 10 | 500 |
| Ultra | 2,000 | 50 | 5,000 |
| Unlimited | Unlimited | Unlimited | Unlimited |
Per-Endpoint Budgets
Different endpoints have different costs based on computational complexity:Low Cost (1 credit)
GET /api/mcp/stats— no searchGET /api/knowledge-bases— list onlyPOST /api/mcp/recent-messages— retrieval onlyPOST /api/mcp/checkpoint— write onlyPOST /api/mcp/forget— delete onlyPOST /api/mcp/trace— lookup only
Medium Cost (5 credits)
POST /api/mcp/search-messages— lexical searchPOST /api/mcp/capture— write + embeddingPOST /api/knowledge-bases— create KBPOST /api/knowledge-bases/:id/import— batch import (cost scales with chunk count)
High Cost (10 credits)
POST /api/mcp/recall— vector search (embedding query)POST /api/knowledge-bases/:id/query— KB vector searchPOST /api/mcp/edit— update + re-embedding
- 100 calls to low-cost endpoints per minute
- 20 calls to medium-cost endpoints per minute
- 10 calls to high-cost endpoints per minute
Rate Limit Headers
Every response includes rate limit information:Handling Rate Limits
Response: 429 Too Many Requests
When you exceed the limit:Retry Strategy
Implement exponential backoff:Python Example
Optimization Strategies
1. Batch Requests
Instead of 10 individual recall requests, batch them:2. Use Low-Cost Endpoints
Prefercrystal_recent over crystal_recall when recent context is sufficient:
3. Cache Results Locally
Don’t repeat the same query within a minute:4. Increase Tier
If you consistently hit limits:| Scenario | Recommendation |
|---|---|
| 3+ API calls per user session | Upgrade to Pro |
| 10+ embedding operations daily | Upgrade to Pro/Ultra |
| Batch imports > 1000 chunks | Use Ultra tier |
| Building a production app | Use Pro at minimum |
Monitoring Rate Limit Usage
Check Current Usage
Parse Headers After Each Request
Set up Alerts
MonitorX-RateLimit-Remaining in your logs. Alert when it falls below 20% of limit:
Burst Handling
Memory Crystal allows brief bursts above the per-minute average:- Soft limit (90% of tier): Warning only
- Hard limit (100% of tier): Rate-limited (429)
Special Cases
Shared API Keys
If multiple users or services share one API key:- All requests count against the same tier limit
- One heavy user can starve others
- Solution: Use separate API keys per user/service
Batch Imports
Large imports are rate-limited per-chunk:Contact Support
If you need:- Higher rate limits for legitimate use
- Whitelist/priority handling for batch jobs
- Custom SLA agreements
- Your tier
- Current usage pattern
- Intended use case
- Requested limits
