The economics
Most of RAG's bill is frontier tokens — your context pushed through a premium model on every question. Engram Smart CAG moves that into a one-time training step, then serves from a compact cartridge on open-model GPUs you run.
Savings calculator
Drag to your monthly volume. Serving cost only — one-time training repays in the first few thousand queries. Modeled vs a frontier-model RAG stack.
~67× less than frontier-model RAG, and flat as your corpus grows.
How it stacks up
RAG, long-context, and fine-tuning each trade away cost, freshness, grounding, or scale.
| Engram Smart CAG | RAG | Long-context | Fine-tuning | |
|---|---|---|---|---|
| Per-query cost | Very low, flat | High (per-token) | Very high | Low |
| Cost as corpus grows | Flat | Grows | Grows fast / caps out | Flat |
| Latency (time-to-first-token) | Flat (read-once) | Grows with context | Grows / caps out | Flat |
| Grounded & sourced | Yes | Yes | Yes | No |
| Fresh on updates | Retrain affected shards | Instant re-index | Instant | Full retrain |
| Scales to huge corpora | Yes (sharded) | Yes | No (window limit) | Limited |
| Private on your cloud | Yes | Depends | Depends | Yes |
RAG stays a strong baseline — Engram Smart CAG's edge is flat cost and latency at scale, with grounding measured at RAG parity. Figures are modeled, self-hosted open 30B vs a frontier-model RAG stack.