The economics

Every query is the cheap part.

Most of RAG's bill is frontier tokens — your context pushed through a premium model on every question. Engram Smart CAG moves that into a one-time training step, then serves from a compact cartridge on open-model GPUs you run.

  • ~$0.0003 per query vs ~$0.02 for frontier-model RAG
  • Flat cost and latency as your corpus grows
  • One-time training repays in the first few thousand queries
Estimate your savings ↓

Per-query cost

@ 100k queries / month
Engram Smart CAG$0.0003
Frontier model + prompt-caching$0.0060
RAG → frontier model$0.0200
~67×
vs frontier RAG
~20×
vs prompt-caching
Flat
cost & latency at scale

Savings calculator

What you'd pay per query.

Drag to your monthly volume. Serving cost only — one-time training repays in the first few thousand queries. Modeled vs a frontier-model RAG stack.

100,000
1K10K100K1M10M
Engram Smart CAG / mo
$30
Frontier RAG / mo
$2,000
You save / yr
$23,640

~67× less than frontier-model RAG, and flat as your corpus grows.

How it stacks up

Engram Smart CAG vs. the alternatives.

RAG, long-context, and fine-tuning each trade away cost, freshness, grounding, or scale.

Engram Smart CAG RAG Long-context Fine-tuning
Per-query costVery low, flatHigh (per-token)Very highLow
Cost as corpus growsFlatGrowsGrows fast / caps outFlat
Latency (time-to-first-token)Flat (read-once)Grows with contextGrows / caps outFlat
Grounded & sourcedYesYesYesNo
Fresh on updatesRetrain affected shardsInstant re-indexInstantFull retrain
Scales to huge corporaYes (sharded)YesNo (window limit)Limited
Private on your cloudYesDependsDependsYes

RAG stays a strong baseline — Engram Smart CAG's edge is flat cost and latency at scale, with grounding measured at RAG parity. Figures are modeled, self-hosted open 30B vs a frontier-model RAG stack.