The economics

Every query is the cheap part.

Most of RAG's bill is frontier tokens — your context pushed through a premium model on every question. Engram Smart CAG moves that into a one-time training step, then serves from a compact cartridge on open-model GPUs you run.

✓~$0.0003 per query vs ~$0.02 for frontier-model RAG
✓Flat cost and latency as your corpus grows
✓One-time training repays in the first few thousand queries

Estimate your savings ↓

Per-query cost

@ 100k queries / month

Engram Smart CAG$0.0003

Frontier model + prompt-caching$0.0060

RAG → frontier model$0.0200

~67×

vs frontier RAG

~20×

vs prompt-caching

Flat

cost & latency at scale

Savings calculator

What you'd pay per query.

Drag to your monthly volume. Serving cost only — one-time training repays in the first few thousand queries. Modeled vs a frontier-model RAG stack.

Queries per month 100,000

1K10K100K1M10M

Engram Smart CAG / mo

$30

Frontier RAG / mo

$2,000

You save / yr

$23,640

~67× less than frontier-model RAG, and flat as your corpus grows.

How it stacks up

Engram Smart CAG vs. the alternatives.

RAG, long-context, and fine-tuning each trade away cost, freshness, grounding, or scale.

	Engram Smart CAG	RAG	Long-context	Fine-tuning
Per-query cost	Very low, flat	High (per-token)	Very high	Low
Cost as corpus grows	Flat	Grows	Grows fast / caps out	Flat
Latency (time-to-first-token)	Flat (read-once)	Grows with context	Grows / caps out	Flat
Grounded & sourced	Yes	Yes	Yes	No
Fresh on updates	Retrain affected shards	Instant re-index	Instant	Full retrain
Scales to huge corpora	Yes (sharded)	Yes	No (window limit)	Limited
Private on your cloud	Yes	Depends	Depends	Yes

RAG stays a strong baseline — Engram Smart CAG's edge is flat cost and latency at scale, with grounding measured at RAG parity. Figures are modeled, self-hosted open 30B vs a frontier-model RAG stack.

Book a demo