How it works

From your sources to grounded answers in four steps.

Connect a source, train it once into cartridges, then answer every question cheaply — grounded in your own documents.

The problem

RAG re-reads your documents on every query.

It stuffs thousands of context tokens into a frontier model for every question — so you pay full price, again and again, for knowledge you already own.

01

Pay-per-read forever

Every query re-sends ~6k tokens to a premium model. Costs track traffic and never amortize.

02

Context windows cap out

Big corpora overflow the window. You chunk, lose context, and answers get shallow.

03

Your data, someone else's model

Sensitive documents stream to third-party APIs on every call. Hard to govern and audit.

The approach

Train the knowledge once. Serve it for pennies.

Engram Smart CAG compresses your documents into a trainable KV-cache cartridge — the knowledge, baked into an open model's attention. Read once, infer many.

$

Read-once economics

A one-time training cost, then a flat per-query price far below frontier RAG — and a flat time-to-first-token, both independent of corpus size.

Grounded, not generic

Answers come from your documents with source attribution — measured at RAG parity on single-document QA.

Scales past the window

We shard large corpora into many cartridges and retrieve the right ones — 50,000 docs answer like 50.

The pipeline

Four steps, one command.

1

Connect

Point Engram Smart CAG at SharePoint, Confluence, Drive, S3, or an upload. Incremental sync keeps it fresh.

2

Train

We generate self-study Q&A and distil your documents into cartridges. A one-time cost.

3

Retrieve

Semantic search picks the right cartridges per question — by meaning, not keywords.

4

Answer

The open model answers from the cartridge — grounded, sourced, cheap to serve at scale.

Capabilities

Built for governed enterprise knowledge.

Private deployment

Runs in your cloud with zero public ingress. Documents never leave your account.

Semantic retrieval

Dense embeddings find the right cartridges by meaning — robust to paraphrase.

Incremental updates

When documents change, only affected shards retrain — minutes, not a full rebuild.

Cost dashboard

Measured training cost, per-query cost, and break-even vs RAG for every corpus.

Tenant isolation

Every query is scoped to your own cartridges. Nothing crosses tenants.

Open models, standard stack

Open-weight LLMs you run, served on the standard open-model serving stack (vLLM) — no per-token lock-in to a frontier vendor.

🔒

Your knowledge stays yours.

Engram Smart CAG deploys inside your own cloud account, with no public endpoint. Your documents never train a shared model and never leave your perimeter.

Book a demo