Performance8 min read

Cutting Token Costs by 90% with Smart Recall

Marco BelliniPerformance LeadApril 29, 2026

We ran a benchmark comparing two strategies for giving an agent long-term context: full-history injection versus selective memory recall. The difference in cost and quality was dramatic.

In the full-history approach, every request includes the entire prior conversation. By turn 50, the average prompt was over 40,000 tokens. With selective recall, Libra retrieves only the top relevant memories, keeping the injected context under 2,000 tokens regardless of conversation length.

Across a 100-conversation test set, selective recall reduced total token consumption by 91% while improving answer relevance, because the model was not distracted by irrelevant earlier turns.

The key insight is that relevance, not recency, should drive what enters the context window. Libra ranks memories by semantic similarity to the current query, so a fact from three weeks ago can outrank yesterday's small talk if it is more pertinent.

If you are paying per token and your agents hold long conversations, selective recall is one of the highest-leverage optimizations available.

Give your AI a memory and personality.

Instant memory for LLMs — better, cheaper, personal.

Get Started Pricing