Chapter 11

Advanced Techniques

Production-grade optimizations for context management—KV-cache, dynamic tools, file-backed memory, recitation, and variation.

Beyond the WSCI framework, production systems employ specialized techniques that dramatically improve performance, reduce costs, and enhance reliability. These methods are proven in large-scale deployments.

💾

KV-Cache Optimization 10x Cost Reduction

LLMs internally cache prompt tokens in key–value pairs. Optimizing this cache reduces recomputation, making inference faster and cheaper.

Keep prompt prefixes stable so cached tokens remain reusable across requests
Avoid volatile tokens early (timestamps, session IDs) that invalidate the cache
Use append-only updates to maximize cache hit rates

"Cache hits are approximately 10x cheaper than recomputing tokens—a low-level but high-impact optimization."

🔧

Dynamic Tool Management

Large tool lists waste tokens and confuse models. Dynamic management controls which tools are visible without breaking consistency.

Keep the full tool list but mask irrelevant ones during decoding
Apply retrieval-based selection to expose only relevant tools per query
Reuse tool schemas across workflows to avoid duplication

"Dynamic tool management makes agents smarter and lighter, reducing confusion while preserving performance."

📁

File System as Extended Memory

Agents offload large or persistent data to external files instead of overloading the context window. The prompt carries only references.

Store documents, results, or web content in files on disk
Keep only summaries or IDs in the active context window
Retrieve full content on-demand when the task requires detail

"By using files as external memory, agents can handle limitless data while staying within context limits."

🎯

Maintaining Focus via Recitation

Agents often lose track of goals in long tasks ("lost in the middle" problem). Recitation re-injects key objectives at each step.

Maintain a persistent todo.md or goal file updated after every step
Append current goals and plans to the end of the context window
Use structured reminders to prevent drift or forgotten objectives

"Recitation helps agents stay on track in complex, multi-step tasks, preventing them from losing sight of objectives."

🔀

Injecting Variation

Repeating identical context structures can cause models to overfit and degrade. Small variations prevent stagnation.

Vary phrasing or formatting of repeated context elements
Rotate templates or synonyms for recurring instructions
Randomize low-impact details to break monotony without changing meaning

"Controlled variation improves robustness and stability across repeated tasks, reducing bias from repetitive patterns."

These advanced techniques are proven in production systems to optimize cost, reliability, and performance at enterprise scale.