32.2 Summarize and compress context safely

Overview and links for this section of the guide.

Recursive Summarization

The standard way to handle long conversations is to summarize the history once it exceeds a certain length. Instead of sending messages 1–50, you send:

  1. A summary of messages 1–40.
  2. The raw text of messages 41–50.

This keeps the "immediate context" fresh while preserving the "long-term memory" in a cheaper format. You can ask a cheap model (like Gemini Flash) to generate these summaries in the background.

Filtering and Selecting

Don't send the whole file if you only need a function.

  • Codebases: Use tools like `ctags` or tree-sitter to extract only the signatures of functions, not the implementation, when asking high-level architectural questions.
  • Data: If you have a CSV, send the header row and 5 sample rows, not the whole dataset.
  • Logs: Send the last 50 lines and any lines containing "ERROR", rather than the full 10MB log file.
The "Need to Know" Principle

Treat the model like a busy executive. Don't dump the filing cabinet on their desk. Hand them the one-page summary and the specific document they asked for.

Format Compression

The format of your data changes its token count.

  • JSON vs. YAML: JSON has a lot of braces `{}`, quotes `""`, and commas. YAML is often 20-30% fewer tokens for the same data because it relies on whitespace.
  • CSV: For tabular data, CSV is extremely token-efficient compared to a list of JSON objects (which repeats keys for every row).
  • Minification: Removing whitespace from code (minification) saves tokens but makes it harder for the model to understand structure. Usually, it's better to keep indentation but remove comments if you are desperate for space.

Where to go next