32.3 Choosing small vs large models strategically

Overview and links for this section of the guide.

The Model Tier System

Think of models as employees with different hourly rates.

  • The Intern (Gemini Flash / Small Models): Fast, cheap, eager. Great for summarizing, formatting, simple classification, and extracting data. Bad at complex reasoning or subtle nuance.
  • The Senior Engineer (Gemini Pro / Large Models): Expensive, thoughtful, thorough. Necessary for architecture, debugging tricky errors, writing creative content, and handling complex instructions.

Model Routing Pattern

You don't have to pick one model for your whole app. You can use a router.

  1. User asks a question.
  2. A tiny, cheap classifier (or even a regex keyword match) decides the difficulty.
  3. Simple? Route to Flash.
  4. Hard? Route to Pro.

Example: A customer support bot. - "Reset my password" → handled by Flash (or a deterministic script). - "My data is corrupted and I'm angry" → handled by Pro (needs empathy and complex troubleshooting).

Hybrid Workflows

Use the big model to generate the plan, and the small model to execute it.

The "Architect-Builder" Pattern:

1. (Pro) Read the spec and generate a list of 5 files to create.
2. (Flash) Write file 1.
3. (Flash) Write file 2.
...
6. (Pro) Review all files and spot integration bugs.

This gives you high-quality direction with low-cost implementation.

Distillation

You can also use the Pro model to generate synthetic training data (examples) to fine-tune a Flash model. This lets the small model "punch above its weight" for specific tasks.

Where to go next