32.3 Choosing small vs large models strategically

On this page

When to Use Each
Model Cascade
Smart Routing
Where to go next

When to Use Each

Task	Model	Reason
Classification	Flash	Simple task, speed matters
Extraction	Flash	Structured output, not creative
Summarization	Flash	Compression, not generation
Code generation	Pro	Quality matters more than speed
Complex reasoning	Pro	Chain of thought needed
Multi-step planning	Pro	Needs working memory

Model Cascade

// Try cheap model first, escalate if needed
async function cascadeGenerate(prompt: string) {
  // Step 1: Try Flash
  const flashResponse = await flash.generate(prompt);
  
  // Step 2: Check confidence
  if (flashResponse.confidence > 0.9) {
    return flashResponse;  // Flash is good enough
  }
  
  // Step 3: Escalate to Pro
  console.log('Escalating to Pro due to low confidence');
  return pro.generate(prompt);
}

// Result: 80% of requests handled by Flash (cheap)
// Only 20% need Pro (expensive but necessary)

Smart Routing

// Route based on task complexity
function selectModel(task: Task): string {
  const complexity = estimateComplexity(task);
  
  if (complexity === 'simple') {
    return 'gemini-1.5-flash';  // $0.075/1M
  } else if (complexity === 'medium') {
    return 'gemini-1.5-flash';  // Still use Flash
  } else {
    return 'gemini-1.5-pro';    // $1.25/1M
  }
}

function estimateComplexity(task: Task): string {
  if (task.type === 'classification') return 'simple';
  if (task.type === 'extraction') return 'simple';
  if (task.requiresReasoning) return 'complex';
  if (task.outputLength > 1000) return 'complex';
  return 'medium';
}

Where to go next

32.4 Batch processing vs interactive