34.2 Repo indexing strategy
Overview and links for this section of the guide.
On this page
Level 1: Brute Force (Small Repos)
If your project is < 50 files, don't overthink it. Just traverse the directory, concatenate all non-ignored files into a single big string (XML-wrapped), and stuff it into the context window.
Gemini 1.5 Pro has a 2M token window. That fits 99% of side projects entirely.
Level 2: The File Map (Medium Repos)
If you have a 100MB monorepo, you can't send everything.
Strategy: 1. Generate a "File Map": A tree structure of all file paths. 2. Send the File Map to the model first. 3. User asks: "Update the login page." 4. Model looks at the map and says: "I need to read `src/pages/login.tsx` and `src/components/auth-form.tsx`." 5. Tool fetches those 2 files. 6. Model generates the answer.
Level 3: Embeddings (Large Repos)
For massive codebases (Google scale), you chunk every function and index it in a vector DB. This is complex to maintain (cache invalidation is hard). We will stick to Level 1 or 2 for this project.