36.1 Upload/ingest flow for CSV-like data

Overview and links for this section of the guide.

You cannot send a 1GB CSV to the model. But you don't need to.

To understand the data, the model only needs:

  1. The column names.
  2. The data types (int, float, string).
  3. A few sample rows (e.g., the first 5).

This "metadata fingerprint" is usually < 500 tokens, even for a massive dataset.

Column Cleaning

Before sending the schema, clean the column names.

  • `Sales (2023)` -> `sales_2023`
  • `Unnamed: 0` -> drop it.

The cleaner the input schema, the better the SQL/Python generation.

Where to go next