Home/
Part VII — Multimodal & Long Context (Where AI Studio Gets Spicy)/23. Audio/Video Inputs (If Your Workflow Uses Them)/23.2 Generating transcripts and cleaning them
23.2 Generating transcripts and cleaning them
Overview and links for this section of the guide.
On this page
Goal: clean transcripts without changing meaning
Transcripts are the foundation for action items and decision logs. If the transcript is noisy, everything downstream becomes unreliable.
Transcript cleaning is a constrained task: you are allowed to improve readability, but you are not allowed to change meaning.
Never “rewrite” what people said
Summaries can rewrite. Transcripts should be faithful. If you can’t hear a word, mark it as unclear instead of guessing.
A practical transcription pipeline
- Transcribe: produce a raw transcript with timestamps (and speakers if available).
- Normalize formatting: consistent timestamps, speaker tags, paragraph breaks.
- Correct obvious errors: names/acronyms using a glossary.
- Mark uncertainty: use a standard token like
[unclear]. - Optional: produce a “readable” version and keep the raw version for audit.
Cleaning rules (what you may and may not change)
Allowed changes:
- fix punctuation and capitalization,
- add paragraph breaks,
- expand obvious acronyms (if known),
- correct obvious misspellings,
- normalize filler words lightly (optional).
Disallowed changes:
- changing meaning or intent,
- adding content not said,
- removing uncertainty,
- summarizing instead of transcribing.
Speaker labels, names, and acronyms
Two tactics improve quality quickly:
- Speaker map: map “Speaker 1” to a name if you know it.
- Glossary: list project names, acronyms, product terms, and people names.
Provide these as inputs to the cleaning prompt. This reduces the model’s temptation to guess.
Quality checks
Quick checks that catch common issues:
- Timestamp continuity: timestamps are ordered and plausible.
- Unclear markers: any unintelligible parts are marked
[unclear]. - Speaker consistency: speaker labels don’t drift.
- Names/acronyms: match your glossary.
Copy-paste prompts
Prompt: clean transcript with strict rules
Clean this transcript for readability WITHOUT changing meaning.
Rules:
- Do not add content that isn't present.
- Do not paraphrase. Keep the original words as much as possible.
- Fix punctuation and formatting only.
- If a word/phrase is unclear, replace it with [unclear] (do not guess).
- Keep timestamps and speaker labels.
Glossary (names/acronyms you must preserve):
- ...
Transcript:
```text
...
```
Prompt: speaker diarization assistance (if labels are missing)
The transcript has no speaker labels. Propose speaker turns as "Speaker A", "Speaker B", etc.
Rules:
- Do not guess real names.
- Keep timestamps.
- If uncertain where a speaker changes, mark a note: [speaker uncertain].
Return the transcript with speaker tags added.