You Can't Benchmark AI With Real Meetings
I wanted to find the best AI notetaker. I compared Granola, Fathom, and Otter.
I started by recording a real meeting. I ran the recording through all three tools. Then I realized my experiment was useless.
To score a transcript, you need a correct version to compare it against. In a real meeting, the only record of what happened is the transcript itself. I was grading the exam using the students' own answers. I had no answer key.
If you lack ground truth, manufacture it.
I wrote a script for a two-person meeting first. I used ElevenLabs to turn that text into audio. Now, the exact words are something I typed. I have a perfect answer key.
I stuffed the script with difficult terms:
- Quarter labels (Q3, Q2)
- Percentages (5.2%, 6.8%)
- Dollar figures ($16 to $19)
- Jargon (churn, cohort, SSO, p95)
- Names and deadlines
Here is what I learned from the results:
All three tools are excellent at raw accuracy. Otter hit 99% accuracy. Fathom was the most precise. Granola kept the meaning but garbled a few lines.
Raw accuracy is the wrong metric. It is just the baseline. The real differences appear in two areas:
- Meaningful tokens: Otter had high accuracy but turned "Q3" into "Q". In a business meeting, that mistake ruins the data.
- Speaker attribution: Otter was the only tool that correctly identified who spoke when. Granola gave me one long stream of text without names.
The "best" tool depends on your goal:
- Use Otter if you need to know who said what.
- Use Fathom if you need perfect numbers and jargon.
- Use Granola if you want a bot-free experience for solo notes.
You can use this method for any speech-to-text testing. Script your audio to get a repeatable test. Add difficult words to see where models fail. Use the same clip to see if a vendor actually improves their model over time.
Synthetic audio is clean and easy. It is not a perfect simulation of a messy four-person meeting. But it provides a clean baseline to compare tools against each other.
Optional learning community: https://t.me/GyaanSetuAi