𝗜 𝗕𝘂𝗶𝗹𝘁 𝗟𝗶𝘃𝗲 𝗖𝗮𝗽𝘁𝗶𝗼𝗻𝘀 𝗶𝗻 𝘁𝗵𝗲 𝗕𝗿𝗼𝘄𝘀𝗲𝗿

You do not need Whisper. You do not need an API key. You do not need a server.

Chrome and Edge include a built-in speech-to-text engine. I built live captions with it using 30 lines of code.

Try it here: https://dev48v.infy.uk/solve/day8-live-captions.html

The code uses the SpeechRecognition API.

Two settings make the difference between a simple dictation box and real live captions:

Without these, the engine only shows text after you pause. With them, you see guesses in real time. This creates that flickering caption effect.

The engine stops if it hears silence for too long. You solve this by restarting it in the onend event. This loop keeps the captions running through pauses and quiet moments.

Chrome streams audio to Google servers for this process. This means you need an internet connection.

You can use this for more than a microphone. You can capture audio from a video call or a YouTube tab. Use getDisplayMedia to grab the audio and feed it to the transcriber.

The browser has many unused features. Live captions and voice commands are available through one line of code. You do not need a backend.

Source: https://dev.to/dev48v/i-built-live-captions-in-the-browser-no-api-key-no-server-4i7n