What it does

Speech to Text transcribes audio from your microphone, audio file, or video file into plain text using an AI speech recognition model.

How to use

Choose source — switch between Microphone and Audio file.
Select language — pick the spoken language or leave Auto-detect to let the model identify it.
Microphone mode — hold the button while speaking, release to transcribe.
With punctuation — enable to add commas and periods automatically based on pause length. Punctuation follows pauses, not grammar rules, so results are approximate.
File mode — drop an audio or video file (MP3, WAV, M4A, OGG, MP4, WebM…) or click to choose, then press Transcribe.
Copy the result — use the Copy button or select text manually.

Each new recording or file replaces the previous transcript. Use Clear to empty it manually.

First-run download

On the very first use, the AI model is downloaded and stored in your browser cache. Subsequent uses load instantly — no repeated downloads.

Supported formats

Any format your browser can decode: MP3, WAV, FLAC, OGG, M4A, WebM, MP4, and more.

Accuracy

The model handles everyday speech well in most major languages. For accented or technical speech, results may vary. Best results come from clean recordings without heavy background noise.

Languages

Supports 90+ languages. When Auto-detect is selected, the model identifies the language from the first 30 seconds of audio.

Privacy

Audio never leaves your device. No account required.