SpeechRecognizer

The SpeechRecognizer is one of the types of objects in Praat. It performs automatic speech recognition (speech-to-text) on a Sound object. The actual recognition is performed by the whisper.cpp engine, and therefore our SpeechRecognizer is an interface to whisper.cpp.

Commands

Creation:

 Create SpeechRecognizer...

Recognition:

 Transcribe

Installing Whisper models

Before you can use the SpeechRecognizer, you need to install one or more Whisper model files (in GGML format, with extension .bin) into the subfolder whispercpp of the folder models in the Praat preferences folder.

Whisper models come in several sizes, each offering a different trade-off between speed and accuracy. Model names that contain .en are English-only models. All other models are multilingual. Available model sizes are: tiny, base, small, medium, large-v1, large-v2, large-v3, and large-v3-turbo (also known as turbo). Larger models are more accurate but require more memory and processing time.

Model files can be obtained from the Hugging Face repository at https://huggingface.co/ggerganov/whisper.cpp/tree/main.


© Anastasia Shchupak 2026-03-15