|
Transcribes the audio in a specific interval of the selected TextGrid using the whisper.cpp engine, and writes the transcription result into the TextGrid.
This command extracts the sound corresponding to the selected interval, runs speech recognition on it, and splits the interval into sentence-level sub-intervals with the recognized text as labels. Optionally, a word-level tier is also created.
The original interval is split into multiple intervals, one per recognized sentence. Sentence boundaries are determined by terminal punctuation (periods, exclamation marks, question marks). If Include words is selected, then word-level alignment is also performed. For this, a new word tier is created if one does not already exist. The word tier then contains one interval per recognized word, with boundaries derived from Whisper's token-level timestamps produced using Dynamic Time Warping (DTW).
Before you can use the SpeechRecognizer, you need to install one or more Whisper model files (in GGML format, with extension .bin) into the subfolder whispercpp of the folder models in the Praat preferences folder.
Whisper models come in several sizes, each offering a different trade-off between speed and accuracy. Model names that contain .en are English-only models. All other models are multilingual. Available model sizes are: tiny, base, small, medium, large-v1, large-v2, large-v3, and large-v3-turbo (also known as turbo). Larger models are more accurate but require more memory and processing time.
Model files can be obtained from the Hugging Face repository at https://huggingface.co/ggerganov/whisper.cpp/tree/main.
© Anastasia Shchupak 2026-03-15