SpeechRecognizer & Sound: Transcribe

Transcribes selected Sound using selected SpeechRecognizer object and writes the result of transcription to the Info window.

The sound is automatically resampled to 16 kHz (the sampling frequency expected by whisper.cpp) before being processed.

The transcription uses the speech activity detection with Silero VAD built into whisper.cpp to skip non-speech parts of the sound, which improves both speed and accuracy.

The result is a flat text string containing the full transcription.


© Anastasia Shchupak 2026-03-15