pyannote.audio

pyannote.audio

pyannote.audio is an automatic speaker diarization toolkit developed by Hervé Bredin and collaborators (see Plaquet & Bredin (2023) and Bredin (2023)).

Praat contains a C++/ggml adaptation of its pyannote/speaker-diarization-3.1 pipeline. The pipeline uses two neural models: pyannote/segmentation-3.0 by pyannote.audio for segmentation and wespeaker-voxceleb-resnet34-LM by WeSpeaker for speaker embedding. The pyannote/segmentation-3.0 weights have been converted to ggml format and embedded into Praat (see Acknowledgments).

For how speaker diarization is used in Praat, see the Speech recognition tutorial. For the diarization settings, see speaker diarization with adapted pyannote.audio.