WeSpeaker

WeSpeaker

WeSpeaker is a speaker embedding learning toolkit developed by WeNet Community (see Wang et al. (2023) and Wang et al. (2024)). Praat uses one of its pretrained models, the wespeaker-voxceleb-resnet34-LM embedding model, as part of pyannote.audio’s pyannote/speaker-diarization-3.1 pipeline.

wespeaker-voxceleb-resnet34-LM was trained on the VoxCeleb2 dataset (see Chung, Nagrani & Zisserman (2018) and Nagrani, Chung & Zisserman (2017)). The model weights have been converted to ggml format and embedded into Praat (see Acknowledgments). Praat contains a C++/ggml port of WeSpeaker’s ResNet34 architecture with TSTP pooling, used for inference on this model.

Links to this page

speaker diarization with adapted pyannote.audio