Metadata-Version: 2.1 Name: pyannote.audio Version: 3.3.2 Summary: Neural building blocks for speaker diarization Home-page: https://github.com/pyannote/pyannote-audio Author: Hervé Bredin Author-email: herve.bredin@irit.fr License: mit Platform: Linux Platform: Mac Classifier: Development Status :: 4 - Beta Classifier: Intended Audience :: Science/Research Classifier: License :: OSI Approved :: MIT License Classifier: Natural Language :: English Classifier: Programming Language :: Python :: 3.9 Classifier: Programming Language :: Python :: 3.10 Classifier: Programming Language :: Python :: 3.11 Classifier: Topic :: Scientific/Engineering Requires-Python: >=3.9 Description-Content-Type: text/markdown License-File: LICENSE Requires-Dist: asteroid-filterbanks>=0.4 Requires-Dist: einops>=0.6.0 Requires-Dist: huggingface-hub>=0.13.0 Requires-Dist: lightning>=2.0.1 Requires-Dist: omegaconf<3.0,>=2.1 Requires-Dist: pyannote.core>=5.0.0 Requires-Dist: pyannote.database>=5.0.1 Requires-Dist: pyannote.metrics>=3.2 Requires-Dist: pyannote.pipeline>=3.0.1 Requires-Dist: pytorch-metric-learning>=2.1.0 Requires-Dist: rich>=12.0.0 Requires-Dist: semver>=3.0.0 Requires-Dist: soundfile>=0.12.1 Requires-Dist: speechbrain>=1.0.0 Requires-Dist: tensorboardX>=2.6 Requires-Dist: torch>=2.0.0 Requires-Dist: torch-audiomentations>=0.11.0 Requires-Dist: torchaudio>=2.2.0 Requires-Dist: torchmetrics>=0.11.0 Provides-Extra: cli Requires-Dist: hydra-core<1.2,>=1.1; extra == "cli" Requires-Dist: typer<0.5.0,>=0.4.0; extra == "cli" Provides-Extra: dev Requires-Dist: pre-commit>=2.7; extra == "dev" Requires-Dist: recommonmark>=0.6; extra == "dev" Requires-Dist: black>=22.3.0; extra == "dev" Provides-Extra: separation Requires-Dist: transformers>=4.39.1; extra == "separation" Requires-Dist: asteroid>=0.7.0; extra == "separation" Provides-Extra: testing Requires-Dist: pytest>=6.0; extra == "testing" Requires-Dist: pytest-cov>=2.10; extra == "testing" Requires-Dist: jupyter; extra == "testing" Requires-Dist: papermill; extra == "testing" Using `pyannote.audio` open-source toolkit in production? Consider switching to [pyannoteAI](https://www.pyannote.ai) for better and faster options. # `pyannote.audio` speaker diarization toolkit `pyannote.audio` is an open-source toolkit written in Python for speaker diarization. Based on [PyTorch](pytorch.org) machine learning framework, it comes with state-of-the-art [pretrained models and pipelines](https://hf.co/pyannote), that can be further finetuned to your own data for even better performance. <p align="center"> <a href="https://www.youtube.com/watch?v=37R_R82lfwA"><img src="https://img.youtube.com/vi/37R_R82lfwA/0.jpg"></a> </p> ## TL;DR 1. Install [`pyannote.audio`](https://github.com/pyannote/pyannote-audio) with `pip install pyannote.audio` 2. Accept [`pyannote/segmentation-3.0`](https://hf.co/pyannote/segmentation-3.0) user conditions 3. Accept [`pyannote/speaker-diarization-3.1`](https://hf.co/pyannote/speaker-diarization-3.1) user conditions 4. Create access token at [`hf.co/settings/tokens`](https://hf.co/settings/tokens). ```python from pyannote.audio import Pipeline pipeline = Pipeline.from_pretrained( "pyannote/speaker-diarization-3.1", use_auth_token="HUGGINGFACE_ACCESS_TOKEN_GOES_HERE") # send pipeline to GPU (when available) import torch pipeline.to(torch.device("cuda")) # apply pretrained pipeline diarization = pipeline("audio.wav") # print the result for turn, _, speaker in diarization.itertracks(yield_label=True): print(f"start={turn.start:.1f}s stop={turn.end:.1f}s speaker_{speaker}") # start=0.2s stop=1.5s speaker_0 # start=1.8s stop=3.9s speaker_1 # start=4.2s stop=5.7s speaker_0 # ... ``` ## Highlights - :hugs: pretrained [pipelines](https://hf.co/models?other=pyannote-audio-pipeline) (and [models](https://hf.co/models?other=pyannote-audio-model)) on [:hugs: model hub](https://huggingface.co/pyannote) - :exploding_head: state-of-the-art performance (see [Benchmark](#benchmark)) - :snake: Python-first API - :zap: multi-GPU training with [pytorch-lightning](https://pytorchlightning.ai/) ## Documentation - [Changelog](CHANGELOG.md) - [Frequently asked questions](FAQ.md) - Models - Available tasks explained - [Applying a pretrained model](tutorials/applying_a_model.ipynb) - [Training, fine-tuning, and transfer learning](tutorials/training_a_model.ipynb) - Pipelines - Available pipelines explained - [Applying a pretrained pipeline](tutorials/applying_a_pipeline.ipynb) - [Adapting a pretrained pipeline to your own data](tutorials/adapting_pretrained_pipeline.ipynb) - [Training a pipeline](tutorials/voice_activity_detection.ipynb) - Contributing - [Adding a new model](tutorials/add_your_own_model.ipynb) - [Adding a new task](tutorials/add_your_own_task.ipynb) - Adding a new pipeline - Sharing pretrained models and pipelines - Blog - 2022-12-02 > ["How I reached 1st place at Ego4D 2022, 1st place at Albayzin 2022, and 6th place at VoxSRC 2022 speaker diarization challenges"](tutorials/adapting_pretrained_pipeline.ipynb) - 2022-10-23 > ["One speaker segmentation model to rule them all"](https://herve.niderb.fr/fastpages/2022/10/23/One-speaker-segmentation-model-to-rule-them-all) - 2021-08-05 > ["Streaming voice activity detection with pyannote.audio"](https://herve.niderb.fr/fastpages/2021/08/05/Streaming-voice-activity-detection-with-pyannote.html) - Videos - [Introduction to speaker diarization](https://umotion.univ-lemans.fr/video/9513-speech-segmentation-and-speaker-diarization/) / JSALT 2023 summer school / 90 min - [Speaker segmentation model](https://www.youtube.com/watch?v=wDH2rvkjymY) / Interspeech 2021 / 3 min - [First release of pyannote.audio](https://www.youtube.com/watch?v=37R_R82lfwA) / ICASSP 2020 / 8 min - Community contributions (not maintained by the core team) - 2024-04-05 > [Offline speaker diarization (speaker-diarization-3.1)](tutorials/community/offline_usage_speaker_diarization.ipynb) by [Simon Ottenhaus](https://github.com/simonottenhauskenbun) ## Benchmark Out of the box, `pyannote.audio` speaker diarization [pipeline](https://hf.co/pyannote/speaker-diarization-3.1) v3.1 is expected to be much better (and faster) than v2.x. Those numbers are diarization error rates (in %): | Benchmark | [v2.1](https://hf.co/pyannote/speaker-diarization-2.1) | [v3.1](https://hf.co/pyannote/speaker-diarization-3.1) | [pyannoteAI](https://www.pyannote.ai) | | --------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------ | ------------------------------------------------------ | ------------------------------------------------ | | [AISHELL-4](https://arxiv.org/abs/2104.03603) | 14.1 | 12.2 | 11.9 | | [AliMeeting](https://www.openslr.org/119/) (channel 1) | 27.4 | 24.4 | 22.5 | | [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (IHM) | 18.9 | 18.8 | 16.6 | | [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (SDM) | 27.1 | 22.4 | 20.9 | | [AVA-AVD](https://arxiv.org/abs/2111.14448) | 66.3 | 50.0 | 39.8 | | [CALLHOME](https://catalog.ldc.upenn.edu/LDC2001S97) ([part 2](https://github.com/BUTSpeechFIT/CALLHOME_sublists/issues/1)) | 31.6 | 28.4 | 22.2 | | [DIHARD 3](https://catalog.ldc.upenn.edu/LDC2022S14) ([full](https://arxiv.org/abs/2012.01477)) | 26.9 | 21.7 | 17.2 | | [Earnings21](https://github.com/revdotcom/speech-datasets) | 17.0 | 9.4 | 9.0 | | [Ego4D](https://arxiv.org/abs/2110.07058) (dev.) | 61.5 | 51.2 | 43.8 | | [MSDWild](https://github.com/X-LANCE/MSDWILD) | 32.8 | 25.3 | 19.8 | | [RAMC](https://www.openslr.org/123/) | 22.5 | 22.2 | 18.4 | | [REPERE](https://www.islrn.org/resources/360-758-359-485-0/) (phase2) | 8.2 | 7.8 | 7.6 | | [VoxConverse](https://github.com/joonson/voxconverse) (v0.3) | 11.2 | 11.3 | 9.4 | [Diarization error rate](http://pyannote.github.io/pyannote-metrics/reference.html#diarization) (in %) ## Citations If you use `pyannote.audio` please use the following citations: ```bibtex @inproceedings{Plaquet23, author={Alexis Plaquet and Hervé Bredin}, title={{Powerset multi-class cross entropy loss for neural speaker diarization}}, year=2023, booktitle={Proc. INTERSPEECH 2023}, } ``` ```bibtex @inproceedings{Bredin23, author={Hervé Bredin}, title={{pyannote.audio 2.1 speaker diarization pipeline: principle, benchmark, and recipe}}, year=2023, booktitle={Proc. INTERSPEECH 2023}, } ``` ## Development The commands below will setup pre-commit hooks and packages needed for developing the `pyannote.audio` library. ```bash pip install -e .[dev,testing] pre-commit install ``` ## Test ```bash pytest ```