Article on joint neural separation and diarization in the IEEE Signal Processing Letters in collaboration with Johns Hopkins University

We are very pleased to inform that a journal article on joint separation and diarization by Magdalena Rybicka in collaboration with partners from John Hopkins University has just been published on IEEE Xplore.

Title: Joint Diarization and Separation Using SepFormer With Non-Autoregressive Attractors

Authors: M. Rybicka, K. Kowalczyk, T. Thebaud, N. Dehak and J. Villalba

Abstract: Speaker diarization and speech separation both aim to track speaker activity in multi-speaker recordings, but they differ in their granularity. Diarization provides a binary indication of whether a speaker is active within a given time frame, whereas speech separation produces individual audio signals, each containing the isolated speech of a specific speaker. Recently, there has been growing interest in approaches that unify diarization and speech separation, particularly those leveraging neural models trained jointly to enhance performance in both tasks. In this letter, we propose a single neural model for joint speaker diarization and speech separation. Our model estimates speaker representations using a non-autoregressive attractor generation mechanism integrated into a modified SepFormer model. We present two variants of the model, designed for scenarios with sparse or highly overlapping speech, which achieve relative improvements of 51% for both separation and diarization over state-of-the-art methods, as evaluated on the LibriMix, LibriheavyMix and CALLHOME datasets.