Machine Learning for Spatial Audio Processing (MLSAP)

Project title: Machine Learning for Spatial Audio Processing (MLSAP)
Project title in Polish: Analiza zastosowania uczenia maszynowego w
przestrzennym przetwarzaniu sygnałów dźwiękowych

Project type: Research project
Funding Institution: National Science Centre (NCN)
Program: OPUS
Project No: 2017/25/B/ST7/01792
Value: 998 400 PLN (232 186 EUR)
Duration: 2019 – 2023

Project team members:
dr hab. inż. Konrad Kowalczyk, prof. AGH – Principal Investigator (PI)
mgr inż. Daniel Krause – Student Member
mgr inż. Mateusz Guzik – PhD Student
dr inż. Stanisław Kacprzak – Postdoctoral Researcher
dr inż. Marcin Witkowski – Postdoctoral Researcher
mgr inż. Szymon Woźniak – PhD Student
mgr inż. Magdalena Rybicka – PhD Student
mgr inż. Julitta Bartolewska – PhD Student
mgr inż. Mieszko Fraś – PhD Student

Project goal
The aim of this project is to study the deployment of classical signal processing methods and machine learning techniques that can be jointly applied to process audio and speech, and to propose novel techniques that merge the benefits of both approaches to provide an improved performance in terms of sound event detection and localization, signal extraction and classification of speech/audio signals. The proposed research should increase the understanding of what is achievable when combining these methods, which are typically used in isolation, or in the best case as two subsequent blocks in the processing chain. The project will utilize the knowledge from the field of audio signal processing and speech technology, in particular machine learning used in speaker recognition or source classification tasks.

Research topics of the project:
– Acoustic scene analysis using a microphone array
– Sound event detection and localization
– Speaker recognition with additional acoustic features
– Acoustic scene classification / acoustic source classification
– Spatial audio acquisition and reproduction over headphones and loudspeakers
– Incorporating machine learning into spatial audio processing

International project partner
University of Tampere, Finland

Publications
[J3] M. Guzik and K. Kowalczyk, “On Ambisonic Source Separation With Spatially Informed Non-Negative Tensor Factorization,” in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 3238-3255, Jun. 2024, doi: 10.1109/TASLP.2024.3399618.
[J2] M. Cobos, J. Ahrens, K. Kowalczyk, and A. Politis, “An overview of machine learning and other data-based methods for spatial audio capture, processing, and reproduction,” EURASIP Journal on Audio, Speech, and Music Processing, Nr 10, pp. 1-21, 2022, doi: https://doi.org/10.1186/s13636-022-00242-x.
[J1] M. Cobos, J. Ahrens, K. Kowalczyk, and A. Politis, “Data-based spatial audio processing,” EURASIP Journal on Audio, Speech, and Music Processing, Nr 13, 2022, doi: https://doi.org/10.1186/s13636-022-00248-5.


[C13] M. Guzik and K. Kowalczyk, “Convolutive NTF for Ambisonic Source Separation under Reverberant Conditions,” 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 2023, pp. 1-5, doi: 10.1109/ICASSP49357.2023.10094601.
[C12] J. Bartolewska, S. Kacprzak, and K. Kowalczyk, “Refining DNN-based Mask Estimation using CGMM-based EM Algorithm for Multi-channel Noise Reduction“, Annual Conf. Int. Speech Communication Association (Interspeech), Incheon, Korea, 2022, pp. 2923-2927, doi: 10.21437/Interspeech.2022-10632.
[C11] M. Fraś, M. Witkowski, and K. Kowalczyk, “Convolutive Weighted Multichannel Wiener Filter Front-end for Distant Automatic Speech Recognition in Reverberant Multispeaker Scenarios“, Annual Conf. Int. Speech Communication Association (Interspeech), Incheon, Korea, 2022, pp. 2943-2947, doi: 10.21437/Interspeech.2022-10780 .
[C10] M. Rybicka, J. Villalba, N. Dehak, and K. Kowalczyk, “End-to-End Neural Speaker Diarization with an Iterative Refinement of Non-Autoregressive Attention-based Attractors“, accepted for Annual Conf. Int. Speech Communication Association (Interspeech), Incheon, Korea, 2022, pp. 5090-5094, doi: 10.21437/Interspeech.2022-10169.
[C9] M. Guzik and K. Kowalczyk , “NTF of Spectral and Spatial Features for Tracking and Separation of Moving Sound Sources in Spherical Harmonic Domain“, Annual Conf. Int. Speech Communication Association (Interspeech), Incheon, Korea, 2022, pp. 261-265, doi: 10.21437/Interspeech.2022-10526.
[C8] M. Guzik and K. Kowalczyk, “Wishart Localization Prior on Spatial Covariance Matrix in Ambisonic Source Separation using Non-negative Tensor Factorization,” 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 2022, pp. 446-450, doi: 10.1109/ICASSP43922.2022.9746222.
[C7] M. Fraś, M. Witkowski and K. Kowalczyk, “Convolutive Weighted Multichannel Wiener Filter Front-end for Distant Automatic Speech Recognition in Reverberant Multispeaker Scenarios,” 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 2022, pp. 286-290, doi: 10.1109/ICASSP43922.2022.9746581.
[C6] M. Witkowski, M. Rybicka and K. Kowalczyk, “Sparse Linear Prediction-based Dereverberation for Signal Enhancement in Distant Speaker Verification,” European Signal Processing Conference (EUSIPCO), Dublin, Ireland, 2021, pp. 461-465, doi: 10.23919/EUSIPCO54536.2021.9616126.
[C5] M. Guzik, M. Fraś and K. Kowalczyk, “Incorporation of Localization Information for Sound Source Separation in Spherical Harmonic Domain,” IEEE Int. Work. on Multimedia Signal Processing (IEEE MMSP), Tampere, Finland, 2021, pp. 1-6, doi: 10.1109/MMSP53017.2021.9733508.
[C4] S. Kacprzak and K. Kowalczyk, “Adversarial Domain Adaptation with Paired Examples for Acoustic Scene Classification on Different Recording Devices,” European Signal Processing Conference (EUSIPCO), Dublin, Ireland, 2021, pp. 1030-1034, doi: 10.23919/EUSIPCO54536.2021.9616321.
[C3] D. Krause, A. Politis and K. Kowalczyk, “Data Diversity for Improving DNN-based Localization of Concurrent Sound Events,” European Signal Processing Conference (EUSIPCO), Dublin, Ireland, 2021, pp. 236-240, doi: 10.23919/EUSIPCO54536.2021.9616284.
[C2] D. Krause, A. Politis and K. Kowalczyk, “Comparison of Convolution Types in CNN-based Feature Extraction for Sound Source Localization,” European Signal Processing Conference (EUSIPCO), Amsterdam, The Netherlands, 2020, pp. 820-824, doi: 10.23919/Eusipco47968.2020.9287344.
[C1] D. Krause, A. Politis and K. Kowalczyk, “Feature Overview for Joint Modeling of Sound Event Detection and Localization Using a Microphone Array,” European Signal Processing Conference (EUSIPCO), Amsterdam, The Netherlands, 2020, pp. 31-35, doi: 10.23919/Eusipco47968.2020.9287374.