Conference papers

Gerald Schuller: "Ultra Low Delay Audio Source Separation Using Zeroth-Order Optimization"
2023 IEEE Statistical Signal Processing Workshop (SSP) (02-05 July 2023), Hanoi, Vietnam.

Results: 91
Created on: Sun, 19 May 2024 20:44:42 +0200 in 0.1131 sec

Schuller, Gerald;
Ultra low delay audio source separation using zeroth-order optimization. - In: 22nd IEEE Statistical Signal Processing Workshop (SSP 2023), (2023), S. 497-501

In this paper, we introduce the "Random Directions" probabilistic optimization method, demonstrating its efficacy in real-time, low-latency signal processing applications. Applied to an ultra-low delay, time-domain, multichannel source separation system, our "Random Directions" is compared with gradient-based method "Trinicon" and frequency domain methods like AuxIVA and FastMNMF. Results indicate that our approach often outperforms Trinicon in terms of the Signal to Interference Ratio (SIR) and presents the least non-linear distortions among all methods, as measured by the Signal to Artifacts Ratio (SAR). This study suggests that probabilistic optimization methods, traditionally perceived as slow, can indeed be effective for real-time applications.
Schuller, Gerald;
Low latency time domain multichannel speech and music source separation. - In: Conference record of the Fifty-Fifth Asilomar Conference on Signals, Systems & Computers, (2021), S. 549-553

The Goal is to obtain a simple multichannel source separation with very low latency. Applications can be teleconferencing, hearing aids, augmented reality, or selective active noise cancellation. These real time applications need a very low latency, usually less than about 6 ms, and low complexity, because they usually run on small portable devices. For that we don't need the best separation, but "useful" separation, and not just on speech, but also music and noise. Usual frequency domain approaches have higher latency and complexity. Hence we introduce a novel probabilistic optimization method which we call "Random Directions", which can overcome local minima, applied to a simple time domain unmixing structure, and which is scalable for low complexity. Then it is compared to frequency domain approaches on separating speech and music sources, and using 3D microphone setups.
Profeta, Renato; Schuller, Gerald
End-to-end learning for musical instruments classification. - In: Conference record of the Fifty-Fifth Asilomar Conference on Signals, Systems & Computers, (2021), S. 1607-1611

Musical instruments classification is a widely studied topic in Music Information Retrieval (MIR) and Signal Processing. The applications of this subject go from indexing of an audio database, automatic transcription, recommender systems, to music search by timbre, music annotation and others. Many different techniques were used along the years using deep neural networks with hand engineered features or learned features [1] [2] [3]. The purpose of this paper is to present Convolutional Neural Network (CNN) based Filter Banks that can generate not only features optimized for classification in the encoded domain but also achieving near perfect reconstruction in the decoder output with similar quality of standard lossy audio codecs. The filter banks are then compared with other commonly used invertible transformations employed as features in classification problems such as Short-time Fourier Transform (STFT) spectrograms and Mel spectrograms using a same simple classifier with a small number of parameters. The idea is that the heavy weight is lifted by the learned features and not the classifier whilst achieving near perfect reconstruction.
Golokolenko, Oleg; Schuller, Gerald
The method of random directions optimization for stereo audio source separation. - In: Cognitive intelligence for speech processing, (2020), S. 3316-3320

In this paper, a novel fast time domain audio source separation technique based on fractional delay filters with low computational complexity and small algorithmic delay is presented and evaluated in experiments. Our goal is a Blind Source Separation (BSS) technique, which can be applicable for the low cost and low power devices where processing is done in real-time, e.g. hearing aids or teleconferencing setups. The proposed approach optimizes fractional delays implemented as IIR filters and attenuation factors between microphone signals to minimize crosstalk, the principle of a fractional delay and sum beamformer. The experiments have been carried out for offline separation with stationary sound sources and for real-time with randomly moving sound sources. Experimental results show that separation performance of the proposed time domain BSS technique is competitive with State-of-the-Art (SoA) approaches but has lower computational complexity and no system delay like in frequency domain BSS.
Schuller, Gerald; Golokolenko, Oleg
Probabilistic optimization for source separation. - In: Conference record of the Fifty-Fourth Asilomar Conference on Signals, Systems & Computers, (2020), S. 534-538

We present a novel probabilistic Zeroth-Order optimization method, which can handle higher dimensions, and can also be used for fast online optimization, for instance for multichannel source separation. We compared it to the Gradientless Descent (GLD) algorithm on a multichannel source separation task, and found that our method results in faster and better separation (for the 2-channels case). For the multichannel case, only our method resulted in useful separation. We also applied it to separating sources from 3-dimensional microphone arrays, with comparable results.
Mimilakis, Stylianos Ioannis; Drossos, Konstantinos; Schuller, Gerald
Unsupervised interpretable representation learning for singing voice separation. - In: 28th European Signal Processing Conference (EUSIPCO 2020), (2020), S. 1412-1416

In this work, we present a method for learning interpretable music signal representations directly from waveform signals. Our method can be trained using unsupervised objectives and relies on the denoising auto-encoder model that uses a simple sinusoidal model as decoding functions to reconstruct the singing voice. To demonstrate the benefits of our method, we employ the obtained representations to the task of informed singing voice separation via binary masking, and measure the obtained separation quality by means of scale-invariant signal to distortion ratio. Our findings suggest that our method is capable of learning meaningful representations for singing voice separation, while preserving conveniences of the the short-time Fourier transform like non-negativity, smoothness, and reconstruction subject to time-frequency masking, that are desired in audio and music source separation.
de Castro Rabelo Profeta, Renato; Schuller, Gerald
Feature-based classification of electric guitar types. - In: Machine learning and knowledge discovery in databases, (2020), S. 478-484

Golokolenko, Oleg; Schuller, Gerald
Fast time domain stereo audio source separation using fractional delay filters. - In: 147th Audio Engineering Society Convention 2019, (2020), S. 179-183

Profeta, Renato; Schuller, Gerald
Comparison of human and machine recognition of electric guitar types. - In: 147th Audio Engineering Society Convention 2019, (2020), S. 1058-1064

Golokolenko, Oleg; Schuller, Gerald
A fast stereo audio source separation for moving sources. - In: Conference record of the Fifty-Third Asilomar Conference on Signals, Systems & Computers, (2019), S. 1931-1935