Conference papers

Gerald Schuller: "Ultra Low Delay Audio Source Separation Using Zeroth-Order Optimization"
2023 IEEE Statistical Signal Processing Workshop (SSP) (02-05 July 2023), Hanoi, Vietnam.

Results: 91
Created on: Thu, 28 Mar 2024 23:13:58 +0100 in 0.6535 sec


Schuller, Gerald;
Ultra low delay audio source separation using zeroth-order optimization. - In: 22nd IEEE Statistical Signal Processing Workshop (SSP 2023), (2023), S. 497-501

In this paper, we introduce the "Random Directions" probabilistic optimization method, demonstrating its efficacy in real-time, low-latency signal processing applications. Applied to an ultra-low delay, time-domain, multichannel source separation system, our "Random Directions" is compared with gradient-based method "Trinicon" and frequency domain methods like AuxIVA and FastMNMF. Results indicate that our approach often outperforms Trinicon in terms of the Signal to Interference Ratio (SIR) and presents the least non-linear distortions among all methods, as measured by the Signal to Artifacts Ratio (SAR). This study suggests that probabilistic optimization methods, traditionally perceived as slow, can indeed be effective for real-time applications.



https://doi.org/10.1109/SSP53291.2023.10208066
Schuller, Gerald;
Low latency time domain multichannel speech and music source separation. - In: Conference record of the Fifty-Fifth Asilomar Conference on Signals, Systems & Computers, (2021), S. 549-553

The Goal is to obtain a simple multichannel source separation with very low latency. Applications can be teleconferencing, hearing aids, augmented reality, or selective active noise cancellation. These real time applications need a very low latency, usually less than about 6 ms, and low complexity, because they usually run on small portable devices. For that we don't need the best separation, but "useful" separation, and not just on speech, but also music and noise. Usual frequency domain approaches have higher latency and complexity. Hence we introduce a novel probabilistic optimization method which we call "Random Directions", which can overcome local minima, applied to a simple time domain unmixing structure, and which is scalable for low complexity. Then it is compared to frequency domain approaches on separating speech and music sources, and using 3D microphone setups.



https://doi.org/10.1109/IEEECONF53345.2021.9723106
Profeta, Renato; Schuller, Gerald
End-to-end learning for musical instruments classification. - In: Conference record of the Fifty-Fifth Asilomar Conference on Signals, Systems & Computers, (2021), S. 1607-1611

Musical instruments classification is a widely studied topic in Music Information Retrieval (MIR) and Signal Processing. The applications of this subject go from indexing of an audio database, automatic transcription, recommender systems, to music search by timbre, music annotation and others. Many different techniques were used along the years using deep neural networks with hand engineered features or learned features [1] [2] [3]. The purpose of this paper is to present Convolutional Neural Network (CNN) based Filter Banks that can generate not only features optimized for classification in the encoded domain but also achieving near perfect reconstruction in the decoder output with similar quality of standard lossy audio codecs. The filter banks are then compared with other commonly used invertible transformations employed as features in classification problems such as Short-time Fourier Transform (STFT) spectrograms and Mel spectrograms using a same simple classifier with a small number of parameters. The idea is that the heavy weight is lifted by the learned features and not the classifier whilst achieving near perfect reconstruction.



https://doi.org/10.1109/IEEECONF53345.2021.9723181
Golokolenko, Oleg; Schuller, Gerald
The method of random directions optimization for stereo audio source separation. - In: Cognitive intelligence for speech processing, (2020), S. 3316-3320

In this paper, a novel fast time domain audio source separation technique based on fractional delay filters with low computational complexity and small algorithmic delay is presented and evaluated in experiments. Our goal is a Blind Source Separation (BSS) technique, which can be applicable for the low cost and low power devices where processing is done in real-time, e.g. hearing aids or teleconferencing setups. The proposed approach optimizes fractional delays implemented as IIR filters and attenuation factors between microphone signals to minimize crosstalk, the principle of a fractional delay and sum beamformer. The experiments have been carried out for offline separation with stationary sound sources and for real-time with randomly moving sound sources. Experimental results show that separation performance of the proposed time domain BSS technique is competitive with State-of-the-Art (SoA) approaches but has lower computational complexity and no system delay like in frequency domain BSS.



https://doi.org/10.21437/Interspeech.2020-1409
Schuller, Gerald; Golokolenko, Oleg
Probabilistic optimization for source separation. - In: Conference record of the Fifty-Fourth Asilomar Conference on Signals, Systems & Computers, (2020), S. 534-538

We present a novel probabilistic Zeroth-Order optimization method, which can handle higher dimensions, and can also be used for fast online optimization, for instance for multichannel source separation. We compared it to the Gradientless Descent (GLD) algorithm on a multichannel source separation task, and found that our method results in faster and better separation (for the 2-channels case). For the multichannel case, only our method resulted in useful separation. We also applied it to separating sources from 3-dimensional microphone arrays, with comparable results.



https://doi.org/10.1109/IEEECONF51394.2020.9443564
Mimilakis, Stylianos Ioannis; Drossos, Konstantinos; Schuller, Gerald
Unsupervised interpretable representation learning for singing voice separation. - In: 28th European Signal Processing Conference (EUSIPCO 2020), (2020), S. 1412-1416

In this work, we present a method for learning interpretable music signal representations directly from waveform signals. Our method can be trained using unsupervised objectives and relies on the denoising auto-encoder model that uses a simple sinusoidal model as decoding functions to reconstruct the singing voice. To demonstrate the benefits of our method, we employ the obtained representations to the task of informed singing voice separation via binary masking, and measure the obtained separation quality by means of scale-invariant signal to distortion ratio. Our findings suggest that our method is capable of learning meaningful representations for singing voice separation, while preserving conveniences of the the short-time Fourier transform like non-negativity, smoothness, and reconstruction subject to time-frequency masking, that are desired in audio and music source separation.



https://doi.org/10.23919/Eusipco47968.2020.9287352
de Castro Rabelo Profeta, Renato; Schuller, Gerald
Feature-based classification of electric guitar types. - In: Machine learning and knowledge discovery in databases, (2020), S. 478-484

Golokolenko, Oleg; Schuller, Gerald
Fast time domain stereo audio source separation using fractional delay filters. - In: 147th Audio Engineering Society Convention 2019, (2020), S. 179-183

Profeta, Renato; Schuller, Gerald
Comparison of human and machine recognition of electric guitar types. - In: 147th Audio Engineering Society Convention 2019, (2020), S. 1058-1064

Golokolenko, Oleg; Schuller, Gerald
A fast stereo audio source separation for moving sources. - In: Conference record of the Fifty-Third Asilomar Conference on Signals, Systems & Computers, (2019), S. 1931-1935

https://doi.org/10.1109/IEEECONF44664.2019.9048652
Mimilakis, Stylianos Ioannis; Cano, Estefanía; FitzGerald, Derry; Drossos, Konstantinos; Schuller, Gerald
Examining the perceptual effect of alternative objective functions for deep learning based music source separation. - In: Conference record of the Fifty-Second Asilomar Conference on Signals, Systems & Computers, (2018), S. 679-683

https://doi.org/10.1109/ACSSC.2018.8645257
Drossos, Konstantinos; Mimilakis, Stylianos Ioannis; Serdyuk, Dmitriy; Schuller, Gerald; Virtanen, Tuomas; Bengio, Yoshua
MaD TwinNet: Masker-Denoiser architecture with Twin Networks for monaural sound source separation. - In: 2018 International Joint Conference on Neural Networks (IJCNN), ISBN 978-1-5090-6014-6, (2018), insges. 8 S.

https://doi.org/10.1109/IJCNN.2018.8489565
Mimilakis, Stylianos Ioannis; Drossos, Konstantinos; Santos, João F.; Schuller, Gerald; Virtanen, Tuomas
Monaural singing voice separation with skip-filtering connections and recurrent inference of time-frequency mask. - In: 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ISBN 978-1-5386-4658-8, (2018), S. 721-725

https://doi.org/10.1109/ICASSP.2018.8461822
Mimilakis, Stylianos Ioannis; Drossos, Konstantinos; Virtanen, Tuomas; Schuller, Gerald
A recurrent encoder-decoder approach with skip-filtering connections for monaural singing voice separation. - In: Proceedings of MLSP2017, ISBN 978-1-5090-6341-3, (2017), insges. 6 S.

https://doi.org/10.1109/MLSP.2017.8168117
Golokolenko, Oleg; Schuller, Gerald
Investigation of electric network frequency for synchronization of low cost and wireless sound cards. - In: EUSIPCO 2017, ISBN 978-0-9928626-7-1, (2017), S. 693-697

https://doi.org/10.23919/EUSIPCO.2017.8081296
Drossos, Konstantinos; Mimilakis, Stylianos Ioannis; Floros, Andreas; Virtanen, Tuomas; Schuller, Gerald
Close miking empirical practice verification: a source separation approach. - In: 142nd Audio Engineering Society International Convention 2017, (2017), S. 629-637

Mimilakis, Stylianos Ioannis; Drossos, Konstantinos; Virtanen, Tuomas; Schuller, Gerald
Deep neural networks for dynamic range compression in mastering applications. - In: 140th Audio Engineering Society International Convention 2016, ISBN 978-1-5108-2570-3, (2016), S. 289-296

Schuller, Gerald; Abeßer, Jakob; Kehling, Christian
Parameter extraction for bass guitar sound models including playing styles. - In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ISBN 978-1-4673-6998-5, (2015), S. 404-408

https://doi.org/10.1109/ICASSP.2015.7178000
Abeßer, Jakob; Schuller, Gerald
Instrument-centered music transcription of bass guitar tracks. - In: Semantic audio, (2014), S. 166-175

Gärtner, Daniel; Dittmar, Christian; Aichroth, Patrick; Cuccovillo, Luca; Mann, Sebastian; Schuller, Gerald
Efficient cross-codec framing grid analysis for audio tampering detection. - In: 136th Audio Engineering Society convention 2014, ISBN 978-1-63266-506-5, (2014), S. 306-316

Neukam, Christian; Nagel, Frederik; Schuller, Gerald; Schnabel, Michael
A MDCT based harmonic spectral bandwidth extension method. - In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2013, ISBN 978-1-4799-0357-3, (2013), S. 566-570

http://dx.doi.org/10.1109/ICASSP.2013.6637711
Bießmann, Paul; Gärtner, Daniel; Dittmar, Christian; Aichroth, Patrick; Schuller, Gerald; Schnabel, Michael; Geiger, Ralf
Estimating MP3PRO encoder parameters from decoded audio. - In: Informatik 2013 - Informatik angepasst an Mensch, Organisation und Umwelt, (2013), S. 2841-2852

Niehaus, Marco; Esch, Lorenz; Esch, Lorenz *1988-*; Schuller, Gerald;
Parametric mesh reconstruction pipeline from 3D point clouds. - In: ISWCS 2013, ISBN 978-3-8007-3529-7, (2013), S. 512-516

Schöberl, Michael; Keinert, Joachim; Ziegler, Matthias; Seiler, Jürgen; Niehaus, Marco; Schuller, Gerald; Kaup, André; Foessel, Siegfried
Evaluation of a high dynamic range video camera with non-regular sensor. - In: Digital photography IX, ISBN 978-0-8194-9433-7, (2013), S. 86600M-1-86600M-12

Schnabel, Michael; Schubert, Benjamin; Schuller, Gerald
Parametric coding of piano signals. - In: 133rd Audio Engineering Society convention 2012, (2013), S. 863-871

Carôt, Alexander; Schuller, Gerald;
Applying video to low delayed audio streams in bandwidth limited networks. - In: Audio networking, ISBN 978-1-62276-006-0, (2012), S. 104-109

Carôt, Alexander; Schuller, Gerald;
Towards a telematic visual-conducting system. - In: Audio networking, ISBN 978-1-62276-006-0, (2012), S. 99-103

Cano, Estefanía; Dittmar, Christian; Schuller, Gerald;
Efficient implementation of a system for solo and accompaniment separation in polyphonic music. - In: Proceedings of the 20th European Signal Processing Conference (EUSIPCO), 2012, ISBN 978-1-4673-1068-0, (2012), S. 285-289

http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=6334302
Kramer, Patrick; Abeßer, Jakob; Dittmar, Christian; Schuller, Gerald
A digital waveguide model of the electric bass guitar including different playing techniques. - In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2012, ISBN 978-1-4673-0045-2, (2012), S. 353-356

http://dx.doi.org/10.1109/ICASSP.2012.6287889
Schnabel, Michael; Werner, Michael; Schuller, Gerald;
Improved error robustness for predictive ultra low delay audio coding. - In: 131st Audio Engineering Society convention 2011, (2012), S. 544-549

Abeßer, Jakob; Lartillot, Olivier; Dittmar, Christian; Eerola, Tuomas; Schuller, Gerald
Modeling musical attributes to characterize ensemble recordings using rhythmic audio features. - In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing, ISBN 978-1-4577-0539-7, (2011), S. 189-192

http://dx.doi.org/10.1109/ICASSP.2011.5946372
Cano, Estefanía; Dittmar, Christian; Schuller, Gerald;
Influence of phase, magnitude and location of harmonic components in the perceived quality of extracted solo signals. - In: Semantic audio, ISBN 978-0-937803-81-3, (2011), S. 247-252

Abeßer, Jakob; Dittmar, Christian; Schuller, Gerald;
Automatic recognition and parametrization of frequency modulation techniques in bass guitar recordings. - In: Semantic audio, ISBN 978-0-937803-81-3, (2011), S. 121-128

Abeßer, Jakob; Bräuer, Paul; Lukashevich, Hanna; Schuller, Gerald
Bass playing style detection based on high-level features and pattern similarity. - In: ISMIR 2010, (2010), S. 93-98

Cano, Estefanía; Schuller, Gerald; Schuller, Gerald *1961-*;
Exploring phase information in sound source separation applications. - In: DAFx-10, ISBN 978-3-200-01940-9, (2010), S. 259-272

Abeßer, Jakob; Lukashevich, Hanna; Schuller, Gerald
Feature-based extraction of plucking and expression styles of the electric bass guitar. - In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2010, ISBN 978-1-4244-4295-9, (2010), S. 2290-2293

http://dx.doi.org/10.1109/ICASSP.2010.5495945
Stein, Michael; Abeßer, Jakob; Dittmar, Christian; Schuller, Gerald
Automatic detection of audio effects in guitar and bass recordings. - In: 128th Audio Engineering Society convention 2010, (2010), S. 522-533

Werner, Michael; Schuller, Gerald
An SBR tool for very low delay applications with flexible crossover frequency. - In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2010, ISBN 978-1-4244-4295-9, (2010), S. 353-356

http://dx.doi.org/10.1109/ICASSP.2010.5495854
Werner, Michael; Schuller, Gerald
An enhanced SBR tool for low-delay applications. - In: 127th Audio Engineering Society convention 2009, (2010), S. 874-879

Abeßer, Jakob; Lukashevich, Hanna; Dittmar, Christian; Schuller, Gerald
Genre classification using bass-related high-level features and playing styles. - In: ISMIR 2009, (2009), S. 453-458

Neuendorf, Max; Gournay, Philippe; Multrus, Markus; Lecomte, Jérémie; Bessette, Bruno; Geiger, Ralf; Bayer, Stefan; Fuchs, Guillaume; Hilpert, Johannes; Rettelbach, Nikolaus
A novel scheme for low bitrate unified speech and audio coding - MPEG RM0. - In: 126th Audio Engineering Society convention 2009, (2009), S. 1142-1154

Neuendorf, Max; Gournay, Philippe; Multrus, Markus; Lecomte, Jeremie; Bessette, Bruno; Geiger, Ralf; Bayer, Stefan; Fuchs, Guillaume; Hilpert, Johannes; Rettelbach, Nikolaus; Salami, Redwan; Schuller, Gerald; Lefebvre, Roch; Grill, Bernhard
Unified speech and audio coding scheme for high quality at low bitrates. - In: IEEE International Conference on Acoustics, Speech and Signal Processing, 2009, ISBN 978-1-4244-2353-8, (2009), S. 1-4

http://dx.doi.org/10.1109/ICASSP.2009.4959505
Wabnik, Stefan; Schuller, Gerald; Krämer, Ferenc
An error robust ultra low delay audio coder using an MA prediction model. - In: IEEE International Conference on Acoustics, Speech and Signal Processing, 2009, ISBN 978-1-4244-2353-8, (2009), S. 5-8

http://dx.doi.org/10.1109/ICASSP.2009.4959506
Arnold, Mirko; Schuller, Gerald
A parametric instrument codec for very low bitrates. - In: 125th Audio Engineering Society convention 2008, (2008), S. 427-433

Kraemer, Ferenc; Schuller, Gerald
Graceful degradation for digital radio mondiale (DRM). - In: 125th Audio Engineering Society convention 2008, (2008), S. 589-595

Friedrich, Tobias; Gruhne, Matthias; Schuller, Gerald
A fast feature extraction system on compressed audio data. - In: 124th Audio Engineering Society convention 2008, (2008), S. 1383-1390

Friedrich, Tobias; Gruhne, Matthias; Schuller, Gerald
Subband conversion for feature extraction from compressed audio. - In: IEEE International Conference on Acoustics, Speech and Signal Processing, 2008, ISBN 978-1-4244-1483-3, (2008), S. 217-220

http://dx.doi.org/10.1109/ICASSP.2008.4517585
Gruhne, Matthias; Dittmar, Christian; Gärtner, Daniel; Schuller, Gerald
An evaluation of pre-processing algorithms for rhythmic pattern analysis. - In: 125th Audio Engineering Society convention 2008, (2008), S. 581-588

Sreenivas, T.V.; Wabnik, Stefan; Schuller, Gerald
Reduced rate ultra low delay audio coder using multistage vector quantization. - In: Conference record of the Forty-First Asilomar Conference on Signals, Systems and Computers, 2007, ISBN 978-1-4244-2109-1, (2007), S. 2080-2084

Communication applications are usually delay restricted, especially for the instance of musicians playing over the Internet. This requires a one-way delay of maximum 25 msec and also a high audio quality is desired at feasible bit rates. The ultra low delay (ULD) audio coding structure is well suited to this application and we investigate further the application of multistage vector quantization (MSVQ) to reach a bit rate range below 64 Kb/s, in a scalable manner. Results at 32 Kb/s and 64 Kb/s show that the trained codebook MSVQ performs best, better than KLT normalization followed by a simulated Gaussian MSVQ or simulated Gaussian MSVQ alone. The results also show that there is only a weak dependence on the training data, and that we indeed converge to the perceptual quality of our previous ULD coder at 64 Kb/s.



https://doi.org/10.1109/ACSSC.2007.4487604
Schnell, Markus; Geiger, Ralf; Schmidt, Markus; Multrus, Markus; Mellar, Michael; Herre, Jürgen; Schuller, Gerald
Low delay filterbanks for enhanced low delay audio coding. - In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2007, (2007), S. 235-238

Low delay perceptual audio coding has recently gained wide acceptance for high quality communication. While common schemes are based on the well-known Modified Discrete Cosine Transform (MDCT) filterbank, this paper describes novel coding algorithms that, for the first time, make use of dedicated low delay filterbanks, thus achieving improved coding efficiency while maintaining or even reducing the low codec delay. The MPEG-4 Enhanced Low Delay AAC (AAC-ELD) coder currently under development within ISO/MPEG combines a traditional perceptual audio coding scheme with spectral band replication (SBR), both running in a delay-optimized fashion by using low delay filterbanks.



https://doi.org/10.1109/ASPAA.2007.4392985
Albert, Tobias; Schuller, Gerald; Wabnik, Stefan; Krämer, Ulrich; Hirschfeld, Jens
Comparison of stereo redundancy reduction schemes for an ultra low delay audio coder. - In: 122nd Audio Engineering Society convention 2007, (2007), S. 1268-1275

Carôt, Alexander; Hirschfeld, Jens; Krämer, Ulrich; Schuller, Gerald; Wabnik, Stefan; Werner, Christian
Network music performance with ultra-low-delay audio coding under unreliable network conditions. - In: 123rd Audio Engineering Society convention 2007, (2007), S. 338-348

Schnell, Markus; Geiger, Ralf; Schmidt, Markus; Jander, Manuel; Multrus, Markus; Schuller, Gerald; Herre, Jürgen
Enhanced MPEG-4 low delay AAC - low bitrate high quality communication. - In: 122nd Audio Engineering Society convention 2007, (2007), S. 1211-1223

Friedrich, Tobias; Schuller, Gerald
Spectral band replication tool for very low delay audio coding applications. - In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2007, (2007), S. 199-202

https://doi.org/10.1109/ASPAA.2007.4393014
Carot, Alexander; Krämer, Ulrich; Schuller, Gerald
Network music performance (NMP) in narrow band networks. - In: 120th convention spring papers 2006, (2006), S. 1959-1967

Hirschfeld, Jens; Krämer, Ulrich; Schuller, Gerald; Wabnik, Stefan
Reduced bit rate ultra low delay audio coding. - In: 120th convention spring papers 2006, (2006), S. 1101-1107

Geiger, Ralf; Yokotani, Yoshikazu; Schuller, Gerald
Audio data hiding with high data rates based on IntMDCT. - In: 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, 2006, ICASSP 2006, (2006), S. V-205-V-208

https://doi.org/10.1109/ICASSP.2006.1661248
Wabnik, Stefan; Schuller, Gerald; Hirschfeld, Jens; Krämer, Ulrich
Different quantisation noise shaping methods for predictive audio coding. - In: 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, 2006, ICASSP 2006, (2006), S. V-185-V-188

https://doi.org/10.1109/ICASSP.2006.1661243
Wabnik, Stefan; Schuller, Gerald; Krämer, Ulrich; Hirschfeld, Jens
Frequency warping in low delay audio coding. - In: 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, (2005), S. III-181-III-184

https://doi.org/10.1109/ICASSP.2005.1415676
Wabnik, Stefan; Schuller, Gerald; Hirschfeld, Jens; Krämer, Ulrich
Packet loss concealment in predictive audio coding. - In: 2005 Workshop on Applications of Signal Processing to Audio and Acoustics proceedings (WASPAA), (2005), S. 227-230

Klier, Juliane; Schuller, Gerald; Haardt, Martin; Hennhöfer, Marko
A new approach for channel equalization without guard interval using polyphase matrices. - In: PIMRC 2005, ISBN 978-3-8007-2909-8, (2005), insges. 5 S.

Yokotani, Yoshikazu; Oraintara, Soontorn; Geiger, Ralf; Schuller, Gerald; Rao, K. Ramamohan
A comparison of integer fast Fourier transforms for lossless coding. - In: IEEE International Symposium on Communications and Information Technology, 2004, (2004), S. 1069-1073

The lifting scheme-based integer fast Fourier transform (IntFFT), an integer approximation of the FFT, is reversible. When it is used for lossless coding applications, the computational complexity and approximation error increase due to realization of the trivial butterflies by three lifting steps. Since the error appears as a "noise floor" and it limits the lossless coding efficiency, it is desirable to reduce not only the computational complexity but also the noise floor level as much as possible. This survey presents two schemes to realize an improved IntFFT in terms of the number of arithmetic operations and the level of the noise floor. The first scheme is based on employment of two/three lifting step schemes with combined rounding operations, and the second one is the multidimensional lifting (MDL) scheme. The improvement is shown by comparing the number of arithmetic operations and rounding operations to compute the IntFFT and also by comparing levels of the noise floor. In addition, an improvement in lossless coding efficiency due to the reduced noise floor can be predicted by observing the reduced estimated entropy of the IntFFT coefficients.



https://doi.org/10.1109/ISCIT.2004.1413884
Yokotani, Yoshikazu; Oraintara, Soontorn; Geiger, Ralf; Schuller, Gerald; Rao, K. Ramamohan
Approximation error analysis for transform-based lossless audio coding. - In: IEEE Global Telecommunications Conference workshops, 2004, GlobeCom workshops 2004, (2004), S. 595-599

The integer modified discrete cosine transform (IntMDCT), an integer approximation of the MDCT, is a reversible transform realized by the lifting scheme and thus is a useful transform for lossless audio coding. Because of the integer approximation, however, the approximation error appears as a "noise floor" in the transform domain and limits the lossless coding efficiency. In this paper, a theoretical analysis of the approximation error of the IntMDCT is discussed. The result is then used to design a simple test filter applied to each rounding operation of the IntMDCT in such a way that the error spectrum is shaped towards the low frequencies. As a result, especially when the spectral energy of an input signal is concentrated in the low frequency domain, the lossless coding efficiency is improved.



https://doi.org/10.1109/GLOCOM.2004.1378032
Geiger, Ralf; Yokotani, Yoshikazu; Schuller, Gerald; Herre, Jürgen
Improved integer transforms using multi-dimensional lifting. - In: 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, (2004), S. II-1005-II-1008

Recently lifting-based integer transforms have received much attention, especially in the area of lossless audio and image coding. The usual approach is to apply the lifting scheme to each Givens rotation. Especially in the case of long transform sizes in audio coding applications, this leads to a considerable approximation error in the frequency domain. This paper presents a multidimensional lifting approach for reducing this approximation error. In this approach, large parts of the transform are calculated without rounding operations, only the output is rounded and added. The new approach is applied and evaluated for both the integer modified discrete cosine transform (IntMDCT) and the integer fast Fourier transform (IntFFT).



https://doi.org/10.1109/ICASSP.2004.1326430
Yokotani, Yoshikazu; Geiger, Ralf; Schuller, Gerald; Oraintara, Soontorn; Rao, K. Ramamohan
Improved lossless audio coding using the noise-shaped IntMDCT. - In: 2004 IEEE 11th Digital Signal Processing Workshop, 2004 and the 3rd IEEE Signal Processing Education Workshop, (2004), S. 356-360

This paper discusses approximation noise shaping to improve the efficiency of the integer modified discrete cosine transform (IntMDCT)-based lossless audio codec. The scheme is applied to rounding operations associated with lifting steps to shape the noise spectrum towards the low frequency bands. In this paper, constraints on the noise shaping filter and a design procedure with the constraints are discussed. Several noise shaping filters are designed and experimental results showing the improvement are presented.



https://doi.org/10.1109/DSPWS.2004.1437975
Gayer, Marc; Lutzky, Manfred; Schuller, Gerald; Krämer, Ulrich; Wabnik, Stefan
A guideline to audio codec delay. - In: Full set of convention papers presented at the 116th AES convention, (2004), Paper 6062

Hirschfeld, Jens; Klier, Juliane; Krämer, Ulrich; Schuller, Gerald; Wabnik, Stefan
Ultra low delay audio coding with constant bit rate. - In: Convention papers, 117th convention, (2004), Paper 6197

Geiger, Ralf; Yokotani, Yoshikazu; Schuller, Gerald
Improved integer transforms for lossless audio coding. - In: Conference record of the Thirty-Seventh Asilomar Conference on Signals, Systems & Computers, (2003), S. 2119-2123

Lifting scheme based integer transforms are very powerful tools to construct lossless coding schemes. These transforms such as the integer fast fourier transform (IntFFT) and the integer modified discrete cosine transform (IntMDCT) are integer approximations of the original floatingpoint transforms, and hence there is an approximation error in the transform domain. This paper will propose structures for improved integer transforms in terms of improved approximation accuracy and computational efficiency. Experimental results will show that clear improvements in these two points are achieved in lossless audio coding.



https://doi.org/10.1109/ACSSC.2003.1292354
Geiger, Ralf; Schuller, Gerald; Sporer, Thomas; Herre, Jürgen
Fine grain scalable perceptual and lossless audio coding based on IntMDCT. - In: 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics proceedings, (2003), S. 50
D-08

The paper presents an embedded fine grain audio coding scheme. The enabling technology for this combined perceptual and lossless audio coding approach is the integer modified discrete cosine transform (IntMDCT), which is an integer approximation of the MDCT based on the lifting scheme. It maintains the perfect reconstruction property and therefore enables efficient lossless coding in the frequency domain. The close approximation of the MDCT also allows a perceptual coding scheme to be built based on the IntMDCT. A bitsliced arithmetic coding technique is applied to the IntMDCT values. Together with the encoded shape of the masking threshold, a perceptually hierarchical bitstream is obtained, containing several stages of perceptual quality and extending to lossless operation when transmitted completely. A concept of encoding subslices is presented in order to obtain a fine adaptation to the masking threshold, especially in the range of perceptually transparent quality.



https://doi.org/10.1109/ASPAA.2003.1285813
Geiger, Ralf; Herre, Jürgen; Schuller, Gerald; Sporer, Thomas
Fine grain scalable perceptual and lossless audio coding based on IntMDCT. - In: 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, (2003), S. V-445-V-448

This papers presents an embedded fine grain scalable perceptual and lossless audio coding scheme. The enabling technology for this combined perceptual and lossless audio coding approach is the integer modified discrete cosine transform (IntMDCT), which is an integer approximation of the MDCT based on the lifting scheme. It maintains the perfect reconstruction property and therefore enables efficient lossless coding in the frequency domain. The close approximation of the MDCT also allows us to build a perceptual coding scheme based on the IntMDCT. In this paper a bitsliced arithmetic coding technique is applied to the IntMDCT values. Together with the encoded shape of the masking threshold a perceptually hierarchical bitstream is obtained, containing several stages of perceptual quality and extending to lossless operation when transmitted completely. A concept of encoding subslices is presented in order to obtain a fine adaptation to the masking threshold especially in the range of perceptually transparent quality.



https://doi.org/10.1109/ICASSP.2003.1200002
Geiger, Ralf; Schuller, Gerald; Herre, Jürgen; Sperschneider, Ralph; Sporer, Thomas
Scalable perceptual and lossless audio coding based on MPEG-4 AAC. - In: Convention paper presented at the 115th convention, (2003), Paper 5868

Schuller, Gerald; Härmä, Aki
Low delay audio compression using predictive coding. - In: 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, (2002), S. II-1853-II-1856

A low delay audio coding scheme for communications applications is proposed. Its compression ratio is comparable to current state-of-the-art audio coding schemes, but with a much lower delay. The source of delay in conventional audio coding are the filters for the subband coding, and the block switching of the filter bank. The block switching leads to high peaks in bit-rate which necessitates a large bit rate buffer to smooth the bit rate for a transmission channel. To avoid or reduce these delays, we replace the subband coding by predictive coding, and the hard switching of the filter bank by soft switching of the predictors. The overall delay becomes 6 ms at 32 kHz sampling rate. A subjective listening test with bit-rates around 64 kb/s for mono signals shows that the new scheme has a comparable quality to a conventional state-of-the-art coder (PAC).



https://doi.org/10.1109/ICASSP.2002.5744987
Geiger, Ralf; Schuller, Gerald
Integer low delay and MDCT filter banks. - In: Conference record of the Thirty-Sixth Asilomar Conference on Signals, Systems & Computers, (2002), S. 811-815

Recently, lifting-based integer approximations of filter banks have received much attention, especially in the field of image coding. The application of the techniques to cosine modulated filter banks for audio coding, including not only the modified discrete Fourier transform (MDCT) but also low delay filter banks are focused on. Applications of the integer filter banks include lossless audio coding and backward compatible lossless enhancement of MDCT-based perceptual audio coding schemes, such as MPEG-2/4 AAC.



https://doi.org/10.1109/ACSSC.2002.1197291
Weerackody, Vijitha; Schuller, Gerald; Lou, H.-L.
Streaming of multimedia with reduced start-up delay. - In: Where minds meet, (2001), S. 1038-1041

High quality multimedia streaming applications over the Internet require very low packet loss rates. The Internet is characterized by long bursts of packet losses and delays. A large receive buffer can be used to mitigate the effects of packet losses and delays. However, a large receive buffer introduces a large delay in the playback of a packet. This large delay could be annoying at the start of a program or during switch over to another channel in a multi-channel broadcast. We introduce a separate low-delay tuning stream to address this start-up problem. In the steady state, this tuning stream is synchronized appropriately with the high-delay steady state stream to give an enhanced composite signal.



Dorward, Sean; Huang, Dawei; Savari, Serap A.; Schuller, Gerald; Yu, Bin
Low delay perceptually lossless coding of audio signals. - In: DCC 2001, (2001), S. 312-320

A novel predictive lossless coding scheme is proposed. The prediction is based on a new weighted cascaded least mean squared (WCLMS) method. To obtain both a high compression ratio and a very low encoding and decoding delay, the residuals from the prediction are encoded using either a variant of adaptive Huffman coding or a version of adaptive arithmetic coding. WCLMS is especially designed for music/speech signals. It can be used either in combination with psycho-acoustically pre-filtered signals to obtain perceptually lossless coding, or as a stand-alone lossless coder. Experiments on a database of moderate size and a variety of pre-filtered mono-signals show that the proposed lossless coder (which needs about 2 bit/sample for pre-filtered signals) outperforms competing lossless coders, such as ppmz, bzip2, Shorten, and LPAC, in terms of compression ratios. The combination of WCLMS with either of the adaptive coding schemes is also shown to achieve better compression ratios and lower delay than an earlier scheme combining WCLMS with Huffman coding over blocks of 4096 samples.



https://doi.org/10.1109/DCC.2001.917162
Kokes, Mark G.; Gibson, Jerry D.; Schuller, Gerald
A wideband speech codec based on nonlinear approximation. - In: Conference record of the Thirty-Fifth Asilomar Conference on Signals, Systems & Computers, (2001), S. 1573-1577

In transform-based wideband speech coding, the modulated lapped biorthogonal transform (MLBT) can be used to improve the frequency selectivity of the synthesis basis functions. In this work Campbell's coefficient rate and spectral entropy are used as a guide to develop band combining strategies to produce an adaptive nonuniform modulated lapped biorthogonal transform (ANMLBT). Due to the nonuniform nature of the transform, psychoacoustic quantization noise shaping is accomplished by employing time-varying pre/post filters. This new codec consists of a nonlinear approximation method to determine the best N basis functions that represent the current speech or audio frame. Preliminary coding results at an average rate of 16 Kbit/sec are demonstrated.



https://doi.org/10.1109/ACSSC.2001.987751
Karp, Tanja; Schuller, Gerald
Joint transmitter/receiver design for multicarrier data transmission with low latency time. - In: 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing, (2001), S. 2401-2404

A design method for low latency multicarrier transmission is presented. It can be considered as a generalization of the trailing-zeros transmitter approach (Scaglione et al., 1999). The generalization mainly consists of using FIR redundant filter banks for the transmitter and receiver instead of pure block transforms and choosing the guard interval independently of the channel impulse response length. Thanks to the latter, we can design a multicarrier transmission system with a low latency time, which is a critical parameter for online applications, even for the case that the channel has a long impulse response, as eg, a twisted-pair copper wire line of several miles length. The design of the transmitter and receiver is based on a Smith decomposition of the channel. The advantages as well as limitations of the new algorithm are discussed.



https://doi.org/10.1109/ICASSP.2001.940484
Schuller, Gerald; Yu, Bin; Huang, Dawei
Lossless coding of audio signals using cascaded prediction. - In: 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing, (2001), S. 3273-3276

A novel predictive lossless coding scheme is proposed. The prediction is based on a new weighted cascaded least mean squared (WCLMS) method. WCLMS is especially designed for music/speech signals. It can be used either in combination with psycho-acoustically pre-filtered signals to obtain perceptually lossless coding, or as a stand-alone lossless coder. Experiments on a database of moderate size and a variety of pre-filtered mono-signals show that the proposed lossless coder (which needs about 2 bit/sample for pre-filtered signals) outperforms competing lossless coders, WaveZip, Shorten, LTAC and LPAC, in terms of compression ratios.



https://doi.org/10.1109/ICASSP.2001.940357
Edler, Bernd; Schuller, Gerald
Audio coding using a psychoacoustic pre- and post-filter. - In: 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing, (2000), S. 881-884

A novel concept for perceptual audio coding is presented which is based on the combination of a pre- and post-filter, controlled by a psychoacoustic model, with a transform coding scheme. This paradigm allows modeling of the temporal and spectral shape of the masked threshold with a resolution independent of the used transform. By using frequency warping techniques the maximum possible detail for a given filter order can be made frequency-dependent and thus better adapted to the human auditory system. The filter coefficients are represented efficiently by LSF parameters which can be adaptively interpolated over time. First experiments with a system obtained by extending an existing transform codec showed that this approach can significantly improve the performance for speech signals, while the performance for other signals remained the same.



https://doi.org/10.1109/ICASSP.2000.859101
Edler, Bernd; Faller, Christof; Schuller, Gerald
Perceptual audio coding using a time-varying linear pre- and post-filter. - In: 109th convention, (2000), Paper 5274

Doser, Adele B.; Schuller, Gerald
Time/frequency techniques for signal feature detection. - In: Conference record of the Thirty-Third Asilomar Conference on Signals, Systems & Computers, (1999), S. 452-456

https://doi.org/10.1109/ACSSC.1999.832370
Schuller, Gerald; Sweldens, Wim
Filter bank design using nilpotent matrices. - In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 1999, (1999), S. 51-54

https://doi.org/10.1109/ASPAA.1999.810847
Schuller, Gerald; Sweldens, Wim
Modulated filter bank design with nilpotent matrices. - In: Wavelet applications in signal and image processing VII, (1999), S. 284-294

Schuller, Gerald; Karp, Tanja
Causal FIR filter banks with arbitrary system delay. - In: 1998 IEEE DSP Workshop, Bryce Canyon, Utah, August 9 - 12, 1998, (1998), insges. 4 S.

A design method for causal bi-orthogonal PR FIR M-band filter banks is described, which allows an explicit control over system delay, independent of the filter length, with the lowest possible delay equal to the blocking delay of M? 1 samples. The design method is very general and can be applied to non-uniform filter banks but also treats uniform modulated filter banks as a special case.



Karp, Tanja; Mertins, Alfred; Schuller, Gerald
Recent trends in the design of biorthogonal modulated filter banks. - In: Transforms and filter banks, (1998), S. 315-335

Biorthogonal modulated filter banks, when compared to paraunitary ones, provide the advantage that the overall system delay can be chosen independently of the filter length, thus resulting in low delay filter banks. They have recently been studied by several authors. In the paper, we connect different design methods (quadratic constrained least-squares optimization, cascade of sparse self-inverse matrices) and describe advantages of the a factorization into Zero-Delay and Maximum-Delay matrices (structureinherent perfect reconstruction, no DC leakage of the filter bank, low implementation cost).



Schuller, Gerald;
Time-varying filter banks with low delay for audio coding. - In: 105th convention, (1998), Paper 4809

Schuller, Gerald;
Time-varying filter banks with variable system delay. - In: 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, (1997), S. 2469-2472

A new filter structure and design method for time-varying cosine modulated FIR filter banks with critical sampling, perfect reconstruction, and an efficient implementation is presented. The proposed filter banks have an arbitrary system delay which can be chosen in the design process and is independent of the arbitrary filter length, hence making a low system delay possible. The time variation includes changing the number of bands and/or filters during signal processing while maintaining critical sampling and perfect reconstruction. The transition windows can be overlapping, which improves the frequency responses. It is based on a factorization of the polyphase matrices into a cascade of 2 types of simple matrices.



https://doi.org/10.1109/ICASSP.1997.599577
Schuller, Gerald;
A new factorization and structure for cosine modulated filter banks with variable system delay. - In: Conference record of the Thirtieth Asilomar Conference on Signals, Systems & Computers, (1997), S. 1310-1314

A new design method for biorthogonal modulated filter banks is presented. It is based on a cascade of simple matrices, and it has some properties that have not been reported before. It represents filter banks with arbitrary overall system delay and filter length, it is shown that almost all cosine modulated filter banks can be described by this structure, and that it leads to a more efficient implementation than previous structures. Imposing certain symmetries on the matrices can be used to design low delay filter banks with identical (except for the sign) baseband impulse responses for the analysis and synthesis filter bank.



https://doi.org/10.1109/ACSSC.1996.599159
Schuller, Gerald; Smith, Mark J. T.
A new algorithm for efficient low delay filter bank design. - In: ICASSP '95, (1995), S. 1472-1475

Historically, exact reconstruction FIR filter banks have had system delays of L-1, where L is the length of the analysis and synthesis filters. Recently it was shown that the system delay could be made less than L-1, which is attractive in applications like speech coding where excessive delays are annoying. In this paper, a formulation and new design algorithm are introduced for two-band low-delay filter banks. The formulation is related to that of two-band lattice filter banks and provides a broad range of design flexibility within a compact framework. Both exact reconstruction and specified system delay are guaranteed by the structure of the framework.



https://doi.org/10.1109/ICASSP.1995.480562
Schuller, Gerald;
A low-delay filter bank for audio coding with reduced pre-echoes. - In: 99th convention, (1995), Paper 4088

Schuller, Gerald; Smith, Mark J. T.
Efficient low delay filter banks. - In: 1994 Sixth IEEE Digital Signal Processing Workshop, (1994), S. 231-234

The paper treats the problem of designing efficient FIR analysis-synthesis filter banks with system delays that can be pre-specified. A framework is introduced that is comprised of the cascade of several distinctive matrices with invertibility properties. The explicit form of the matrices guarantees computational efficiency and exact reconstruction, and allows for control over the system delay.<>



https://doi.org/10.1109/DSP.1994.379834