Conference papers

Gerald Schuller: "Ultra Low Delay Audio Source Separation Using Zeroth-Order Optimization"
2023 IEEE Statistical Signal Processing Workshop (SSP) (02-05 July 2023), Hanoi, Vietnam.

Results: 91
Created on: Mon, 15 Apr 2024 23:12:38 +0200 in 0.1005 sec


Geiger, Ralf; Schuller, Gerald; Herre, Jürgen; Sperschneider, Ralph; Sporer, Thomas
Scalable perceptual and lossless audio coding based on MPEG-4 AAC. - In: Convention paper presented at the 115th convention, (2003), Paper 5868

Schuller, Gerald; Härmä, Aki
Low delay audio compression using predictive coding. - In: 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, (2002), S. II-1853-II-1856

A low delay audio coding scheme for communications applications is proposed. Its compression ratio is comparable to current state-of-the-art audio coding schemes, but with a much lower delay. The source of delay in conventional audio coding are the filters for the subband coding, and the block switching of the filter bank. The block switching leads to high peaks in bit-rate which necessitates a large bit rate buffer to smooth the bit rate for a transmission channel. To avoid or reduce these delays, we replace the subband coding by predictive coding, and the hard switching of the filter bank by soft switching of the predictors. The overall delay becomes 6 ms at 32 kHz sampling rate. A subjective listening test with bit-rates around 64 kb/s for mono signals shows that the new scheme has a comparable quality to a conventional state-of-the-art coder (PAC).



https://doi.org/10.1109/ICASSP.2002.5744987
Geiger, Ralf; Schuller, Gerald
Integer low delay and MDCT filter banks. - In: Conference record of the Thirty-Sixth Asilomar Conference on Signals, Systems & Computers, (2002), S. 811-815

Recently, lifting-based integer approximations of filter banks have received much attention, especially in the field of image coding. The application of the techniques to cosine modulated filter banks for audio coding, including not only the modified discrete Fourier transform (MDCT) but also low delay filter banks are focused on. Applications of the integer filter banks include lossless audio coding and backward compatible lossless enhancement of MDCT-based perceptual audio coding schemes, such as MPEG-2/4 AAC.



https://doi.org/10.1109/ACSSC.2002.1197291
Weerackody, Vijitha; Schuller, Gerald; Lou, H.-L.
Streaming of multimedia with reduced start-up delay. - In: Where minds meet, (2001), S. 1038-1041

High quality multimedia streaming applications over the Internet require very low packet loss rates. The Internet is characterized by long bursts of packet losses and delays. A large receive buffer can be used to mitigate the effects of packet losses and delays. However, a large receive buffer introduces a large delay in the playback of a packet. This large delay could be annoying at the start of a program or during switch over to another channel in a multi-channel broadcast. We introduce a separate low-delay tuning stream to address this start-up problem. In the steady state, this tuning stream is synchronized appropriately with the high-delay steady state stream to give an enhanced composite signal.



Dorward, Sean; Huang, Dawei; Savari, Serap A.; Schuller, Gerald; Yu, Bin
Low delay perceptually lossless coding of audio signals. - In: DCC 2001, (2001), S. 312-320

A novel predictive lossless coding scheme is proposed. The prediction is based on a new weighted cascaded least mean squared (WCLMS) method. To obtain both a high compression ratio and a very low encoding and decoding delay, the residuals from the prediction are encoded using either a variant of adaptive Huffman coding or a version of adaptive arithmetic coding. WCLMS is especially designed for music/speech signals. It can be used either in combination with psycho-acoustically pre-filtered signals to obtain perceptually lossless coding, or as a stand-alone lossless coder. Experiments on a database of moderate size and a variety of pre-filtered mono-signals show that the proposed lossless coder (which needs about 2 bit/sample for pre-filtered signals) outperforms competing lossless coders, such as ppmz, bzip2, Shorten, and LPAC, in terms of compression ratios. The combination of WCLMS with either of the adaptive coding schemes is also shown to achieve better compression ratios and lower delay than an earlier scheme combining WCLMS with Huffman coding over blocks of 4096 samples.



https://doi.org/10.1109/DCC.2001.917162
Kokes, Mark G.; Gibson, Jerry D.; Schuller, Gerald
A wideband speech codec based on nonlinear approximation. - In: Conference record of the Thirty-Fifth Asilomar Conference on Signals, Systems & Computers, (2001), S. 1573-1577

In transform-based wideband speech coding, the modulated lapped biorthogonal transform (MLBT) can be used to improve the frequency selectivity of the synthesis basis functions. In this work Campbell's coefficient rate and spectral entropy are used as a guide to develop band combining strategies to produce an adaptive nonuniform modulated lapped biorthogonal transform (ANMLBT). Due to the nonuniform nature of the transform, psychoacoustic quantization noise shaping is accomplished by employing time-varying pre/post filters. This new codec consists of a nonlinear approximation method to determine the best N basis functions that represent the current speech or audio frame. Preliminary coding results at an average rate of 16 Kbit/sec are demonstrated.



https://doi.org/10.1109/ACSSC.2001.987751
Karp, Tanja; Schuller, Gerald
Joint transmitter/receiver design for multicarrier data transmission with low latency time. - In: 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing, (2001), S. 2401-2404

A design method for low latency multicarrier transmission is presented. It can be considered as a generalization of the trailing-zeros transmitter approach (Scaglione et al., 1999). The generalization mainly consists of using FIR redundant filter banks for the transmitter and receiver instead of pure block transforms and choosing the guard interval independently of the channel impulse response length. Thanks to the latter, we can design a multicarrier transmission system with a low latency time, which is a critical parameter for online applications, even for the case that the channel has a long impulse response, as eg, a twisted-pair copper wire line of several miles length. The design of the transmitter and receiver is based on a Smith decomposition of the channel. The advantages as well as limitations of the new algorithm are discussed.



https://doi.org/10.1109/ICASSP.2001.940484
Schuller, Gerald; Yu, Bin; Huang, Dawei
Lossless coding of audio signals using cascaded prediction. - In: 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing, (2001), S. 3273-3276

A novel predictive lossless coding scheme is proposed. The prediction is based on a new weighted cascaded least mean squared (WCLMS) method. WCLMS is especially designed for music/speech signals. It can be used either in combination with psycho-acoustically pre-filtered signals to obtain perceptually lossless coding, or as a stand-alone lossless coder. Experiments on a database of moderate size and a variety of pre-filtered mono-signals show that the proposed lossless coder (which needs about 2 bit/sample for pre-filtered signals) outperforms competing lossless coders, WaveZip, Shorten, LTAC and LPAC, in terms of compression ratios.



https://doi.org/10.1109/ICASSP.2001.940357
Edler, Bernd; Schuller, Gerald
Audio coding using a psychoacoustic pre- and post-filter. - In: 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing, (2000), S. 881-884

A novel concept for perceptual audio coding is presented which is based on the combination of a pre- and post-filter, controlled by a psychoacoustic model, with a transform coding scheme. This paradigm allows modeling of the temporal and spectral shape of the masked threshold with a resolution independent of the used transform. By using frequency warping techniques the maximum possible detail for a given filter order can be made frequency-dependent and thus better adapted to the human auditory system. The filter coefficients are represented efficiently by LSF parameters which can be adaptively interpolated over time. First experiments with a system obtained by extending an existing transform codec showed that this approach can significantly improve the performance for speech signals, while the performance for other signals remained the same.



https://doi.org/10.1109/ICASSP.2000.859101
Edler, Bernd; Faller, Christof; Schuller, Gerald
Perceptual audio coding using a time-varying linear pre- and post-filter. - In: 109th convention, (2000), Paper 5274