Implementation and Assessment of a Machine Learning based Psychoacoustic Model for Perceptual Audio Codecs

Topic

Although traditional auditory masking models have been shown to be very successful for controlling waveform preserving audio encoders, these models are known to be unsuitable to likewise steer coding tools which do not preserve the original waveform, e.g. parametric- or semi-parametric coding techniques like band-width extensions. Therefore, an improved excitation based psychoacoustic model may be used to control the parametrization of non-waveform preserving coding techniques. From such model, so-called Internal Difference Representations (IDR) are obtained for each available encoding option at hand. The IDRs are shown to provide a metric that estimates the level of perceptual distortion created by applying the corresponding parametric encoding option. [see: Disch, S. et al., "Improved Psychoacoustic Model for Efficient Perceptual Audio Codecs", 145th Audio Engineering Society Convention, 2018] Currently, for controlling the final encoding process, the parametric encoding option that leads to the minimal absolute IDR has to be determined. However, this 'brute force approach' is computationally too demanding for practical applications. This master thesis is directed towards the development of a realistic encoder application, wherein a Neural Network (DNN, CNN) learns and practically substitutes the model output (the coding parameter) at a fraction of its current computational costs. The work includes the choice and the implementation of a suitable machine learning topology to substitute the analytic psychoacoustic model (and the decision making) in a codec. It also includes the automatic annotation of audio material through the psychoacoustic model for proper training and evaluation of the DNN. Finally, the perceptual result of the encoder work should be assessed through systematic listening tests. The successful completion of the task requires programming skills in Python, Matlab and the C programming language. A decent knowledge of digital signal processing techniques is mandatory.

Supervision:

Prof. Gerald Schuller (TU Ilmenau), Dr.-Ing. Sascha Disch, Dipl.-Math. Andreas Niedermeier (Fraunhofer IIS)

Backback