Publikationen im Fachgebiet

Nachstehend finden Sie eine automatisierte Zusammenstellung der Veröffentlichungen des Fachgebietes. Die Veröffentlichungen der einzelnen Mitarbeiter:innen finden Sie auf deren persönlichen Seiten.

Publikationsliste

Anzahl der Treffer: 286
Erstellt: Thu, 16 May 2024 23:03:40 +0200 in 0.0903 sec


Gari, Sebastia V. Amengual; Hassager, Henrik G.; Klein, Florian; Arend, Johannes M.; Robinson, Philip W.
Towards determining thresholds for room divergence: a pilot study on perceived externalization. - In: 2021 Immersive and 3D Audio: from Architecture to Automotive (I3DA), (2021), insges. 7 S.

https://doi.org/10.1109/I3DA48870.2021.9610835
Klein, Florian; Gari, Sebastia V. Amengual; Arend, Johannes M.; Robinson, Philip W.
Towards determining thresholds for room divergence: a pilot study on detection thresholds. - In: 2021 Immersive and 3D Audio: from Architecture to Automotive (I3DA), (2021), insges. 7 S.

In binaural rendering, the room divergence effect refers to the decrease in perceived externalization due to a mismatch between the room acoustics of the virtual sounds and those of the listening space. However, it is currently unknown which specific acoustic differences cause this effect. In this work, we present a pilot study to determine detection thresholds between sound sources recorded under different acoustic conditions in a variable acoustics room. These results are intended to predict situations where divergence effects can be expected. The participants had to perform a triangle test where they could listen to three sound sources placed at different positions in the room. The test design was motivated by the fact that sound sources are not placed at the same position in real acoustic scenes. One sound source was recorded under different acoustic conditions than the other two, and the task for the participant was to detect the differing source. The test was conducted in the measured room using 3 DoF binaural reproduction and using a virtual reality (VR) headset to display a visual 360 capture of the room enabling the subjects to see the positions of the sources in the room. Detection rates are signal-dependent and increase with differences in reverberation time (RT). For the most critical signal in the test (castanets), an RT difference of 8% was detectable, while the difference was 15% across all conditions. Furthermore, we discuss the influence of sound source distance and absorption configuration (symmetric or asymmetric) on detection thresholds.



https://doi.org/10.1109/I3DA48870.2021.9610876
Klein, Florian;
Auditive Adaptationsprozesse im Kontext räumlicher Audiowiedergabesysteme. - Ilmenau : Universitätsbibliothek, 2021. - 1 Online-Ressource (ii, 145 Seiten)
Technische Universität Ilmenau, Dissertation 2021

Das Ziel technischer Weiterentwicklungen im Bereich der Unterhaltungselektronik ist die Optimierung der Benutzererfahrung durch die stetige Verbesserung der audiovisuellen Wiedergabe. Durch die Fortschritte im Bereich virtueller und augmentierter Realitäten wurde das Ziel einer realitätsnahen Wiedergabe immer greifbarer. Werden die Sinnesreize so perfekt imitiert, dass es dem Nutzer nicht mehr möglich ist künstlich erzeugte Schallquellen von Realen zu unterscheiden, ist die Rede von einer auditiven Illusion. In erster Linie sind die damit verbundenen Herausforderungen technischer Natur. Allerdings führt eine exakte Reproduktion der Ohrsignale nicht zwangsläufig zur gleichen Wahrnehmung wie in der entsprechenden realen Situation. Neben sinnesübergreifenden Wechselwirkungen, liegt dies daran, dass unsere Wahrnehmung stark von unseren Erwartungen und Erfahrungen abhängt. Diese Erwartungen können sich je nach vorheriger Schallexposition ändern. In Bezug auf das räumliche Hören bedeutet dies, dass Menschen wahrscheinlich lernen können wie räumliche Signale und ihre Merkmale zu interpretieren sind. Solche Mechanismen und ihre Auswirkungen auf die wahrgenommene Qualität von räumlichen Audiowiedergabesystemen ist der Gegenstand dieser Arbeit. In Wahrnehmungsstudien wurde das Erlernen von Lokalisationsmerkmalen untersucht sowie Adaptationsprozesse bei der raumakustischen Wahrnehmung näher beleuchtet. Es wird betrachtet mit welchen Qualitätsdefiziten zu rechnen ist, wenn die Ohrsignale nicht korrekt reproduziert werden und wie sich die Qualitätsbeurteilung abhängig vom Training ändert. Die Ergebnisse deuten darauf hin, dass Lern- und Adaptationsprozesse ein ausschlaggebender Faktor für das Zustandekommen einer auditiven Illusion ist. Die Arbeit diskutiert sowohl die praktische Relevanz dieser Effekte als auch die zugrundeliegenden Lern- und Adaptationsvorgänge.



https://doi.org/10.22032/dbt.50107
Grollmisch, Sascha; Cano, Estefanía
Improving semi-supervised learning for audio classification with FixMatch. - In: Electronics, ISSN 2079-9292, Bd. 10 (2021), 15, 1807, insges. 20 S.

Including unlabeled data in the training process of neural networks using Semi-Supervised Learning (SSL) has shown impressive results in the image domain, where state-of-the-art results were obtained with only a fraction of the labeled data. The commonality between recent SSL methods is that they strongly rely on the augmentation of unannotated data. This is vastly unexplored for audio data. In this work, SSL using the state-of-the-art FixMatch approach is evaluated on three audio classification tasks, including music, industrial sounds, and acoustic scenes. The performance of FixMatch is compared to Convolutional Neural Networks (CNN) trained from scratch, Transfer Learning, and SSL using the Mean Teacher approach. Additionally, a simple yet effective approach for selecting suitable augmentation methods for FixMatch is introduced. FixMatch with the proposed modifications always outperformed Mean Teacher and the CNNs trained from scratch. For the industrial sounds and music datasets, the CNN baseline performance using the full dataset was reached with less than 5% of the initial training data, demonstrating the potential of recent SSL methods for audio data. Transfer Learning outperformed FixMatch only for the most challenging dataset from acoustic scene classification, showing that there is still room for improvement.



https://doi.org/10.3390/electronics10151807
Arend, Johannes M.; Garí, Sebastià V. Amengual; Schissler, Carl; Klein, Florian; Robinson, Philip W.
Six-degrees-of-freedom parametric spatial audio based on one monaural room impulse response. - In: Journal of the Audio Engineering Society, ISSN 0004-7554, Bd. 69 (2021), 7/8, S. 557-575

Parametric spatial audio rendering is a popular approach for low computing capacity applications, such as augmented reality systems. However most methods rely on spatial room impulse responses (SRIR) for sound field rendering with 3 degrees of freedom (DoF), i.e., for arbitrary head orientations of the listener, and often require multiple SRIRs for 6-DoF rendering, i.e., when additionally considering listener translations. This paper presents a method for parametric spatial audio rendering with 6 DoF based on one monaural room impulse response (RIR). The scalable and perceptually motivated encoding results in a parametric description of the spatial sound field for any listener's head orientation or position in space. These parameters form the basis for the binaural room impulse responses (BRIR) synthesis algorithm presented in this paper. The physical evaluation revealed good performance, with differences to reference measurements at most tested positions in a room below the just-noticeable differences of various acoustic parameters. The paper further describes the implementation of a 6-DoF realtime virtual acoustic environment (VAE) using the synthesized BRIRs. A pilot study assessing the plausibility of the 6-DoF VAE showed that the system can provide a plausible binaural reproduction, but it also revealed challenges of 6-DoF rendering requiring further research.



https://doi.org/10.17743/jaes.2021.0009
Grollmisch, Sascha; Cano, Estefanía; Mora Ángel, Fernando; López Gil, Gustavo
Ensemble size classification in Colombian Andean string music recordings. - In: Perception, representations, image, sound, music, (2021), S. 60-74

Reliable methods for automatic retrieval of semantic information from large digital music archives can play a critical role in musicological research and musical heritage preservation. With the advancement of machine learning techniques, new possibilities for information retrieval in scenarios where ground-truth data is scarce are now available. This work investigates the problem of ensemble size classification in music recordings. For this purpose, a new dataset of Colombian Andean string music was compiled and annotated by musicological experts. Different neural network architectures, as well as pre-processing steps and data augmentation techniques were systematically evaluated and optimized. The best deep neural network architecture achieved 81.5% file-wise mean class accuracy using only feed forward layers with linear magnitude spectrograms as input representation. This model will serve as a baseline for future research on ensemble size classification.



Werner, Stephan; Klein, Florian; Neidhardt, Annika; Sloma, Ulrike; Schneiderwind, Christian; Brandenburg, Karlheinz
Creation of auditory augmented reality using a position-dynamic binaural synthesis system - technical components, psychoacoustic needs, and perceptual evaluation. - In: Applied Sciences, ISSN 2076-3417, Bd. 11 (2021), 3, 1150, S. 1-20

For a spatial audio reproduction in the context of augmented reality, a position-dynamic binaural synthesis system can be used to synthesize the ear signals for a moving listener. The goal is the fusion of the auditory perception of the virtual audio objects with the real listening environment. Such a system has several components, each of which help to enable a plausible auditory simulation. For each possible position of the listener in the room, a set of binaural room impulse responses (BRIRs) congruent with the expected auditory environment is required to avoid room divergence effects. Adequate and efficient approaches are methods to synthesize new BRIRs using very few measurements of the listening room. The required spatial resolution of the BRIR positions can be estimated by spatial auditory perception thresholds. Retrieving and processing the tracking data of the listener’s head-pose and position as well as convolving BRIRs with an audio signal needs to be done in real-time. This contribution presents work done by the authors including several technical components of such a system in detail. It shows how the single components are affected by psychoacoustics. Furthermore, the paper also discusses the perceptive effect by means of listening tests demonstrating the appropriateness of the approaches.



https://doi.org/10.3390/app11031150
Lenzen, Lucien;
Konzept zur Einführung von HDR im Broadcast mithilfe präferenzbasierter Kontrastkompression. - Ilmenau : Universitätsbibliothek, 2020. - 1 Online-Ressource (xv, 167 Blätter)
Technische Universität Ilmenau, Dissertation 2021

HDR (High Dynamic Range) ermöglicht es, einen weitaus größeren Kontrastumfang einer Szene einzufangen als es im HD-Broadcast der Fall wäre. In der Folge können Details sowohl in den Lichtern als auch in den Schatten erhalten werden. Allerdings sind die Möglichkeiten zur Wiedergabe sehr heterogen und meist deutlich limitierter. Um trotzdem alle Zuschauer von der gesteigerten Aufnahmequalität profitieren zu lassen, wird eine Anpassung - auch Kontrastkompression genannt - nötig. Manuelle Techniken zur Kontrastkompression sind aus der filmischen Postproduktion bekannt, während automatische Verfahren in der Computergrafik Anwendung finden. Aufgrund der speziellen Anforderungen des Broadcast lassen sich diese jedoch nicht einfach übertragen. Eine grundlegende Herausforderung besteht dabei in der Präferenz des Zuschauers. Das Ziel der Arbeit ist es deshalb, die Zuschauerpräferenz bezüglich der Helligkeits- und Farbwahrnehmung zu quantifizieren und anschließend auf diesen Ergebnissen eine algorithmische Lösung zur Anpassung der Kontrastkompression für die Anwendung beim Broadcast anzubieten. Mithilfe von objektiven und subjektiven Untersuchungen soll gezeigt werden, wie sich hierdurch die Bildqualität signifikant steigern lässt. Abschließend gilt es anhand von beispielhaften Workflows und Feldversuchen einen Weg für die flächendeckende Einführung von HDR aufzuzeigen.



https://nbn-resolving.org/urn:nbn:de:gbv:ilm1-2021000124
Neidhardt, Annika; Reif, Boris
Minimum BRIR grid resolution for interactive position changes in dynamic binaural synthesis. - In: 148th Audio Engineering Society International Convention 2020, (2020), S. 660-669

Grollmisch, Sascha; Cano, Estefanía; Kehling, Christian; Taenzer, Michael
Analyzing the potential of pre-trained embeddings for audio classification tasks. - In: 28th European Signal Processing Conference (EUSIPCO 2020), (2020), S. 790-794

In the context of deep learning, the availability of large amounts of training data can play a critical role in a models performance. Recently, several models for audio classification have been pre-trained in a supervised or self-supervised fashion on large datasets to learn complex feature representations, socalled embeddings. These embeddings can then be extracted from smaller datasets and used to train subsequent classifiers. In the field of audio event detection (AED) for example, classifiers using these features have achieved high accuracy without the need of additional domain knowledge. This paper evaluates three state-of-the-art embeddings on six audio classification tasks from the fields of music information retrieval and industrial sound analysis. The embeddings are systematically evaluated by analyzing the influence on classification accuracy of classifier architecture, fusion methods for file-wise predictions, amount of training data, and initial training domain of the embeddings. To better understand the impact of the pre-training step, results are also compared with those acquired with models trained from scratch. On average, the OpenL3 embeddings performed best with a linear SVM classifier. For a reduced amount of training examples, OpenL3 outperforms the initial baseline.



https://doi.org/10.23919/Eusipco47968.2020.9287743