Publikationen im Fachgebiet

Nachstehend finden Sie eine automatisierte Zusammenstellung der Veröffentlichungen des Fachgebietes. Die Veröffentlichungen der einzelnen Mitarbeiter:innen finden Sie auf deren persönlichen Seiten.

Publikationsliste

Anzahl der Treffer: 286
Erstellt: Thu, 16 May 2024 23:03:40 +0200 in 0.0853 sec


Immohr, Felix; Rendle, Gareth; Neidhardt, Annika; Lammert, Anton; Brandenburg, Karlheinz; Fröhlich, Bernd; Raake, Alexander
APlausE-MR: investigating multi-party communication in audiovisual mixed-reality environments. - In: Proceedings of the 1st AUDICTIVE Conference, (2023), S. 100-103

https://doi.org/10.18154/RWTH-2023-08834
Treybig, Lukas; Höbel-Müller, Juliane; Werner, Stephan; Nürnberger, Andreas
Acoustic inter- and intra-room similarity based on room acoustic parameters. - In: Engineering for a changing world, (2023), 5.2.136, S. 1-15

This paper shows various approaches for determining acoustic (dis-)similarity based on room acoustic parameter values derived from real measurements. The similarity is calculated across different room configurations and/or between different microphone-loudspeaker positions within the same room configuration. We compare supervised (LDA, Random Forrest) and unsupervised techniques (PCA, SPPA) and pre-selected visualizations in terms of their ability to exhibit inter- and intra-room (dis-)similarities. The data set generated comprises spatially high-resolution room impulse responses obtained from multiple source-receiver positions within a room configuration. The room acoustics are varied by introducing active walls and geometries accounting for specific room configurations. The results show that the separation of room configurations primarily relies on specific acoustic parameters, with the reverberation time playing an important role. Within a given room configuration, the acoustic parameters excluding the reverberation time mainly capture the orientation and distance between the source and receiver.



https://doi.org/10.22032/dbt.58929
Neidhardt, Annika;
On the plausibility of simplified acoustic room representations for listener translation in dynamic binaural auralizations. - Ilmenau : Universitätsbibliothek, 2023. - 1 Online-Ressource (167 Seiten)
Technische Universität Ilmenau, Dissertation 2023

Diese Doktorarbeit untersucht die Wahrnehmung vereinfachter akustischer Raumrepräsentationen in positionsdynamischer Binauralwiedergabe für die Hörertranslation. Die dynamische Binauralsynthese ist eine Audiowiedergabemethode zur Erzeugung räumlicher auditiver Illusionen über Kopfhörer für virtuelle, erweiterte und gemischte Realität (VR/AR/MR). Dabei ist es nun eine typische Anforderung, immersive Inhalte in sechs Freiheitsgraden (6DOF) zu erkunden. Dynamische binaurale Schallfeldimitationen mit hoher physikalischer Genauigkeit zu realisieren, ist meist mit sehr hohem Rechenaufwand verbunden. Frühere psychoakustische Studien weisen jedoch darauf hin, dass Menschen eine begrenzte Empfindlichkeit gegenüber den Details des Schallfelds haben, insbesondere im späten Nachhall. Dies birgt das Potential physikalischer Vereinfachungen bei der positionsdynamischen Auralisation von Räumen. Beispielsweise wurden Konzepte vorgeschlagen, die auf der perzeptiven Mixing Time oder der Hörbarkeitsschwelle von frühen Reflexionen basieren, für welche jedoch eine gründliche psychoakustische Bewertung noch aussteht. Zunächst wurde ein Aufbau zur positionsdynamischen Raumauralisation implementiert und evaluiert. Daran untersucht die Arbeit wesentliche Systemparameter wie die erforderliche räumliche Auflösung eines Positionsrasters für die dynamische Anpassung. Da allgemein etablierte Testmethoden zur wahrnehmungsbezogenen Bewertung von räumlichen auditiven Illusionen unter Berücksichtigung interaktiver Hörertranslation fehlten, untersucht die Arbeit verschiedene Ansätze zur Messung der Plausibilität. Auf dieser Grundlage werden physikalische Vereinfachungen im Verlauf des Schallfeldes in positionsdynamischen binauralen Auralisationen der Raumakustik untersucht. Für die Hauptexperimente wurden binaurale Raumimpulsantworten (BRIRs) entlang einer Linie für die Hörertranslation in einem eher trockenen Hörlabor und einem halligen Seminarraum ähnlicher Größe gemessen. Die erstellten Datensätze enthalten Szenarien von Hörerbewegungen auf eine virtuelle Schallquelle zu, daran vorbei, davon weg oder dahinter. Darüber hinaus betrachten die Untersuchungen zwei Extremfälle der Quellenorientierung, um die Auswirkungen einer Variation der Schallquellenrichtcharakteristik zu berücksichtigen. Die BRIR-Sätze werden systematisch bearbeitet und vereinfacht, um die Auswirkungen auf die Wahrnehmung zu bewerten. Insbesondere das Konzept der perzeptiven Mixing Time und manipulierte räumlich-zeitliche Muster früher Reflexionen dienten als Testfälle in den psychoakustischen Studien. Die Ergebnisse zeigen ein hohes Potential für Vereinfachungen, unterstreichen aber auch die Relevanz der genauen Imitation prominenter früher Reflexionen. Die Ergebnisse bestätigen auch das Konzept der wahrnehmungsbezogenen Mixing Time für die betrachteten Fälle der positionsdynamischen binauralen Wiedergabe. Die Beobachtungen verdeutlichen, dass gängige Testszenarien für Auralisierungen, Interpolation und Extrapolation nicht kritisch genug sind, um allgemeine Schlussfolgerungen über die Eignung der getesteten Rendering-Ansätze zu ziehen. Die Arbeit zeigt Lösungsansätze auf.



https://doi.org/10.22032/dbt.57596
Immohr, Felix; Rendle, Gareth; Neidhardt, Annika; Göring, Steve; Ramachandra Rao, Rakesh Rao; Arévalo Arboleda, Stephanie; Froehlich, Bernd; Raake, Alexander
Proof-of-concept study to evaluate the impact of spatial audio on social presence and user behavior in multi-modal VR communication. - In: IMX 2023, (2023), S. 209-215

This paper presents a proof-of-concept study conducted to analyze the effect of simple diotic vs. spatial, position-dynamic binaural synthesis on social presence in VR, in comparison with face-to-face communication in the real world, for a sample two-party scenario. A conversational task with shared visual reference was realized. The collected data includes questionnaires for direct assessment, tracking data, and audio and video recordings of the individual participants’ sessions for indirect evaluation. While tendencies for improvements with binaural over diotic presentation can be observed, no significant difference in social presence was found for the considered scenario. The gestural analysis revealed that participants used the same amount and type of gestures in face-to-face as in VR, highlighting the importance of non-verbal behavior in communication. As part of the research, an end-to-end framework for conducting communication studies and analysis has been developed.



https://doi.org/10.1145/3573381.3596458
Kehling, Christian; Cano, Estefanía
Knowledge transfer from neural networks for speech music classification. - In: Music in the AI era, (2023), S. 202-213

A frequent problem when dealing with audio classification tasks is the scarcity of suitable training data. This work investigates ways of mitigating this problem by applying transfer learning techniques to neural network architectures for several classification tasks from the field of Music Information Retrieval (MIR). First, three state-of-the-art architectures are trained and evaluated with several datasets for the task of speech/music classification. Second, feature representations or embeddings are extracted from the trained networks to classify new tasks with unseen data. The effect of pre-training with respect to the similarity of the source and target tasks are investigated in the context of transfer learning, as well as different fine-tuning strategies.



Klein, Florian; Surdu, Tatiana; Treybig, Lukas; Werner, Stephan
The ability to memorize acoustic features in a discrimination task. - In: Journal of the Audio Engineering Society, ISSN 0004-7554, Bd. 71 (2023), 5, S. 254-266

How humans perceive, recognize, and remember room acoustics is of particular interest in the domain of spatial audio. For the creation of virtual or augmented acoustic environments, a room acoustic impression matches the expectations of certain room classes or a specific room. These expectations are based on the auditory memory of the acoustic room impression. In this paper, the authors present an exploratory study to evaluate the ability of listeners to recognize room acoustic features. The task of the listeners was to detect the reference room in a modified ABX double-blind stimulus test that featured a pre-defined playback order and a fixed time schedule. Furthermore, the authors explored distraction effects by employing additional nonacoustic interferences. The results show a significant decrease of the auditory memory capacity within 10 s, which is more pronounced when the listeners were distracted. However, the results suggest that auditory memory depends on what auditory cues are available.



https://doi.org/10.17743/jaes.2022.0073
Klein, Florian; Amengual Garí, Sebastià V.
The R3VIVAL dataset: repository of room responses and 360 videos of a variable acoustics lab. - In: IEEE ICASSP 2023 conference proceedings, (2023), insges. 5 S.

This paper presents a dataset of spatial room impulse responses (SRIRs) and 360˚ stereoscopic video captures of a variable acoustics laboratory. A total of 34 source positions are measured with 8 different acoustic panel configurations, resulting in a total of 272 SRIRs. The source positions are arranged in 30˚ increments at concentric circles of radius 1.5, 2, and 3 m measured with a directional studio monitor, as well as 4 extra positions at the room corners measured with an omnidirectional source. The receiver is a 7 channel open microphone array optimized for its use with the Spatial Decomposition Method (SDM). The 8 acoustic configurations are achieved by setting a subset of the panels to their absorptive configuration in 5 steps (0%, 25%, 50%, 75%, 100% of the panels), as well as 3 configurations in which entire walls are set to their absorptive configuration (right, right/back, right/back/left). Video captures of the laboratory and a second room are obtained using a 360˚ stereoscopic camera with a resolution of 4096 × 2160 pixels, covering the same source/receiver combinations. Furthermore, we present an acoustic analysis of both time-energy and spatio-temporal parameters showcasing the differences in the measured configurations. The dataset, together with spatial analysis and rendering scripts, is publicly released in a GitHub repository1.



https://doi.org/10.1109/ICASSP49357.2023.10097257
Baum, Malte; Cuccovillo, Luca; Yaroshchuk, Artem; Aichroth, Patrick
Environment classification via blind roomprints estimation. - In: 2022 IEEE International Workshop on Information Forensics and Security (WIFS), (2022), insges. 6 S.

In this paper we present a novel approach for environment classification for speech recordings, which does not require the selection of decaying reverberation tails. It is based on a multi-band RT60 analysis of blind channel estimates and achieves an accuracy of up to 93.8% on test recordings derived from the ACE corpus.



https://doi.org/10.1109/WIFS55849.2022.9975411
Neidhardt, Annika; Kamandi, Samaneh
Plausibility of an approaching motion towards a virtual sound source II: in a reverberant seminar room. - In: AES Europe Spring 2022, (2022), S. 559-571

This study investigates the plausibility of dynamic binaural audio scenarios wherein the listener interactively walks towards a virtual sound source. An originally measured BRIR set was manipulated and simplified systematically to challenge plausibility, explore its limits, and examine the relevance of selected acoustic properties. After the first investigation in a quite dry listening laboratory, this second exploratory study repeats and extends the experiment in a considerably more reverberant room. The participants had to rate externalization, continuity, stability of the apparent sound source, impression of walking towards the sound source and the plausibility of the virtual acoustic scene. The results confirm the observations of the first study in the different acoustic environment. Both studies indicate much room for simplifications, but certain modifications seriously affect plausibility. Even inexperienced listeners notice if the progress of the auditory distance change does not match their own walking motion. In addition, the meaning of context and expectation for the perception of binaural audio is highlighted.



Schneiderwind, Christian; Neidhardt, Annika
Discriminability of concurrent virtual and real sound sources in an augmented audio scenario. - In: AES Europe Spring 2022, (2022), S. 521-529

This exploratory study investigates peoples’ ability to discriminate between real and virtual sound sources in a position-dynamic headphone based augmented audio scene. For this purpose, an acoustic scene was created consisting of two loudspeakers at different positions in a small seminar room. Considering the presence of headphones, non-individualized BRIRs measured along a line with a dummy head wearing AKG K1000 headphones were used to allow for head rotation and translation. In a psychoacoustic experiment, participants had to explore the acoustic scene and tell which sound source they believe is real or virtual. The test cases included a dialog scenario, stereo pop-music and one person speaking while the other speaker played mono-music simultaneously. Results show that the participants were on trend able to debunk individual virtual sources. However, for the cases where both sound sources reproduced sound simultaneously, lower distinguishability rates were observed.