Proof-of-concept study to evaluate the impact of spatial audio on social presence and user behavior in multi-modal VR communication. - In: IMX 2023, (2023), S. 209-215
This paper presents a proof-of-concept study conducted to analyze the effect of simple diotic vs. spatial, position-dynamic binaural synthesis on social presence in VR, in comparison with face-to-face communication in the real world, for a sample two-party scenario. A conversational task with shared visual reference was realized. The collected data includes questionnaires for direct assessment, tracking data, and audio and video recordings of the individual participants’ sessions for indirect evaluation. While tendencies for improvements with binaural over diotic presentation can be observed, no significant difference in social presence was found for the considered scenario. The gestural analysis revealed that participants used the same amount and type of gestures in face-to-face as in VR, highlighting the importance of non-verbal behavior in communication. As part of the research, an end-to-end framework for conducting communication studies and analysis has been developed.
Knowledge transfer from neural networks for speech music classification. - In: Music in the AI Era, (2023), S. 202-213
A frequent problem when dealing with audio classification tasks is the scarcity of suitable training data. This work investigates ways of mitigating this problem by applying transfer learning techniques to neural network architectures for several classification tasks from the field of Music Information Retrieval (MIR). First, three state-of-the-art architectures are trained and evaluated with several datasets for the task of speech/music classification. Second, feature representations or embeddings are extracted from the trained networks to classify new tasks with unseen data. The effect of pre-training with respect to the similarity of the source and target tasks are investigated in the context of transfer learning, as well as different fine-tuning strategies.
The ability to memorize acoustic features in a discrimination task. - In: Journal of the Audio Engineering Society, ISSN 0004-7554, Bd. 71 (2023), 5, S. 254-266
How humans perceive, recognize, and remember room acoustics is of particular interest in the domain of spatial audio. For the creation of virtual or augmented acoustic environments, a room acoustic impression matches the expectations of certain room classes or a specific room. These expectations are based on the auditory memory of the acoustic room impression. In this paper, the authors present an exploratory study to evaluate the ability of listeners to recognize room acoustic features. The task of the listeners was to detect the reference room in a modified ABX double-blind stimulus test that featured a pre-defined playback order and a fixed time schedule. Furthermore, the authors explored distraction effects by employing additional nonacoustic interferences. The results show a significant decrease of the auditory memory capacity within 10 s, which is more pronounced when the listeners were distracted. However, the results suggest that auditory memory depends on what auditory cues are available.
The R3VIVAL dataset: repository of room responses and 360 videos of a variable acoustics lab. - In: IEEE ICASSP 2023 conference proceedings, (2023), insges. 5 S.
This paper presents a dataset of spatial room impulse responses (SRIRs) and 360˚ stereoscopic video captures of a variable acoustics laboratory. A total of 34 source positions are measured with 8 different acoustic panel configurations, resulting in a total of 272 SRIRs. The source positions are arranged in 30˚ increments at concentric circles of radius 1.5, 2, and 3 m measured with a directional studio monitor, as well as 4 extra positions at the room corners measured with an omnidirectional source. The receiver is a 7 channel open microphone array optimized for its use with the Spatial Decomposition Method (SDM). The 8 acoustic configurations are achieved by setting a subset of the panels to their absorptive configuration in 5 steps (0%, 25%, 50%, 75%, 100% of the panels), as well as 3 configurations in which entire walls are set to their absorptive configuration (right, right/back, right/back/left). Video captures of the laboratory and a second room are obtained using a 360˚ stereoscopic camera with a resolution of 4096 × 2160 pixels, covering the same source/receiver combinations. Furthermore, we present an acoustic analysis of both time-energy and spatio-temporal parameters showcasing the differences in the measured configurations. The dataset, together with spatial analysis and rendering scripts, is publicly released in a GitHub repository1.
Plausibility of an approaching motion towards a virtual sound source II: in a reverberant seminar room. - In: AES Europe Spring 2022, (2022), S. 559-571
This study investigates the plausibility of dynamic binaural audio scenarios wherein the listener interactively walks towards a virtual sound source. An originally measured BRIR set was manipulated and simplified systematically to challenge plausibility, explore its limits, and examine the relevance of selected acoustic properties. After the first investigation in a quite dry listening laboratory, this second exploratory study repeats and extends the experiment in a considerably more reverberant room. The participants had to rate externalization, continuity, stability of the apparent sound source, impression of walking towards the sound source and the plausibility of the virtual acoustic scene. The results confirm the observations of the first study in the different acoustic environment. Both studies indicate much room for simplifications, but certain modifications seriously affect plausibility. Even inexperienced listeners notice if the progress of the auditory distance change does not match their own walking motion. In addition, the meaning of context and expectation for the perception of binaural audio is highlighted.
Discriminability of concurrent virtual and real sound sources in an augmented audio scenario. - In: AES Europe Spring 2022, (2022), S. 521-529
This exploratory study investigates peoples’ ability to discriminate between real and virtual sound sources in a position-dynamic headphone based augmented audio scene. For this purpose, an acoustic scene was created consisting of two loudspeakers at different positions in a small seminar room. Considering the presence of headphones, non-individualized BRIRs measured along a line with a dummy head wearing AKG K1000 headphones were used to allow for head rotation and translation. In a psychoacoustic experiment, participants had to explore the acoustic scene and tell which sound source they believe is real or virtual. The test cases included a dialog scenario, stereo pop-music and one person speaking while the other speaker played mono-music simultaneously. Results show that the participants were on trend able to debunk individual virtual sources. However, for the cases where both sound sources reproduced sound simultaneously, lower distinguishability rates were observed.
A dataset of measured spatial room impulse responses in different rooms including visualization. - In: AES Europe Spring 2022, (2022), S. 621-625
In this contribution, an open-source dataset of captured spatial room impulse responses (SRIRs) is presented. The data was collected in different enclosed spaces at the Technische Universität Ilmenau using an open self-build microphone array design following the spatial decomposition method (SDM) guidelines. The included rooms were selected based on their distinctive acoustical properties resulting from their general build and furnishing as required by their utility. Three different classes of spaces can be distinguished, including seminar rooms, offices, and classrooms. For each considered space different source-receiver positions were recorded, including 360? images for each condition. The dataset can be utilized for various augmented or virtual reality applications, using either a loudspeaker or headphone-based reproduction alongside the appropriate head-related transfer function sets. The complete database, including the measured impulse responses as well as the corresponding images, is publicly available.
Room acoustic analysis and BRIR matching based on room acoustic measurements. - In: AES International Conference on Audio for Virtual and Augmented Reality (AVAR 2022), (2022), S. 48-57
To achieve the goal of a perceptual fusion between the auralization of virtual audio objects in the room acoustics of a real listening room, an adequate adaptation of the virtual acoustics to the real room acoustics is necessary. The challenges are to describe the acoustics of different rooms by suitable parameters, to classify different rooms, and to evoke a similar auditory perception between acoustically similar rooms. An approach is presented to classify rooms based on measured BRIRs using statistical methods and to select best match BRIRs from the dataset to auralize audio objects in a new room. The results show that it is possible to separate rooms based on their room acoustic properties, that the separation also corresponds to a large extent to the perceptual distance between rooms, and that a selection of best match BRIRs is possible.
Auditory room identification in a memory task. - In: AES International Conference on Audio for Virtual and Augmented Reality (AVAR 2022), (2022), S. 132-141
How we perceive and remember room acoustics is of particular interest in the domain of spatial audio. For the creation of virtual or augmented acoustic environments, a room acoustic impression needs to be created which matches the expectations of certain room classes or a specific room. These expectations are based on the auditory memory of the acoustic room impression. In this paper, we present an exploratory study to evaluate the ability of listeners to remember specific rooms. The task of the listeners was to detect the reference room in a modified ABX double-blind stimulus test which featured a pre-defined playback order and a fixed time schedule. Furthermore, we explored distraction effects by employing additional non-acoustic interferences. The results show a significant decrease of the auditory memory capacity within ten seconds, which is more pronounced when the listeners were distracted. However, the results suggest that auditory memory depends on what auditory cues are available.
Digital media in intergenerational communication: status quo and future scenarios for the grandparent-grandchild relationship. - In: Universal access in the information society, ISSN 1615-5297, Bd. 0 (2022), 0, insges. 16 S.
Communication technologies play an important role in maintaining the grandparent-grandchild (GP-GC) relationship. Based on Media Richness Theory, this study investigates the frequency of use (RQ1) and perceived quality (RQ2) of established media as well as the potential use of selected innovative media (RQ3) in GP-GC relationships with a particular focus on digital media. A cross-sectional online survey and vignette experiment were conducted in February 2021 among N = 286 university students in Germany (mean age 23 years, 57% female) who reported on the direct and mediated communication with their grandparents. In addition to face-to-face interactions, non-digital and digital established media (such as telephone, texting, video conferencing) and innovative digital media, namely augmented reality (AR)-based and social robot-based communication technologies, were covered. Face-to-face and phone communication occurred most frequently in GP-GC relationships: 85% of participants reported them taking place at least a few times per year (RQ1). Non-digital established media were associated with higher perceived communication quality than digital established media (RQ2). Innovative digital media received less favorable quality evaluations than established media. Participants expressed doubts regarding the technology competence of their grandparents, but still met innovative media with high expectations regarding improved communication quality (RQ3). Richer media, such as video conferencing or AR, do not automatically lead to better perceived communication quality, while leaner media, such as letters or text messages, can provide rich communication experiences. More research is needed to fully understand and systematically improve the utility, usability, and joy of use of different digital communication technologies employed in GP-GC relationships.