A key challenge in so-called auditory mixed reality is the plausible embedding of virtual objects in the real environment. When synthesizing and reproducing the virtual audio objects, the room acoustics of the real room must be captured and reproduced for every possible source-receiver relation in the room.

The aim is the perceptual fusion of the virtual objects with the real acoustic environment [1, 2]. If the virtual acoustics deviate sufficiently from the real acoustics, the virtual audio objects are not perceived in the room, but inside the head. This cognitive effect is known as the Room Divergence Effect [3, 4].

Common approaches for determining the room acoustics are usually simplified statistical and/or geometry-based room acoustic simulations [5, 6], wave-based simulations [7] or by parametric description of acoustic room transfer functions including also machine learning [8, 9]. Other approaches use optical data such as photos, videos or 3D scans to estimate acoustic transfer function or create directly spatial audio [10].

All these approaches result in a more or less correct approximation of the real room acoustics. An analytical and perceptual comparison between simulation and reality is therefore highly desirable. However, high-precision measurements at a large number of source-receiver positions in a room, which can serve as reference data sets, are complex to create, involve large amounts of data and are prone to errors. It is practically impossible to carry out such measurements manually.

Autonomous Room Acoustic Measurement

The Electronic Media Technology group at Technische Universität Ilmenau, with the support of the Ilmenau company Metralabs, has developed a measuring system that can carry out autonomous room acoustic measurements (see Figure 1). For this purpose, a mobile robot platform was equipped with a microphone array in order to record so-called spatial room impulse responses and make them available for spatial audio reproduction over headphones using the position-dynamic binaural synthesis [11].

robotic platform with mounted microphone array
Figure 1- Picture of the robotic platform with mounted microphone array.

Robotic Platform

As a robotic platform a TORY v2 by MetraLabs is being used. The robot’s two-wheel drive allows for free positioning on a plane while using a 2D-LIDAR scanner and odometry data to determine its location. The data from the LIDAR scanner is also used to detect obstacles in its path and avoid collisions. In addition, it has a tactile safety edge that can detect contact with objects, triggering a safety stop to prevent damage or injury. The maps for navigation have to be pre-recorded by a human operator.

In a previous study [12], the robot’s localization accuracy was determined to an average of 2.7 cm in translation and 0.7° in rotation. In addition, it can approach a target position with an accuracy of 3 cm in translation and 0.6° in orientation. Battery life averages around 8 hours per charge, but highly depends on the distance traveled by the robot. Recharging is done automatically by docking on a charging station. Charging the battery from 20% to 100% takes approximately 2 hours.

Microphone Array

The robot, as shown in figure 1, is equipped with a microphone array consisting of six satellite microphones mounted around an Earthworks M30 measurement microphone. The satellite microphones are spherically distributed in a diameter of 10 cm with the measurement microphone in the center. The whole array is placed in the center of the robotic platform and can be manually adjusted between 112 cm and 175 cm from the ground. Based on the Spatial Decomposition Method (SDM) [13] the array allows the estimation of SRIRs from Exponential Sine Sweep recordings. In addition, the RIRs based on the recordings of the measurement microphone can be used for classical one-dimensional room acoustics analysis. The test signals are played back over loudspeakers placed at fixed positions in the room which are connected to an external computer. Currently there is no synchronization between the loudspeaker playback and recording, but for the estimation of many acoustic parameters this is not necessary.


The software components are built around a central database containing the TaskChain as shown in Figure 2. As the basic database software MongoDB is used.

workflow to control the robot system
Figure 2- Block diagram of the system components and workflow to control the robot system.

The TaskChain is a linked list of Tasks, which are database entries containing the parameters for different Services to fulfill the measurement. Said parameters are for example the position and orientation for the robot to approach or the sweeps to play over the loudspeakers. In addition to that, the collected data is stored back in the TaskChain, for example the sweep recordings or the execution time. The TaskChain can be modified during runtime, allowing for example the insertion of tasks to load the battery if the charge is critical or putting back a task if the target is temporarily unreachable for the robot.

The TaskManager distributes the Tasks to the corresponding Services. Currently the Services are the Soundservers for playing back and recording audio and one to control the robotic platform itself by communicating with the robot’s middleware MIRA [14]. Through the modular design of the TaskChain it is easily possible to add new Services for future use cases. The communication between the TaskManager and the Services is done via the ActiveMQ middleware, allowing communication over IP-Network. Aside from the mentioned software, all implementations were done in Python 3.

Exemplary Reference Dataset

In the field of Mixed Reality applications, one strives for seamless integration of virtual sources into real environments. A high spatial resolution dataset of Spatial Room Impulse Responses (SRIRs) measurements is presented to support this goal [15]. Figure 3 shows the calculated early decay time values of the example measurement using the robotic platform.

Calculated Early Decay Time values
Figure 3- Calculated Early Decay Time values of the example measurement using the robotic platform. The red encircled cross marks the loudspeaker used for the example calculations. The red crosses without circle mark the other loudspeakers used in the measurement. The white floor plan is a visualization of the robot’s LIDAR map.

An acoustically dry room was deliberately modified by adding highly reflective walls. A total of nine different wall configurations were measured using robotic platform that sampled the room uniformly with a resolution of 0.25 m. Five loudspeaker positions in each wall configuration led to an overall number of 18535 measurements. Using open-source libraries SRIR from this dataset can be used to calculate Binaural Room Impulse Responses for any head orientation at all measured positions. The dataset can serve as ground truth for various use cases such as: room acoustic simulations, (S)RIR inter- and extrapolation algorithms, and parametric six-degrees-of-freedom binaural rendering, and as training set for machine-learning applications. Furthermore, our measurements allow us to analyze the change in directional energy distribution caused by different wall configurations. This way, the perceptual effects of first reflections can be assessed in a controlled manner. Overall, the contribution can help to provide a comprehensive basis for parametric analysis and perception of room acoustic variations in mixed reality scenarios.

Processing, Research, and Related Projects

The things mentioned are part of the research work in the Electronic Media Technology group. These are primarily the ongoing or starting doctoral theses of Lukas Treybig and Georg Stolz, the project work of Dr.-Ing. Florian Klein and the research activities of all colleagues in the group as well as students with their final theses.

The work is directly linked to the ISOPERARE (research cooperation funded by Meta) and MULTIPARTIES (joint project with TU Ilmenau groups AVT and VWDG and companies funded by BMBF) projects, where it serves as a basis for perceptual auditory modeling and for acoustic modeling of the spatial audio reproduction.


[1]           Neidhardt, A., Schneiderwind, C., and Klein, F., “Perceptual Matching of Room Acoustics for Auditory Augmented Reality in Small Rooms- Literature Review and Theoretical Framework,” Trends in Hearing, 26, 2022, doi:10.1177/23312165221092919,

[2]           Werner, S., “Über den Einfluss kontextabhängiger Qualitätsparameter auf die Wahrnehmung von Externalität und Hörereignisort”, Dissertation Thesis (PhD Thesis), Technische Universität Ilmenau, Fakultät für Elektrotechnik und Informationstechnik, Ilmenau, Germany, urn:nbn:de:gbv:ilm1-2018000672, 2018.

[3]           Werner, S., Klein, F., Mayenfels, T., and Brandenburg, K., “A summary on acoustic room divergence and its effect on externalization of auditory events,” in 2016 8th Int. Conf. on Quality of Multimedia Experience (QoMEX), pp. 1–6, 2016, doi:10.1109/QoMEX.2016.7498973,

[4]           Gil-Carvajal, J., Cubick J., Santurette, S. and Dau T., "Spatial Hearing with Incongruent Visual or Auditory Room Cues" , Nature Scientific Reports, 6:37342, 2016, doi: 10.1038/srep37342.

[5]           Tang, Z., Aralikatti, R., Ratnarajah, A. J. and Manocha, D., “Gwa: A large high-quality acoustic dataset for audio processing,” in ACM SIGGRAPH 2022 Conference Proceedings, ser. SIGGRAPH ’22. New York, NY, USA: Association for Computing Machinery, 2022.

[6]           Diaz-Guerra, D., Miguel, A. and Beltran, J., “gpurir: A python library for room impulse response simulation with gpu acceleration,” in Multimed Tools Appl 80, 5653–567, 2021.

[7]           Pind, F., Engsig-Karup, A. P., Jeong, C.-H., Hesthaven, J. S., Mejling, M. S. and Strømann-Andersen, J., “Time domain room acoustic simulations using the spectral element methode,” The Journal of the Acoustical Society of America, vol. 145, no. 6, pp. 3299–3310, 06, 2019.

[8]           Götz, G., Falcón Pérez, F., Schlecht, S. J. and Pulkki, V., “Neural network for multi-exponential sound energy decay analysis,” The Journal of the Acoustical Society of America, vol. 152, no. 2, pp. 942–953, 2022.

[9]           Ratnarajah, A., Ananthabhotla, I., Ithapu, V. K., Hoffmann, P. F., Manocha, D. and Calamia, P. T., “Towards improved room impulse response estimation for speech recognition,” ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5, 2022.

[10]         Ratnarajah, A., Tang, Z., Aralikatti, R. and Manocha, D., “Mesh2ir: Neural acoustic impulse response generator for complex 3d scenes,” in Proceedings of the 30th ACM International Conference on Multimedia, ser. MM ’22. New York, NY, USA: Association for Computing Machinery, 2022, p. 924–933.

[11]         Werner, S., Klein, F., Neidhardt, A., Sloma, U., Schneiderwind, C. and Brandenburg, K., “Creation of Auditory Augmented Reality Using a Position-Dynamic Binaural Synthesis System—Technical Components, Psychoacoustic Needs, and Perceptual Evaluation,” Applied Sciences, vol. 11, no. 3, 2021.

[12]         Stolz, G., „Entwicklung eines Systems für raumakustische Messungen unter Anwendung einer Robotikplattform“, Master Thesis, Technische Universität Ilmenau, Oct. 2022.

[13]         Tervo, S., Tynen, J. P., Kuusinen, A. and Lokki, T., “Spatial Decomposition Method for Room Impulse Responses,” J. Audio Eng. Soc., vol. 61, no. 1, p. 13, 2013.

[14]         Einhorn, E., Langner, T., Stricker, R., Martin, C. and Gross, H.-M., “MIRA - Middleware for Robotic Applications,” in IEEE International Conference on Intelligent Robots and Systems 2012, Oct. 2012.

[15]         Treybig, L., Klein, F., Stolz, G., and Werner, S., “A high spatial resolution dataset of spatial room impulse responses for different acoustic room configurations [Data set]”, Jahrestagung für Akustik, DAGA, Hannover, 2024.


Technische Universität Ilmenau

Electronic Media Technology Group

Stephan Werner

Room H3520