Logo TU Ilmenau

You are here


Audiovisual Technology Group

The Audiovisual Technology Group (AVT) deals with the function, application and perception of audio and video equipment. An essential focus of the research is on the relationship between the technical characteristics of audio, video and audiovisual systems and human perception and experience (“Quality of Experience”, QoE).

further information on the group


You can also find news about the lab on our Twitter channel:


DFG-Project ECoClass-VR started

DFG-Project ECoClass-VR started

Recently, the Deutsche Forschungsgemeinschaft (DFG) accepted a submitted project proposal within the DFG priority programme "Auditory Cognition in Interactive Virtual Environments" (AUDICTIVE). The project is being carried out in cooperation with the Chair for Hearing Technology and Acoustics (Prof. Janina Fels, RWTH Aachen) and the Department of Cognitive and Developmental Psychology (Prof. Maria Klatte, TU Kaiserslautern).

Project description

In the project “Evaluating cognitive performance in classroom scenarios using audiovisual virtual reality” (ECoClass-VR) we are investigating the suitability of audio-visual Immersive Virtual Environments (IVEs) for a "real-world" assessment of the influence of the visuospatial and acoustic environment on cognitive performance of adults and children in classroom-type environments. Existing knowledge on the influence of environmental variables on cognitive performance in classrooms comes predominantly from auditory experimental paradigms with typically simple acoustic replications. So far, only limited attention has been paid to visual processing, without considering relevant audiovisual aspects.

Read more in the project page!

Between the Frames - Evaluation of Various Motion Interpolation Algorithms to Improve 360° Video Quality

Between the Frames - Evaluation of Various Motion Interpolation Algorithms to Improve 360° Video Quality

22nd IEEE International Symposium on Multimedia. Online. December 2020

Stephan Fremerey, Frank Hofmeyer, Steve Göring, Dominik Keller, Alexander Raake

With the increasing availability of 360° video content, it becomes important to provide smoothly playing videos of high quality for end users. For this reason, we compare the influence of different Motion Interpolation (MI) algorithms on 360° video quality. After conducting a pre-test with 12 video experts, we found that MI is a useful tool to increase the QoE (Quality of Experience) of omnidirectional videos. As a result of the pretest, we selected three suitable MI algorithms, namely ffmpeg Motion Compensated Interpolation (MCI), Butterflow and Super-SloMo. Subsequently, we interpolated 15 entertaining and realworld omnidirectional videos with a duration of 20 seconds from 30 fps (original framerate) to 90 fps, which is the native refresh rate of the HMD used, the HTC Vive Pro. To assess QoE, we conducted two subjective tests with 24 and 27 participants. In the first test we used a Modified Paired Comparison (M-PC) method, and in the second test the Absolute Category Rating (ACR) approach. In the M-PC test, 45 stimuli were used and in the ACR test 60. Results show that for most of the 360° videos, the interpolated versions obtained significantly higher quality scores than the lower-framerate source videos, validating our hypothesis that motion interpolation can improve the overall video quality for 360° video. As expected, it was observed that the relative comparisons in the M-PC test result in larger differences in terms of quality. Generally, the ACR method lead to similar results, while reflecting a more realistic viewing situation. In addition, we compared the different MI algorithms and can conclude that with sufficient available computing power Super-SloMo should be preferred for interpolation of omnidirectional videos, while MCI also shows a good performance.

AVT lab nominated for the Thuringian Innovation Award

AVT lab nominated for the Thuringian Innovation Award

With a new method for quality prediction of video streams, the AVT department was among the three nominated candidates for the Thuringian Innovation Award 2020 in the category "Digital & Media".

Presentation video for the nomination of the department: Youtube, Download

The awards were presented on November 25, 2020 in Weimar.The prize in the category "Digital & Media" was finally awarded to rooom AG Jena for "EXPO-X, the platform for virtual and hybrid events".

Website of the Innovation Award (with all nominees and award winners)

CO-HUMANICS project adopted

CO-HUMANICS project adopted

University press release: Mit Augmented Reality und Robotik soziale Kontakte älterer Menschen stärken (English translation below)

The Carl-Zeiss-Stiftung will support the CO-HUMANICS project ("Co-Presence of Humans and Interactive Companions for Seniors") of the TU Ilmenau within the call "Durchbrüche" over the next five years.

The following groups are involved in this project:

  • Audiovisuelle Technik (AVT), Prof. Raake, Fak. EI (Speaker)
  • Elektronische  Medientechnik  (EMT),  Prof.  Brandenburg/Dr. Werner and successor,  Fak.  EI
  • Virtuelle Welten und Digitale Spiele (VWDS), Prof. Broll, Fak. WM
  • Medienpsychologie und Medienkonzeption (MPMK), Prof. Döring, Fak.  WM
  • Neuroinformatik  und  Kognitive  Robotik  (NEUROB),  Prof.  Groß,  Fak.  IA

In addition, the University Computer Centre (UniRZ), the Virtual Reality Competence Centre (KVR) and the Fraunhofer IDMT will contribute to the realisation.

Contribution by Mitteldeutscher Rundfunk as part of the "mdr Wissen" series: Mit Augmented Reality gegen Einsamkeit im Alter 


University press release

TU Ilmenau: Using Augmented Reality and Robotics to Strengthen Social Contacts of Elderly People

In a recently approved research project, the Technical University of Ilmenau will develop state-of-the-art methods to enable elderly people to have contact with familiar people from far away in their home environment. These people - relatives or friends, but also medical or nursing staff - will be "projected" into the environment using novel technical methods as if they were there themselves. The project is being funded by the Carl-Zeiss-Stiftung with up to 4.5 million euros over the next five years.

Especially senior citizens often find it difficult to maintain regular social contacts. After retirement, contacts with colleagues break off, children and relatives live far away, and for health reasons it becomes more difficult to visit fellow people. In addition, the corona pandemic is currently also restricting the opportunities for older people to maintain contact with relatives and friends. In the research project CO-HUMANICS ("Co-Presence of Humans and Interactive Companions for Seniors"), an interdisciplinary research team from Technische Universität Ilmenau and Thüringer Zentrums für Lernende Systeme und Robotik (Thuringian Center for Learning Systems and Robotics) will develop technical solutions from 2021 on to integrate remote people into the home environment as if they were there in person. Alexander Raake, Professor for Audiovisual Technology at TU Ilmenau, is the representative of the research team.

Based on augmented reality technologies, the team is developing innovative communication channels with which the “connected" subjects appear much more present to the elderly as realistic representations than is the case with conventional telephone or video calls. For example, they can reach out to their conversation partners if they are presented in an appropriate spatial manner. The so-called co-presence is not intended to replace interpersonal contact, but to promote and improve existing ties to familiar people over distances.

In addition to the new augmented reality technologies, the CO-HUMANICS project also wants to develop technical assistance systems for elderly people, which can help them to communicate with relatives or even medical care personnel at a distance and to receive assistance. These robots would be able to provide concrete help, for instance, in operating technical devices or to put a conversation partner in an optimal position for talking to the senior citizen. With the help of both systems - augmented reality and robotics - elderly people can communicate with people close to them in their home environment and receive support in everyday activities.

The CO-HUMANICS project is supported by the Carl-Zeiss-Stiftung within the framework of the "Durchbrüche 2020" (“Breakthroughs 2020”) funding program with up to 4.5 million euros over the next five years. The Carl-Zeiss-Stiftung has set itself the goal of creating scope for scientific breakthroughs. As a partner of excellent science, it supports both basic research and application-oriented research and teaching in the STEM- disciplines (Science Technology Engineering Mathematics). Founded in 1889 by the physicist and mathematician Ernst Abbe, the Carl-Zeiss-Stiftung is one of the oldest and largest private science-promoting foundations in Germany. It is the sole owner of Carl Zeiss AG and SCHOTT AG. Its projects are financed from the dividend distributions of the two foundation companies.

Contact us:
Professor Alexander Raake
Head of the Audiovisual Technology Group
Phone: +49 3677 69-1468

DFG-Project SoPhoAppeal started

DFG-Project SoPhoAppeal started

Recently the German Research Foundation (DFG) gave a positive assessment to a submitted project proposal about photo appeal and aesthetics.

Project description

The project SoPhoAppeal covers topics regarding image appeal and liking.
Starting with the development of an image dataset including likes, views and other social signals, crowd- and lab tests will be performed to analyze the connection of liking and aesthetic rating.
Furthermore models to predict such aesthetic ratings will be developed.
Read more in the project overview.

Realization of the large-scale device ILMETA started

Realization of the large-scale device ILMETA started

In the department, the implementation of the large-scale device ILMETA has been started, in which the first orders are triggered.

At the beginning of 2020, the German Research Foundation (DFG) gave a positive assessment of the department's application for the co-financing of the large-scale ILMETA facility. As a result, the DFG and the Free State of Thuringia are providing a total of €570,000 for the realisation of this infrastructure project.

Project description

ILMETA (Interconnected Lab for MEdia Technology Analytics) is a networked large-scale device for the investigation of audiovisual media technology systems. It is intended to enable research into the measurement and evaluation of systems for recording, signal processing, network technology (production, streaming) and reproduction, by means of data analysis (signals, meta/measurement data) and tests on human perception and quality of experience (QoE). Since all systems along the end-to-end chain aim at the best possible processing of media content (content: signals and meta data), content represents the essential measurement information. For various reasons (including copyright), high-quality content that is representative for the application is only available to a limited extent for research, especially for new immersive media formats such as 360° or high-resolution video. In order to be able to achieve the research objectives, the large-scale equipment consists of two closely interlinked components: (1) production infrastructure for the creation of high-quality research content, including current IP-based studio and measurement technology, (2) computing and storage infrastructure for the analysis of the resulting heterogeneous research data (signals, measurement/metadata) in order to gain insights for the improvement of media systems for different users.

AVT members win DASH Industry Forum Excellence Award in collaboration with TU Berlin, NTNU and TU Munich

AVT members win DASH Industry Forum Excellence Award in collaboration with TU Berlin, NTNU and TU Munich

Award certificate

This year's DASH Industry Forum Excellence in DASH Awards were presented at ACM MMSys 2020. The prizes were awarded for  "practical enhancements and developments which can sustain future commercial usefulness of DASH". The paper "Comparing Fixed and Variable Segment Durations for Adaptive Video Streaming – A Holistic Analysis" was written by Susanna Schwarzmann (TU Berlin), Nick Hainke (TU Berlin), Thomas Zinner (NTNU Norway), and Christian Sieber (TU Munich) together with Werner Robitza and Alexander Raake from the AVT group. The paper won the first prize in the ceremony.

More info about the awards can be found here ( The paper is available here (

Bitstream-based Model Standard for 4K/UHD: ITU-T P.1204.3 -- Model Details, Evaluation, Analysis and Open Source Implementation

Bitstream-based Model Standard for 4K/UHD: ITU-T P.1204.3 -- Model Details, Evaluation, Analysis and Open Source Implementation

Twelfth International Conference on Quality of Multimedia Experience (QoMEX). Athlone, Ireland. May 2020

Rakesh Rao Ramachandra Rao, Steve Göring, Werner Robitza, Alexander Raake, Bernhard Feiten, Peter List, and Ulf Wüstenhagen

With the increasing requirement of users to view high-quality videos with a constrained bandwidth, typically realized using HTTP-based adaptive streaming, it becomes more and more important to determine the quality of the encoded videos accurately, to assess and possibly optimize the overall streaming quality.
In this paper, we describe a bitstream-based no-reference video quality model developed as part of the latest model-development competition conducted by ITU-T Study Group 12 and the Video Quality Experts Group (VQEG), "P.NATS Phase 2''. It is now part of the new P.1204 series of Recommendations as P.1204.3.

It can be applied to bitstreams encoded with H.264/AVC, HEVC and VP9, using various encoding options, including resolution, bitrate, framerate and typical encoder settings such as number of passes, rate control variants and speeds.

The proposed model follows an ensemble-modelling--inspired approach with weighted parametric and machine-learning parts to efficiently leverage the performance of both approaches.
The paper provides details about the general approach to modelling, the
features used and the final feature aggregation.

The model creates per-segment and per-second video quality scores on the 5-point Absolute Category Rating scale, and is applicable to segments of 5--10 seconds duration.

It covers both PC/TV and mobile/tablet viewing scenarios. We outline the databases on which the model was trained and validated as part of the competition, and perform an additional evaluation using a total of four independently created databases, where resolutions varied from 360p to 2160p, and frame rates from 15--60fps, using realistic coding and bitrate settings.

We found that the model performs well on the independent dataset, with a Pearson correlation of 0.942 and an RMSE of 0.42. We also provide an open-source reference implementation of the described P.1204.3 model, as well as the multi-codec bitstream parser required to extract the input data, which is not part of the standard.

Are you still watching? Streaming Video Quality and Engagement Assessment in the Crowd

Are you still watching? Streaming Video Quality and Engagement Assessment in the Crowd

Twelfth International Conference on Quality of Multimedia Experience (QoMEX), May 26 - 28, 2020

Werner Robitza, Alexander M. Dethof, Steve Göring, Alexander Raake, André Beyer, Tim Polzehl

We present first results from a large-scale crowdsourcing study in which three major video streaming OTTs were compared across five major national ISPs in Germany. We not only look at streaming performance in terms of loading times and stalling, but also customer behavior (e.g., user engagement) and Quality of Experience based on the ITU-T P.1203 QoE model. We used a browser extension to evaluate the streaming quality and to passively collect anonymous OTT usage information based on explicit user consent. Our data comprises over 400,000 video playbacks from more than 2,000 users, collected throughout the entire year of 2019.

The results show differences in how customers use the video services, how the content is watched, how the network influences video streaming QoE, and how user engagement varies by service. Hence, the crowdsourcing paradigm is a viable approach for third parties to obtain streaming QoE insights from OTTs.

The paper was written together with the TU Ilmenau spin-off AVEQ GmbH, and the Berlin-based company Crowdee GmbH, and it can be downloaded here (

Prenc - Predict Number Of Video Encoding Passes With Machine Learning

Prenc - Predict Number Of Video Encoding Passes With Machine Learning

Twelfth International Conference on Quality of Multimedia Experience(QoMEX). Athlone, Ireland. May 2020

Steve Göring, Rakesh Rao Ramachandra Rao and Alexander Raake

Video streaming providers spend huge amounts of processing time to get a quality-optimized encoding.
While the quality-related impact may be known to the service provider, the impact on video quality is hard to assess, when no reference is available.

Here, bitstream-based video quality models may be applicable, delivering estimates that include encoding-specific settings. Such models typically use several input parameters, e.g. bitrate, framerate, resolution, video codec, QP values and more.

However, for a given bitstream, to determine which encoding parameters were selected, e.g., the number of encoding passes, is not a trivial task.

This leads to our following research question: Given an unknown video bitstream, which encoding settings have been used? To tackle this reverse engineering problem, we introduce a system called prenc.
Besides the use in video-quality estimation, such algorithms may also be used in other applications such as video forensics. We prove our concept by applying prenc to distinguish between one- and two-pass encoding.

Starting from modeling the problem as a classification task, estimating bitstream-based features, we further describe a machine learning approach with feature selection to automatically predict the number of encoding passes for a given video bitstream.

Our large-scale evaluation consists of 16 short movie type 4K videos that were segmented and encoded with different settings (resolutions, codecs, bitrates), so that we in total analyzed 131.976 DASH video segments.

We further show that our system is robust, based on a 50\% train and 50\% validation approach without source video overlapping, where we get a classification performance of 65\%~F1 score.
Moreover, we also describe the used bitstream-based features in detail, the feature pooling strategy and include other machine learning algorithms in our evaluation.

Development and evaluation of a test setup to investigate distance differences in immersive virtual environments

Development and evaluation of a test setup to investigate distance differences in immersive virtual environments

2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX), May 26 - 28, 2020

Stephan Fremerey, Muhammad Sami Suleman, Abdul Haq Azeem Paracha and Alexander Raake

Nowadays, with recent advances in virtual reality technology, it is easily possible to integrate real objects into virtual environments by creating an exact virtual replication and enabling interaction with them by mapping the obtained tracking data of the real to the virtual objects.

The primary goal of our study is to develop a system to investigate distance differences for near-field interaction in immersive virtual environments. In this context, the term distance difference refers to the shift between a real object and the respective replication of the real object in the virtual environment of the same size. This could occur for a number of reasons e.g. due to errors in motion tracking or mistakes in designing the virtual environment. Our virtual environment is developed using the Unity3D game engine, while the immersive contents were displayed on an HTC Vive Pro head-mounted display. The virtual room shown to the user includes a replication of the real testing lab environment, while one of the two real objects is tracked and mirrored to the virtual world using an HTC Vive Tracker.

Both objects are present in the real as well as in the virtual world. To find perceivable distance differences in the near-field, the actual task in the subjective test was to pick up one object and place it into another object.

The position of the static object in the virtual world is shifted by values between 0 and 4 cm, while the position of the real object is kept constant. The system is evaluated by conducting a subjective proof-of-concept test with 18 test subjects.

The distance difference is evaluated by the subjects through estimating perceived confusion on a modified 5-point absolute category rating scale. The study provides quantitative insights into allowable real-world vs. virtual-world mismatch boundaries for near-field interactions, with a threshold value of around 1 cm.

Link to the repository:

Let the Music Play: An Automated Test Setup for Blind Subjective Evaluation of Music Playback on Mobile Devices

Let the Music Play: An Automated Test Setup for Blind Subjective Evaluation of Music Playback on Mobile Devices

Twelfth International Conference on Quality of Multimedia Experience (QoMEX), May 2020

Keller, D.; Raake, A.; Vaalgamaa, M.; Paajanen, E.

Several methods for subjective evaluation for audio and speech have been standardized over the last years. However, with the advancement of mobile devices such as smartphones and Bluetooth speakers, people listen to music even outside their home environment, when traveling and in social situations. Conventional comparative methodologies are difficult to use for sound-quality evaluation of such devices, since subjects are likely to include other factors such as brand or design. Hence, we propose an automated test setup to evaluate music and audio playback of portable devices with subjects without revealing the devices or interfering with the tests. Furthermore, an identical placement of the devices in front of the listener is crucial to accommodate the individual acoustic directivity of the device. For this purpose, we use a large motorized turntable on which the devices are mounted so that the playback device is automatically moved to the defined position in advance. An enhanced version of rating software avrateNG enables the automatic playout of musical pieces and appropriate turning of the devices to face the listeners. Devices that can automatically be tested using our setup include Android and iOS smartphones, as well as Bluetooth and wired portable speakers. Preliminary user tests were conducted to verify the practical applicability and stability of the proposed setup.

SiSiMo: Towards Simulator Sickness Modeling for 360° Videos Viewed with an HMD

SiSiMo: Towards Simulator Sickness Modeling for 360° Videos Viewed with an HMD

27th IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW), March 2020, Atlanta, USA

A. Raake, A. Singla, R. R. R. Rao, W. Robitza and F. Hofmeyer

Users may experience symptoms of simulator sickness while watching 360°/VR videos with Head-Mounted Displays (HMDs). At present, practically no solution exists that can efficiently eradicate the symptoms of simulator sickness from virtual environments. Therefore, in the absence of a solution, it is required to at least quantify the amount of sickness. In this paper, we present initial work on our Simulator Sickness Model SiSiMo including a first component to predict simulator sickness scores over time. Using linear regression of short term scores already shows promising performance for predicting the scores collected from a number of user tests.



The project is a scientific cooperation between the labs of Audiovisual Technology, Virtual Worlds and Digital Games (Prof. Wolfgang Broll, Faculty of Economics and Media) and Electronic Media Technology (Prof. Karlheinz Brandenburg, Faculty of Electrical Engineering and Information Technology).

The project, funded by the Free State of Thuringia, was co-financed by the European Union within the European Regional Development Fund (ERDF).

Project Website

ITU-T standard P.1204 for predicting video quality developed

ITU-T standard for predicting video quality developed with the significant participation of the AVT department

ITU-T recently consented the P.1204 series of Recommendations titled “Video quality assessment of streaming services over reliable transport for resolutions up to 4K”. This work was jointly conducted by Question 14 of Study Group 12 (SG12/Q14) of the ITU-T and the Video Quality Experts Group (VQEG). Overall 9 companies and universities were part of this competition-based development, with the best set of models recommended as standards.

From the official ITU-T SG12 communication it reads:

"The P.1204 Recommendation series describes a set of objective video quality models. These can be used standalone for assessing video quality for 5-10 sec long video sequences, providing a 5-point ACR-type Mean Opinion Score (MOS) output. In addition, they deliver per-1-second MOS-scores that together with audio information and stalling / initial loading data can be used to form a complete model to predict the impact of audio and video media encodings and observed IP network impairments on quality experienced by the end-user in multimedia streaming applications. The addressed streaming techniques comprise progressive download as well as adaptive streaming, for both mobile and fixed network streaming applications."

To date, the P.1204 series of Recommendations comprises four sub-recommendations, namely P.1204 (an introductory document for the whole P.1204 series), P.1204.3 (bitstream-based model with full access to bitstream), P1204.4 (reference-/pixel-based model) and P1204.5 (hybrid bitstream- and pixel-based no-reference) with 2 more sub-recommendations, P1204.1 (meta-data-based) and P1204.2 (meta-data- and video-frame-information-based) planned to be consented by April 2020.

The AVT group of TU Ilmenau in collaboration with Deutsche Telekom were the sole winners in the category which resulted in Recommendation P1204.3 and are co-winners in the category which is planned to result in Recommendations P1204.1 and P1204.2 by April 2020.

In the official ITU-T SG12 communication it is further stated that: 

"The consent of the P.1204 model standards marks the first time that video-quality models of all relevant types have been developed and validated within the same standardization campaign. The respective “P.NATS Phase 2” model competition used a total of 13 video-quality test databases for training, and another 13 video-quality test databases for validation. With this comparatively high number of data (more than 5000 video sequences), the resulting standards deliver class-leading video-quality prediction performance."

The published ITU standards:





The building blocks of the consented Recommendation

Older News

Older news from the AVT lab can be found on this website.

Offers for theses in the AVT Lab

Now you can inform yourself directly about the range of topics for bachelor and master theses as well as for media projects on our website .

Take a look under the point Theses!