Logo TU Ilmenau

You are here


Audiovisual Technology Group

The Audiovisual Technology Group (AVT) deals with the function, application and perception of audio and video equipment. An essential focus of the research is on the relationship between the technical characteristics of audio, video and audiovisual systems and human perception and experience (“Quality of Experience”, QoE).

further information on the group


Recent publications from the group

7th European Workshop on Visual Information Processing (EUVIP), Tampere (Finland), 26 - 28 November 2018 (

Steve Göring, Alexander Raake

deimeq – A Deep Neural Network Based Hybrid No-reference Image Quality Model

Current no reference image quality assessment models are mostly based on hand-crafted features (signal, computer vision, . . . ) or deep neural networks. Using DNNs for image quality prediction leads to several problems, e.g. the input size is restricted; higher resolutions will increase processing time and memory consumption. Large inputs are handled by image patching and aggregation a quality score. In a pure patching approach connections between the sub-images are getting lost.

Also, a huge dataset is required for training a DNN from scratch, though only small datasets with annotations are available. We provide a hybrid solution (deimeq) to predict image quality using

DNN feature extraction combined with random forest models. Firstly, deimeq uses a pre-trained DNN for feature extraction in a hierarchical sub-image approach, this avoids a huge training dataset. Further, our proposed sub-image approach circumvents a pure patching, because of hierarchical connections between the sub-images. Secondly, deimeq can be extended using signal-based features from state-of-the art models. To evaluate our approach, we choose a strict cross-dataset evaluation with the Live-2 and TID2013 datasets with several pre-trained DNNs. Finally, we show that deimeq and variants of it perform better or similar than other methods.

Picture: General approach for a classification in HD and UHD


Human Vision and Electronic Imaging 2019, Burlingame (California USA), 13 - 17  January 2019 (

Steve Göring, Julian Zebelein, Simon Wedel, Dominik Keller, Alexander Raake

Analyze And Predict the Perceptibility of UHD Video Contents

720p, Full-HD, 4K, 8K, ..., display resolutions are increasing heavily over the past time. However many video streaming providers are currently streaming videos with a maximum of 4K/UHD-1 resolution. Considering that normal video viewers are enjoying their videos in typical living rooms, where viewing distances are quite large, the question arises if more resolution is even recognizable. In the following paper we will analyze the problem of UHD perceptibility in comparison with lower resolutions. As a first step, we conducted a subjective video test, that focuses on short uncompressed video sequences and compares two different testing methods for pairwise discrimination of two representations of the same source video in different resolutions.

We selected an extended stripe method and a temporal switching method. We found that the temporal switching is more suitable to recognize UHD video content. Furthermore, we developed features, that can be used in a machine learning system to predict whether there is a benefit in showing a given video in UHD or not.

Evaluating different models based on these features for predicting perceivable differences shows good performance on the available test data. Our implemented system can be used to verify UHD source video material or to optimize streaming applications.

Offers for theses in the AVT Lab

Now you can inform yourself directly about the range of topics for bachelor and master theses as well as for media projects on our website .

Take a look under the point Theses!

Preis für P. Lebreton, S. Fremerey und A. Raake

Im Rahmen der „Grand Challenge on Salient 360!“ auf der diesjährigen IEEE International Conference on Multimedia and Expo (ICME) in San Diego erhielten Dr. Pierre Lebreton (früher TU Ilmenau, jetzt Zhejiang Universitiy, China), Stephan Fremerey und Prof. Dr. Alexander Raake den Preis für den zweiten Platz in der Kategorie „Prediction of Head Saliency for Images“ und den vierten Platz in der Kategorie „Prediction of Head Saliency for Videos“. (

Example for a heatmap showing the head rotation data captured for all participants for one sequence
Image Appeal Prediction Pipeline
YouTube quality according to ITU-T P.1203 dependent on download speed and other factors.

New work on VR, image appeal, and video streaming presented at IEEE QoMEX, ACM MMSys

At this year’s ACM MMSys conference in Amsterdam, Stephan Fremerey has published a study and a related Open Source dataset and software. The study was done in collaboration with ARTE G.E.I.E. while the research goal was to get insights into the exploration behavior of 48 participants watching 20 different 30s long 360° videos in a task-free scenario. The dataset containing the Simulator Sickness Questionnaire scores, the head rotation data and the software to record and evaluate the respective data are published as Open Source (see

Steve Göring’s work on image appeal was presented at IEEE QoMEX. The core idea of the paper is to train a model for image liking prediction based on a crawled dataset of 80k images from a photo sharing platform. The used features are based on the image, social network analysis, comment analysis and other provided meta-data of images from such platforms.

On the topic of video streaming, Werner Robitza has published a study at QoMEX. The research was conducted in collaboration with Deutsche Telekom, where the quality of YouTube streams was measured under different bandwidth conditions. The paper shows the influence of different measurement scenarios on the measured key performance indicators and quality according to ITU-T P.1203, a standard for HTTP Adaptive Streaming quality estimation.

Together with academic and industry partners from Ericsson, Deutsche Telekom, TU Berlin, NTT Corporation, and NETSCOUT Systems, an open dataset and software for the ITU-T P.1203 standard was published at the ACM MMSys conference. The software can be used freely for non-commercial research purposes. You can find out more about the group’s work on P.1203 on this page.

Prof. Dr. Alexander Raake in conversation with the Thuringian Prime Minister Bodo Ramelow (Photo: M. Döring)

IMT at the summer party of the Thuringian State Representation in Berlin

As in the previous year, the department was represented with demonstrations of its current research topics at the summer party of the Thuringian State Representation in Berlin. At the event in the afternoon and evening of June 26, 2018, interested visitors were able to find out about techniques of virtual reality and the mechanisms used for quality estimation at the booth of the TU Ilmenau. Also, after watching 360° videos, they could see which parts of the video they had actually explored.

Prominent visitor and interested guest was Thuringian Minister-President Bodo Ramelow. Prof. Raake explained the research goals in the field of VR technology.

Prof. Raake was supported during the event by Stephan Fremerey, Matthias Döring and Paul Krabbes from the Institute for Media Technology.

Prof. Schade (center) with the managing director Jürgen Burghardt and the chairman Siegfried Fößel of the FKTG

FKTG Honorary Membership for Prof. Schade

Prof. Dr.-Ing. Hans-Peter Schade, who was the head of the AVT laboratory from 2002 to 2015, was appointed honorary member of the Fernseh- und Kinotechnischen Gesellschaft (FKTG) - Television and Cinematographic Society - at the 28th FKTG Conference on June 4-6, 2018 in Nuremberg.

STEEM project was finished

In May 2018, the STEEM project (Speech Transmission End-to-End Monitoring) was successfully completed. Based on a large number of conversational tests, in the lab has developed an improved model for predicting the perceived quality of IP telephony. In addition to existing models, the influence of background noise and terminal device characteristics were included in the model. This model was handed over to the project partner HEAD acoustics. The solution will find its way into the products of the company in order to better enable network operators and component manufacturers to predict the voice quality perceived by the customer and to optimize their products and services with regard to customer requirements.

Partial results of the conversational tests carried out in Ilmenau were presented at the DAGA 2018. From this publication also comes the following figure.

Conversation test in the AVT test lab with different terminals and a system for defined background noise.

Older News

Older news from the AVT lab can be found on this website.