http://www.tu-ilmenau.de

Logo TU Ilmenau


Ihre Position

INHALTE

Fachgebiet Audiovisuelle Technik

Das Fachgebiet Audiovisuelle Technik beschäftigt sich mit der Funktion, Anwendung und Wahrnehmung von Audio- und Videotechnik. Dabei liegt ein wesentlicher Schwerpunkt auf der Erforschung des Zusammenhangs zwischen technischen Systemeigenschaften und der menschlichen Wahrnehmung sowie dem Nutzererleben (“Quality of Experience”).

weitere Informationen zum Fachgebiet

Stellenangebot Wissenschaftliche Mitarbeiter*in / Doktorand*in

Im Fachgebiet Audiovisuelle Technik (AVT) ist ab Januar 2021 eine Stelle als wissenschaftliche/r Mitarbeiter*in / Doktorand*in „Audiovisuelle Kommunikation in Mixed-Reality (AR, VR) Umgebungen“ zu besetzten.

Stellenausschreibung (PDF)

Bewerbungen bis 15.10.2020

Twitter

Neuigkeiten aus dem Fachgebiet erfahren Sie auch über unseren Twitter-Kanal.

https://twitter.com/avt_imt

Neuigkeiten

DFG-Projekt SoPhoAppeal gestartet

DFG-Projekt SoPhoAppeal gestartet

Kürzlich hat die Deutsche Forschungsgemeinschaft (DFG) einen eingereichten Projektantrag hinsichtlich Fotoappeal und Ästhetik angenommen.

Projektbeschreibung

Das Projekt SoPhoAppeal befasst sich mit Themen rund um Image-Appeal und Liking.
Ausgehend von der Entwicklung eines Bilddatensatzes mit likes, views und anderen sozialen Signalen werden crowd- und Labortests durchgeführt, um den Zusammenhang von Liking und ästhetischer Bewertung zu analysieren.
Darüber hinaus werden Modelle zur Vorhersage solcher ästhetischen Bewertungen entwickelt.
Lesen Sie mehr in der Projektübersicht.

Workshop Digital Broadcasting 2020

Workshop Digital Broadcasting 2020

Programm jetzt online

Das Programm zum 16. Workshop Digital Broadcasting WSDB am 23. und 24. September 2020 in Erfurt ist online. Die Teilnehmerinnen und Teilnehmer erwartet ein vielseitiges Programm mit Vorträgen und Erfahrungsberichten rund um das Thema Digitaler Rundfunk und KI im Medienbereich. Eröffnet wird der zweitägige Workshop mit einer Keynote vom Pionier des digitalen Fernsehens, Prof. Ulrich Reimers. Sollte eine vor Ort Veranstaltung in Erfurt aufgrund der aktuellen Umstände nicht möglich sein, wird der WSDB 2020 als Online-Workshop durchgeführt.

Weitere Informationen zum Programm und zur Anmeldung unter www.idmt.fraunhofer.de/wsdb2020

Realisierung des Großgerätes ILMETA gestartet

Realisierung des Großgerätes ILMETA gestartet

Im Fachgebiet wurde mit der Realisierung des Großgerätes ILMETA begonnen, in dem die ersten Aufträge ausgelöst werden.

Zu Beginn des Jahres 2020 hatte die Deutsche Forschungsgemeinschaft (DFG) den Antrag des Fachgebiets zur Co-Finanzierung des Großgerätes ILMETA positiv begutachtet. Die DFG und der Freistaat Thüringen stellen daraufhin insgesamt 570.000 € für die Realisierung dieses Infrastruktur-Projektes bereit.

Projektbeschreibung

ILMETA (Interconnected Lab for MEdia Technology Analytics) ist ein vernetztes Großgerät zur Untersuchung audiovisueller Medientechnologiesysteme. Damit soll Forschung ermöglicht werden zur Vermessung und Evaluierung von Systemen der Aufnahme, Signalverarbeitung, Netztechnik (Produktion, Streaming) und Wiedergabe, mittels Datenanalyse (Signale, Meta-/Messdaten) sowie Tests zur menschlichen Wahrnehmung und Quality of Experience (QoE). Da alle Systeme entlang der Ende-zu-Ende-Kette auf eine möglichst optimale Verarbeitung der medialen Inhalte (Contents: Signale und Metadaten) abzielen, stellt Content die wesentliche Messinformation dar. Hochqualitativer und für die Anwendung repräsentativer Content steht für die Forschung aus verschiedenen Gründen (u.a. Urheberrecht) nur eingeschränkt zur Verfügung, vor allem für neue immersive Medienformate wie 360°- oder hochauflösendes Video. Um die angestrebten Forschungsziele erreichen zu können, besteht das Großgerät aus zwei eng verzahnten Komponenten: (1) Produktionsinfrastruktur zur Erstellung von hochqualitativem Forschungs-Content, inklusive aktueller IP-basierter Studio- und Messtechnik, (2) Rechen- und Speicherinfrastruktur zur Analyse der resultierenden heterogenen Forschungsdaten (Signale, Mess-/Metadaten), um daraus Erkenntnisse für die Verbesserung von Mediensystemen für unterschiedliche Nutzer zu gewinnen.

Struktur des Großgerätes

AVT-Mitarbeiter gewinnen in Zusammenarbeit mit Kollegen der TU Berlin, der NTNU und der TU München den "DASH Industry Forum Excellence Award"

AVT-Mitarbeiter gewinnen in Zusammenarbeit mit Kollegen der TU Berlin, der NTNU und der TU München den "DASH Industry Forum Excellence Award"

Preis-Urkunde

Die diesjährigen Preise "The DASH Industry Forum Excellence in DASH Awards" wurden auf der ACM MMSys 2020 verliehen. Die Preise wurden für "praktische Verbesserungen und Entwicklungen vergeben, die den zukünftigen kommerziellen Nutzen von DASH unterstützen können". Die Arbeit "Comparing Fixed and Variable Segment Durations for Adaptive Video Streaming – A Holistic Analysis" wurde von Susanna Schwarzmann (TU Berlin), Nick Hainke (TU Berlin), Thomas Zinner (NTNU Norwegen) und Christian Sieber (TU München) gemeinsam mit Werner Robitza und Alexander Raake von der AVT-Gruppe verfasst. Das Papier gewann den ersten Preis.

Weitere Informationen zu den Auszeichnungen finden Sie hier (https://multimediacommunication.blogspot.com/2020/06/dash-if-awarded-excellence-in-dash.html). Der Artikel ist hier verfügbar (https://dl.acm.org/doi/abs/10.1145/3339825.3391858).

New article: Bitstream-based Model Standard for 4K/UHD: ITU-T P.1204.3 -- Model Details, Evaluation, Analysis and Open Source Implementation

New article: Bitstream-based Model Standard for 4K/UHD: ITU-T P.1204.3 -- Model Details, Evaluation, Analysis and Open Source Implementation

Twelfth International Conference on Quality of Multimedia Experience (QoMEX). Athlone, Ireland. May 2020

Rakesh Rao Ramachandra Rao, Steve Göring, Werner Robitza, Alexander Raake, Bernhard Feiten, Peter List, and Ulf Wüstenhagen

With the increasing requirement of users to view high-quality videos with a constrained bandwidth, typically realized using HTTP-based adaptive streaming, it becomes more and more important to determine the quality of the encoded videos accurately, to assess and possibly optimize the overall streaming quality.
In this paper, we describe a bitstream-based no-reference video quality model developed as part of the latest model-development competition conducted by ITU-T Study Group 12 and the Video Quality Experts Group (VQEG), "P.NATS Phase 2''. It is now part of the new P.1204 series of Recommendations as P.1204.3.

It can be applied to bitstreams encoded with H.264/AVC, HEVC and VP9, using various encoding options, including resolution, bitrate, framerate and typical encoder settings such as number of passes, rate control variants and speeds.

The proposed model follows an ensemble-modelling--inspired approach with weighted parametric and machine-learning parts to efficiently leverage the performance of both approaches.
The paper provides details about the general approach to modelling, the
features used and the final feature aggregation.

The model creates per-segment and per-second video quality scores on the 5-point Absolute Category Rating scale, and is applicable to segments of 5--10 seconds duration.

It covers both PC/TV and mobile/tablet viewing scenarios. We outline the databases on which the model was trained and validated as part of the competition, and perform an additional evaluation using a total of four independently created databases, where resolutions varied from 360p to 2160p, and frame rates from 15--60fps, using realistic coding and bitrate settings.

We found that the model performs well on the independent dataset, with a Pearson correlation of 0.942 and an RMSE of 0.42. We also provide an open-source reference implementation of the described P.1204.3 model, as well as the multi-codec bitstream parser required to extract the input data, which is not part of the standard.

New article: Are you still watching? Streaming Video Quality and Engagement Assessment in the Crowd

New article: Are you still watching? Streaming Video Quality and Engagement Assessment in the Crowd

Twelfth International Conference on Quality of Multimedia Experience (QoMEX), May 26 - 28, 2020

Werner Robitza, Alexander M. Dethof, Steve Göring, Alexander Raake, André Beyer, Tim Polzehl

We present first results from a large-scale crowdsourcing study in which three major video streaming OTTs were compared across five major national ISPs in Germany. We not only look at streaming performance in terms of loading times and stalling, but also customer behavior (e.g., user engagement) and Quality of Experience based on the ITU-T P.1203 QoE model. We used a browser extension to evaluate the streaming quality and to passively collect anonymous OTT usage information based on explicit user consent. Our data comprises over 400,000 video playbacks from more than 2,000 users, collected throughout the entire year of 2019.

The results show differences in how customers use the video services, how the content is watched, how the network influences video streaming QoE, and how user engagement varies by service. Hence, the crowdsourcing paradigm is a viable approach for third parties to obtain streaming QoE insights from OTTs.

The paper was written together with the TU Ilmenau spin-off AVEQ GmbH, and the Berlin-based company Crowdee GmbH, and it can be downloaded here (https://aveq.info/resources/).

New article: Prenc - Predict Number Of Video Encoding Passes With Machine Learning

New article: Prenc - Predict Number Of Video Encoding Passes With Machine Learning

Twelfth International Conference on Quality of Multimedia Experience(QoMEX). Athlone, Ireland. May 2020

Steve Göring, Rakesh Rao Ramachandra Rao and Alexander Raake

Video streaming providers spend huge amounts of processing time to get a quality-optimized encoding.
While the quality-related impact may be known to the service provider, the impact on video quality is hard to assess, when no reference is available.

Here, bitstream-based video quality models may be applicable, delivering estimates that include encoding-specific settings. Such models typically use several input parameters, e.g. bitrate, framerate, resolution, video codec, QP values and more.

However, for a given bitstream, to determine which encoding parameters were selected, e.g., the number of encoding passes, is not a trivial task.

This leads to our following research question: Given an unknown video bitstream, which encoding settings have been used? To tackle this reverse engineering problem, we introduce a system called prenc.
Besides the use in video-quality estimation, such algorithms may also be used in other applications such as video forensics. We prove our concept by applying prenc to distinguish between one- and two-pass encoding.

Starting from modeling the problem as a classification task, estimating bitstream-based features, we further describe a machine learning approach with feature selection to automatically predict the number of encoding passes for a given video bitstream.

Our large-scale evaluation consists of 16 short movie type 4K videos that were segmented and encoded with different settings (resolutions, codecs, bitrates), so that we in total analyzed 131.976 DASH video segments.

We further show that our system is robust, based on a 50\% train and 50\% validation approach without source video overlapping, where we get a classification performance of 65\%~F1 score.
Moreover, we also describe the used bitstream-based features in detail, the feature pooling strategy and include other machine learning algorithms in our evaluation.

New article: Development and evaluation of a test setup to investigate distance differences in immersive virtual environments

New article: Development and evaluation of a test setup to investigate distance differences in immersive virtual environments

2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX), May 26 - 28, 2020

Stephan Fremerey, Muhammad Sami Suleman, Abdul Haq Azeem Paracha and Alexander Raake

Nowadays, with recent advances in virtual reality technology, it is easily possible to integrate real objects into virtual environments by creating an exact virtual replication and enabling interaction with them by mapping the obtained tracking data of the real to the virtual objects.

The primary goal of our study is to develop a system to investigate distance differences for near-field interaction in immersive virtual environments. In this context, the term distance difference refers to the shift between a real object and the respective replication of the real object in the virtual environment of the same size. This could occur for a number of reasons e.g. due to errors in motion tracking or mistakes in designing the virtual environment. Our virtual environment is developed using the Unity3D game engine, while the immersive contents were displayed on an HTC Vive Pro head-mounted display. The virtual room shown to the user includes a replication of the real testing lab environment, while one of the two real objects is tracked and mirrored to the virtual world using an HTC Vive Tracker.

Both objects are present in the real as well as in the virtual world. To find perceivable distance differences in the near-field, the actual task in the subjective test was to pick up one object and place it into another object.

The position of the static object in the virtual world is shifted by values between 0 and 4 cm, while the position of the real object is kept constant. The system is evaluated by conducting a subjective proof-of-concept test with 18 test subjects.

The distance difference is evaluated by the subjects through estimating perceived confusion on a modified 5-point absolute category rating scale. The study provides quantitative insights into allowable real-world vs. virtual-world mismatch boundaries for near-field interactions, with a threshold value of around 1 cm.

Link to the repository: https://github.com/Telecommunication-Telemedia-Assessment/distance_differences_ives

Neuer Artikel: Let the Music Play: An Automated Test Setup for Blind Subjective Evaluation of Music Playback on Mobile Devices

Neuer Artikel: Let the Music Play: An Automated Test Setup for Blind Subjective Evaluation of Music Playback on Mobile Devices

Twelfth International Conference on Quality of Multimedia Experience (QoMEX), May 2020

Keller, D.; Raake, A.; Vaalgamaa, M.; Paajanen, E.

In den letzten Jahren wurden mehrere Methoden zur subjektiven Bewertung für Audio und Sprache standardisiert. Mit der Weiterentwicklung mobiler Geräte wie Smartphones und Bluetooth-Lautsprecher hören Menschen jedoch auch außerhalb ihrer häuslichen Umgebung, auf Reisen und in sozialen Situationen Musik. Herkömmliche Vergleichsmethoden sind für die Beurteilung der Klangqualität solcher Geräte schwer anzuwenden, da die Probanden wahrscheinlich andere Faktoren wie Marke oder Design mit einbeziehen. Daher schlagen wir einen automatisierten Testaufbau vor, um die Musik- und Audiowiedergabe von tragbaren Geräten mit Testpersonen zu bewerten, ohne die Geräte zu enthüllen oder die Tests zu stören. Darüber hinaus ist eine identische Platzierung der Geräte vor dem Zuhörer von entscheidender Bedeutung, um der individuellen akustischen Richtwirkung des Geräts Rechnung zu tragen. Zu diesem Zweck verwenden wir einen großen motorisierten Drehtisch, auf dem die Geräte so montiert sind, dass das Wiedergabegerät im Voraus automatisch in die definierte Position gebracht wird. Eine erweiterte Version der Rating-Software avrateNG ermöglicht die automatische Ausspielung von Musikstücke und entsprechendes Drehen der Geräte zum Zuhörer hin. Zu den Geräten, die mit unserer Einrichtung automatisch getestet werden können, gehören Android- und iOS-Smartphones sowie Bluetooth und kabelgebundene tragbare Lautsprecher. Vorläufige Benutzertests wurden durchgeführt, um die praktische Anwendbarkeit und Stabilität des vorgeschlagenen Aufbaus zu überprüfen.

SiSiMo: Towards Simulator Sickness Modeling for 360° Videos Viewed with an HMD

SiSiMo: Towards Simulator Sickness Modeling for 360° Videos Viewed with an HMD

27th IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW), March 2020, Atlanta, USA

A. Raake, A. Singla, R. R. R. Rao, W. Robitza and F. Hofmeyer

Users may experience symptoms of simulator sickness while watching 360°/VR videos with Head-Mounted Displays (HMDs). At present, practically no solution exists that can efficiently eradicate the symptoms of simulator sickness from virtual environments. Therefore, in the absence of a solution, it is required to at least quantify the amount of sickness. In this paper, we present initial work on our Simulator Sickness Model SiSiMo including a first component to predict simulator sickness scores over time. Using linear regression of short term scores already shows promising performance for predicting the scores collected from a number of user tests.

Projekt CYTEMEX

Projekt CYTEMEX

Das Projekt ist eine wissenschaftliche Kooperation der Fachgebiete Audiovisuelle Technik, Virtuelle Welten und Digitale Spiele (Prof. Wolfgang Broll, Fakultät Wirtschaftswissenschaften und Medien) und Elektronische Medientechnik (Prof. Karlheinz Brandenburg, Fakultät Elektrotechnik und Informationstechnik).

Das vom Freistaat Thüringen geförderte Vorhaben wurde durch Mittel der Europäischen Union im Rahmen des Europäischen Fonds für regionale Entwicklung (EFRE) kofinanziert.

Projekt-Webseite

ITU-T-Standard P.1204 zur Vorhersage der Videoqualität

ITU-T-Standard zur Vorhersage der Videoqualität unter maßgeblicher Beteiligung des Fachgebietes AVT entwickelt

ITU-T recently consented the P.1204 series of Recommendations titled “Video quality assessment of streaming services over reliable transport for resolutions up to 4K”. This work was jointly conducted by Question 14 of Study Group 12 (SG12/Q14) of the ITU-T and the Video Quality Experts Group (VQEG). Overall 9 companies and universities were part of this competition-based development, with the best set of models recommended as standards.

From the official ITU-T SG12 communication it reads:

"The P.1204 Recommendation series describes a set of objective video quality models. These can be used standalone for assessing video quality for 5-10 sec long video sequences, providing a 5-point ACR-type Mean Opinion Score (MOS) output. In addition, they deliver per-1-second MOS-scores that together with audio information and stalling / initial loading data can be used to form a complete model to predict the impact of audio and video media encodings and observed IP network impairments on quality experienced by the end-user in multimedia streaming applications. The addressed streaming techniques comprise progressive download as well as adaptive streaming, for both mobile and fixed network streaming applications."

To date, the P.1204 series of Recommendations comprises four sub-recommendations, namely P.1204 (an introductory document for the whole P.1204 series), P.1204.3 (bitstream-based model with full access to bitstream), P1204.4 (reference-/pixel-based model) and P1204.5 (hybrid bitstream- and pixel-based no-reference) with 2 more sub-recommendations, P1204.1 (meta-data-based) and P1204.2 (meta-data- and video-frame-information-based) planned to be consented by April 2020.

The AVT group of TU Ilmenau in collaboration with Deutsche Telekom were the sole winners in the category which resulted in Recommendation P1204.3 and are co-winners in the category which is planned to result in Recommendations P1204.1 and P1204.2 by April 2020.

In the official ITU-T SG12 communication it is further stated that: 

The consent of the P.1204 model standards marks the first time that video-quality models of all relevant types have been developed and validated within the same standardization campaign. The respective “P.NATS Phase 2” model competition used a total of 13 video-quality test databases for training, and another 13 video-quality test databases for validation. With this comparatively high number of data (more than 5000 video sequences), the resulting standards deliver class-leading video-quality prediction performance.

The published ITU standards:

P.1204: https://www.itu.int/rec/T-REC-P.1204-202001-P/en

P.1204.3: https://www.itu.int/rec/T-REC-P.1204.3-202001-P/en

 

 

The building blocks of the consented Recommendation

cencro – Speedup of Video Quality Calculation using Center Cropping

21st IEEE International Symposium on Multimedia (2019 IEEE ISM), Dec 9 - 11, 2019, San Diego, USA

Steve Göring, Christopher Krämmer, Alexander Raake

cencro – Speedup of Video Quality Calculation using Center Cropping

Today's video streaming providers, e.g. Youtube, Netflix or Amazon Prime, are able to deliver high resolution and high-quality content to end users. To optimize video quality and to reduce transmission bandwidth, new encoders and smarter encoding schemes are required. Encoding optimization forms an important part of this effort in reducing bandwidth and results in saving considerable amount of bitrate. For such optimization, accurate and computationally fast video quality models are required, e.g. Netflix's VMAF. However, VMAF is a full-reference (FR) metric, and the calculation of such metrics tend to be slower in comparison to other metrics, due to the amount of data that needs to be processed, especially for high resolutions of 4k and beyond.

We introduce an approach to speed up video quality metric calculations in general. We use VMAF as an example with a video database up to 4K resolution videos, to show that our approach works well.
Our main idea is that we reduce each frame of the reference and distorted video based on a center crop of the frame, assuming that most important visual information are presented in the middle of most typical videos. In total we analyze 18 different crop settings and compare our results with uncropped VMAF values and subjective scores. We show that this approach -- named cencro -- is able to save up to 95% computation time, with just an overall error of 4% considering a 360p center crop.

Furthermore, we checked other full-reference metrics, and show that cencro performs similar good. As a last evaluation, we apply our approach to full-hd gaming videos, also in this scenario cencro can be successfully applied.

The idea behind cencro is not restricted to full-reference models and can also be applied to other type of video quality models or datasets, or even for higher resolution videos such as 8K.

Link to the source code: https://git.io/JeR5q

AVT-VQDB-UHD-1: A Large Scale Video Quality Database for UHD-1

21st IEEE International Symposium on Multimedia (2019 IEEE ISM), Dec 9 - 11, 2019, San Diego, USA

Rakesh Rao Ramachandra Rao, Steve Göring, Werner Robitza, Bernhard Feiten, Alexander Raake

AVT-VQDB-UHD-1: A Large Scale Video Quality Database for UHD-1

4K television screens or even with higher resolutions are currently available in the market.Moreover video streaming providers are able to stream videos in 4K resolution and beyond.Therefore, it becomes increasingly important to have a proper understanding of video quality especially in case of 4K videos. To this effect, in this paper, we present a study of subjective and objective quality assessment of 4K ultra-high-definition videos of short duration, similar to DASH segment lengths.

As a first step, we conducted four subjective quality evaluation tests for compressed versions of the 4K videos. The videos were encoded using three different video codecs, namely H.264, HEVC, and VP9. The resolutions of the compressed videos ranged from 360p to 2160p with framerates varying from 15fps to 60fps. All the source 4K contents used were of 60fps. We included low-quality conditions in terms of bitrate, resolution and framerate to ensure that the tests cover a wide range of conditions, and that e.g. possible models trained on this data are more general and applicable to a wider range of real world applications. The results of the subjective quality evaluation are analyzed to assess the impact of different factors such as bitrate, resolution, framerate, and content.

In the second step, different state-of-the-art objective quality models were applied to all videos and their performance was analyzed in comparison with the subjective ratings, e.g. using Netflix's VMAF. The videos, subjective scores, both MOS and confidence interval per sequence and objective scores are made public for use by the community for further research.

Link to the videos:

Angebote für Abschlussarbeiten im Fachgebiet AVT

Auf unserer Webseite können Sie sich jetzt direkt über das Angebot an Themen für Bachelor- und Masterarbeiten sowie für Medienprojekte informieren.

Schauen Sie unter dem Punkt Abschlussarbeiten nach!