Publications of the Department of Audiovisual Technology

The following list (automatically generated by the University Library) contains the publications from the year 2016. The publications up to the year 2015 can be found on an extra page.

Note: If you want to search through all the publications, select "Show All" and then you can use the browser search with Ctrl+F.

Results: 122
Created on: Tue, 29 Nov 2022 23:02:18 +0100 in 0.0281 sec

Robotham, Thomas; Singla, Ashutosh; Rummukainen, Olli S.; Raake, Alexander; Habets, Emanuel A.P.
Audiovisual database with 360˚ video and higher-order Ambisonics audio for perception, cognition, behavior, and QoE evaluation research. - In: IEEE Xplore digital library, ISSN 2473-2001, (2022), insges. 6 S.

Research into multi-modal perception, human cognition, behavior, and attention can benefit from high-fidelity content that may recreate real-life-like scenes when rendered on head-mounted displays. Moreover, aspects of audiovisual perception, cognitive processes, and behavior may complement questionnaire-based Quality of Experience (QoE) evaluation of interactive virtual environments. Currently, there is a lack of high-quality open-source audiovisual databases that can be used to evaluate such aspects or systems capable of reproducing high-quality content. With this paper, we provide a publicly available audiovisual database consisting of twelve scenes capturing real-life nature and urban environments with a video resolution of 7680×3840 at 60 frames-per-second and with 4th-order Ambisonics audio. These 360˚ video sequences, with an average duration of 60 seconds, represent real-life settings for systematically evaluating various dimensions of uni-/multi-modal perception, cognition, behavior, and QoE. The paper provides details of the scene requirements, recording approach, and scene descriptions. The database provides high-quality reference material with a balanced focus on auditory and visual sensory information. The database will be continuously updated with additional scenes and further metadata such as human ratings and saliency information.
Herglotz, Christian; Robitza, Werner; Kränzler, Matthias; Kaup, André; Raake, Alexander
Modeling of energy consumption and streaming video QoE using a crowdsourcing dataset. - In: IEEE Xplore digital library, ISSN 2473-2001, (2022)

In the past decade, we have witnessed an enormous growth in the demand for online video services. Recent studies estimate that nowadays, more than 1% of the global greenhouse gas emissions can be attributed to the production and use of devices performing online video tasks. As such, research on the true power consumption of devices and their energy efficiency during video streaming is highly important for a sustainable use of this technology. At the same time, over-the-top providers strive to offer high-quality streaming experiences to satisfy user expectations. Here, energy consumption and QoE partly depend on the same system parameters. Hence, a joint view is needed for their evaluation. In this paper, we perform a first analysis of both end-user power efficiency and Quality of Experience of a video streaming service. We take a crowdsourced dataset comprising 447,000 streaming events from YouTube and estimate both the power consumption and perceived quality. The power consumption is modeled based on previous work which we extended towards predicting the power usage of different devices and codecs. The user-perceived QoE is estimated using a standardized model. Our results indicate that an intelligent choice of streaming parameters can optimize both the QoE and the power efficiency of the end user device. Further, the paper discusses limitations of the approach and identifies directions for future research.
Döring, Nicola; Conde, Melisa; Brandenburg, Karlheinz; Broll, Wolfgang; Groß, Horst-Michael; Werner, Stephan; Raake, Alexander
Can communication technologies reduce loneliness and social isolation in older people? : a scoping review of reviews. - In: International journal of environmental research and public health, ISSN 1660-4601, Bd. 19 (2022), 18, 11310, S. 1-20

Background: Loneliness and social isolation in older age are considered major public health concerns and research on technology-based solutions is growing rapidly. This scoping review of reviews aims to summarize the communication technologies (CTs) (review question RQ1), theoretical frameworks (RQ2), study designs (RQ3), and positive effects of technology use (RQ4) present in the research field. Methods: A comprehensive multi-disciplinary, multi-database literature search was conducted. Identified reviews were analyzed according to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) framework. A total of N = 28 research reviews that cover 248 primary studies spanning 50 years were included. Results: The majority of the included reviews addressed general internet and computer use (82% each) (RQ1). Of the 28 reviews, only one (4%) worked with a theoretical framework (RQ2) and 26 (93%) covered primary studies with quantitative-experimental designs (RQ3). The positive effects of technology use were shown in 55% of the outcome measures for loneliness and 44% of the outcome measures for social isolation (RQ4). Conclusion: While research reviews show that CTs can reduce loneliness and social isolation in older people, causal evidence is limited and insights on innovative technologies such as augmented reality systems are scarce.
Ramachandra Rao, Rakesh Rao; Göring, Steve; Raake, Alexander
AVQBits-adaptive video quality model based on bitstream information for various video applications. - In: IEEE access, ISSN 2169-3536, Bd. 10 (2022), S. 80321-80351

The paper presents AVQBits, a versatile, bitstream-based video quality model. It can be applied in several contexts such as video service monitoring, evaluation of video encoding quality, of gaming video QoE, and even of omnidirectional video quality. In the paper, it is shown that AVQBits predictions closely match video quality ratings obained in various subjective tests with human viewers, for videos up to 4K-UHD resolution (Ultra-High Definition, 3840 x 2180 pixels) and framerates up 120 fps. With the different variants of AVQBits presented in the paper, video quality can be monitored either at the client side, in the network or directly after encoding. The no-reference AVQBits model was developed for different video services and types of input data, reflecting the increasing popularity of Video-on-Demand services and widespread use of HTTP-based adaptive streaming. At its core, AVQBits encompasses the standardized ITU-T P.1204.3 model, with further model instances that can either have restricted or extended input information, depending on the application context. Four different instances of AVQBits are presented, that is, a Mode 3 model with full access to the bitstream, a Mode 0 variant using only metadata such as codec type, framerate, resoution and bitrate as input, a Mode 1 model using Mode 0 information and frame-type and -size information, and a Hybrid Mode 0 model that is based on Mode 0 metadata and the decoded video pixel information. The models are trained on the authors’ own AVT-PNATS-UHD-1 dataset described in the paper. All models show a highly competitive performance by using AVT-VQDB-UHD-1 as validation dataset, e.g., with the Mode 0 variant yielding a value of 0.890 Pearson Correlation, the Mode 1 model of 0.901, the hybrid no-reference mode 0 model of 0.928 and the model with full bitstream access of 0.942. In addition, all four AVQBits variants are evaluated when applying them out-of-the-box to different media formats such as 360˚ video, high framerate (HFR) content, or gaming videos. The analysis shows that the ITU-T P.1204.3 and Hybrid Mode 0 instances of AVQBits for the considered use-cases either perform on par with or better than even state-of-the-art full reference, pixel-based models. Furthermore, it is shown that the proposed Mode 0 and Mode 1 variants outperform commonly used no-reference models for the different application scopes. Also, a long-term integration model based on the standardized ITU-T P.1203.3 is presented to estimate ratings of overall audiovisual streaming Quality of Experience (QoE) for sessions of 30 s up to 5 min duration. In the paper, the AVQBits instances with their per-1-sec score output are evaluated as the video quality component of the proposed long-term integration model. All AVQBits variants as well as the long-term integration module are made publicly available for the community for further research.
Bajpai, Vaibhav; Hohlfeld, Oliver; Crowcroft, Jon; Keshav, Srinivasan; Schulzrinne, Henning; Ott, Jörg; Ferlin, Simone; Carle, Georg; Hines, Andrew; Raake, Alexander
Recommendations for designing hybrid conferences. - In: ACM SIGCOMM computer communication review, ISSN 0146-4833, Bd. 52 (2022), 2, S. 63-69

During the COVID-19 pandemic, many smaller conferences have moved entirely online and larger ones are being held as hybrid events. Even beyond the pandemic, hybrid events reduce the carbon footprint of conference travel and makes events more accessible to parts of the research community that have difficulty traveling long distances, while preserving most advantages of in-person gatherings. While we have developed a solid understanding of how to design virtual events over the last two years, we are still learning how to properly run hybrid events. We present guidelines and considerations-spanning technology, organization and social factors-for organizing successful hybrid conferences. This paper summarizes and extends the discussions held at the Dagstuhl seminar on "Climate Friendly Internet Research" held in July 2021.
Gutiérrez, Jesús; Pérez, Pablo; Orduna, Marta; Singla, Ashutosh; Cortés, Carlos; Mazumdar, Pramit; Viola, Irene; Brunnström, Kjell; Battisti, Federica; Ciepliânska, Natalia; Juszka, Dawid; Janowski, Lucjan; Leszczuk, Mikołaj; Adeyemi-Ejeye, Anthony; Hu, Yaosi; Chen, Zhenzhong; Wallendael, Glenn Van; Lambert, Peter; Díaz, César; Hedlund, John; Hamsis, Omar; Fremerey, Stephan; Hofmeyer, Frank; Raake, Alexander; César, Pablo; Carli, Marco; García, Narciso
Subjective evaluation of visual quality and simulator sickness of short 360˚ videos: ITU-T Rec. P.919. - In: IEEE transactions on multimedia, Bd. 24 (2022), S. 3087-3100

Recently an impressive development in immersive technologies, such as Augmented Reality (AR), Virtual Reality (VR) and 360˚ video, has been witnessed. However, methods for quality assessment have not been keeping up. This paper studies quality assessment of 360˚ video from the cross-lab tests (involving ten laboratories and more than 300 participants) carried out by the Immersive Media Group (IMG) of the Video Quality Experts Group (VQEG). These tests were addressed to assess and validate subjective evaluation methodologies for 360˚ video. Audiovisual quality, simulator sickness symptoms, and exploration behavior were evaluated with short (from 10 seconds to 30 seconds) 360˚ sequences. The following factors’ influences were also analyzed: assessment methodology, sequence duration, Head-Mounted Display (HMD) device, uniform and non-uniform coding degradations, and simulator sickness assessment methods. The obtained results have demonstrated the validity of Absolute Category Rating (ACR) and Degradation Category Rating (DCR) for subjective tests with 360˚ videos, the possibility of using 10-second videos (with or without audio) when addressing quality evaluation of coding artifacts, as well as any commercial HMD (satisfying minimum requirements). Also, more efficient methods than the long Simulator Sickness Questionnaire (SSQ) have been proposed to evaluate related symptoms with 360˚ videos. These results have been instrumental for the development of the ITU-T Recommendation P.919. Finally, the annotated dataset from the tests is made publicly available for the research community.
Göring, Steve;
Data-driven visual quality estimation using machine learning. - Ilmenau : Universitätsbibliothek, 2022. - 1 Online-Ressource (vi, 190 Seiten)
Technische Universität Ilmenau, Dissertation 2022

Heutzutage werden viele visuelle Inhalte erstellt und sind zugänglich, was auf Verbesserungen der Technologie wie Smartphones und das Internet zurückzuführen ist. Es ist daher notwendig, die von den Nutzern wahrgenommene Qualität zu bewerten, um das Erlebnis weiter zu verbessern. Allerdings sind nur wenige der aktuellen Qualitätsmodelle speziell für höhere Auflösungen konzipiert, sagen mehr als nur den Mean Opinion Score vorher oder nutzen maschinelles Lernen. Ein Ziel dieser Arbeit ist es, solche maschinellen Modelle für höhere Auflösungen mit verschiedenen Datensätzen zu trainieren und zu evaluieren. Als Erstes wird eine objektive Analyse der Bildqualität bei höheren Auflösungen durchgeführt. Die Bilder wurden mit Video-Encodern komprimiert, hierbei weist AV1 die beste Qualität und Kompression auf. Anschließend werden die Ergebnisse eines Crowd-Sourcing-Tests mit einem Labortest bezüglich Bildqualität verglichen. Weiterhin werden auf Deep Learning basierende Modelle für die Vorhersage von Bild- und Videoqualität beschrieben. Das auf Deep Learning basierende Modell ist aufgrund der benötigten Ressourcen für die Vorhersage der Videoqualität in der Praxis nicht anwendbar. Aus diesem Grund werden pixelbasierte Videoqualitätsmodelle vorgeschlagen und ausgewertet, die aussagekräftige Features verwenden, welche Bild- und Bewegungsaspekte abdecken. Diese Modelle können zur Vorhersage von Mean Opinion Scores für Videos oder sogar für anderer Werte im Zusammenhang mit der Videoqualität verwendet werden, wie z.B. einer Bewertungsverteilung. Die vorgestellte Modellarchitektur kann auf andere Videoprobleme angewandt werden, wie z.B. Videoklassifizierung, Vorhersage der Qualität von Spielevideos, Klassifikation von Spielegenres oder der Klassifikation von Kodierungsparametern. Ein wichtiger Aspekt ist auch die Verarbeitungszeit solcher Modelle. Daher wird ein allgemeiner Ansatz zur Beschleunigung von State-of-the-Art-Videoqualitätsmodellen vorgestellt, der zeigt, dass ein erheblicher Teil der Verarbeitungszeit eingespart werden kann, während eine ähnliche Vorhersagegenauigkeit erhalten bleibt. Die Modelle sind als Open Source veröffentlicht, so dass die entwickelten Frameworks für weitere Forschungsarbeiten genutzt werden können. Außerdem können die vorgestellten Ansätze als Bausteine für neuere Medienformate verwendet werden.
Skowronek, Janto; Raake, Alexander; Berndtsson, Gunilla H.; Rummukainen, Olli S.; Usai, Paolino; Gunkel, Simon N. B.; Johanson, Mathias; Habets, Emanuel A.P.; Malfait, Ludovic; Lindero, David; Toet, Alexander
Quality of experience in telemeetings and videoconferencing: a comprehensive survey. - In: IEEE access, ISSN 2169-3536, Bd. 10 (2022), S. 63885-63931

Telemeetings such as audiovisual conferences or virtual meetings play an increasingly important role in our professional and private lives. For that reason, system developers and service providers will strive for an optimal experience for the user, while at the same time optimizing technical and financial resources. This leads to the discipline of Quality of Experience (QoE), an active field originating from the telecommunication and multimedia engineering domains, that strives for understanding, measuring, and designing the quality experience with multimedia technology. This paper provides the reader with an entry point to the large and still growing field of QoE of telemeetings, by taking a holistic perspective, considering both technical and non-technical aspects, and by focusing on current and near-future services. Addressing both researchers and practitioners, the paper first provides a comprehensive survey of factors and processes that contribute to the QoE of telemeetings, followed by an overview of relevant state-of-the-art methods for QoE assessment. To embed this knowledge into recent technology developments, the paper continues with an overview of current trends, focusing on the field of eXtended Reality (XR) applications for communication purposes. Given the complexity of telemeeting QoE and the current trends, new challenges for a QoE assessment of telemeetings are identified. To overcome these challenges, the paper presents a novel Profile Template for characterizing telemeetings from the holistic perspective endorsed in this paper.
Robitza, Werner; Ramachandra Rao, Rakesh Rao; Göring, Steve; Dethof, Alexander; Raake, Alexander
Deploying the ITU-T P.1203 QoE model in the wild and retraining for new codecs. - In: MHV '22, (2022), S. 121-122

This paper presents two challenges associated with using the ITU-T P.1203 standard for video quality monitoring in practice. We discuss the issue of unavailable data on certain browsers/platforms and the lack of information within newly developed data formats like Common Media Client Data. We also re-trained the coefficients of the P.1203.1 video model for newer codecs, and published a completely new model derived from the P.1204.3 bitstream model.
Döring, Nicola; De Moor, Katrien; Fiedler, Markus; Schoenenberg, Katrin; Raake, Alexander
Videoconference fatigue: a conceptual analysis. - In: International journal of environmental research and public health, ISSN 1660-4601, Bd. 19 (2022), 4, 2061, S. 1-20

Videoconferencing (VC) is a type of online meeting that allows two or more participants from different locations to engage in live multi-directional audio-visual communication and collaboration (e.g., via screen sharing). The COVID-19 pandemic has induced a boom in both private and professional videoconferencing in the early 2020s that elicited controversial public and academic debates about its pros and cons. One main concern has been the phenomenon of videoconference fatigue. The aim of this conceptual review article is to contribute to the conceptual clarification of VC fatigue. We use the popular and succinct label "Zoom fatigue" interchangeably with the more generic label "videoconference fatigue" and define it as the experience of fatigue during and/or after a videoconference, regardless of the specific VC system used. We followed a structured eight-phase process of conceptual analysis that led to a conceptual model of VC fatigue with four key causal dimensions: (1) personal factors, (2) organizational factors, (3) technological factors, and (4) environmental factors. We present this 4D model describing the respective dimensions with their sub-dimensions based on theories, available evidence, and media coverage. The 4D-model is meant to help researchers advance empirical research on videoconference fatigue.