Publications of the Department of Audiovisual Technology

The following list (automatically generated by the University Library) contains the publications from the year 2016. The publications up to the year 2015 can be found on an extra page.

Note: If you want to search through all the publications, select "Show All" and then you can use the browser search with Ctrl+F.

Results: 162
Created on: Thu, 25 Apr 2024 23:03:09 +0200 in 0.0853 sec


Diao, Chenyao; Sinani, Luljeta; Ramachandra Rao, Rakesh Rao; Raake, Alexander
Revisiting videoconferencing QoE: impact of network delay and resolution as factors for social cue perceptibility. - In: 2023 15th International Conference on Quality of Multimedia Experience (QoMEX), (2023), S. 240-243

Previous research from well before the Covid-19 pandemic had indicated little effect of delay on integral quality but a measurable one on user behavior, and a significant effect of resolution on quality but not on behavior in a two-party communication scenario. In this paper, we re-investigate the topic, after the times of the Covid-19 pandemic and its frequent and widespread videoconferencing usage. To this aim, we conducted a subjective test involving 23 pairs of participants, employing the Celebrity Name Guessing task. The focus was on impairments that may affect social (resolution) and communication cues (de-lay). Subjective data in the form of overall conversational quality and task performance satisfaction as well as objective data in the form of task correctness, user motion, and facial expressions were collected in the test. The analysis of the subjective data indicates that perceived conversational quality and performance satisfaction were mainly affected by video resolution, while delay (up to 1000 ms) had no significant impact. Furthermore, the analysis of the objective data shows that there is no impact of resolution and delay on user performance and behavior, in contrast to earlier findings.



https://doi.org/10.1109/QoMEX58391.2023.10178483
Singla, Ashutosh; Robotham, Thomas; Bhattacharya, Abhinav; Menz, William; Habets, Emanuel A.P.; Raake, Alexander
Saliency of omnidirectional videos with different audio presentations: analyses and dataset. - In: 2023 15th International Conference on Quality of Multimedia Experience (QoMEX), (2023), S. 264-269

There is an increased interest in understanding users' behavior when exploring omnidirectional (360˚) videos, especially in the presence of spatial audio. Several studies demonstrate the effect of no, mono, or spatial audio on visual saliency. However, no studies investigate the influence of higher-order (i.e., 4t h- order) Ambisonics on subjective exploration in virtual reality settings. In this work, a between-subjects test design is employed to collect users' exploration data of 360˚ videos in a free-form viewing scenario using the Varjo XR-3 Head Mounted Display, in the presence of no, mono, and 4th-order Ambisonics audio. Saliency information was captured as head-saliency in terms of the center of a viewport at 50 Hz. For each item, subjects were asked to describe the scene with a short free-verbalization task. Moreover, cybersickness was assessed using the simulator sickness questionnaire at the beginning and at the end of the test. The head-saliency results over time show that with the presence of higher-order Ambisonics audio, subjects concentrate more on the directions sound is coming from. No influence of audio scenario on cybersickness scores was observed. From the analysis of the verbal scene descriptions, it was found that users were attentive to the omnidirectional video, but only for the ‘no audio’ scenario provided minute and insignificant details of the scene objects. The audiovisual saliency dataset is made available following the open science approach already used for the audiovisual scene recordings we previously published. The data is sought to enable training of visual and audiovisual saliency prediction models for interactive experiences.



https://doi.org/10.1109/QoMEX58391.2023.10178588
Ramachandra Rao, Rakesh Rao; Borer, Silvio; Lindero, David; Göring, Steve; Raake, Alexander
PNATS-UHD-1-Long: an open video quality dataset for long sequences for HTTP-based Adaptive Streaming QoE assessment. - In: 2023 15th International Conference on Quality of Multimedia Experience (QoMEX), (2023), S. 252-257

The P.NATS Phase 2 competition in ITU-T Study Group 12 resulted in both the ITU-T Rec. P.1204 series of recommendations, and also a large dataset for HTTP-based adaptive streaming QoE assessment that is now made openly available as part of this paper. The presented dataset consists of 3 subjective databases targeting overall quality assessment of a typical HTTP-based Adaptive Streaming session consisting of degradations such as quality switching, initial loading delay, and stalling events using audiovisual content ranging between 2 and 5 minutes. In addition to this, subject bias and consistency in quality assessment of such longer-duration audiovisual contents with multiple degradations are investigated using a subject behaviour model. As part of this paper, the overall test design, subjective test results, sources, encoded audiovisual contents, and a set of analysis plots are made publicly available for further research.



https://doi.org/10.1109/QoMEX58391.2023.10178493
Braun, Florian; Ramachandra Rao, Rakesh Rao; Robitza, Werner; Raake, Alexander
Automatic audiovisual asynchrony measurement for quality assessment of videoconferencing. - In: 2023 15th International Conference on Quality of Multimedia Experience (QoMEX), (2023), S. 248-251

Audiovisual asynchrony is a significant factor im-pacting the Quality of Experience (QoE), especially for interactive communication like video conferencing. In this paper, we propose a client-side approach to predict the delay between an audio and a video signal, using only the media signals from both streams. Features are extracted from the video and audio stream, respectively, and analyzed using a cross-correlation approach to determine the actual delay. Our approach predicts the delay with an accuracy of over 80% in a time frame of ±1s. We further highlight the potential drawbacks of using a cross-correlation-based analysis and propose different solutions for practical implementations of a delay-based QoE metric.



https://doi.org/10.1109/QoMEX58391.2023.10178438
Keller, Dominik; Hagen, Felix; Prenzel, Julius; Strama, Kay; Ramachandra Rao, Rakesh Rao; Raake, Alexander
Influence of viewing distances on 8K HDR video quality perception. - In: 2023 15th International Conference on Quality of Multimedia Experience (QoMEX), (2023), S. 209-212

The benefits of high resolutions in displays, such as 8K (UHD-2), have been the subject of ongoing research in the field of display technology and human perception in recent years. Out of several factors influencing users' perception of video quality, viewing distance is one of the key aspects. Hence, this study uses a subjective test to investigate the perceptual advantages of 8K over 4K (UHD-1) resolution for HDR videos at 7 different viewing distances, ranging from 0.5 H to 2 H. The results indicate that, on average, for HDR content the 8K resolution can improve the video quality at all tested distances. Our study shows that although the 8K resolution is slightly better than 4K at close distances, the extent of these benefits is highly dependent on factors such as the pixel-related complexity of the content and the visual acuity of the viewers.



https://doi.org/10.1109/QoMEX58391.2023.10178602
Herglotz, Christian; Robitza, Werner; Raake, Alexander; Hoßfeld, Tobias; Kaup, André
Power reduction opportunities on end-user devices in quality-steady video streaming. - In: 2023 15th International Conference on Quality of Multimedia Experience (QoMEX), (2023), S. 79-82

This paper uses a crowdsourced dataset of online video streaming sessions to investigate opportunities to reduce the power consumption while considering QoE. For this, we base our work on prior studies which model both the end-user's QoE and the end-user device's power consumption with the help of high-level video features such as the bitrate, the frame rate, and the resolution. On top of existing research, which focused on reducing the power consumption at the same QoE optimizing video parameters, we investigate potential power savings by other means such as using a different playback device, a different codec, or a predefined maximum quality level. We find that based on the power consumption of the streaming sessions from the crowdsourcing dataset, devices could save more than 55% of power if all participants adhere to low-power settings.



https://doi.org/10.1109/QoMEX58391.2023.10178450
Göring, Steve; Ramachandra Rao, Rakesh Rao; Merten, Rasmus; Raake, Alexander
Appeal and quality assessment for AI-generated images. - In: 2023 15th International Conference on Quality of Multimedia Experience (QoMEX), (2023), S. 115-118

Recently AI-generated images gained in popularity. A critical aspect of AI-generated images using, e.g., DALL-E-2 or Midjourney, is that they may look artificial, be of low quality, or have a low appeal in contrast to real images, depending on the text prompt and AI generator. For this reason, we evaluate the quality and appeal of AI-generated images using a crowdsourcing test as an extension of our recently published AVT-AI-Image-Dataset. This dataset consists of a total of 135 images generated with five different AI-text-to-image generators. Based on the collected subjective ratings in the crowdsourcing test, we evaluate the different used AI generators in terms of image quality and appeal of the AI-generated images. We also link image quality and image appeal also with SoA objective models. The extension will be made publicly available for reproducibility.



https://doi.org/10.1109/QoMEX58391.2023.10178486
Göring, Steve; Merten, Rasmus; Raake, Alexander
DNN-based photography rule prediction using photo tags. - In: 2023 15th International Conference on Quality of Multimedia Experience (QoMEX), (2023), S. 83-86

Instagram and Flickr are just two examples of photo-sharing platforms which are currently used to upload thousands of images on a daily basis. One important aspect in such social media contexts is to know whether an image is of high appeal or not. In particular, to understand the composition of a photo and to improve reading flow, several photo rules have been established. In this paper, we focus on eight selected photo rules. To automatically predict whether an image follows one of these rules or not, we train 13 deep neural networks in a transfer-learning setup and compare their prediction performance. As a dataset, we use photos downloaded from Flickr with specifically selected image tags, which reflect the eight photo rules. There-fore, our dataset does not need additional human annotations. ResNet50 has the best prediction performance, however, there are images that follow several rules, which must be addressed in follow-up work. The code and the data (image URLs) are made publicly available for reproducibility.



https://doi.org/10.1109/QoMEX58391.2023.10178505
Mossakowski, Till; Hedblom, Maria M.; Neuhaus, Fabian; Arévalo Arboleda, Stephanie; Raake, Alexander
Using the diagrammatic image schema language for joint human-machine cognition. - In: Engineering for a changing world, (2023), 5.1.133, S. 1-5

https://doi.org/10.22032/dbt.58917
Robotham, Thomas; Singla, Ashutosh; Raake, Alexander; Rummukainen, Olli S.; Habets, Emanuel A.P.
Influence of multi-modal interactive formats on subjective audio quality and exploration behavior. - In: IMX 2023, (2023), S. 115-128

This study uses a mixed between- and within-subjects test design to evaluate the influence of interactive formats on the quality of binaurally rendered 360&ring; spatial audio content. Focusing on ecological validity using real-world recordings of 60 s duration, three independent groups of subjects () were exposed to three formats: audio only (A), audio with 2D visuals (A2DV), and audio with head-mounted display (AHMD) visuals. Within each interactive format, two sessions were conducted to evaluate degraded audio conditions: bit-rate and Ambisonics order. Our results show a statistically significant effect (p < .05) of format only on spatial audio quality ratings for Ambisonics order. Exploration data analysis shows that format A yields little variability in exploration, while formats A2DV and AHMD yield broader viewing distribution of 360&ring; content. The results imply audio quality factors can be optimized depending on the interactive format.



https://doi.org/10.1145/3573381.3596155