Veröffentlichungen des Fachgebiets Audiovisuelle Technik

Die folgende Liste (automatisch durch die Universitätsbibliothek erstellt) enthält die Publikationen ab dem Jahr 2016. Die Veröffentlichungen bis zum Jahr 2015 finden sie auf einer extra Seite.

Hinweis: Wenn Sie alle Publikationen durchsuchen wollen, dann wählen Sie "Alle anzeigen" und können dann die Browser-Suche mit Strg+F verwenden.

Anzahl der Treffer: 162
Erstellt: Thu, 18 Apr 2024 23:03:14 +0200 in 2.6929 sec


Arévalo Arboleda, Stephanie; Conde, Melisa; Döring, Nicola; Raake, Alexander
Introducing personas and scenarios to highlight older adults' perspectives on robot-mediated communication. - In: HRI '24 companion, (2024), S. 209-213

Little is known about the expectations of older adults (60+ years old) in robot-mediated communication when leaving aside care-related activities. To bridge this gap, we carried out 30 semi-structured interviews with older adults to explore their experiences and expectations related to technology-mediated communication. We present the results of the collected data through personas that portray three archetype users, Conny Connected, Stephan Skeptical, and Thomas TechFan. These personas are presented in a specific communication scenario with individual goals that go beyond mere communication, such as the desire for closeness (Conny Connected), a problem-free experience (Stephan Skeptical), and exploring affordances of telepresence robots (Thomas Tech-Fan). Also, we provide two considerations when aiming at positive experiences for older adults with robots: balance generalizable aspects and individual needs and identify and challenge preconceptions of telepresence robots.



https://doi.org/10.1145/3610978.3640659
Döring, Nicola; Mikhailova, Veronika; Brandenburg, Karlheinz; Broll, Wolfgang; Groß, Horst-Michael; Werner, Stephan; Raake, Alexander
Digital media in intergenerational communication: status quo and future scenarios for the grandparent-grandchild relationship. - In: Universal access in the information society, ISSN 1615-5297, Bd. 23 (2024), 1, S. 379-394

Communication technologies play an important role in maintaining the grandparent-grandchild (GP-GC) relationship. Based on Media Richness Theory, this study investigates the frequency of use (RQ1) and perceived quality (RQ2) of established media as well as the potential use of selected innovative media (RQ3) in GP-GC relationships with a particular focus on digital media. A cross-sectional online survey and vignette experiment were conducted in February 2021 among N = 286 university students in Germany (mean age 23 years, 57% female) who reported on the direct and mediated communication with their grandparents. In addition to face-to-face interactions, non-digital and digital established media (such as telephone, texting, video conferencing) and innovative digital media, namely augmented reality (AR)-based and social robot-based communication technologies, were covered. Face-to-face and phone communication occurred most frequently in GP-GC relationships: 85% of participants reported them taking place at least a few times per year (RQ1). Non-digital established media were associated with higher perceived communication quality than digital established media (RQ2). Innovative digital media received less favorable quality evaluations than established media. Participants expressed doubts regarding the technology competence of their grandparents, but still met innovative media with high expectations regarding improved communication quality (RQ3). Richer media, such as video conferencing or AR, do not automatically lead to better perceived communication quality, while leaner media, such as letters or text messages, can provide rich communication experiences. More research is needed to fully understand and systematically improve the utility, usability, and joy of use of different digital communication technologies employed in GP-GC relationships.



https://doi.org/10.1007/s10209-022-00957-w
Singla, Ashutosh; Wang, Shuang; Göring, Steve; Ramachandra Rao, Rakesh Rao; Viola, Irene; Cesar, Pablo; Raake, Alexander
Subjective quality evaluation of point clouds using remote testing. - In: IXR '23, (2023), S. 21-28

Subjective quality assessment serves as a method to evaluate the perceptual quality of 3D point clouds. These evaluations can be conducted using lab-based or remote or crowdsourcing tests. The lab-based tests are time-consuming and less cost-effective. As an alternative, remote or crowd tests can be used, offering a time and cost-friendly approach. Remote testing enables larger and more diverse participant pools. However, this raises the question of its applicability due to variability in participants' display devices and environments for the evaluation of the point cloud. In this paper, the focus is on investigating the applicability of remote testing by using the Absolute Category Rating (ACR) test method for assessing the subjective quality of point clouds in different tests. We compare the results of lab and remote tests by replicating lab-based tests. In the first test, we assess the subjective quality of a static point cloud geometry for two different types of geometrical degradations, namely Gaussian noise, and octree-pruning. In the second test, we compare the performance of two different compression methods (G-PCC and V-PCC) to assess the subjective quality of coloured point cloud videos. Based on the results obtained using correlation and Standard deviation of Opinion Scores (SOS) analysis, the remote testing paradigm can be used for evaluating point clouds.



https://doi.org/10.1145/3607546.3616803
Breuer, Carolin; Leist, Larissa; Fremerey, Stephan; Raake, Alexander; Klatte, Maria; Fels, Janina
Towards investigating listening comprehension in virtual reality. - Aachen : Universitätsbibliothek der RWTH Aachen. - 1 Online-Ressource (7 Seiten)

The investigation of listening comprehension in auditory and visually complex classroom settings is a promising method to evaluate children’s cognitive performance in a realistic setting. Many studies were able to show that children are more susceptible to noise than adults. However, it has recently been suggested that established monaural listening situations could overestimate the influence of noise on children’s task performance. Therefore, new, close- to real-life scenarios need to be introduced to investigate cognitive performance in everyday situations rather than artificial laboratory settings. This study aimed at extending a validated paper-and-pencil test towards a virtual reality setting. To get first insights, into different interaction methods, a pilot study with adult participants was conducted. In contrast to other recent studies, the virtual environment had little influence on this listening comprehension paradigm, since comparable results were obtained in the paper-and-pencil test and in the virtual reality variants for all user interfaces. Thus, the presented paradigm has proven to be robust and can be used to further investigate the usage of virtual reality to evaluate children’s cognitive performance.



https://doi.org/10.18154/RWTH-2023-11913
Ramachandra Rao, Rakesh Rao; Göring, Steve; Elmeligy, Bassem; Raake, Alexander
AVT-VQDB-UHD-1-Appeal: a UHD-1/4K open dataset for video quality and appeal assessment using modern video codecs. - In: IEEE Xplore digital library, ISSN 2473-2001, (2023), insges. 6 S.

A number of factors play an important role in the perception of video quality for streaming and other services, key among them being encoding-related degradations. Hence, newer codecs are developed with the goal of optimizing video quality for a given encoding setting. Here, subjective studies are an efficient method to evaluate the performance of such newer codecs. Furthermore, contextual factors impact the perception of video quality, e.g., the appeal of the content itself. To this end, this paper presents a subjective study targeting both quality and appeal assessment of videos. For this purpose, a subjective study consisting of three different parts is conducted. Firstly, participants were asked to rate the appeal of the uncompressed UHD-1/4K source content with a duration of 8 - 10s each. Following this, the video quality of these source videos individually encoded with either the HEVC/H.265, AV1, or VVC/H.266 video codec was rated. A wide range of encoding conditions in terms of resolution (360p to 2160p) and bitrate (100kbps to 15mbps) is used to encode the videos, so as to enable the applicability of the data to real-world settings. In the last part, subjects are again asked to rate the appeal of the uncompressed source content. The results are analyzed to assess the impact of different encoding conditions on perceived video quality. In addition, the impact of appeal on video quality and vice-versa is also investigated. Furthermore, an objective quality assessment with different state-of-the-art full-reference, bitstream-based, and hybrid models including the newer codecs AV1 and VVC is presented. The subjective dataset including test design, subjective results, sources, and encoded audiovisual contents are made publicly available following an open science approach.



https://doi.org/10.1109/MMSP59012.2023.10337713
Viola, Irene; Amirpour, Hadi; Arévalo Arboleda, Stephanie; Torres Vega, Maria
IXR '23: 2nd International Workshop on Interactive eXtended Reality. - In: MM '23, (2023), S. 9728-9730

Despite remarkable advances, current Extended Reality (XR) applications are in their majority local and individual experiences. A plethora of interactive applications, such as teleconferencing, telesurgery, interconnection in new buildings project chain, Cultural Heritage, and Museum contents communication, are well on their way to integrating immersive technologies. However, interconnected, and interactive XR, where participants can virtually interact across vast distances, remains a distant dream. In fact, three great barriers stand between current technology and remote immersive interactive life-like experiences, namely (i) content realism, (ii) motion-to-photon latency, and accurate (iii) human-centric quality assessment and control. Overcoming these barriers will require novel solutions at all elements of the end-to-end transmission chain. This workshop focuses on the challenges, applications, and major advancements in multimedia, networks, and end-user infrastructures to enable the next generation of interactive XR applications and services.



https://doi.org/10.1145/3581783.3610945
Fischedick, Söhnke B.; Richter, Kay; Wengefeld, Tim; Seichter, Daniel; Scheidig, Andrea; Döring, Nicola; Broll, Wolfgang; Werner, Stephan; Raake, Alexander; Groß, Horst-Michael
Bridging distance with a collaborative telepresence robot for older adults - report on progress in the CO-HUMANICS project. - In: ISR Europe 2023: 56th International Symposium on Robotics, (2023), S. 346-353

In an aging society, the social needs of older adults, such as regular interactions and independent living, are crucial for their quality of life. However, due to spatial separation from their family and friends, it is difficult to maintain social relationships. Our multidisciplinary project, CO-HUMANICS, aims to meet these needs, even over long distances, through the utilization of innovative technologies, including a robot-based system. This paper presents the first prototype of our system, designed to connect family members or friends virtually present through a mobile robot with an older adult. The system incorporates bi-directional video telephony, remote control capabilities, and enhanced visualization methods. A comparison is made with other state-of-the-art robotic approaches, focusing on remote control capabilities. We provide details about the hard- and software components, e.g., a projector-based pointing unit for collaborative telepresence to assist in everyday tasks. Our comprehensive scene representation is discussed, which utilizes 3D NDT maps, enabling advanced remote navigation features, such as autonomously driving to a specific object. Finally, insights about past and concepts for future evaluation are provided to assess the developed system.



https://ieeexplore.ieee.org/document/10363093
Saboor, Qasim; Mehfooz-Khan, Hamd; Raake, Alexander; Arévalo Arboleda, Stephanie
A virtual gardening experience: evaluating the effect of haptic feedback on spatial presence, perceptual realism, mental immersion, and user experience. - In: MUM 2023, (2023), S. 520-522

Virtual nature settings have demonstrated to provide benefits to mental well-being. However, most studies have focused on providing only audiovisual stimuli. We aim to evaluate the use of haptic feedback to simulate touching elements in nature-inspired settings. In this paper, we designed a VR gardening environment to investigate the impact of haptic feedback on spatial presence, perceptual realism, mental immersion, user experience, and task performance while interacting with gardening objects in a study (N=18, 9 female and 9 male). Our results suggest that haptic feedback can increase spatial presence and point to gender differences, i.e., female participants reported higher scores in spatial presence and perceptual realism, in the chosen VR experience. Although our main goal was to evaluate the role of haptics in a virtual garden, our findings highlight the importance of investigating and identifying factors that could lead to gender differences in VR experiences.



https://doi.org/10.1145/3626705.3631794
Hartbrich, Jakob; Weidner, Florian; Kunert, Christian; Arévalo Arboleda, Stephanie; Raake, Alexander; Broll, Wolfgang
Eye and face tracking in VR: avatar embodiment and enfacement with realistic and cartoon avatars. - In: MUM 2023, (2023), S. 270-278

Previous studies have explored the perception of various types of embodied avatars in immersive environments. However, the impact of eye and face tracking with personalized avatars is yet to be explored. In this paper, we investigate the impact of eye and face tracking on embodiment, enfacement, and the uncanny valley with four types of avatars using a VR-based mirroring task. We conducted a study (N=12) and created self-avatars with two rendering styles: a cartoon avatar (created in an avatar generator using a picture of the user’s face) and a photorealistic scanned avatar (created using a 3D scanner), each with and without eye and face tracking and respective adaptation of the mirror image. Our results indicate that adding eye and face tracking can be beneficial for certain enfacement scales (belonged), and we confirm that compared to a cartoon avatar, a scanned realistic avatar results in higher body ownership and increased enfacement (own face, belonging, mirror) - regardless of eye and face tracking. We critically discuss our experiences and outline the limitations of the applied hardware and software with respect to the provided level of control and the applicability for complex tasks such as displaying emotions. We synthesize these findings into a discussion about potential improvements for facial animation in VR and highlight the need for a better level of control, the integration of additional sensing and processing technologies, and an objective metric for comparing facial animation systems.



https://doi.org/10.1145/3626705.3627793
Friese, Ingo; Galkow-Schneider, Mandy; Bassbouss, Louay; Zoubarev, Alexander; Neparidze, Andy; Melnyk, Sergiy; Zhou, Qiuheng; Schotten, Hans D.; Pfandzelter, Tobias; Bermbach, David; Kritzner, Arndt; Zschau, Enrico; Dhara, Prasenjit; Göring, Steve; Menz, William; Raake, Alexander; Rüther-Kindel, Wolfgang; Quaeck, Fabian; Stuckert, Nick; Vilter, Robert
True 3D holography: a communication service of tomorrow and its requirements for a new converged cloud and network architecture on the path to 6G. - In: International Conference on 6G Networking, October 18 - 20, 2023, (2023), insges. 8 S.

Research project 6G NeXt is considering true 3D holography as a use case, setting requirements on the communication as well as the computing infrastructure. In a future holographic communication service, clients are widely spread in the network and cooperatively interact with each other. Especially for holographic communication high processing power is required as well. This makes a high-speed distributed backbone computing infrastructure, which realizes the concept of split computing, inevitable. Furthermore, tight integration between processing facilities and wireless networks is required in order to provide an immersive user experience. This paper illustrates true 3D holographic communication and its requirements. Afterward, an appropriate solution approach is elaborated. Here, novel technological approaches are discussed based on a proposed overall communication and computing architecture.



https://doi.org/10.1109/6GNet58894.2023.10317647
Diao, Chenyao; Sinani, Luljeta; Ramachandra Rao, Rakesh Rao; Raake, Alexander
Revisiting videoconferencing QoE: impact of network delay and resolution as factors for social cue perceptibility. - In: 2023 15th International Conference on Quality of Multimedia Experience (QoMEX), (2023), S. 240-243

Previous research from well before the Covid-19 pandemic had indicated little effect of delay on integral quality but a measurable one on user behavior, and a significant effect of resolution on quality but not on behavior in a two-party communication scenario. In this paper, we re-investigate the topic, after the times of the Covid-19 pandemic and its frequent and widespread videoconferencing usage. To this aim, we conducted a subjective test involving 23 pairs of participants, employing the Celebrity Name Guessing task. The focus was on impairments that may affect social (resolution) and communication cues (de-lay). Subjective data in the form of overall conversational quality and task performance satisfaction as well as objective data in the form of task correctness, user motion, and facial expressions were collected in the test. The analysis of the subjective data indicates that perceived conversational quality and performance satisfaction were mainly affected by video resolution, while delay (up to 1000 ms) had no significant impact. Furthermore, the analysis of the objective data shows that there is no impact of resolution and delay on user performance and behavior, in contrast to earlier findings.



https://doi.org/10.1109/QoMEX58391.2023.10178483
Singla, Ashutosh; Robotham, Thomas; Bhattacharya, Abhinav; Menz, William; Habets, Emanuel A.P.; Raake, Alexander
Saliency of omnidirectional videos with different audio presentations: analyses and dataset. - In: 2023 15th International Conference on Quality of Multimedia Experience (QoMEX), (2023), S. 264-269

There is an increased interest in understanding users' behavior when exploring omnidirectional (360˚) videos, especially in the presence of spatial audio. Several studies demonstrate the effect of no, mono, or spatial audio on visual saliency. However, no studies investigate the influence of higher-order (i.e., 4t h- order) Ambisonics on subjective exploration in virtual reality settings. In this work, a between-subjects test design is employed to collect users' exploration data of 360˚ videos in a free-form viewing scenario using the Varjo XR-3 Head Mounted Display, in the presence of no, mono, and 4th-order Ambisonics audio. Saliency information was captured as head-saliency in terms of the center of a viewport at 50 Hz. For each item, subjects were asked to describe the scene with a short free-verbalization task. Moreover, cybersickness was assessed using the simulator sickness questionnaire at the beginning and at the end of the test. The head-saliency results over time show that with the presence of higher-order Ambisonics audio, subjects concentrate more on the directions sound is coming from. No influence of audio scenario on cybersickness scores was observed. From the analysis of the verbal scene descriptions, it was found that users were attentive to the omnidirectional video, but only for the ‘no audio’ scenario provided minute and insignificant details of the scene objects. The audiovisual saliency dataset is made available following the open science approach already used for the audiovisual scene recordings we previously published. The data is sought to enable training of visual and audiovisual saliency prediction models for interactive experiences.



https://doi.org/10.1109/QoMEX58391.2023.10178588
Ramachandra Rao, Rakesh Rao; Borer, Silvio; Lindero, David; Göring, Steve; Raake, Alexander
PNATS-UHD-1-Long: an open video quality dataset for long sequences for HTTP-based Adaptive Streaming QoE assessment. - In: 2023 15th International Conference on Quality of Multimedia Experience (QoMEX), (2023), S. 252-257

The P.NATS Phase 2 competition in ITU-T Study Group 12 resulted in both the ITU-T Rec. P.1204 series of recommendations, and also a large dataset for HTTP-based adaptive streaming QoE assessment that is now made openly available as part of this paper. The presented dataset consists of 3 subjective databases targeting overall quality assessment of a typical HTTP-based Adaptive Streaming session consisting of degradations such as quality switching, initial loading delay, and stalling events using audiovisual content ranging between 2 and 5 minutes. In addition to this, subject bias and consistency in quality assessment of such longer-duration audiovisual contents with multiple degradations are investigated using a subject behaviour model. As part of this paper, the overall test design, subjective test results, sources, encoded audiovisual contents, and a set of analysis plots are made publicly available for further research.



https://doi.org/10.1109/QoMEX58391.2023.10178493
Braun, Florian; Ramachandra Rao, Rakesh Rao; Robitza, Werner; Raake, Alexander
Automatic audiovisual asynchrony measurement for quality assessment of videoconferencing. - In: 2023 15th International Conference on Quality of Multimedia Experience (QoMEX), (2023), S. 248-251

Audiovisual asynchrony is a significant factor im-pacting the Quality of Experience (QoE), especially for interactive communication like video conferencing. In this paper, we propose a client-side approach to predict the delay between an audio and a video signal, using only the media signals from both streams. Features are extracted from the video and audio stream, respectively, and analyzed using a cross-correlation approach to determine the actual delay. Our approach predicts the delay with an accuracy of over 80% in a time frame of ±1s. We further highlight the potential drawbacks of using a cross-correlation-based analysis and propose different solutions for practical implementations of a delay-based QoE metric.



https://doi.org/10.1109/QoMEX58391.2023.10178438
Keller, Dominik; Hagen, Felix; Prenzel, Julius; Strama, Kay; Ramachandra Rao, Rakesh Rao; Raake, Alexander
Influence of viewing distances on 8K HDR video quality perception. - In: 2023 15th International Conference on Quality of Multimedia Experience (QoMEX), (2023), S. 209-212

The benefits of high resolutions in displays, such as 8K (UHD-2), have been the subject of ongoing research in the field of display technology and human perception in recent years. Out of several factors influencing users' perception of video quality, viewing distance is one of the key aspects. Hence, this study uses a subjective test to investigate the perceptual advantages of 8K over 4K (UHD-1) resolution for HDR videos at 7 different viewing distances, ranging from 0.5 H to 2 H. The results indicate that, on average, for HDR content the 8K resolution can improve the video quality at all tested distances. Our study shows that although the 8K resolution is slightly better than 4K at close distances, the extent of these benefits is highly dependent on factors such as the pixel-related complexity of the content and the visual acuity of the viewers.



https://doi.org/10.1109/QoMEX58391.2023.10178602
Herglotz, Christian; Robitza, Werner; Raake, Alexander; Hoßfeld, Tobias; Kaup, André
Power reduction opportunities on end-user devices in quality-steady video streaming. - In: 2023 15th International Conference on Quality of Multimedia Experience (QoMEX), (2023), S. 79-82

This paper uses a crowdsourced dataset of online video streaming sessions to investigate opportunities to reduce the power consumption while considering QoE. For this, we base our work on prior studies which model both the end-user's QoE and the end-user device's power consumption with the help of high-level video features such as the bitrate, the frame rate, and the resolution. On top of existing research, which focused on reducing the power consumption at the same QoE optimizing video parameters, we investigate potential power savings by other means such as using a different playback device, a different codec, or a predefined maximum quality level. We find that based on the power consumption of the streaming sessions from the crowdsourcing dataset, devices could save more than 55% of power if all participants adhere to low-power settings.



https://doi.org/10.1109/QoMEX58391.2023.10178450
Göring, Steve; Ramachandra Rao, Rakesh Rao; Merten, Rasmus; Raake, Alexander
Appeal and quality assessment for AI-generated images. - In: 2023 15th International Conference on Quality of Multimedia Experience (QoMEX), (2023), S. 115-118

Recently AI-generated images gained in popularity. A critical aspect of AI-generated images using, e.g., DALL-E-2 or Midjourney, is that they may look artificial, be of low quality, or have a low appeal in contrast to real images, depending on the text prompt and AI generator. For this reason, we evaluate the quality and appeal of AI-generated images using a crowdsourcing test as an extension of our recently published AVT-AI-Image-Dataset. This dataset consists of a total of 135 images generated with five different AI-text-to-image generators. Based on the collected subjective ratings in the crowdsourcing test, we evaluate the different used AI generators in terms of image quality and appeal of the AI-generated images. We also link image quality and image appeal also with SoA objective models. The extension will be made publicly available for reproducibility.



https://doi.org/10.1109/QoMEX58391.2023.10178486
Göring, Steve; Merten, Rasmus; Raake, Alexander
DNN-based photography rule prediction using photo tags. - In: 2023 15th International Conference on Quality of Multimedia Experience (QoMEX), (2023), S. 83-86

Instagram and Flickr are just two examples of photo-sharing platforms which are currently used to upload thousands of images on a daily basis. One important aspect in such social media contexts is to know whether an image is of high appeal or not. In particular, to understand the composition of a photo and to improve reading flow, several photo rules have been established. In this paper, we focus on eight selected photo rules. To automatically predict whether an image follows one of these rules or not, we train 13 deep neural networks in a transfer-learning setup and compare their prediction performance. As a dataset, we use photos downloaded from Flickr with specifically selected image tags, which reflect the eight photo rules. There-fore, our dataset does not need additional human annotations. ResNet50 has the best prediction performance, however, there are images that follow several rules, which must be addressed in follow-up work. The code and the data (image URLs) are made publicly available for reproducibility.



https://doi.org/10.1109/QoMEX58391.2023.10178505
Mossakowski, Till; Hedblom, Maria M.; Neuhaus, Fabian; Arévalo Arboleda, Stephanie; Raake, Alexander
Using the diagrammatic image schema language for joint human-machine cognition. - In: Engineering for a changing world, (2023), 5.1.133, S. 1-5

https://doi.org/10.22032/dbt.58917
Robotham, Thomas; Singla, Ashutosh; Raake, Alexander; Rummukainen, Olli S.; Habets, Emanuel A.P.
Influence of multi-modal interactive formats on subjective audio quality and exploration behavior. - In: IMX 2023, (2023), S. 115-128

This study uses a mixed between- and within-subjects test design to evaluate the influence of interactive formats on the quality of binaurally rendered 360&ring; spatial audio content. Focusing on ecological validity using real-world recordings of 60 s duration, three independent groups of subjects () were exposed to three formats: audio only (A), audio with 2D visuals (A2DV), and audio with head-mounted display (AHMD) visuals. Within each interactive format, two sessions were conducted to evaluate degraded audio conditions: bit-rate and Ambisonics order. Our results show a statistically significant effect (p < .05) of format only on spatial audio quality ratings for Ambisonics order. Exploration data analysis shows that format A yields little variability in exploration, while formats A2DV and AHMD yield broader viewing distribution of 360&ring; content. The results imply audio quality factors can be optimized depending on the interactive format.



https://doi.org/10.1145/3573381.3596155
Raake, Alexander; Broll, Wolfgang; Chuang, Lewis L.; Domahidi, Emese; Wendemuth, Andreas
Cross-timescale experience evaluation framework for productive teaming. - In: Engineering for a changing world, (2023), 5.4.129, S. 1-6

This paper presents the initial concept for an evaluation framework to systematically evaluate productive teaming (PT). We consider PT as adaptive human-machine interactions between human users and augmented technical production systems. Also, human-to-human communication as part of a hybrid team with multiple human actors is considered, as well as human-human and human-machine communication for remote and mixed remote- and co-located teams. The evaluation comprises objective, performance-related success indicators, behavioral metadata, and measures of human experience. In particular, it considers affective, attentional and intentional states of human team members, their influence on interaction dynamics in the team, and researches appropriate strategies to satisfyingly adjust dysfunctional dynamics, using concepts of companion technology. The timescales under consideration span from seconds to several minutes, with selected studies targeting hour-long interactions and longer-term effects such as effort and fatigue. Two example PT scenarios will be discussed in more detail. To enable generalization and a systematic evaluation, the scenarios’ use cases will be decomposed into more general modules of interaction.



https://doi.org/10.22032/dbt.58930
Melnyk, Sergiy; Zhou, Qiuheng; Schotten, Hans D.; Rüther-Kindel, Wolfgang; Quaeck, Fabian; Stuckert, Nick; Vilter, Robert; Gebauer, Lisa; Galkow-Schneider, Mandy; Friese, Ingo; Drüsedow, Steffen; Pfandzelter, Tobias; Malekabbasi, Mohammadreza; Bermbach, David; Bassbouss, Louay; Zoubarev, Alexander; Neparidze, Andy; Kritzner, Arndt; Hartbrich, Jakob; Raake, Alexander; Zschau, Enrico; Schwahn, Klaus-Jürgen
6G NeXt - joint communication and compute mobile network: use cases and architecture. - In: Kommunikation in der Automation, (2023), 6, insges. 10 S.

The research on the new generation mobile networks is currently in the phase of defining the key technologies to make 6G successful. Hereby, the research project 6G NeXt is aiming to provide a tight integration between the communication network, consisting of the radio access as well as backbone network, and processing facilities. By the concept of split computing, the processing facilities are distributed over the entire backbone network, from centralised cloud to the edge cloud at a base station. Based on two demanding use cases, Smart Drones and Hologradic Communication, we investigate a joint communication and compute architecture that will make the application of tomorrow become reality.



https://opendata.uni-halle.de//handle/1981185920/113595
Chao, Fang-Yi; Battisti, Federica; Lebreton, Pierre; Raake, Alexander
Omnidirectional video saliency. - In: Immersive video technologies, (2023), S. 123-158

When exploring the visual world, humans (and other species) are faced with more information than they are able to process. To overcome this, selective visual attention allows them to gaze rapidly towards objects of interest in the visual environment. This function of the visual system is of high importance and has received a lot of attention from the scientific community. Being able to understand how selective attention works find applications in various fields, such as design, advertising, perceptual coding and streaming, etc., which makes it a highly regarded research topic. In recent years, with the advent of omnidirectional images and videos (ODI, ODV) new challenges are raised compared to traditional 2D visual content. Indeed, with such type of OD content, users are not restricted to explore what is shown to them in a static viewport, but are now able to freely explore the entire visual world around them by moving their head and even in some cases, by moving their entire body. Therefore, in this chapter, the work on the analysis of user behavior as well as the effort done towards modeling selective visual attention and visual exploration in OD images and video will be introduced. This chapter will provide information on the different approaches that have been taken, and key challenges that have been raised compared to traditional 2D contents, allowing the reader to grasp the key work that has been done on the understanding and modeling of the exploration of OD contents.



https://doi.org/10.1016/B978-0-32-391755-1.00011-0
Croci, Simone; Singla, Ashutosh; Fremerey, Stephan; Raake, Alexander; Smoliâc, Aljoscha
Subjective and objective quality assessment for omnidirectional video. - In: Immersive video technologies, (2023), S. 85-122

Video quality assessment is generally important to assess any kind of immersive media technology, for example, to evaluate a captured content, algorithms for encoding and projection, and systems, as well as for technology optimization. This chapter provides an overview of the two types of video quality assessment, subjective testing with human viewers, and quality prediction or estimation using video quality metrics or models. First, viewing tests with humans as the gold standard for video quality are reviewed in light of their instantiation for omnidirectional video (ODV). In the second part of the chapter, the less time-consuming, better scalable second type of assessment with objective video quality metrics and models is discussed, considering the specific requirements of ODV. Often they incorporate computational models of human perception and content properties. ODV introduces the challenges of interactivity compared to standard 2D video and typically spherical projection distortions due to its omnidirectional, “point-of-view” (in terms of camera-shot type) nature. Accordingly, subjective tests for ODV include specific considerations of the omnidirectional nature of the presented content and dedicated head-rotation or even additional eyetracking data capture. In the last part of the chapter, it is shown how to improve objective video quality prediction by taking into account user behavior and projection distortions.



https://doi.org/10.1016/B978-0-32-391755-1.00010-9
Stoll, Eckhard; Breide, Stephan; Göring, Steve; Raake, Alexander
Automatic camera selection, shot size, and video editing in theater multi-camera recordings. - In: IEEE access, ISSN 2169-3536, Bd. 11 (2023), S. 96673-96692

In a non-professional environment, multi-camera recordings of theater performances or other stage shows are difficult to realize, because amateurs are usually untrained in camera work and in using a vision mixing desk that mixes multiple cameras. This can be remedied by a production process with high-resolution cameras where recordings of image sections from long shots or medium-long shots are manually or automatically cropped in post-production. For this purpose, Gandhi et al. presented a single-camera system (referred to as Gandhi Recording System in the paper) that obtains close-ups from a high-resolution recording from the central perspective. The proposed system in this paper referred to as “Proposed Recording System” extends the method to four perspectives based on a Reference Recording System from professional TV theater recordings from the Ohnsorg Theater. Rules for camera selection, image cropping, and montage are derived from the Reference Recording System in this paper. For this purpose, body and pose recognition software is used and the stage action is reconstructed from the recordings into the stage set. Speakers are recognized by detecting lip movements and speaker changes are identified using audio diarization software. The Proposed Recording System proposed in this paper is practically instantiated on a school theater recording made by laymen using four 4K cameras. An automatic editing script is generated that outputs a montage of a scene. The principles can also be adapted for other recording situations with an audience, such as lectures, interviews, discussions, talk shows, gala events, award ceremonies, and the like. More than 70 % of test persons confirm in an online study the added value of the perspective diversity of four cameras of the Proposed Recording System versus the single-camera method of Gandhi et al.



https://doi.org/10.1109/ACCESS.2023.3311256
Immohr, Felix; Rendle, Gareth; Neidhardt, Annika; Göring, Steve; Ramachandra Rao, Rakesh Rao; Arévalo Arboleda, Stephanie; Froehlich, Bernd; Raake, Alexander
Proof-of-concept study to evaluate the impact of spatial audio on social presence and user behavior in multi-modal VR communication. - In: IMX 2023, (2023), S. 209-215

This paper presents a proof-of-concept study conducted to analyze the effect of simple diotic vs. spatial, position-dynamic binaural synthesis on social presence in VR, in comparison with face-to-face communication in the real world, for a sample two-party scenario. A conversational task with shared visual reference was realized. The collected data includes questionnaires for direct assessment, tracking data, and audio and video recordings of the individual participants’ sessions for indirect evaluation. While tendencies for improvements with binaural over diotic presentation can be observed, no significant difference in social presence was found for the considered scenario. The gestural analysis revealed that participants used the same amount and type of gestures in face-to-face as in VR, highlighting the importance of non-verbal behavior in communication. As part of the research, an end-to-end framework for conducting communication studies and analysis has been developed.



https://doi.org/10.1145/3573381.3596458
Ramachandra Rao, Rakesh Rao;
Bitstream-based video quality modeling and analysis of HTTP-based adaptive streaming. - Ilmenau : Universitätsbibliothek, 2023. - 1 Online-Ressource (viii, 252 Seiten)
Technische Universität Ilmenau, Dissertation 2023

Die Verbreitung erschwinglicher Videoaufnahmetechnologie und verbesserte Internetbandbreiten ermöglichen das Streaming von hochwertigen Videos (Auflösungen > 1080p, Bildwiederholraten ≥ 60fps) online. HTTP-basiertes adaptives Streaming ist die bevorzugte Methode zum Streamen von Videos, bei der Videoparameter an die verfügbare Bandbreite angepasst wird, was sich auf die Videoqualität auswirkt. Adaptives Streaming reduziert Videowiedergabeunterbrechnungen aufgrund geringer Netzwerkbandbreite, wirken sich jedoch auf die wahrgenommene Qualität aus, weswegen eine systematische Bewertung dieser notwendig ist. Diese Bewertung erfolgt üblicherweise für kurze Abschnitte von wenige Sekunden und während einer Sitzung (bis zu mehreren Minuten). Diese Arbeit untersucht beide Aspekte mithilfe perzeptiver und instrumenteller Methoden. Die perzeptive Bewertung der kurzfristigen Videoqualität umfasst eine Reihe von Labortests, die in frei verfügbaren Datensätzen publiziert wurden. Die Qualität von längeren Sitzungen wurde in Labortests mit menschlichen Betrachtern bewertet, die reale Betrachtungsszenarien simulieren. Die Methodik wurde zusätzlich außerhalb des Labors für die Bewertung der kurzfristigen Videoqualität und der Gesamtqualität untersucht, um alternative Ansätze für die perzeptive Qualitätsbewertung zu erforschen. Die instrumentelle Qualitätsevaluierung wurde anhand von bitstrom- und hybriden pixelbasierten Videoqualitätsmodellen durchgeführt, die im Zuge dieser Arbeit entwickelt wurden. Dazu wurde die Modellreihe AVQBits entwickelt, die auf den Labortestergebnissen basieren. Es wurden vier verschiedene Modellvarianten von AVQBits mit verschiedenen Inputinformationen erstellt: Mode 3, Mode 1, Mode 0 und Hybrid Mode 0. Die Modellvarianten wurden untersucht und schneiden besser oder gleichwertig zu anderen aktuellen Modellen ab. Diese Modelle wurden auch auf 360&ring;- und Gaming-Videos, HFR-Inhalte und Bilder angewendet. Darüber hinaus wird ein Langzeitintegrationsmodell (1 - 5 Minuten) auf der Grundlage des ITU-T-P.1203.3-Modells präsentiert, das die verschiedenen Varianten von AVQBits mit sekündigen Qualitätswerten als Videoqualitätskomponente des vorgeschlagenen Langzeitintegrationsmodells verwendet. Alle AVQBits-Varianten, das Langzeitintegrationsmodul und die perzeptiven Testdaten wurden frei zugänglich gemacht, um weitere Forschung zu ermöglichen.



https://doi.org/10.22032/dbt.57583
Göring, Steve; Raake, Alexander
Image appeal revisited: analysis, new dataset, and prediction models. - In: IEEE access, ISSN 2169-3536, Bd. 11 (2023), S. 69563-69585

There are more and more photographic images uploaded to social media platforms such as Instagram, Flickr, or Facebook on a daily basis. At the same time, attention and consumption for such images is high, with image views and liking as one of the success factors for users and driving forces for social media algorithms. Here, “liking” can be assumed to be driven by image appeal and further factors such as who is posting the images and what they may show and reveal about the posting person. It is therefore of high research interest to evaluate the appeal of such images in the context of social media platforms. Such an appeal evaluation may help to improve image quality or could be used as an additional filter criterion to select good images. To analyze image appeal, various datasets have been established over the past years. However, not all datasets contain high-resolution images, are up to date, or include additional data, such as meta-data or social-media-type data such as likes and views. We created our own dataset “AVT-ImageAppeal-Dataset”, which includes images from different photo-sharing platforms. The dataset also includes a subset of other state-of-the-art datasets and is extended by social-media-type data, meta-data, and additional images. In this paper, we describe the dataset and a series of laboratory- and crowd-tests we conducted to evaluate image appeal. These tests indicate that there is only a small influence when likes and views are included in the presentation of the images in comparison to when these are not shown, and also the appeal ratings are only a little correlated to likes and views. Furthermore, it is shown that lab and crowd tests are highly similar considering the collected appeal ratings. In addition to the dataset, we also describe various machine learning models for the prediction of image appeal, using only the photo itself as input. The models have a similar or slightly better performance than state-of-the-art models. The evaluation indicates that there is still an improvement in image appeal prediction and furthermore, other aspects, such as the presentation context could be evaluated.



https://doi.org/10.1109/ACCESS.2023.3292588
Melnyk, Sergiy; Zhou, Qiuheng; Schotten, Hans D.; Galkow-Schneider, Mandy; Friese, Ingo; Pfandzelter, Tobias; Bermbach, David; Bassbouss, Louay; Zoubarev, Alexander; Neparidze, Andy; Kritzner, Arndt; Zschau, Enrico; Dhara, Prasenjit; Göring, Steve; Menz, William; Raake, Alexander; Rüther-Kindel, Wolfgang; Quaeck, Fabian; Stuckert, Nick; Vilter, Robert
6G NeXt - toward 6G split computing network applications: use cases and architecture. - In: Mobilkommunikation, (2023), S. 126-131

Göring, Steve; Ramachandra Rao, Rakesh Rao; Raake, Alexander
Quality assessment of higher resolution images and videos with remote testing. - In: Quality and user experience, ISSN 2366-0147, Bd. 8 (2023), 1, 2, S. 1-26

In many research fields, human-annotated data plays an important role as it is used to accomplish a multitude of tasks. One such example is in the field of multimedia quality assessment where subjective annotations can be used to train or evaluate quality prediction models. Lab-based tests could be one approach to get such quality annotations. They are usually performed in well-defined and controlled environments to ensure high reliability. However, this high reliability comes at a cost of higher time consumption and costs incurred. To mitigate this, crowd or online tests could be used. Usually, online tests cover a wider range of end devices, environmental conditions, or participants, which may have an impact on the ratings. To verify whether such online tests can be used for visual quality assessment, we designed three online tests. These online tests are based on previously conducted lab tests as this enables comparison of the results of both test paradigms. Our focus is on the quality assessment of high-resolution images and videos. The online tests use AVrate Voyager, which is a publicly accessible framework for online tests. To transform the lab tests into online tests, dedicated adaptations in the test methodologies are required. The considered modifications are, for example, a patch-based or centre cropping of the images and videos, or a randomly sub-sampling of the to-be-rated stimuli. Based on the analysis of the test results in terms of correlation and SOS analysis it is shown that online tests can be used as a reliable replacement for lab tests albeit with some limitations. These limitations relate to, e.g., lack of appropriate display devices, limitation of web technologies, and modern browsers considering support for different video codecs and formats.



https://doi.org/10.1007/s41233-023-00055-6
Göring, Steve; Ramachandra Rao, Rakesh Rao; Merten, Rasmus; Raake, Alexander
Analysis of appeal for realistic AI-generated photos. - In: IEEE access, ISSN 2169-3536, Bd. 11 (2023), S. 38999-39012

AI-generated images have gained in popularity in recent years due to improvements and developments in the field of artificial intelligence. This has led to several new AI generators, which may produce realistic, funny, and impressive images using a simple text prompt. DALL-E-2, Midjourney, and Craiyon are a few examples of the mentioned approaches. In general, it can be seen that the quality, realism, and appeal of the images vary depending on the used approach. Therefore, in this paper, we analyze to what extent such AI-generated images are realistic or of high appeal from a more photographic point of view and how users perceive them. To evaluate the appeal of several state-of-the-art AI generators, we develop a dataset consisting of 27 different text prompts, with some of them being based on the DrawBench prompts. Using these prompts we generated a total of 135 images with five different AI-Text-To-Image generators. These images in combination with real photos form the basis of our evaluation. The evaluation is based on an online subjective study and the results are compared with state-of-the-art image quality models and features. The results indicate that some of the included generators are able to produce realistic and highly appealing images. However, this depends on the approach and text prompt to a large extent. The dataset and evaluation of this paper are made publicly available for reproducibility, following an Open Science approach.



https://doi.org/10.1109/ACCESS.2023.3267968
Weidner, Florian; Böttcher, Gerd; Arévalo Arboleda, Stephanie; Diao, Chenyao; Sinani, Luljeta; Kunert, Christian; Gerhardt, Christoph; Broll, Wolfgang; Raake, Alexander
A systematic review on the visualization of avatars and agents in AR & VR displayed using head-mounted displays. - In: IEEE transactions on visualization and computer graphics, ISSN 1941-0506, Bd. 29 (2023), 5, S. 2596-2606

Augmented Reality (AR) and Virtual Reality (VR) are pushing from the labs towards consumers, especially with social applications. These applications require visual representations of humans and intelligent entities. However, displaying and animating photo-realistic models comes with a high technical cost while low-fidelity representations may evoke eeriness and overall could degrade an experience. Thus, it is important to carefully select what kind of avatar to display. This article investigates the effects of rendering style and visible body parts in AR and VR by adopting a systematic literature review. We analyzed 72 papers that compare various avatar representations. Our analysis includes an outline of the research published between 2015 and 2022 on the topic of avatars and agents in AR and VR displayed using head-mounted displays, covering aspects like visible body parts (e.g., hands only, hands and head, full-body) and rendering style (e.g., abstract, cartoon, realistic); an overview of collected objective and subjective measures (e.g., task performance, presence, user experience, body ownership); and a classification of tasks where avatars and agents were used into task domains (physical activity, hand interaction, communication, game-like scenarios, and education/training). We discuss and synthesize our results within the context of today's AR and VR ecosystem, provide guidelines for practitioners, and finally identify and present promising research opportunities to encourage future research of avatars and agents in AR/VR environments.



https://doi.org/10.1109/TVCG.2023.3247072
Stoll, Eckhard; Breide, Stephan; Göring, Steve; Raake, Alexander
Modeling of an automatic vision mixer with human characteristics for multi-camera theater recordings. - In: IEEE access, ISSN 2169-3536, Bd. 11 (2023), S. 18714-18726

A production process using high-resolution cameras can be used for multi-camera recordings of theater performances or other stage performances. One approach to automate the generation of suitable image cuts could be to focus on speaker changes so that the person who is speaking is shown in the generated cut. However, these image cuts can appear static and robotic if they are set too precisely. Therefore, the characteristics and habits of professional vision mixers (persons who operate the vision mixing desk) during the editing process are investigated in more detail in order to incorporate them into an automation process. The characteristic features of five different vision mixers are examined, which were used under almost identical recording conditions for theatrical cuts in TV productions. The cuts are examined with regard to their temporal position in relation to pauses in speech, which take place during speaker changes on stage. It is shown that different professional vision mixers set the cuts individually differently before, in or after the pauses in speech. Measured are differences on average up to 0.3 seconds. From the analysis of the image cuts, an approach for a model is developed in which the individual characteristics of a vision mixer can be set. With the help of this novel model, a more human appearance can be given to otherwise exact and robotic cuts, when automating image cuts.



https://doi.org/10.1109/ACCESS.2023.3245804
Leist, Larissa; Reimers, Carolin; Fremerey, Stephan; Fels, Janina; Raake, Alexander; Klatte, Maria
Effects of binaural classroom noise scenarios on primary school children's speech perception and listening comprehension. - In: 51st International Congress and Exposition on Noise Control Engineering (INTER-NOISE 2022), (2023), S. 3214-3220

Singla, Ashutosh;
Assessment of visual quality and simulator sickness for omnidirectional videos. - Ilmenau, 2023. - viii, 186 Seiten
Technische Universität Ilmenau, Dissertation 2022

Ein Anwendungsfall für die aktuelle VR-Technologie mit Head Mounted Displays (HMDs) sind 360&ring;-Videos. Die valide Bewertung der Erlebnisqualität (Quality of Experience, QoE) für 360&ring;-Videos erfordert subjektive Tests. Solche Bewertungstests sind zeitaufwändig und erfordern ein gut durchdachtes Protokoll. Internationale Empfehlungen wie z. B. ITU-T Rec. P.910 und ITU-R Rec. BT.500-13 existieren, die Richtlinien für die Bewertung der Videoqualität von 2D-Videos auf 2D-Displays unter Einbeziehung von Testpersonen enthalten. Bis zu dieser Arbeit gab es jedoch keine solche Standardempfehlung für 360&ring;-Videos. Daher war es notwendig, eine Reihe von Richtlinien zu entwickeln, um die visuelle Qualität und die QoE-Bewertung für 360&ring;-Videos zu untersuchen. In dieser Arbeit werden umfangreiche Forschungsarbeiten zur Qualität und Lebensqualität von 360&ring;-Videos vorgestellt, die von Nutzern mit HMDs wahrgenommen werden, sowie eine Reihe von Testprotokollen für eine systematische Bewertung. Zunächst wurden konventionelle subjektive Testmethoden wie das Absolute Category Rating (ACR) und die Double Stimulus Impairment Scale (DSIS) zur Bewertung der Videoqualität eingesetzt, neben der im Rahmen dieser Arbeit neu vorgeschlagenen Modified ACR (M-ACR) Methode. Aufbauend auf der Zuverlässigkeit und allgemeinen Anwendbarkeit des Verfahrens über verschiedene Tests hinweg wird in dieser Arbeit ein methodischer Rahmen für die Bewertung der Qualität von 360&ring;-Videos vorgestellt. Zweitens bringt ein erhöhter Immersionsgrad bei 360&ring;-Videos das Problem der Simulatorkrankheit (Simulator Sickness) als weiteren QoE-Bestandteil mit sich. Daher wird in dieser Arbeit die Simulatorkrankheit analysiert, um die Auswirkungen verschiedener Einflussfaktoren zu untersuchen. Die gewonnenen Erkenntnisse zur Simulator Sickness im Zusammenhang mit 360&ring;-Videos tragen zu einem besseren Verständnis dieses speziellen Anwendungsfalls von VR bei. Darüber hinaus wird ein vereinfachter Fragebogen zur Simulatorkrankheit (SSQ) für die Selbsteinschätzung von Symptomen, die für 360&ring;-Videos relevant sind, vorgeschlagen, indem verschiedene Versionen von Fragebögen mit den State-of-the-Art-Varianten Cybersickness Questionnaire und Virtual Reality Symptom Questionnaire sowie den bestehenden SSQ-Skalen verglichen werden. Die Ergebnisse zeigen, dass die vereinfachte Version des SSQ sich auf die Symptome konzentriert, die für Studien mit 360&ring;-Videos relevant sind. Es wird gezeigt, dass er effektiv eingesetzt werden kann, wobei der reduzierte Satz von Skalen eine effizientere und damit umfangreichere Prüfung ermöglicht.



De Moor, Katrien; Fiedler, Markus; Raake, Alexander; Jhunjhunwala, Ashok; Gnanasekaran, Vahiny; Subramanian, Sruti; Zinner, Thomas
Towards the design and evaluation of more sustainable multimedia experiences: which role can QoE research play?. - In: ACM SIGMultimedia records, ISSN 1947-4598, Bd. 14 (2022), 3, 4, S. 1

In this column, we reflect on the environmental impact and broader sustainability implications of resource-demanding digital applications and services such as video streaming, VR/AR/XR and videoconferencing. We put emphasis not only on the experiences and use cases they enable but also on the "cost" of always striving for high Quality of Experience (QoE) and better user experiences. Starting by sketching the broader context, our aim is to raise awareness about the role that QoE research can play in the context of various of the United Nations' Sustainable Development Goals (SDGs), either directly (e.g., SDG 13 "climate action") or more indirectly (e.g., SDG 3 "good health and well-being" and SDG 12 "responsible consumption and production").



https://doi.org/10.1145/3630658.3630662
Reimers, Carolin; Loh, Karin; Leist, Larissa; Fremerey, Stephan; Raake, Alexander; Klatte, Maria; Fels, Janina
Investigating different cueing methods for auditory selective attention in virtual reality. - Berlin : Deutsche Gesellschaft für Akustik e.V.. - 1 Online-Ressource (4 Seiten)Online-Ausgabe: DAGA 2022 : 48. Jahrestagung für Akustik, 21.-24. März 2022, Stuttgart und Online, Seiten/Artikel-Nr: 1173-1176

An audio-only paradigm for investigating auditory selective attention (ASA) has previously been transferred into a classroom-type audio-visual virtual reality (VR) environment. Due to the paradigm structure, the participants were only focusing on a specific area of the VR environment during the entire experiment. In a more realistic scenario, participants are expected to interact with the scene. Therefore, this study investigates new cueing methods that may reduce the focus on one point in the virtual world and allow for further development of a close-to-real life scenario.



https://doi.org/10.18154/RWTH-2022-04388
Breuer, Carolin; Loh, Karin; Leist, Larissa; Fremerey, Stephan; Raake, Alexander; Klatte, Maria; Fels, Janina
Examining the auditory selective attention switch in a child-suited virtual reality classroom environment. - In: International journal of environmental research and public health, ISSN 1660-4601, Bd. 19 (2022), 24, 16569, S. 1-20

The ability to focus ones attention in different acoustical environments has been thoroughly investigated in the past. However, recent technological advancements have made it possible to perform laboratory experiments in a more realistic manner. In order to investigate close-to-real-life scenarios, a classroom was modeled in virtual reality (VR) and an established paradigm to investigate the auditory selective attention (ASA) switch was translated from an audio-only version into an audiovisual VR setting. The new paradigm was validated with adult participants in a listening experiment, and the results were compared to the previous version. Apart from expected effects such as switching costs and auditory congruency effects, which reflect the robustness of the overall paradigm, a difference in error rates between the audio-only and the VR group was found, suggesting enhanced attention in the new VR setting, which is consistent with recent studies. Overall, the results suggest that the presented VR paradigm can be used and further developed to investigate the voluntary auditory selective attention switch in a close-to-real-life classroom scenario.



https://doi.org/10.3390/ijerph192416569
Leist, Larissa; Breuer, Carolin; Yadav, Manuj; Fremerey, Stephan; Fels, Janina; Raake, Alexander; Lachmann, Thomas; Schlittmeier, Sabine; Klatte, Maria
Differential effects of task-irrelevant monaural and binaural classroom scenarios on children's and adults' speech perception, listening comprehension, and visual-verbal short-term memory. - In: International journal of environmental research and public health, ISSN 1660-4601, Bd. 19 (2022), 23, 15998, S. 1-17

Most studies investigating the effects of environmental noise on children’s cognitive performance examine the impact of monaural noise (i.e., same signal to both ears), oversimplifying multiple aspects of binaural hearing (i.e., adequately reproducing interaural differences and spatial information). In the current study, the effects of a realistic classroom-noise scenario presented either monaurally or binaurally on tasks requiring processing of auditory and visually presented information were analyzed in children and adults. In Experiment 1, across age groups, word identification was more impaired by monaural than by binaural classroom noise, whereas listening comprehension (acting out oral instructions) was equally impaired in both noise conditions. In both tasks, children were more affected than adults. Disturbance ratings were unrelated to the actual performance decrements. Experiment 2 revealed detrimental effects of classroom noise on short-term memory (serial recall of words presented pictorially), which did not differ with age or presentation mode (monaural vs. binaural). The present results add to the evidence for detrimental effects of noise on speech perception and cognitive performance, and their interactions with age, using a realistic classroom-noise scenario. Binaural simulations of real-world auditory environments can improve the external validity of studies on the impact of noise on children’s and adults’ learning.



https://doi.org/10.3390/ijerph192315998
Robotham, Thomas; Singla, Ashutosh; Rummukainen, Olli S.; Raake, Alexander; Habets, Emanuel A.P.
Audiovisual database with 360&ring; video and higher-order Ambisonics audio for perception, cognition, behavior, and QoE evaluation research. - In: 2022 14th International Conference on Quality of Multimedia Experience (QoMEX), (2022), insges. 6 S.

Research into multi-modal perception, human cognition, behavior, and attention can benefit from high-fidelity content that may recreate real-life-like scenes when rendered on head-mounted displays. Moreover, aspects of audiovisual perception, cognitive processes, and behavior may complement questionnaire-based Quality of Experience (QoE) evaluation of interactive virtual environments. Currently, there is a lack of high-quality open-source audiovisual databases that can be used to evaluate such aspects or systems capable of reproducing high-quality content. With this paper, we provide a publicly available audiovisual database consisting of twelve scenes capturing real-life nature and urban environments with a video resolution of 7680×3840 at 60 frames-per-second and with 4th-order Ambisonics audio. These 360&ring; video sequences, with an average duration of 60 seconds, represent real-life settings for systematically evaluating various dimensions of uni-/multi-modal perception, cognition, behavior, and QoE. The paper provides details of the scene requirements, recording approach, and scene descriptions. The database provides high-quality reference material with a balanced focus on auditory and visual sensory information. The database will be continuously updated with additional scenes and further metadata such as human ratings and saliency information.



https://doi.org/10.1109/QoMEX55416.2022.9900893
Herglotz, Christian; Robitza, Werner; Kränzler, Matthias; Kaup, André; Raake, Alexander
Modeling of energy consumption and streaming video QoE using a crowdsourcing dataset. - In: 2022 14th International Conference on Quality of Multimedia Experience (QoMEX), (2022), insges. 6 S.

In the past decade, we have witnessed an enormous growth in the demand for online video services. Recent studies estimate that nowadays, more than 1% of the global greenhouse gas emissions can be attributed to the production and use of devices performing online video tasks. As such, research on the true power consumption of devices and their energy efficiency during video streaming is highly important for a sustainable use of this technology. At the same time, over-the-top providers strive to offer high-quality streaming experiences to satisfy user expectations. Here, energy consumption and QoE partly depend on the same system parameters. Hence, a joint view is needed for their evaluation. In this paper, we perform a first analysis of both end-user power efficiency and Quality of Experience of a video streaming service. We take a crowdsourced dataset comprising 447,000 streaming events from YouTube and estimate both the power consumption and perceived quality. The power consumption is modeled based on previous work which we extended towards predicting the power usage of different devices and codecs. The user-perceived QoE is estimated using a standardized model. Our results indicate that an intelligent choice of streaming parameters can optimize both the QoE and the power efficiency of the end user device. Further, the paper discusses limitations of the approach and identifies directions for future research.



https://doi.org/10.1109/QoMEX55416.2022.9900886
Döring, Nicola; Conde, Melisa; Brandenburg, Karlheinz; Broll, Wolfgang; Groß, Horst-Michael; Werner, Stephan; Raake, Alexander
Can communication technologies reduce loneliness and social isolation in older people? : a scoping review of reviews. - In: International journal of environmental research and public health, ISSN 1660-4601, Bd. 19 (2022), 18, 11310, S. 1-20

Background: Loneliness and social isolation in older age are considered major public health concerns and research on technology-based solutions is growing rapidly. This scoping review of reviews aims to summarize the communication technologies (CTs) (review question RQ1), theoretical frameworks (RQ2), study designs (RQ3), and positive effects of technology use (RQ4) present in the research field. Methods: A comprehensive multi-disciplinary, multi-database literature search was conducted. Identified reviews were analyzed according to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) framework. A total of N = 28 research reviews that cover 248 primary studies spanning 50 years were included. Results: The majority of the included reviews addressed general internet and computer use (82% each) (RQ1). Of the 28 reviews, only one (4%) worked with a theoretical framework (RQ2) and 26 (93%) covered primary studies with quantitative-experimental designs (RQ3). The positive effects of technology use were shown in 55% of the outcome measures for loneliness and 44% of the outcome measures for social isolation (RQ4). Conclusion: While research reviews show that CTs can reduce loneliness and social isolation in older people, causal evidence is limited and insights on innovative technologies such as augmented reality systems are scarce.



https://doi.org/10.3390/ijerph191811310
Ramachandra Rao, Rakesh Rao; Göring, Steve; Raake, Alexander
AVQBits-adaptive video quality model based on bitstream information for various video applications. - In: IEEE access, ISSN 2169-3536, Bd. 10 (2022), S. 80321-80351

The paper presents AVQBits, a versatile, bitstream-based video quality model. It can be applied in several contexts such as video service monitoring, evaluation of video encoding quality, of gaming video QoE, and even of omnidirectional video quality. In the paper, it is shown that AVQBits predictions closely match video quality ratings obained in various subjective tests with human viewers, for videos up to 4K-UHD resolution (Ultra-High Definition, 3840 x 2180 pixels) and framerates up 120 fps. With the different variants of AVQBits presented in the paper, video quality can be monitored either at the client side, in the network or directly after encoding. The no-reference AVQBits model was developed for different video services and types of input data, reflecting the increasing popularity of Video-on-Demand services and widespread use of HTTP-based adaptive streaming. At its core, AVQBits encompasses the standardized ITU-T P.1204.3 model, with further model instances that can either have restricted or extended input information, depending on the application context. Four different instances of AVQBits are presented, that is, a Mode 3 model with full access to the bitstream, a Mode 0 variant using only metadata such as codec type, framerate, resoution and bitrate as input, a Mode 1 model using Mode 0 information and frame-type and -size information, and a Hybrid Mode 0 model that is based on Mode 0 metadata and the decoded video pixel information. The models are trained on the authors’ own AVT-PNATS-UHD-1 dataset described in the paper. All models show a highly competitive performance by using AVT-VQDB-UHD-1 as validation dataset, e.g., with the Mode 0 variant yielding a value of 0.890 Pearson Correlation, the Mode 1 model of 0.901, the hybrid no-reference mode 0 model of 0.928 and the model with full bitstream access of 0.942. In addition, all four AVQBits variants are evaluated when applying them out-of-the-box to different media formats such as 360&ring; video, high framerate (HFR) content, or gaming videos. The analysis shows that the ITU-T P.1204.3 and Hybrid Mode 0 instances of AVQBits for the considered use-cases either perform on par with or better than even state-of-the-art full reference, pixel-based models. Furthermore, it is shown that the proposed Mode 0 and Mode 1 variants outperform commonly used no-reference models for the different application scopes. Also, a long-term integration model based on the standardized ITU-T P.1203.3 is presented to estimate ratings of overall audiovisual streaming Quality of Experience (QoE) for sessions of 30 s up to 5 min duration. In the paper, the AVQBits instances with their per-1-sec score output are evaluated as the video quality component of the proposed long-term integration model. All AVQBits variants as well as the long-term integration module are made publicly available for the community for further research.



https://doi.org/10.1109/ACCESS.2022.3195527
Bajpai, Vaibhav; Hohlfeld, Oliver; Crowcroft, Jon; Keshav, Srinivasan; Schulzrinne, Henning; Ott, Jörg; Ferlin, Simone; Carle, Georg; Hines, Andrew; Raake, Alexander
Recommendations for designing hybrid conferences. - In: ACM SIGCOMM computer communication review, ISSN 0146-4833, Bd. 52 (2022), 2, S. 63-69

During the COVID-19 pandemic, many smaller conferences have moved entirely online and larger ones are being held as hybrid events. Even beyond the pandemic, hybrid events reduce the carbon footprint of conference travel and makes events more accessible to parts of the research community that have difficulty traveling long distances, while preserving most advantages of in-person gatherings. While we have developed a solid understanding of how to design virtual events over the last two years, we are still learning how to properly run hybrid events. We present guidelines and considerations-spanning technology, organization and social factors-for organizing successful hybrid conferences. This paper summarizes and extends the discussions held at the Dagstuhl seminar on "Climate Friendly Internet Research" held in July 2021.



https://doi.org/10.1145/3544912.3544920
Gutiérrez, Jesús; Pérez, Pablo; Orduna, Marta; Singla, Ashutosh; Cortés, Carlos; Mazumdar, Pramit; Viola, Irene; Brunnström, Kjell; Battisti, Federica; Ciepliânska, Natalia; Juszka, Dawid; Janowski, Lucjan; Leszczuk, Mikołaj; Adeyemi-Ejeye, Anthony; Hu, Yaosi; Chen, Zhenzhong; Wallendael, Glenn Van; Lambert, Peter; Díaz, César; Hedlund, John; Hamsis, Omar; Fremerey, Stephan; Hofmeyer, Frank; Raake, Alexander; César, Pablo; Carli, Marco; García, Narciso
Subjective evaluation of visual quality and simulator sickness of short 360&ring; videos: ITU-T Rec. P.919. - In: IEEE transactions on multimedia, Bd. 24 (2022), S. 3087-3100

Recently an impressive development in immersive technologies, such as Augmented Reality (AR), Virtual Reality (VR) and 360&ring; video, has been witnessed. However, methods for quality assessment have not been keeping up. This paper studies quality assessment of 360&ring; video from the cross-lab tests (involving ten laboratories and more than 300 participants) carried out by the Immersive Media Group (IMG) of the Video Quality Experts Group (VQEG). These tests were addressed to assess and validate subjective evaluation methodologies for 360&ring; video. Audiovisual quality, simulator sickness symptoms, and exploration behavior were evaluated with short (from 10 seconds to 30 seconds) 360&ring; sequences. The following factors’ influences were also analyzed: assessment methodology, sequence duration, Head-Mounted Display (HMD) device, uniform and non-uniform coding degradations, and simulator sickness assessment methods. The obtained results have demonstrated the validity of Absolute Category Rating (ACR) and Degradation Category Rating (DCR) for subjective tests with 360&ring; videos, the possibility of using 10-second videos (with or without audio) when addressing quality evaluation of coding artifacts, as well as any commercial HMD (satisfying minimum requirements). Also, more efficient methods than the long Simulator Sickness Questionnaire (SSQ) have been proposed to evaluate related symptoms with 360&ring; videos. These results have been instrumental for the development of the ITU-T Recommendation P.919. Finally, the annotated dataset from the tests is made publicly available for the research community.



https://doi.org/10.1109/TMM.2021.3093717
Göring, Steve;
Data-driven visual quality estimation using machine learning. - Ilmenau : Universitätsbibliothek, 2022. - 1 Online-Ressource (vi, 190 Seiten)
Technische Universität Ilmenau, Dissertation 2022

Heutzutage werden viele visuelle Inhalte erstellt und sind zugänglich, was auf Verbesserungen der Technologie wie Smartphones und das Internet zurückzuführen ist. Es ist daher notwendig, die von den Nutzern wahrgenommene Qualität zu bewerten, um das Erlebnis weiter zu verbessern. Allerdings sind nur wenige der aktuellen Qualitätsmodelle speziell für höhere Auflösungen konzipiert, sagen mehr als nur den Mean Opinion Score vorher oder nutzen maschinelles Lernen. Ein Ziel dieser Arbeit ist es, solche maschinellen Modelle für höhere Auflösungen mit verschiedenen Datensätzen zu trainieren und zu evaluieren. Als Erstes wird eine objektive Analyse der Bildqualität bei höheren Auflösungen durchgeführt. Die Bilder wurden mit Video-Encodern komprimiert, hierbei weist AV1 die beste Qualität und Kompression auf. Anschließend werden die Ergebnisse eines Crowd-Sourcing-Tests mit einem Labortest bezüglich Bildqualität verglichen. Weiterhin werden auf Deep Learning basierende Modelle für die Vorhersage von Bild- und Videoqualität beschrieben. Das auf Deep Learning basierende Modell ist aufgrund der benötigten Ressourcen für die Vorhersage der Videoqualität in der Praxis nicht anwendbar. Aus diesem Grund werden pixelbasierte Videoqualitätsmodelle vorgeschlagen und ausgewertet, die aussagekräftige Features verwenden, welche Bild- und Bewegungsaspekte abdecken. Diese Modelle können zur Vorhersage von Mean Opinion Scores für Videos oder sogar für anderer Werte im Zusammenhang mit der Videoqualität verwendet werden, wie z.B. einer Bewertungsverteilung. Die vorgestellte Modellarchitektur kann auf andere Videoprobleme angewandt werden, wie z.B. Videoklassifizierung, Vorhersage der Qualität von Spielevideos, Klassifikation von Spielegenres oder der Klassifikation von Kodierungsparametern. Ein wichtiger Aspekt ist auch die Verarbeitungszeit solcher Modelle. Daher wird ein allgemeiner Ansatz zur Beschleunigung von State-of-the-Art-Videoqualitätsmodellen vorgestellt, der zeigt, dass ein erheblicher Teil der Verarbeitungszeit eingespart werden kann, während eine ähnliche Vorhersagegenauigkeit erhalten bleibt. Die Modelle sind als Open Source veröffentlicht, so dass die entwickelten Frameworks für weitere Forschungsarbeiten genutzt werden können. Außerdem können die vorgestellten Ansätze als Bausteine für neuere Medienformate verwendet werden.



https://doi.org/10.22032/dbt.52210
Skowronek, Janto; Raake, Alexander; Berndtsson, Gunilla H.; Rummukainen, Olli S.; Usai, Paolino; Gunkel, Simon N. B.; Johanson, Mathias; Habets, Emanuel A.P.; Malfait, Ludovic; Lindero, David; Toet, Alexander
Quality of experience in telemeetings and videoconferencing: a comprehensive survey. - In: IEEE access, ISSN 2169-3536, Bd. 10 (2022), S. 63885-63931

Telemeetings such as audiovisual conferences or virtual meetings play an increasingly important role in our professional and private lives. For that reason, system developers and service providers will strive for an optimal experience for the user, while at the same time optimizing technical and financial resources. This leads to the discipline of Quality of Experience (QoE), an active field originating from the telecommunication and multimedia engineering domains, that strives for understanding, measuring, and designing the quality experience with multimedia technology. This paper provides the reader with an entry point to the large and still growing field of QoE of telemeetings, by taking a holistic perspective, considering both technical and non-technical aspects, and by focusing on current and near-future services. Addressing both researchers and practitioners, the paper first provides a comprehensive survey of factors and processes that contribute to the QoE of telemeetings, followed by an overview of relevant state-of-the-art methods for QoE assessment. To embed this knowledge into recent technology developments, the paper continues with an overview of current trends, focusing on the field of eXtended Reality (XR) applications for communication purposes. Given the complexity of telemeeting QoE and the current trends, new challenges for a QoE assessment of telemeetings are identified. To overcome these challenges, the paper presents a novel Profile Template for characterizing telemeetings from the holistic perspective endorsed in this paper.



https://doi.org/10.1109/ACCESS.2022.3176369
Robitza, Werner; Ramachandra Rao, Rakesh Rao; Göring, Steve; Dethof, Alexander; Raake, Alexander
Deploying the ITU-T P.1203 QoE model in the wild and retraining for new codecs. - In: MHV '22, (2022), S. 121-122

This paper presents two challenges associated with using the ITU-T P.1203 standard for video quality monitoring in practice. We discuss the issue of unavailable data on certain browsers/platforms and the lack of information within newly developed data formats like Common Media Client Data. We also re-trained the coefficients of the P.1203.1 video model for newer codecs, and published a completely new model derived from the P.1204.3 bitstream model.



https://doi.org/10.1145/3510450.3517310
Döring, Nicola; De Moor, Katrien; Fiedler, Markus; Schoenenberg, Katrin; Raake, Alexander
Videoconference fatigue: a conceptual analysis. - In: International journal of environmental research and public health, ISSN 1660-4601, Bd. 19 (2022), 4, 2061, S. 1-20

Videoconferencing (VC) is a type of online meeting that allows two or more participants from different locations to engage in live multi-directional audio-visual communication and collaboration (e.g., via screen sharing). The COVID-19 pandemic has induced a boom in both private and professional videoconferencing in the early 2020s that elicited controversial public and academic debates about its pros and cons. One main concern has been the phenomenon of videoconference fatigue. The aim of this conceptual review article is to contribute to the conceptual clarification of VC fatigue. We use the popular and succinct label "Zoom fatigue" interchangeably with the more generic label "videoconference fatigue" and define it as the experience of fatigue during and/or after a videoconference, regardless of the specific VC system used. We followed a structured eight-phase process of conceptual analysis that led to a conceptual model of VC fatigue with four key causal dimensions: (1) personal factors, (2) organizational factors, (3) technological factors, and (4) environmental factors. We present this 4D model describing the respective dimensions with their sub-dimensions based on theories, available evidence, and media coverage. The 4D-model is meant to help researchers advance empirical research on videoconference fatigue.



https://doi.org/10.3390/ijerph19042061
Katsavounidis, Ioannis; Robitza, Werner; Puri, Rohit; Satti, Shahid
VQEG column: new topics. - In: ACM SIGMultimedia records, ISSN 1947-4598, Bd. 13 (2021), 1, 5, S. 1

Welcome to the fourth column on the ACM SIGMM Records from the Video Quality Experts Group (VQEG). During the last VQEG plenary meeting (14-18 Dec. 2020) various interesting discussions arose regarding new topics not addressed up to then by VQEG groups, which led to launching three new sub-projects and a new project related to: 1) clarifying the computation of spatial and temporal information (SI and TI), 2) including video quality metrics as metadata in compressed bitstreams, 3) Quality of Experience (QoE) metrics for live video streaming applications, and 4) providing guidelines on implementing objective video quality metrics to the video compression community. The following sections provide more details about these new activities and try to encourage interested readers to follow and get involved in any of them by subscribing to the corresponding reflectors.



https://doi.org/10.1145/3577934.3577939
Göring, Steve; Raake, Alexander
Rule of thirds and simplicity for image aesthetics using deep neural networks. - In: IEEE 23rd International Workshop on Multimedia Signal Processing, (2021), insges. 6 S.

Considering the increasing amount of photos being uploaded to sharing platforms, a proper evaluation of photo appeal or aesthetics is required. For appealing images several "rules of thumb" have been established, e.g., the rule of thirds and simplicity. We handle rule of thirds and simplicity as binary classification problems with a deep learning based image processing pipeline. Our pipeline uses a pre-processing step, a pre-trained baseline deep neural network (DNN) and post-processing. For each of the rules, we re-train 17 pre-trained DNN models using transfer learning. Our results for publicly available datasets show that the ResNet152 DNN is best for rule of thirds prediction and DenseNet121 is best for simplicity with an accuracy of around 0.84 and 0.94 respectively. In addition to the datasets for both classifications, five experts annotated another dataset with ≈ 1100 images and we evaluate the best performing models. Results show that the best performing models have an accuracy of 0.67 for rule of thirds and 0.79 for image simplicity. Both accuracy results are within the range of pairwise accuracy of expert annotators. However, it further indicates that there is a high subjective influence for both of the considered rules.



https://doi.org/10.1109/MMSP53017.2021.9733554
Göring, Steve; Ramachandra Rao, Rakesh Rao; Fremerey, Stephan; Raake, Alexander
AVrate Voyager: an open source online testing platform. - In: IEEE 23rd International Workshop on Multimedia Signal Processing, (2021), insges. 6 S.

Subjective testing is an integral part of many research fields considering, e.g., human perception. For this purpose, lab tests are a popular approach to gather ratings for subjective evaluations. However, not in all cases controlled lab tests can be performed, either in cases where no labs are existing, accessible or it may be disallowed to use them. For this reason, online tests, e.g., using crowdsourcing are supposed to be an alternative approach for traditional lab tests. We describe in the following paper a framework to implement such online tests for audio, video, and image-related evaluations or questionnaires. Our framework AVrate Voyager builds upon previously developed frameworks for lab tests including the experience with them. AVrate Voyager uses scalable web technologies to implement a test framework, this ensures that it will be running reliably. In addition, we added strategies for pre-caching to avoid additional influence for play-out, e.g. in the case of video testing. We analyze several conducted tests using the new framework and describe the required steps to modify the provided tool in detail.



https://doi.org/10.1109/MMSP53017.2021.9733561
Döring, Nicola; Mikhailova, Veronika; Brandenburg, Karlheinz; Broll, Wolfgang; Groß, Horst-Michael; Werner, Stephan; Raake, Alexander
Saying "Hi" to grandma in nine different ways : established and innovative communication media in the grandparent-grandchild relationship. - In: Technology, Mind, and Behavior, ISSN 2689-0208, (2021), insges. 1 S.

https://doi.org/10.1037/tms0000107
Fremerey, Stephan; Reimers, Carolin; Leist, Larissa; Spilski, Jan; Klatte, Maria; Fels, Janina; Raake, Alexander
Generation of audiovisual immersive virtual environments to evaluate cognitive performance in classroom type scenarios. - In: Tagungsband, DAGA 2021 - 47. Jahrestagung für Akustik, (2021), S. 1336-1339

https://doi.org/10.22032/dbt.50292
Ramachandra Rao, Rakesh Rao; Göring, Steve; Raake, Alexander
Enhancement of pixel-based video quality models using meta-data. - In: Electronic imaging, ISSN 2470-1173, Bd. 33 (2021), 9, art00022, S. 264-1-264-6

Current state-of-the-art pixel-based video quality models for 4K resolution do not have access to explicit meta information such as resolution and framerate and may not include implicit or explicit features that model the related effects on perceived video quality. In this paper, we propose a meta concept to extend state-of-the-art pixel-based models and develop hybrid models incorporating meta-data such as framerate and resolution. Our general approach uses machine learning to incooperate the meta-data to the overall video quality prediction. To this aim, in our study, we evaluate various machine learning approaches such as SVR, random forest, and extreme gradient boosting trees in terms of their suitability for hybrid model development. We use VMAF to demonstrate the validity of the meta-information concept. Our approach was tested on the publicly available AVT-VQDB-UHD-1 dataset. We are able to show an increase in the prediction accuracy for the hybrid models in comparison with the prediction accuracy of the underlying pixel-based model. While the proof-of-concept is applied to VMAF, it can also be used with other pixel-based models.



https://doi.org/10.2352/ISSN.2470-1173.2021.9.IQSP-264
Ho, Man M.; Zhang, Lu; Raake, Alexander; Zhou, Jinjia
Semantic-driven colorization. - In: Proceedings CVMP 2021, (2021), 1, S. 1-10

Recent colorization works implicitly predict the semantic information while learning to colorize black-and-white images. Consequently, the generated color is easier to be overflowed, and the semantic faults are invisible. According to human experience in colorization, our brains first detect and recognize the objects in the photo, then imagine their plausible colors based on many similar objects we have seen in real life, and finally colorize them, as described in Figure 1. In this study, we simulate that human-like action to let our network first learn to understand the photo, then colorize it. Thus, our work can provide plausible colors at a semantic level. Plus, the semantic information predicted from a well-trained model becomes understandable and able to be modified. Additionally, we also prove that Instance Normalization is also a missing ingredient for image colorization, then re-design the inference flow of U-Net to have two streams of data, providing an appropriate way of normalizing the features extracted from the black-and-white image. As a result, our network can provide plausible colors competitive to the typical colorization works for specific objects. Our interactive application is available at https://github.com/minhmanho/semantic-driven_colorization.



https://doi.org/10.1145/3485441.3485645
Keller, Dominik; Seybold, Tamara; Skowronek, Janto; Raake, Alexander
Sensorische Evaluierung in der Kinotechnik : wie Videoqualität mit Methoden aus der Lebensmittelforschung bewertet werden kann. - In: FKT, ISSN 1430-9947, Bd. 75 (2021), 4, S. 33-37

Ramachandra Rao, Rakesh Rao; Göring, Steve; Raake, Alexander
Towards high resolution video quality assessment in the crowd. - In: 2021 13th International Conference on Quality of Multimedia Experience (QoMEX), (2021), S. 1-6

Assessing high resolution video quality is usually performed using controlled, defined, and standardized lab tests. This method of acquiring human ratings in a lab environment is time-consuming and may also not reflect the typical viewing conditions. To overcome these disadvantages, crowd testing paradigms have been used for assessing video quality in general. Crowdsourcing-based tests enable a more diverse set of participants and also use a realistic hardware setup and viewing environment of typical users. However, obtaining valid ratings for high-resolution video quality poses several problems. Example issues are that streaming of such high-bandwidth content may not be feasible for some users, or that crowd participants lack an appropriate, high-resolution display device. In this paper, we propose a method to overcome such problems and conduct a crowd test using for higher resolution content by using a 540 p cutout from the center of the original 2160p video. To this aim, we use the videos from Test#1 of the publicly available dataset AVT-VQDB-UHD-1, which contains videos up to a resolution of UHD-1. The quality-labels available from that lab test allow us to compare the results with the crowd test presented in this paper. It is shown that there is a Pearson correlation of 0.96 between the lab and crowd tests and hence such crowd tests can reliably be used for video assessment of higher resolution content. The overall implementation of the crowd test framework and the results are made publicly available for further research and reproducibility1.



https://doi.org/10.1109/QoMEX51781.2021.9465425
Keller, Dominik; Vaalgamaa, Markus; Paajanen, Erkki; Ramachandra Rao, Rakesh Rao; Göring, Steve; Raake, Alexander
Groovability: using groove as a novel measure for audio QoE with the example of smartphones. - In: 2021 13th International Conference on Quality of Multimedia Experience (QoMEX), (2021), S. 13-18

Groove in music is a fundamental part of why humans entrain to it and enjoy it. Smartphones have become an important medium to listen to music. Especially when being with others, loudspeaker playback may be the method of choice. However, due to the physical limits of acoustics, for loudspeaker playback, smartphones are equipped with sub-optimal audio capabilities. Therefore, it is desirable to measure Quality of Experience (QoE) of music played on smartphones. While audio playback is often assessed in terms of sound quality, the aim of this work is to address QoE in terms of the meaning or effect that the audio has on the listener. A key component for the meaning of popular music is groove. Hence, in this paper, we study groovability, that is, the ability of a piece of audio technology to convey groove. To instantiate our novel audio QoE assessment method, we apply it to music played by 8 different smartphones. For this purpose, looped 4-bar loudness-aligned recordings from 24 music pieces of different intrinsic groove were played back on the different smartphones. Our test method uses a multi-stimulus comparison with synchronized playback capability. A total of 62 subjects evaluated groovability using two stimulus subsets. It was found that the proposed methodology is highly effective to distinguish between the groovability provided by the considered phones. In addition, a reduced-reference model is proposed to predict groovability, using a set of both acoustics-and music-groove related features. In our formal validation on unknown data, the model is shown to provide good prediction performance with a Pearson correlation of greater than 0.90.



https://doi.org/10.1109/QoMEX51781.2021.9465440
Robitza, Werner; Ramachandra Rao, Rakesh Rao; Göring, Steve; Raake, Alexander
Impact of spatial and temporal information on video quality and compressibility. - In: 2021 13th International Conference on Quality of Multimedia Experience (QoMEX), (2021), S. 65-68

Spatial Information (SI) and Temporal Information (TI) are frequently-used metrics to classify the spatiotemporal complexity of video content. However, they are mostly used on original video sources, and their impact on actual encoding efficiency is not known. In this paper, we propose a method to determine the compressibility of video sources, that is, how good video quality can be under a given bitrate constraint. We show how various aggregations of SI and TI correlate with compressibility scores obtained from a public dataset of H.264/HEVCN P9 content. We observe that the minimum TI value as well as an existing criticality metric from the literature are good indicators for compressibility, as judged by subjective ratings as well as VMAF and P.1204.3 objective scores.



https://doi.org/10.1109/QoMEX51781.2021.9465452
Ávila Soto, Mauro; Barzegar, Najmeh
I know you are looking to me: enabling eye-gaze communication between small children and parents with visual impairments. - In: AH 2021, (2021), 9, insges. 4 S.

Eye-gaze interaction is a relevant mean of communication from the early infancy. The bonding between infants and their care-takers is Strengthened through eye contact. Parents with visual impairments are excluded of this type of interaction with their children. Thus, nowadays computer vision technologies allow to track eye-gaze with different purposes, even users with visual impairments are enable to recognize faces. This work starts from the following research question: Can current available eye tracking solutions aid parents with visual impairments to have eye-gaze interaction with their young infants children? We devised a software prototype based on currently available eye tracking technologies which was tested with three sets of visually impaired parents and their young infant children to explore the possibility to assist those parents to have eye-gaze interaction with their children. The experience was documented as semi-structured interviews which were processed with a content analysis technique. The approach got positive feedback in the functionality and Emotional interaction aspects.



https://doi.org/10.1145/3460881.3460883
Singla, Ashutosh; Göring, Steve; Keller, Dominik; Ramachandra Rao, Rakesh Rao; Fremerey, Stephan; Raake, Alexander
Assessment of the simulator sickness questionnaire for omnidirectional videos. - In: 2021 IEEE Conference on Virtual Reality and 3D User Interfaces, (2021), S. 198-206

Virtual Reality/360&ring; videos provide an immersive experience to users. Besides this, 360&ring; videos may lead to an undesirable effect when consumed with Head-Mounted Displays (HMDs), referred to as simulator sickness/cybersickness. The Simulator Sickness Questionnaire (SSQ) is the most widely used questionnaire for the assessment of simulator sickness. Since the SSQ with its 16 questions was not designed for 360&ring; video related studies, our research hypothesis in this paper was that it may be simplified to enable more efficient testing for 360&ring; video. Hence, we evaluate the SSQ to reduce the number of questions asked from subjects, based on six different previously conducted studies. We derive the reduced set of questions from the SSQ using Principal Component Analysis (PCA) for each test. Pearson Correlation is analysed to compare the relation of all obtained reduced questionnaires as well as two further variants of SSQ reported in the literature, namely Virtual Reality Sickness Questionnaire (VRSQ) and Cybersickness Questionnaire (CSQ). Our analysis suggests that a reduced questionnaire with 9 out of 16 questions yields the best agreement with the initial SSQ, with less than 44% of the initial questions. Exploratory Factor Analysis (EFA) shows that the nine symptom-related attributes determined as relevant by PCA also appear to be sufficient to represent the three dimensions resulting from EFA, namely, Uneasiness, Visual Discomfort and Loss of Balance. The simplified version of the SSQ has the potential to be more efficiently used than the initial SSQ for 360&ring; video by focusing on the questions that are most relevant for individuals, shortening the required testing time.



https://doi.org/10.1109/VR50410.2021.00041
Göring, Steve; Ramachandra Rao, Rakesh Rao; Feiten, Bernhard; Raake, Alexander
Modular framework and instances of pixel-based video quality models for UHD-1/4K. - In: IEEE access, ISSN 2169-3536, Bd. 9 (2021), S. 31842-31864

https://doi.org/10.1109/ACCESS.2021.3059932
Göring, Steve; Steger, Robert; Ramachandra Rao, Rakesh Rao; Raake, Alexander
Automated genre classification for gaming videos. - In: IEEE 22nd International Workshop on Multimedia Signal Processing, (2020), insges. 6 S.

Besides classical videos, videos of gaming matches, entire tournaments or individual sessions are streamed and viewed all over the world. The increased popularity of Twitch or YoutubeGaming shows the importance of additional research on gaming videos. One important pre-condition for live or offline encoding of gaming videos is the knowledge of game-specific properties. Knowing or automatically predicting the genre of a gaming video enables a more advanced and optimized encoding pipeline for streaming providers, especially because gaming videos of different genres vary a lot from classical 2D video, e.g., considering the CGI content, textures or camera motion. We describe several computer-vision based features that are optimized for speed and motivated by characteristics of popular games, to automatically predict the genre of a gaming video. Our prediction system uses random forest and gradient boosting trees as underlying machine-learning techniques, combined with feature selection. For the evaluation of our approach we use a dataset that was built as part of this work and consists of recorded gaming sessions for 6 genres from Twitch. In total 351 different videos are considered. We show that our prediction approach shows a good performance in terms of f1-score. Besides the evaluation of different machine-learning approaches, we additionally investigate the influence of the hyper-parameters for the algorithms.



https://doi.org/10.1109/MMSP48831.2020.9287122
Singla, Ashutosh; Fremerey, Stephan; Hofmeyer, Frank; Robitza, Werner; Raake, Alexander
Quality assessment protocols for omnidirectional video quality evaluation. - In: Electronic imaging, ISSN 2470-1173, Bd. 32 (2020), 11, art00003, S. 069-1-069-6

In recent years, with the introduction of powerful HMDs such as Oculus Rift, HTC Vive Pro, the QoE that can be achieved with VR/360&ring; videos has increased substantially. Unfortunately, no standardized guidelines, methodologies and protocols exist for conducting and evaluating - the quality of 360&ring; videos in tests with human test subjects. In this paper, we present a set of test protocols for the evaluation of quality of 360&ring; videos using HMDs. To this aim, we review the state-of-the-art with respect to the assessment of 360&ring; videos summarizes their results. - Also, we summarize the methodological approaches and results taken for different subjective experiments at our lab under different contextual conditions. In the first two experiments 1a and 1b, the performance of two different subjective test methods, Double-Stimulus Impairment Scale (DSIS) - and Modified Absolute Category Rating (M-ACR) was compared under different contextual conditions. In experiment 2, the performance of three different subjective test methods, DSIS, M-ACR and Absolute Category Rating (ACR) was compared this time without varying the contextual conditions. Building - on the reliability and general applicability of the procedure across different tests, a methodological framework for 360&ring; video quality assessment is presented in this paper. Besides video or media quality judgments, the procedure comprises the assessment of presence and simulator sickness, - for which different methods were compared. Further, the accompanying head-rotation data can be used to analyze both content- and quality-related behavioural viewing aspects. Based on the results, the implications of different contextual settings are discussed.



https://doi.org/10.2352/ISSN.2470-1173.2020.11.HVEI-069
Zadtootaghaj, Saman; Barman, Nabajeet; Ramachandra Rao, Rakesh Rao; Göring, Steve; Martini, Maria G.; Raake, Alexander; Möller, Sebastian
DEMI: deep video quality estimation model using perceptual video quality dimensions. - In: IEEE 22nd International Workshop on Multimedia Signal Processing, (2020), insges. 6 S.

Existing works in the field of quality assessment focus separately on gaming and non-gaming content. Along with the traditional modeling approaches, deep learning based approaches have been used to develop quality models, due to their high prediction accuracy. In this paper, we present a deep learning based quality estimation model considering both gaming and non-gaming videos. The model is developed in three phases. First, a convolutional neural network (CNN) is trained based on an objective metric which allows the CNN to learn video artifacts such as blurriness and blockiness. Next, the model is fine-tuned based on a small image quality dataset using blockiness and blurriness ratings. Finally, a Random Forest is used to pool frame-level predictions and temporal information of videos in order to predict the overall video quality. The light-weight, low complexity nature of the model makes it suitable for real-time applications considering both gaming and non-gaming content while achieving similar performance to existing state-of-the-art model NDNetGaming. The model implementation for testing is available on GitHub.



https://doi.org/10.1109/MMSP48831.2020.9287080
Fremerey, Stephan; Göring, Steve; Ramachandra Rao, Rakesh Rao; Huang, Rachel; Raake, Alexander
Subjective test dataset and meta-data-based models for 360&ring; streaming video quality. - In: IEEE 22nd International Workshop on Multimedia Signal Processing, (2020), insges. 6 S.

During the last years, the number of 360&ring; videos available for streaming has rapidly increased, leading to the need for 360&ring; streaming video quality assessment. In this paper, we report and publish results of three subjective 360&ring; video quality tests, with conditions used to reflect real-world bitrates and resolutions including 4K, 6K and 8K, resulting in 64 stimuli each for the first two tests and 63 for the third. As playout device we used the HTC Vive for the first and HTC Vive Pro for the remaining two tests. Video-quality ratings were collected using the 5-point Absolute Category Rating scale. The 360&ring; dataset provided with the paper contains the links of the used source videos, the raw subjective scores, video-related meta-data, head rotation data and Simulator Sickness Questionnaire results per stimulus and per subject to enable reproducibility of the provided results. Moreover, we use our dataset to compare the performance of state-of-the-art full-reference quality metrics such as VMAF, PSNR, SSIM, ADM2, WS-PSNR and WS-SSIM. Out of all metrics, VMAF was found to show the highest correlation with the subjective scores. Further, we evaluated a center-cropped version of VMAF ("VMAF-cc") that showed to provide a similar performance as the full VMAF. In addition to the dataset and the objective metric evaluation, we propose two new video-quality prediction models, a bitstream meta-data-based model and a hybrid no-reference model using bitrate, resolution and pixel information of the video as input. The new lightweight models provide similar performance as the full-reference models while enabling fast calculations.



https://doi.org/10.1109/MMSP48831.2020.9287065
Ramachandra Rao, Rakesh Rao; Göring, Steve; Steger, Robert; Zadtootaghaj, Saman; Barman, Nabajeet; Fremerey, Stephan; Möller, Sebastian; Raake, Alexander
A large-scale evaluation of the bitstream-based video-quality model ITU-T P.1204.3 on gaming content. - In: IEEE 22nd International Workshop on Multimedia Signal Processing, (2020), insges. 6 S.

The streaming of gaming content, both passive and interactive, has increased manifolds in recent years. Gaming contents bring with them some peculiarities which are normally not seen in traditional 2D videos, such as the artificial and synthetic nature of contents or repetition of objects in a game. In addition, the perception of gaming content by the user is different from that of traditional 2D videos due to its pecularities and also the fact that users may not often watch such content. Hence, it becomes imperative to evaluate whether the existing video quality models usually designed for traditional 2D videos are applicable to gaming content. In this paper, we evaluate the applicability of the recently standardized bitstream-based video-quality model ITU-T P.1204.3 on gaming content. To analyze the performance of this model, we used 4 different gaming datasets (3 publicly available + 1 internal) not previously used for model training, and compared it with the existing state-of-the-art models. We found that the ITU P.1204.3 model out of the box performs well on these unseen datasets, with an RMSE ranging between 0.38 - 0.45 on the 5-point absolute category rating and Pearson Correlation between 0.85 - 0.93 across all the 4 databases. We further propose a full-HD variant of the P.1204.3 model, since the original model is trained and validated which targets a resolution of 4K/UHD-1. A 50:50 split across all databases is used to train and validate this variant so as to make sure that the proposed model is applicable to various conditions.



https://doi.org/10.1109/MMSP48831.2020.9287055
Fremerey, Stephan; Hofmeyer, Frank; Göring, Steve; Keller, Dominik; Raake, Alexander
Between the frames - evaluation of various motion interpolation algorithms to improve 360&ring; video quality. - In: 2020 IEEE International Symposium on Multimedia, (2020), S. 65-72

With the increasing availability of 360&ring; video content, it becomes important to provide smoothly playing videos of high quality for end users. For this reason, we compare the influence of different Motion Interpolation (MI) algorithms on 360&ring; video quality. After conducting a pre-test with 12 video experts in [3], we found that MI is a useful tool to increase the QoE (Quality of Experience) of omnidirectional videos. As a result of the pretest, we selected three suitable MI algorithms, namely ffmpeg Motion Compensated Interpolation (MCI), Butterflow and Super-SloMo. Subsequently, we interpolated 15 entertaining and realworld omnidirectional videos with a duration of 20 seconds from 30 fps (original framerate) to 90 fps, which is the native refresh rate of the HMD used, the HTC Vive Pro. To assess QoE, we conducted two subjective tests with 24 and 27 participants. In the first test we used a Modified Paired Comparison (M-PC) method, and in the second test the Absolute Category Rating (ACR) approach. In the M-PC test, 45 stimuli were used and in the ACR test 60. Results show that for most of the 360&ring; videos, the interpolated versions obtained significantly higher quality scores than the lower-framerate source videos, validating our hypothesis that motion interpolation can improve the overall video quality for 360&ring; video. As expected, it was observed that the relative comparisons in the M-PC test result in larger differences in terms of quality. Generally, the ACR method lead to similar results, while reflecting a more realistic viewing situation. In addition, we compared the different MI algorithms and can conclude that with sufficient available computing power Super-SloMo should be preferred for interpolation of omnidirectional videos, while MCI also shows a good performance.



https://doi.org/10.1109/ISM.2020.00017
Raake, Alexander; Wierstorf, Hagen
Binaural evaluation of sound quality and quality of experience. - In: The technology of binaural understanding, (2020), S. 393-434

The chapter outlines the concepts of Sound Quality and Quality of Experience (QoE). Building on these, it describes a conceptual model of sound quality perception and experience during active listening in a spatial-audio context. The presented model of sound quality perception considers both bottom-up (signal-driven) as well as top-down (hypothesis-driven) perceptual functional processes. Different studies by the authors and from the literature are discussed in light of their suitability to help develop implementations of the conceptual model. As a key prerequisite, the underlying perceptual ground-truth data required for model training and validation are discussed, as well as means for deriving these from respective listening tests. Both feature-based and more holistic modeling approaches are analyzed. Overall, open research questions are summarized, deriving trajectories for future work on spatial-audio Sound Quality and Quality of Experience modeling.



Raake, Alexander; Borer, Silvio; Satti, Shahid M.; Gustafsson, Jörgen; Ramachandra Rao, Rakesh Rao; Medagli, Stefano; List, Peter; Göring, Steve; Lindero, David; Robitza, Werner; Heikkilä, Gunnar; Broom, Simon; Schmidmer, Christian; Feiten, Bernhard; Wüstenhagen, Ulf; Wittmann, Thomas; Obermann, Matthias; Bitto, Roland
Multi-model standard for bitstream-, pixel-based and hybrid video quality assessment of UHD/4K: ITU-T P.1204. - In: IEEE access, ISSN 2169-3536, Bd. 8 (2020), S. 193020-193049

https://doi.org/10.1109/ACCESS.2020.3032080
Stoll, Eckhard; Breide, Stephan; Raake, Alexander
Towards analysing the interaction between quality and storytelling for event video recording. - In: 2020 Twelfth International Conference on Quality of Multimedia Experience (Qomex), (2020), insges. 4 S.

https://doi.org/10.1109/QoMEX48832.2020.9123113
Robitza, Werner; Dethof, Alexander M.; Göring, Steve; Raake, Alexander; Beyer, André; Polzehl, Tim
Are you still watching? Streaming video quality and engagement assessment in the crowd. - In: 2020 Twelfth International Conference on Quality of Multimedia Experience (Qomex), (2020), insges. 6 S.

https://doi.org/10.1109/QoMEX48832.2020.9123148
Keller, Dominik; Raake, Alexander; Vaalgamaa, Markus; Paajanen, Erkki
Let the music play: an automated test setup for blind subjective evaluation of music playback on mobile devices. - In: 2020 Twelfth International Conference on Quality of Multimedia Experience (Qomex), (2020), insges. 4 S.

https://doi.org/10.1109/QoMEX48832.2020.9123092
Ramachandra Rao, Rakesh Rao; Göring, Steve; List, Peter; Robitza, Werner; Feiten, Bernhard; Wüstenhagen, Ulf; Raake, Alexander
Bitstream-based model standard for 4K/UHD: ITU-T P.1204.3 - model details, evaluation, analysis and open source implementation. - In: 2020 Twelfth International Conference on Quality of Multimedia Experience (Qomex), (2020), insges. 6 S.

https://doi.org/10.1109/QoMEX48832.2020.9123110
Göring, Steve; Ramachandra Rao, Rakesh Rao; Raake, Alexander
Prenc - predict number of video encoding passes with machine learning. - In: 2020 Twelfth International Conference on Quality of Multimedia Experience (Qomex), (2020), insges. 6 S.

https://doi.org/10.1109/QoMEX48832.2020.9123108
Fremerey, Stephan; Suleman, Muhammad Sami; Paracha, Abdul Haq Azeem; Raake, Alexander
Development and evaluation of a test setup to investigate distance differences in immersive virtual environments. - In: 2020 Twelfth International Conference on Quality of Multimedia Experience (Qomex), (2020), insges. 4 S.

https://doi.org/10.1109/QoMEX48832.2020.9123077
Schwarzmann, Susanna; Hainke, Nick; Zinner, Thomas; Sieber, Christian; Robitza, Werner; Raake, Alexander
Comparing fixed and variable segment durations for adaptive video streaming: a holistic analysis. - In: MMSys '20, (2020), S. 38-53

https://doi.org/10.1145/3339825.3391858
Raake, Alexander; Singla, Ashutosh; Ramachandra Rao, Rakesh Rao; Robitza, Werner; Hofmeyer, Frank
SiSiMo: towards simulator sickness modeling for 360&ring; videos viewed with an HMD. - In: 2020 IEEE Conference on Virtual Reality and 3D User Interfaces workshops, (2020), S. 582-583

https://doi.org/10.1109/VRW50115.2020.00142
Fliegel, Karel; Krasula, Lukáš; Robitza, Werner
Qualinet databases: central resource for QoE research - history, current status, and plans. - In: ACM SIGMultimedia records, ISSN 1947-4598, Bd. 11 (2019), 3, 5, S. 1

https://doi.org/10.1145/3524460.3524465
Hofmeyer, Frank; Fremerey, Stephan; Cohrs, Thaden; Raake, Alexander
Impacts of internal HMD playback processing on subjective quality perception. - In: Electronic imaging, ISSN 2470-1173, Bd. 31 (2019), 12, art00013, S. 219-1-219-6

In this paper, we conducted two different studies. Our first study deals with measuring the flickering in HMDs using a selfdeveloped measurement tool. Therefore, we investigated several combinations of software 360&ring; video players and framerates. We found out that only 90 fps - content is leading to a ideal and smooth playout without stuttering or black frame insertion. In addition, it should be avoided to playout 360&ring; content at lower framerates, especially 25 and 50 fps. In our second study we investigated the influence of higher framerates of various 360&ring; - videos on the perceived quality. Doing so, we conducted a subjective test using 12 expert viewers. The participants watched 30 fps native as well as interpolated 90 fps 360&ring; content, whether we also rendered two contents published along with the paper. We found out that 90 fps is significantly - improving the perceived quality. Additionally, we compared the performance of three motion interpolation algorithms. From the results it is visible that motion interpolation can be used in post production to improve the perceived quality.



https://doi.org/10.2352/ISSN.2470-1173.2019.12.HVEI-219
Jaiswal, Sunil Prasad; Jakhetiya, Vinit; Gu, Ke; Guntuku, Sharath C.; Singla, Ashutosh
Frequency-domain analysis based exploitation of color channels for color image demosaicking. - In: 2019 IEEE International Conference on Visual Communications and Image Processing (VCIP), (2019), insges. 4 S.

https://doi.org/10.1109/VCIP47243.2019.8966070
Ramachandra Rao, Rakesh Rao; Göring, Steve; Robitza, Werner; Feiten, Bernhard; Raake, Alexander
AVT-VQDB-UHD-1: a large scale video quality database for UHD-1. - In: 2019 IEEE International Symposium on Multimedia, (2019), S. 17-24

4K television screens or even with higher resolutions are currently available in the market. Moreover video streaming providers are able to stream videos in 4K resolution and beyond. Therefore, it becomes increasingly important to have a proper understanding of video quality especially in case of 4K videos. To this effect, in this paper, we present a study of subjective and objective quality assessment of 4K ultra-high-definition videos of short duration, similar to DASH segment lengths. As a first step, we conducted four subjective quality evaluation tests for compressed versions of the 4K videos. The videos were encoded using three different video codecs, namely H.264, HEVC, and VP9. The resolutions of the compressed videos ranged from 360p to 2160p with framerates varying from 15fps to 60fps. All the source 4K contents used were of 60fps. We included low quality conditions in terms of bitrate, resolution and framerate to ensure that the tests cover a wide range of conditions, and that e.g. possible models trained on this data are more general and applicable to a wider range of real world applications. The results of the subjective quality evaluation are analyzed to assess the impact of different factors such as bitrate, resolution, framerate, and content. In the second step, different state-of-the-art objective quality models were applied to all videos and their performance was analyzed in comparison with the subjective ratings, e.g. using Netflix's VMAF. The videos, subjective scores, both MOS and confidence interval per sequence and objective scores are made public for use by the community for further research.



https://doi.org/10.1109/ISM46123.2019.00012
Göring, Steve; Krämmer, Christopher; Raake, Alexander
cencro - speedup of video quality calculation using center cropping. - In: 2019 IEEE International Symposium on Multimedia, (2019), S. 1-8

Today's video streaming providers, e.g. Youtube, Netflix or Amazon Prime, are able to deliver high resolution and high-quality content to end users. To optimize video quality and to reduce transmission bandwidth, new encoders and smarter encoding schemes are required. Encoding optimization forms an important part of this effort in reducing bandwidth and results in saving considerable amount of bitrate. For such optimization, accurate and computationally fast video quality models are required, e.g. Netflix's VMAF. However, VMAF is a full-reference (FR) metric, and the calculation of such metrics tend to be slower in comparison to other metrics, due to the amount of data that needs to be processed, especially for high resolutions of 4k and beyond. We introduce an approach to speed up video quality metric calculations in general. We use VMAF as an example with a video database up to 4K resolution videos, to show that our approach works well. Our main idea is that we reduce each frame of the reference and distorted video based on a center crop of the frame, assuming that most important visual information are presented in the middle of most typical videos. In total we analyze 18 different crop settings and compare our results with uncropped VMAF values and subjective scores. We show that this approach - named cencro - is able to save up to 95% computation time, with just an overall error of 4% considering a 360p center crop. Furthermore, we checked other full-reference metrics, and show that cencro performs similar good. As a last evaluation, we apply our approach to full-hd gaming videos, also in this scenario cencro can be successfully applied. The idea behind cencro is not restricted to full-reference models and can also be applied to other type of video quality models or datasets, or even for higher resolution videos such as 8K.



https://doi.org/10.1109/ISM46123.2019.00010
Göring, Steve; Raake, Alexander
Evaluation of intra-coding based image compression. - In: EUVIP 2019, (2019), S. 169-174

Considering modern cameras, increasing image resolutions and thousands of images uploaded to sharing platforms there is still reason to have a deeper look into image compression. Especially lossy image compression is always a trade-off between file-size and image quality, where high quality is usually preferred for storage. Beside classical image compression, e.g. JPEG, there is also ongoing development to use video codecs to compress images. We analyze four different video codecs, namely AV1, H.264, H.265 and VP9, in comparison with JPEG. Our evaluation considers classical image quality metrics, e.g. PSNR, and also a modern subjective quality metric, i.e. Netflix's VMAF. We are able to show that modern video codecs can outperform classical JPEG compression both in terms of quality and file-size. For this we used 1133 uncompressed images and applied different encoding settings and estimated image quality.



https://doi.org/10.1109/EUVIP47703.2019.8946162
Lestari, Purji; Schade, Hans-Peter
Efficient human detection algorithm using color & depth information with accurate outer boundary matching. - In: Emerging trends in Big Data and Artificial Intelligence, (2019), S. 64-69

https://doi.org/10.1109/IC3INA48034.2019.8949572
Lestari, Purji; Schade, Hans-Peter
Human detection from RGB depth image using active contour and grow-cut segmentation. - In: Emerging trends in Big Data and Artificial Intelligence, (2019), S. 70-75

https://doi.org/10.1109/IC3INA48034.2019.8949571
Singla, Ashutosh; Robitza, Werner; Raake, Alexander
Comparison of subjective quality test methods for omnidirectional video quality evaluation. - In: IEEE 21st International Workshop on Multimedia Signal Processing, (2019), insges. 6 S.

https://doi.org/10.1109/MMSP.2019.8901719
Kara, Peter A.; Robitza, Werner; Pinter, Nikolett; Martini, Maria G.; Raake, Alexander; Simon, Aniko
Comparison of HD and UHD video quality with and without the influence of the labeling effect. - In: Quality and user experience, ISSN 2366-0147, Volume 4 (2019), issue 1, article 4, Seite 1-29

https://doi.org/10.1007/s41233-019-0027-3
Wedel, Simon; Koppetz, Michael; Skowronek, Janto; Raake, Alexander
ViProVoQ: towards a vocabulary for video quality assessment in the context of creative video production. - In: MM'19, (2019), S. 2387-2395

This paper presents a method for developing a consensus vocabulary to describe and evaluate the visual experience of videos. As a first result, a vocabulary characterizing the specific look of cinema-type video is presented. Such a vocabulary can be used to relate perceptual features of professional high-end image and video quality of experience (QoE) with the underlying technical characteristics and settings of the video systems involved in the creative content production process. For the vocabulary elicitation, a combination of different survey techniques was applied in this work. As the first step, individual interviews were conducted with experts of the motion picture industry on image quality in the context of cinematography. The data obtained from the interviews was used for the subsequent Real-time Delphi survey, where an extended group of experts worked out a consensus on key aspects of the vocabulary specification. Here, 33 experts were supplied with the anonymized results of the other panelists, which they could use to revise their own assessment. Based on this expert panel, the attributes collected in the interviews were verified and further refined, resulting in the final vocabulary proposed in this paper. Besides an attribute-based sensory evaluation of high-quality image, video and film material, applications of the vocabulary are the development of dimension-based image and video quality models, and the analysis of the multivariate relationship between quality-relevant perceptual attributes and technical system parameters.



https://doi.org/10.1145/3343031.3351171
Lestari, Purji; Schade, Hans-Peter
Boundary matched human area segmentation for Chroma keying using hybrid depth-color analysis. - In: 2019 IEEE 4th International Conference on Signal and Image Processing (ICSIP 2019), (2019), S. 761-767

https://doi.org/10.1109/SIPROCESS.2019.8868469
Zhou, Jun; Qi, Lianyong; Raake, Alexander; Xu, Tao; Piekarska, Marta; Zhang, Xuyun
User attitudes and behaviors toward personalized control of privacy settings on smartphones. - In: Concurrency and computation, ISSN 1532-0634, Volume 31 (2019), issue 22, e4884, Seite 1-14

https://doi.org/10.1002/cpe.4884
Singla, Ashutosh; Ramachandra Rao, Rakesh Rao; Göring, Steve; Raake, Alexander
Assessing media QoE, simulator sickness and presence for omnidirectional videos with different test protocols. - In: 26th IEEE Conference on Virtual Reality and 3D User Interfaces, (2019), S. 1163-1164

https://doi.org/10.1109/VR.2019.8798291
Raake, Alexander; Skowronek, Janto; Soloducha, Michal
Telecommunications applications. - In: Sensory evaluation of sound, (2019), S. 227-267

Singla, Ashutosh; Göring, Steve; Raake, Alexander; Meixner, Britta; Koenen, Rob; Buchholz, Thomas
Subjective quality evaluation of tile-based streaming for omnidirectional videos. - In: Proceedings of the 10th ACM Multimedia Systems Conference (MMSys'19), (2019), S. 232-242

https://doi.org/10.1145/3304109.3306218
Fremerey, Stephan; Huang, Rachel; Göring, Steve; Raake, Alexander
Are people pixel-peeping 360&ring; videos?. - In: Electronic imaging, ISSN 2470-1173, Bd. 31 (2019), 10, art00002, S. 220-1-220-6

In this paper, we compare the influence of a higher-resolution Head-Mounted Display (HMD) like HTC Vive Pro on 360&ring; video QoE to that obtained with a lower-resolution HMD like HTC Vive. Furthermore, we evaluate the difference in perceived quality for entertainment-type 360&ring; content in 4K/6K/8K resolutions at typical high-quality bitrates. In addition, we evaluate which video parts people are focusing on while watching omnidirectional videos. To this aim we conducted three subjective tests. We used HTC Vive in the first and HTC Vive Pro in the other two tests. The results from our tests are showing that the higher resolution of the Vive Pro seems to enable people to more easily judge the quality, shown by a minor deviation between the resulting quality ratings. Furthermore, we found no significant difference between the quality scores for the highest bitrate for 6K and 8K resolution. We also compared the viewing behavior for the same content viewed for the first time with the behavior when the same content is viewed again multiple times. The different representations of the contents were explored similarly, probably due to the fact that participants are finding and comparing specific parts of the 360&ring; video suitable for rating the quality.



https://doi.org/10.2352/ISSN.2470-1173.2019.10.IQSP-220
Ramachandra Rao, Rakesh Rao; Göring, Steve; Vogel, Patrick; Pachatz, Nicolas; Villamar Villarreal, Juan Jose; Robitza, Werner; List, Peter; Feiten, Bernhard; Raake, Alexander
Adaptive video streaming with current codecs and formats: extensions to parametric video quality model ITU-T P.1203. - In: Electronic imaging, ISSN 2470-1173, Bd. 31 (2019), 10, art00015, S. 314-1-314-6

Adaptive streaming is fast becoming the most widely used method for video delivery to the end users over the internet. The ITU-T P.1203 standard is the first standardized quality of experience model for audiovisual HTTP-based adaptive streaming. This recommendation has been trained and validated for H.264 and resolutions up to and including full-HD. The paper provides an extension for the existing standardized short-term video quality model mode 0 for new codecs i.e., H.265, VP9 and AV1 and resolutions larger than full-HD (e.g. UHD-1). The extension is based on two subjective video quality tests. In the tests, in total 13 different source contents of 10 seconds each were used. These sources were encoded with resolutions ranging from 360p to 2160p and various quality levels using the H.265, VP9 and AV1 codecs. The subjective results from the two tests were then used to derive a mapping/correction function for P.1203.1 to handle new codecs and resolutions. It should be noted that the standardized model was not re-trained with the new subjective data, instead only a mapping/correction function was derived from the two subjective test results so as to extend the existing standard to the new codecs and resolutions.



https://doi.org/10.2352/ISSN.2470-1173.2019.10.IQSP-314
Göring, Steve; Zebelein, Julian; Wedel, Simon; Keller, Dominik; Raake, Alexander
Analyze and predict the perceptibility of UHD video contents. - In: Electronic imaging, ISSN 2470-1173, Bd. 31 (2019), 12, art00009, S. 215-1-215-6

720p, Full-HD, 4K, 8K, ..., display resolutions are increasing heavily over the past time. However, many video streaming providers are currently streaming videos with a maximum of 4K/UHD-1 resolution. Considering that normal video viewers are enjoying their videos in typical living rooms, where viewing distances are quite large, the question arises if more resolution is even recognizable. In the following paper we will analyze the problem of UHD perceptibility in comparison with lower resolutions. As a first step, we conducted a subjective video test, that focuses on short uncompressed video sequences and compares two different testing methods for pairwise discrimination of two representations of the same source video in different resolutions.We selected an extended stripe method and a temporal switching method. We found that the temporal switching is more suitable to recognize UHD video content. Furthermore, we developed features, that can be used in a machine learning system to predict whether there is a benefit in showing a given video in UHD or not. Evaluating different models based on these features for predicting perceivable differences shows good performance on the available test data. Our implemented system can be used to verify UHD source video material or to optimize streaming applications.



https://doi.org/10.2352/ISSN.2470-1173.2019.12.HVEI-215
Keller, Dominik; Seybold, Tamara; Skowronek, Janto; Raake, Alexander
Assessing texture dimensions and video quality in motion pictures using sensory evaluation techniques. - In: 2019 Eleventh International Conference on Quality of Multimedia Experience (QoMEX), (2019), insges. 6 S.

The quality of images and videos is usually examined with well established subjective tests or instrumental models. These often target content transmitted over the internet, such as streaming or videoconferences and address the human preferential experience. In the area of high-quality motion pictures, however, other factors are relevant. These mostly are not error-related but aimed at the creative image design, which has gained comparatively little attention in image and video quality research. To determine the perceptual dimensions underlying movie-type video quality, we combine sensory evaluation techniques extensively used in food assessment - Degree of Difference test and Free Choice Profiling - with more classical video quality tests. The main goal of this research is to analyze the suitability of sensory evaluation methods for high-quality video assessment. To understand which features in motion pictures are recognizable and critical to quality, we address the example of image texture properties, measuring human perception and preferences with a panel of image-quality experts. To this aim, different capture settings were simulated applying sharpening filters as well as digital and analog noise to exemplary source sequences. The evaluation, involving Multidimensional Scaling, Generalized Procrustes Analysis as well as Internal and External Preference Mapping, identified two separate perceptual dimensions. We conclude that Free Choice Profiling connected with a quality test offers the highest level of insight relative to the needed effort. The combination enables a quantitative quality measurement including an analysis of the underlying perceptual reasons.



https://doi.org/10.1109/QoMEX.2019.8743189
Lebreton, Pierre; Hupont, Isabelle; Hirth, Matthias; Mäki, Toni; Skodras, Evangelos; Schubert, Anton; Raake, Alexander
CrowdWatcher: an open-source platform to catch the eye of the crowd. - In: Quality and user experience, ISSN 2366-0147, Volume 4 (2019), issue 1, article 1, Seite 1-17

https://doi.org/10.1007/s41233-019-0024-6
Fremerey, Stephan; Hofmeyer, Frank; Göring, Steve; Raake, Alexander
Impact of various motion interpolation algorithms on 360&ring; video QoE. - In: 2019 Eleventh International Conference on Quality of Multimedia Experience (QoMEX), (2019), insges. 3 S.

https://doi.org/10.1109/QoMEX.2019.8743307
Göring, Steve; Ramachandra Rao, Rakesh Rao; Raake, Alexander
nofu - a lightweight no-reference pixel based video quality model for gaming content. - In: 2019 Eleventh International Conference on Quality of Multimedia Experience (QoMEX), (2019), insges. 6 S.

https://doi.org/10.1109/QoMEX.2019.8743262
Göring, Steve; Raake, Alexander
deimeq - a deep neural network based hybrid no-reference image quality model. - In: Proceedings of the 2018 7th European Workshop on Visual Information Processing (EUVIP), (2018), insges. 6 S.

https://doi.org/10.1109/EUVIP.2018.8611703
Lebreton, Pierre; Fremerey, Stephan; Raake, Alexander
V-BMS360: a video extention to the BMS360 image saliency model. - In: 2018 IEEE International Conference on Multimedia and Expo workshops (ICMEW), ISBN 978-1-5386-4195-8, (2018), insges. 4 S.

https://doi.org/10.1109/ICMEW.2018.8551523
Göring, Steve; Skowronek, Janto; Raake, Alexander
DeViQ - a deep no reference video quality model. - In: Electronic imaging, ISSN 2470-1173, Bd. 30 (2018), 14, art00017, S. 518-1-518-6

When enjoying video streaming services, users expect high video quality in various situations, including mobile phone connections with low bandwidths. Furthermore, the user's interest in consuming new large-size data content, such as high resolution/frame rate material or 360 degree videos, is gaining as well. To deal with such challenges, modern encoders adaptively reduce the size of the transmitted data. This in turn requires automated video quality monitoring solutions to ensure a sufficient quality of the material delivered. We present a no-reference video quality model; a model that does not require the original reference material, which is convenient for application in the field. Our approach uses a pretrained classification DNN in combination with hierarchical sub-image creation, some state-of-the-art features and a random forest model. Furthermore, the model can process UHD content and is trained on a large ground-truth data set, which is generated using a state-of-the-art full-reference model. The proposed model achieved a high quality prediction accuracy, comparable to a number of full-reference metrics. Thus our model is a proof-of-concept for a successful no-reference video quality estimation.



https://doi.org/10.2352/ISSN.2470-1173.2018.14.HVEI-518
Singla, Ashutosh; Robitza, Werner; Raake, Alexander
Comparison of subjective quality evaluation methods for omnidirectional videos with DSIS and modified ACR. - In: Electronic imaging, ISSN 2470-1173, Bd. 30 (2018), 14, art00025, S. 525-1-525-6

In this paper, we compare the Double-Stimulus Impairment Scale (DSIS) and a Modified Absolute Category Rating (M-ACR) subjective quality evaluation method for HEVC/H.265-encoded omnidirectional videos. These two methods differ in the type of rating scale and presentation of stimuli. Results of our test provide insight into the similarities and differences between these two subjective test methods. Also, we investigate whether the results obtained with these subjective test methods are content-dependent. We evaluated subjective quality on an Oculus Rift for two different resolutions (4K and FHD) and at five different bit-rates. Experimental results show that for 4K resolution, for the lower bit-rates at 1 and 2 MBit/s, M-ACR provides slightly higher MOS compared to DSIS. For 4, 8, 15 MBit/s, DSIS provides slightly higher MOS. While the correlation coefficient between these two methods is very high, M-ACR offers a higher statistical reliability than DSIS. We also compared simulator sickness scores and viewing behavior. Experimental results show that subjects are more prone to simulator sickness while evaluating 360&ring; videos with the DSIS method.



https://doi.org/10.2352/ISSN.2470-1173.2018.14.HVEI-525
Berndtsson, Gunilla; Schmitt, Marwin; Hughes, Peter; Skowronek, Janto; Schoenenberg, Katrin; Raake, Alexander
Methods for human-centered evaluation of mediasync in real-time communication. - In: MediaSync, (2018), S. 229-270

In an ideal world people interacting using real-time multimedia links experience perfectly synchronized media, and there is no latency of transmission: the interlocutors would hear and see each other with no delay. Methods to achieve the former are discussed in other chapters in this book, but for a variety of practical and physical reasons, delay-free communication will never be possible. In some cases, the delay will be very obvious since it will be possible to observe the reaction time of the listeners modified by the delay, or there may be some acoustic echo from the listeners' audio equipment. However, in the absence of echo, the users themselves do not always explicitly notice the presence of delay, even for quite large values. Typically, they notice something is wrong (for example "we kept interrupting each other!"), but are unable to define what it is. Some useful insights into the impact of delay on a conversation can be obtained from the linguistic discipline of Conversation Analysis, and especially the analysis of "turn-taking" in a conversation. This chapter gives an overview of the challenges in evaluating media synchronicity in real-time communications, outlining appropriate tasks and methods for subjective testing and how in-depth analysis of such tests can be performed to gain a deep understanding of the effects of delay. The insights are based on recent studies of audio and audiovisual communication, but also show examples from other media synchronization applications like networked music interaction.



https://doi.org/10.1007/978-3-319-65840-7_9
Asan, Avsar; Robitza, Werner; Mkwawa, Is-haka; Sun, Lingfen; Begen, Ali C.
Optimum encoding approaches on video resolution changes: a comparative study. - In: 2018 IEEE International Conference on Image Processing, ISBN 978-1-4799-7061-2, (2018), S. 1003-1007

https://doi.org/10.1109/ICIP.2018.8451635
Robitza, Werner; Kittur, Dhananjaya G.; Dethof, Alexander M.; Göring, Steve; Feiten, Bernhard; Raake, Alexander
Measuring YouTube QoE with ITU-T P.1203 under constrained bandwidth conditions. - In: 2018 Tenth International Conference on Quality of Multimedia Experience (QoMEX), ISBN 978-1-5386-2605-4, (2018), insges. 6 S.

https://doi.org/10.1109/QoMEX.2018.8463363
Göring, Steve; Brand, Konstantin; Raake, Alexander
Extended features using machine learning techniques for photo liking prediction. - In: 2018 Tenth International Conference on Quality of Multimedia Experience (QoMEX), ISBN 978-1-5386-2605-4, (2018), insges. 6 S.

https://doi.org/10.1109/QoMEX.2018.8463396
Skowronek, Janto; Raake, Alexander
On the quality perception of multiparty conferencing calls. - In: 2018 Tenth International Conference on Quality of Multimedia Experience (QoMEX), ISBN 978-1-5386-2605-4, (2018), insges. 6 S.

https://doi.org/10.1109/QoMEX.2018.8463378
Xu, Toa; Zhou, Yun; Raake, Alexander; Zhang, Xuyun
Analyzing impact factors for smartphone sharing decisions using decision tree. - In: Human-Computer Interaction. Interaction in Context, (2018), S. 628-637

https://doi.org/10.1007/978-3-319-91244-8_48
Fremerey, Stephan; Singla, Ashutosh; Meseberg, Kay; Raake, Alexander
AVTrack360: an open dataset and software recording people's head rotations watching 360&ring; videos on an HMD. - In: Proceedings of the 9th ACM Multimedia Systems Conference (MMSys'18), (2018), S. 403-408

https://doi.org/10.1145/3204949.3208134
Robitza, Werner; Göring, Steve; Raake, Alexander; Lindegren, David; Heikkilä, Gunnar; Gustafsson, Jörgen; List, Peter; Feiten, Bernhard; Wüstenhagen, Ulf; Garcia, Marie-Neige; Yamagishi, Kazuhisa; Broom, Simon
HTTP adaptive streaming QoE estimation with ITU-T Rec. P.1203 - open databases and software. - In: Proceedings of the 9th ACM Multimedia Systems Conference (MMSys'18), (2018), S. 466-471

https://doi.org/10.1145/3204949.3208124
Winter, Fiete; Wierstorf, Hagen; Hold, Christoph; Krüger, Frank; Raake, Alexander; Spors, Sascha
Colouration in local wave field synthesis. - In: IEEE ACM transactions on audio, speech, and language processing, ISSN 2329-9304, Bd. 26 (2018), 10, S. 1913-1924

https://doi.org/10.1109/TASLP.2018.2842435
Wierstorf, Hagen; Hold, Christoph; Raake, Alexander
Listener preference for wave field synthesis, stereophony, and different mixes in popular music. - In: Journal of the Audio Engineering Society, ISSN 0004-7554, Bd. 66 (2018), 5, S. 385-396

https://doi.org/10.17743/jaes.2018.0019
Lestari, Puji; Schade, Hans-Peter
RGB-depth image based human detection using Viola-Jones and Chan-Vese active contour segmentation. - In: Advances in Signal Processing and Intelligent Recognition Systems, (2018), S. 285-296

https://doi.org/10.1007/978-3-319-67934-1_25
Hold, Christoph; Nagel, Lukas; Wierstorf, Hagen; Raake, Alexander
Positioning of musical foreground parts in surrounding sound stages. - In: Audio for virtual and augmented reality, ISBN 978-1-5108-4346-2, (2017), S. 135-141

Schade, Hans-Peter; Daschner, Anna-Maria; Cohrs, Thaden; Görner, Johannes
Empfang von Video-over-IP auf virtualisierten IT-Servern. - In: FKT, ISSN 1430-9947, Bd. 71 (2017), 12, S. 557-562

Blauert, Jens; Braasch, Jonas; Raake, Alexander
Tools for the assessment of sound quality and quality of experience in original and simulated spaces for acoustic performances. - In: The journal of the Acoustical Society of America, ISSN 1520-8524, Bd. 141 (2017), 5, S. 3932

https://doi.org/10.1121/1.4988896
Raake, Alexander; Skowronek, Janto; Wierstorf, Hagen; Hold, Christoph
Update on sound quality assessment with TWO!EARS. - In: The journal of the Acoustical Society of America, ISSN 1520-8524, Bd. 141 (2017), 5, S. 3974

https://doi.org/10.1121/1.4989064
Zhou, Yun; Raake, Alexander; Xu, Tao; Zhang, Xuyun
Users' perceived control, trust and expectation on privacy settings of smartphone. - In: Cyberspace Safety and Security, (2017), S. 427-441

https://doi.org/10.1007/978-3-319-69471-9_31
Winter, Fiete; Hold, Christoph; Wierstorf, Hagen; Raake, Alexander; Spors, Sascha
Colouration in 2.5D local wave field synthesis using spatial bandwidth-limitation. - In: 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, ISBN 978-1-5386-1632-1, (2017), S. 160-164

https://doi.org/10.1109/WASPAA.2017.8170015
Singla, Ashutosh; Fremerey, Stephan; Robitza, Werner; Lebreton, Pierre; Raake, Alexander
Comparison of subjective quality evaluation for HEVC encoded omnidirectional videos at different bit-rates for UHD and FHD resolution. - In: Thematics Workshops'17, ISBN 978-1-4503-5416-5, (2017), S. 511-519

In this paper, we perform subjective quality evaluation studies for HEVC/H.265-encoded omnidirectional videos at different bit-rates for two different resolutions (FHD and UHD) on an Oculus Rift. Results of these tests provide insight into appropriate coding and resolution settings for given bitrate constraints, for example in an HTTP-based streaming (HAS) context. Subjective quality judgements were collected on a 5-point Absolute Category Rating (ACR) scale. Further, we collected head motion data during viewing and rating. Working towards the technical goal of subjective evaluation for different resolutions and bit-rates, we address aspects of how to conduct respective viewing tests, involving information from head-rotation tracking (yaw and pitch) and motion-sickness questionnaires. Quality adaptation (in terms of resolution and bit-rate) of omnidirectional videos is an important feature of media streaming. Its effect on subjective quality evaluations of 360&ring; video has not been investigated so far. To utilize network and processing resources efficiently, limitations in the resolution of current Head Mounted Displays (HMDs), with typically 2160 x 1200 pixels per view, may be exploited. The subjective test results provide indications for boundaries between resolution and quantization scaling. To discuss the merits of the applied subjective test method, we compare simulator sickness scores along with behavioral data.



https://doi.org/10.1145/3126686.3126768
Robitza, Werner; Ahmad, Arslan; Kara, Peter A.; Atzori, Luigi; Martini, Maria G.; Raake, Alexander; Sun, Lingfen
Challenges of future multimedia QoE monitoring for internet service providers. - In: Multimedia tools and applications, ISSN 1573-7721, Bd. 76 (2017), 21, S. 22243-22266

https://doi.org/10.1007/s11042-017-4870-z
Biswas, Shantonu; Reiprich, Johannes; Cohrs, Thaden; Arboleda, David T.; Schöberl, Andreas; Kaltwasser, Mahsa; Schlag, Leslie; Stauden, Thomas; Pezoldt, Jörg; Jacobs, Heiko O.
3D metamorphic stretchable microphone arrays. - In: Advanced Materials Technologies, ISSN 2365-709X, Bd. 2 (2017), 10, 1700131, insges. 11 S.

https://doi.org/10.1002/admt.201700131
Zhou, Yun; Piekarska, Marta; Raake, Alexander; Xu, Tao; Wu, Xiaojun; Dong, Bei
Control yourself: on user control of privacy settings using personalization and privacy panel on smartphones. - In: 8th International Conference on Ambient Systems, Networks and Technologies, ANT-2017 and the 7th International Conference on Sustainable Energy Information Technology, SEIT 2017, 16-19 May 2017, Madeira, Portugal, (2017), S. 100-107

https://doi.org/10.1016/j.procs.2017.05.300
Soloducha, Michal; Raake, Alexander; Bleiholder, Stefan; Kettler, Frank
Conversational speech quality in noisy environments. - In: 142nd Audio Engineering Society International Convention 2017, (2017), S. 144-151

Asan, Av¸sar; Robitza, Werner; Mkwawa, Is-haka; Sun, Lingfen; Ifeachor, Emmanuel; Raake, Alexander
Impact of video resolution changes on QoE for adaptive video streaming. - In: 2017 IEEE International Conference on Multimedia and Expo (ICME), ISBN 978-1-5090-6067-2, (2017), S. 499-504

https://doi.org/10.1109/ICME.2017.8019297
Biswas, Shantonu; Reiprich, Johannes; Cohrs, Thaden; Stauden, Thomas; Pezoldt, Jörg; Jacobs, Heiko O.
Metamorphic hemispherical microphone array for three-dimensional acoustics. - In: Applied physics letters, ISSN 1077-3118, Bd. 111 (2017), 4, 043109, insges. 5 S.

http://dx.doi.org/10.1063/1.4985710
Robitza, Werner; Garcia, Marie-Neige; Raake, Alexander
A modular HTTP adaptive streaming QoE model - candidate for ITU-T P.1203 ("P.NATS"). - In: 2017 Ninth International Conference on Quality of Multimedia Experience (QoMEX), ISBN 978-1-5386-4024-1, (2017), insges. 6 S.

https://doi.org/10.1109/QoMEX.2017.7965689
Kara, Peter A.; Robitza, Werner; Raake, Alexander; Martini, Maria G.
The label knows better: the impact of labeling effects on perceived quality of HD and UHD video streaming. - In: 2017 Ninth International Conference on Quality of Multimedia Experience (QoMEX), ISBN 978-1-5386-4024-1, (2017), insges. 6 S.

https://doi.org/10.1109/QoMEX.2017.7965674
Singla, Ashutosh; Fremerey, Stephan; Robitza, Werner; Raake, Alexander
Measuring and comparing QoE and simulator sickness of omnidirectional videos in different head mounted displays. - In: 2017 Ninth International Conference on Quality of Multimedia Experience (QoMEX), ISBN 978-1-5386-4024-1, (2017), insges. 6 S.

In this paper, we evaluated and compared the integral quality of different omnidirectional contents for two head mounted displays (HMDs), namely HTC Vive and Oculus Rift. We also investigated motion sickness and head-movements. To this aim, we categorized omnidirectional contents into three categories based on the degree of motion: high, medium and low motion. For assessing simulator sickness, we used the Simulator Sickness Questionnaire for each of the contents in both HMDs. The viewing direction for subjects while watching the contents were recorded in terms of the three coordinates yaw, roll and pitch. Experimental results show that HTC Vive offers better integral quality compared to Oculus Rift. We also compared simulator sickness scores along with the behavioral data for different contents and HMDs and discussed the results in the paper.



https://doi.org/10.1109/QoMEX.2017.7965658
Göring, Steve; Raake, Alexander; Feiten, Bernhard
A framework for QoE analysis of encrypted video streams. - In: 2017 Ninth International Conference on Quality of Multimedia Experience (QoMEX), ISBN 978-1-5386-4024-1, (2017), insges. 3 S.

Today most internet traffic is generated by video streaming. YouTube and other video streaming platforms are using encrypted streams (HTTPS) for transport of video content. Encryption will lead to more requirements on network and content providers, e.g. caching mechanisms will not work direct. Estimation of video quality for measuring users satisfaction is also harder because there is no direct access to the video bitstream. We are building up a framework for analyzing video quality that allows us to store client information, decrypted network traffic and encrypted messages. Our approach is based on a man-in-the-middle proxy for storing the decrypted video bitstream, active probing and traffic shaping. Using these data, we are able to calculate video QoE values for example using a model such as ITU-T Rec. P.1203. Our framework will be used for generating datasets for encrypted video stream analysis, analyzing internal behavior of video streaming platforms, and more. For experimental evaluation, in this paper we analyze the influence of our man-in-the-middle proxy on key-performance indicators (KPIs) for video streaming quality.



https://doi.org/10.1109/QoMEX.2017.7965640
Soloducha, Michał; Raake, Alexander; Kettler, Frank; Bleiholder, Stefan
Testing conversational quality of VoIP with different terminals and degradations. - In: 2017 Ninth International Conference on Quality of Multimedia Experience (QoMEX), ISBN 978-1-5386-4024-1, (2017), insges. 3 S.

https://doi.org/10.1109/QoMEX.2017.7965639
Raake, Alexander; Garcia, Marie-Neige; Robitza, Werner; List, Peter; Göring, Steve; Feiten, Bernhard
A bitstream-based, scalable video-quality model for HTTP adaptive streaming: ITU-T P.1203.1. - In: 2017 Ninth International Conference on Quality of Multimedia Experience (QoMEX), ISBN 978-1-5386-4024-1, (2017), insges. 6 S.

https://doi.org/10.1109/QoMEX.2017.7965631
Sloma, Ulrike; Klein, Florian; Helbig, Thomas; Skowronek, Janto; Gadyuchko, Maria; Werner, Stephan; Breitbarth, Andreas; Neidhardt, Annika; Chillian, Antje; Brandenburg, Karlheinz; Raake, Alexander; Notni, Gunther; Sattel, Thomas; Witte, Hartmut; Husar, Peter
GO-LEM - Charakterisierung der auditorischen und auditorisch-visuellen Wahrnehmung des Menschen in Alltagsszenen. - In: Prävention von arbeitsbedingten Gesundheitsgefahren und Erkrankungen, (2017), S. 349-356

Wierstorf, Hagen; Raake, Alexander; Spors, Sascha
Assessing localization accuracy in sound field synthesis. - In: The journal of the Acoustical Society of America, ISSN 1520-8524, Bd. 141 (2017), 2, S. 1111-1119

http://dx.doi.org/10.1121/1.4976061
Winkler, Stefan; Chen, Chang Wen; Raake, Alexander; Schelkens, Peter; Skorin-Kapov, Lea
Introduction to the issue on measuring quality of experience for advanced media technologies and services. - In: IEEE journal of selected topics in signal processing, ISSN 1941-0484, Bd. 11 (2017), 1, S. 3-5

https://doi.org/10.1109/JSTSP.2016.2639779
Treybig, Lukas; Cohrs, Thaden; Schade, Hans-Peter
Arraymikrofone mit netzwerkbasierter Übertragung im Fernsehstudio :
Network-based array microphones in TV studio. - In: Expertise in audio media, ISBN 978-3-9812830-7-5, (2017), S. 245-248

This contribution gives a field report for the use of network-based array microphones in a television news broadcast studio. Presenters are captured with multiple beam patterns. A network based signal processing unit used to generate this multiple directivity. The test setup has been integrated into the production environment with the IEEE AVB/TSN network standard suite using the existing network infrastructure. The main part of this contribution refers to the question in how far the deployment of array microphones with digital signal processing is suitable for television studios. For this purpose, the sound quality of the microphone array beams is compared with signals captured with conventional clip-on microphones. Moreover, resulting acoustic problems and solutions are discussed.



Sackl, Andreas; Schatz, Raimund; Raake, Alexander
More than I ever wanted or just good enough? : user expectations and subjective quality perception in the context of networked multimedia services. - In: Quality and user experience, ISSN 2366-0147, Volume 2 (2017), issue 1, article 3, Seite 1-27

https://doi.org/10.1007/s41233-016-0004-z
Lebreton, Pierre; Raake, Alexander; Barkowsky, Marcus
Evaluation of aesthetic appeal with regard of user's knowledge. - In: Electronic imaging, ISSN 2470-1173, Bd. 28 (2016), 16, art00015, S. HVEI-119.1-HVEI-119.6

https://doi.org/10.2352/ISSN.2470-1173.2016.16.HVEI-119
Quintero, Miguel Rios; Raake, Alexander
Are lab-based audiovisual quality tests reflecting what users experience at home?. - In: Electronic imaging, ISSN 2470-1173, Bd. 28 (2016), 16, art00018, S. HVEI-123.1-HVEI-123.6

https://doi.org/10.2352/ISSN.2470-1173.2016.16.HVEI-123
Robitza, Werner; Schönfellner, Sabine; Raake, Alexander
A theoretical approach to the formation of quality of experience and user behavior in multimedia services. - In: 5th ISCA/DEGA Workshop on Perceptual Quality of Systems (PQS 2016), (2016), S. 39-43

https://doi.org/10.21437/PQS.2016-9
Soloducha, Michal; Raake, Alexander; Kettler, Frank; Rohrer, Nils; Parotat, Eva; Wältermann, Marcel; Trevisany, Sven; Voigt, Peter
Towards VoIP quality testing with real-life devices and degradations. - In: Speech communication, (2016), S. 205-209

http://ieeexplore.ieee.org/document/7776176/
Raake, Alexander; Wierstorf, Hagen
Assessment of audio quality and experience using binaural-hearing models. - In: 22nd International Congress on Acoustics (ICA 2016), (2016), S. 3131-3140

Raake, Alexander;
Views on sound quality. - In: 22nd International Congress on Acoustics (ICA 2016), (2016), S. 1633-1642

Robitza, Werner; Kara, Péter A.; Martini, Maria G.; Raake, Alexander
On the experimental biases in user behavior and QoE assessment in the lab. - In: 2016 IEEE Globecom Workshops (GC Wkshps), ISBN 978-1-5090-2482-7, (2016), insges. 6 S.

https://doi.org/10.1109/GLOCOMW.2016.7848978
Cohrs, Thaden; Treybig, Lukas
Array microphones and signal processing within an ethernet-based AVB network. - In: Proceedings 2016 IEEE 6th International Conference on Consumer Electronics - Berlin (ICCE-Berlin), ISBN 978-1-5090-2096-6, (2016), S. 145-148

http://dx.doi.org/10.1109/ICCE-Berlin.2016.7684741
Zhou, Yun; Xu, Tao; Raake, Alexander; Cai, Yanping
Access control is not enough: how owner and guest set limits to protect privacy when sharing smartphone. - In: HCI International 2016 Posters' Extended Abstracts, (2016), S. 494-499

http://dx.doi.org/10.1007/978-3-319-40548-3_82
Hold, Christoph; Wierstorf, Hagen; Raake, Alexander
The difference between stereophony and wave field synthesis in the context of popular music. - In: 140th Audio Engineering Society International Convention 2016, ISBN 978-1-5108-2570-3, (2016), S. 667-674

Lebreton, Pierre; Raake, Alexander; Barkowsky, Marcus
Studying user agreement on aesthetic appeal ratings and its relation with technical knowledge. - In: QoMEX 2016, ISBN 978-1-5090-0354-9, (2016), insges. 6 S.

http://dx.doi.org/10.1109/QoMEX.2016.7498934
Soloducha, Michal; Raake, Alexander; Kettler, Frank; Voigt, Peter
Lombard speech database for German language. - In: Fortschritte der Akustik, ISBN 978-3-939296-10-2, (2016), S. 992-995

Hold, Christoph; Wierstorf, Hagen; Raake, Alexander
Tonmischung für Stereophonie und Wellenfeldsynthese im Vergleich. - In: Fortschritte der Akustik, ISBN 978-3-939296-10-2, (2016), S. 1023-1026

Spur, Maxim; Guse, Dennis; Skowronek, Janto
Influence of packet loss and double-talk on the perceived quality of multi-party telephone conferencing with binaurally presented spatial audio reproduction. - In: Fortschritte der Akustik, ISBN 978-3-939296-10-2, (2016), S. 968-971

Skowronek, Janto; Raake, Alexander
Akustische Herausforderungen für interaktive Gruppenkommunikation. - In: Fortschritte der Akustik, ISBN 978-3-939296-10-2, (2016), S. 952-955

Walther, Thomas; Blauert, Jens; Raake, Alexander
System zur Simulation von kognitivem Feedback im Kontext auditiver Szenenanalyse und auditiver Qualitätsbeurteilung. - In: Fortschritte der Akustik, ISBN 978-3-939296-10-2, (2016), S. 22-25

Liu, Mohan; Müller, Karsten; Raake, Alexander
Efficient no-reference metric for sharpness mismatch artifact between stereoscopic views. - In: Journal of visual communication and image representation, ISSN 1047-3203, Bd. 39 (2016), S. 132-141

http://dx.doi.org/10.1016/j.jvcir.2016.05.010