Influence of multi-modal interactive formats on subjective audio quality and exploration behavior. - In: IMX 2023, (2023), S. 115-128
This study uses a mixed between- and within-subjects test design to evaluate the influence of interactive formats on the quality of binaurally rendered 360˚ spatial audio content. Focusing on ecological validity using real-world recordings of 60 s duration, three independent groups of subjects () were exposed to three formats: audio only (A), audio with 2D visuals (A2DV), and audio with head-mounted display (AHMD) visuals. Within each interactive format, two sessions were conducted to evaluate degraded audio conditions: bit-rate and Ambisonics order. Our results show a statistically significant effect (p < .05) of format only on spatial audio quality ratings for Ambisonics order. Exploration data analysis shows that format A yields little variability in exploration, while formats A2DV and AHMD yield broader viewing distribution of 360˚ content. The results imply audio quality factors can be optimized depending on the interactive format.
Cross-timescale experience evaluation framework for productive teaming. - In: Engineering for a Changing World, (2023), 5.4.129, insges. 6 S.
Kolloquium: 60th ISC, Ilmenau Scientific Colloquium, Ilmenau, 04.-08.09.2023
This paper presents the initial concept for an evaluation framework to systematically evaluate productive teaming (PT). We consider PT as adaptive human-machine interactions between human users and augmented technical production systems. Also, human-to-human communication as part of a hybrid team with multiple human actors is considered, as well as human-human and human-machine communication for remote and mixed remote- and co-located teams. The evaluation comprises objective, performance-related success indicators, behavioral metadata, and measures of human experience. In particular, it considers affective, attentional and intentional states of human team members, their influence on interaction dynamics in the team, and researches appropriate strategies to satisfyingly adjust dysfunctional dynamics, using concepts of companion technology. The timescales under consideration span from seconds to several minutes, with selected studies targeting hour-long interactions and longer-term effects such as effort and fatigue. Two example PT scenarios will be discussed in more detail. To enable generalization and a systematic evaluation, the scenarios’ use cases will be decomposed into more general modules of interaction.
Omnidirectional video saliency. - In: Immersive video technologies, (2023), S. 123-158
When exploring the visual world, humans (and other species) are faced with more information than they are able to process. To overcome this, selective visual attention allows them to gaze rapidly towards objects of interest in the visual environment. This function of the visual system is of high importance and has received a lot of attention from the scientific community. Being able to understand how selective attention works find applications in various fields, such as design, advertising, perceptual coding and streaming, etc., which makes it a highly regarded research topic. In recent years, with the advent of omnidirectional images and videos (ODI, ODV) new challenges are raised compared to traditional 2D visual content. Indeed, with such type of OD content, users are not restricted to explore what is shown to them in a static viewport, but are now able to freely explore the entire visual world around them by moving their head and even in some cases, by moving their entire body. Therefore, in this chapter, the work on the analysis of user behavior as well as the effort done towards modeling selective visual attention and visual exploration in OD images and video will be introduced. This chapter will provide information on the different approaches that have been taken, and key challenges that have been raised compared to traditional 2D contents, allowing the reader to grasp the key work that has been done on the understanding and modeling of the exploration of OD contents.
Subjective and objective quality assessment for omnidirectional video. - In: Immersive video technologies, (2023), S. 85-122
Video quality assessment is generally important to assess any kind of immersive media technology, for example, to evaluate a captured content, algorithms for encoding and projection, and systems, as well as for technology optimization. This chapter provides an overview of the two types of video quality assessment, subjective testing with human viewers, and quality prediction or estimation using video quality metrics or models. First, viewing tests with humans as the gold standard for video quality are reviewed in light of their instantiation for omnidirectional video (ODV). In the second part of the chapter, the less time-consuming, better scalable second type of assessment with objective video quality metrics and models is discussed, considering the specific requirements of ODV. Often they incorporate computational models of human perception and content properties. ODV introduces the challenges of interactivity compared to standard 2D video and typically spherical projection distortions due to its omnidirectional, “point-of-view” (in terms of camera-shot type) nature. Accordingly, subjective tests for ODV include specific considerations of the omnidirectional nature of the presented content and dedicated head-rotation or even additional eyetracking data capture. In the last part of the chapter, it is shown how to improve objective video quality prediction by taking into account user behavior and projection distortions.
Automatic camera selection, shot size, and video editing in theater multi-camera recordings. - In: IEEE access, ISSN 2169-3536, Bd. 11 (2023), S. 96673-96692
In a non-professional environment, multi-camera recordings of theater performances or other stage shows are difficult to realize, because amateurs are usually untrained in camera work and in using a vision mixing desk that mixes multiple cameras. This can be remedied by a production process with high-resolution cameras where recordings of image sections from long shots or medium-long shots are manually or automatically cropped in post-production. For this purpose, Gandhi et al. presented a single-camera system (referred to as Gandhi Recording System in the paper) that obtains close-ups from a high-resolution recording from the central perspective. The proposed system in this paper referred to as “Proposed Recording System” extends the method to four perspectives based on a Reference Recording System from professional TV theater recordings from the Ohnsorg Theater. Rules for camera selection, image cropping, and montage are derived from the Reference Recording System in this paper. For this purpose, body and pose recognition software is used and the stage action is reconstructed from the recordings into the stage set. Speakers are recognized by detecting lip movements and speaker changes are identified using audio diarization software. The Proposed Recording System proposed in this paper is practically instantiated on a school theater recording made by laymen using four 4K cameras. An automatic editing script is generated that outputs a montage of a scene. The principles can also be adapted for other recording situations with an audience, such as lectures, interviews, discussions, talk shows, gala events, award ceremonies, and the like. More than 70 % of test persons confirm in an online study the added value of the perspective diversity of four cameras of the Proposed Recording System versus the single-camera method of Gandhi et al.
Proof-of-concept study to evaluate the impact of spatial audio on social presence and user behavior in multi-modal VR communication. - In: IMX 2023, (2023), S. 209-215
This paper presents a proof-of-concept study conducted to analyze the effect of simple diotic vs. spatial, position-dynamic binaural synthesis on social presence in VR, in comparison with face-to-face communication in the real world, for a sample two-party scenario. A conversational task with shared visual reference was realized. The collected data includes questionnaires for direct assessment, tracking data, and audio and video recordings of the individual participants’ sessions for indirect evaluation. While tendencies for improvements with binaural over diotic presentation can be observed, no significant difference in social presence was found for the considered scenario. The gestural analysis revealed that participants used the same amount and type of gestures in face-to-face as in VR, highlighting the importance of non-verbal behavior in communication. As part of the research, an end-to-end framework for conducting communication studies and analysis has been developed.
Bitstream-based video quality modeling and analysis of HTTP-based adaptive streaming. - Ilmenau : Universitätsbibliothek, 2023. - 1 Online-Ressource (viii, 252 Seiten)
Technische Universität Ilmenau, Dissertation 2023
Die Verbreitung erschwinglicher Videoaufnahmetechnologie und verbesserte Internetbandbreiten ermöglichen das Streaming von hochwertigen Videos (Auflösungen > 1080p, Bildwiederholraten ≥ 60fps) online. HTTP-basiertes adaptives Streaming ist die bevorzugte Methode zum Streamen von Videos, bei der Videoparameter an die verfügbare Bandbreite angepasst wird, was sich auf die Videoqualität auswirkt. Adaptives Streaming reduziert Videowiedergabeunterbrechnungen aufgrund geringer Netzwerkbandbreite, wirken sich jedoch auf die wahrgenommene Qualität aus, weswegen eine systematische Bewertung dieser notwendig ist. Diese Bewertung erfolgt üblicherweise für kurze Abschnitte von wenige Sekunden und während einer Sitzung (bis zu mehreren Minuten). Diese Arbeit untersucht beide Aspekte mithilfe perzeptiver und instrumenteller Methoden. Die perzeptive Bewertung der kurzfristigen Videoqualität umfasst eine Reihe von Labortests, die in frei verfügbaren Datensätzen publiziert wurden. Die Qualität von längeren Sitzungen wurde in Labortests mit menschlichen Betrachtern bewertet, die reale Betrachtungsszenarien simulieren. Die Methodik wurde zusätzlich außerhalb des Labors für die Bewertung der kurzfristigen Videoqualität und der Gesamtqualität untersucht, um alternative Ansätze für die perzeptive Qualitätsbewertung zu erforschen. Die instrumentelle Qualitätsevaluierung wurde anhand von bitstrom- und hybriden pixelbasierten Videoqualitätsmodellen durchgeführt, die im Zuge dieser Arbeit entwickelt wurden. Dazu wurde die Modellreihe AVQBits entwickelt, die auf den Labortestergebnissen basieren. Es wurden vier verschiedene Modellvarianten von AVQBits mit verschiedenen Inputinformationen erstellt: Mode 3, Mode 1, Mode 0 und Hybrid Mode 0. Die Modellvarianten wurden untersucht und schneiden besser oder gleichwertig zu anderen aktuellen Modellen ab. Diese Modelle wurden auch auf 360˚- und Gaming-Videos, HFR-Inhalte und Bilder angewendet. Darüber hinaus wird ein Langzeitintegrationsmodell (1 - 5 Minuten) auf der Grundlage des ITU-T-P.1203.3-Modells präsentiert, das die verschiedenen Varianten von AVQBits mit sekündigen Qualitätswerten als Videoqualitätskomponente des vorgeschlagenen Langzeitintegrationsmodells verwendet. Alle AVQBits-Varianten, das Langzeitintegrationsmodul und die perzeptiven Testdaten wurden frei zugänglich gemacht, um weitere Forschung zu ermöglichen.
Image appeal revisited: analysis, new dataset, and prediction models. - In: IEEE access, ISSN 2169-3536, Bd. 11 (2023), S. 69563-69585
There are more and more photographic images uploaded to social media platforms such as Instagram, Flickr, or Facebook on a daily basis. At the same time, attention and consumption for such images is high, with image views and liking as one of the success factors for users and driving forces for social media algorithms. Here, “liking” can be assumed to be driven by image appeal and further factors such as who is posting the images and what they may show and reveal about the posting person. It is therefore of high research interest to evaluate the appeal of such images in the context of social media platforms. Such an appeal evaluation may help to improve image quality or could be used as an additional filter criterion to select good images. To analyze image appeal, various datasets have been established over the past years. However, not all datasets contain high-resolution images, are up to date, or include additional data, such as meta-data or social-media-type data such as likes and views. We created our own dataset “AVT-ImageAppeal-Dataset”, which includes images from different photo-sharing platforms. The dataset also includes a subset of other state-of-the-art datasets and is extended by social-media-type data, meta-data, and additional images. In this paper, we describe the dataset and a series of laboratory- and crowd-tests we conducted to evaluate image appeal. These tests indicate that there is only a small influence when likes and views are included in the presentation of the images in comparison to when these are not shown, and also the appeal ratings are only a little correlated to likes and views. Furthermore, it is shown that lab and crowd tests are highly similar considering the collected appeal ratings. In addition to the dataset, we also describe various machine learning models for the prediction of image appeal, using only the photo itself as input. The models have a similar or slightly better performance than state-of-the-art models. The evaluation indicates that there is still an improvement in image appeal prediction and furthermore, other aspects, such as the presentation context could be evaluated.
6G NeXt - toward 6G split computing network applications: use cases and architecture. - In: Mobilkommunikation, (2023), S. 126-131
Quality assessment of higher resolution images and videos with remote testing. - In: Quality and user experience, ISSN 2366-0147, Bd. 8 (2023), 1, 2, S. 1-26
In many research fields, human-annotated data plays an important role as it is used to accomplish a multitude of tasks. One such example is in the field of multimedia quality assessment where subjective annotations can be used to train or evaluate quality prediction models. Lab-based tests could be one approach to get such quality annotations. They are usually performed in well-defined and controlled environments to ensure high reliability. However, this high reliability comes at a cost of higher time consumption and costs incurred. To mitigate this, crowd or online tests could be used. Usually, online tests cover a wider range of end devices, environmental conditions, or participants, which may have an impact on the ratings. To verify whether such online tests can be used for visual quality assessment, we designed three online tests. These online tests are based on previously conducted lab tests as this enables comparison of the results of both test paradigms. Our focus is on the quality assessment of high-resolution images and videos. The online tests use AVrate Voyager, which is a publicly accessible framework for online tests. To transform the lab tests into online tests, dedicated adaptations in the test methodologies are required. The considered modifications are, for example, a patch-based or centre cropping of the images and videos, or a randomly sub-sampling of the to-be-rated stimuli. Based on the analysis of the test results in terms of correlation and SOS analysis it is shown that online tests can be used as a reliable replacement for lab tests albeit with some limitations. These limitations relate to, e.g., lack of appropriate display devices, limitation of web technologies, and modern browsers considering support for different video codecs and formats.