Publications of the Department of Audiovisual Technology

The following list (automatically generated by the University Library) contains the publications from the year 2016. The publications up to the year 2015 can be found on an extra page.

Note: If you want to search through all the publications, select "Show All" and then you can use the browser search with Ctrl+F.

Results: 165
Created on: Thu, 02 May 2024 23:03:32 +0200 in 0.0843 sec


Göring, Steve; Merten, Rasmus; Raake, Alexander
DNN-based photography rule prediction using photo tags. - In: 2023 15th International Conference on Quality of Multimedia Experience (QoMEX), (2023), S. 83-86

Instagram and Flickr are just two examples of photo-sharing platforms which are currently used to upload thousands of images on a daily basis. One important aspect in such social media contexts is to know whether an image is of high appeal or not. In particular, to understand the composition of a photo and to improve reading flow, several photo rules have been established. In this paper, we focus on eight selected photo rules. To automatically predict whether an image follows one of these rules or not, we train 13 deep neural networks in a transfer-learning setup and compare their prediction performance. As a dataset, we use photos downloaded from Flickr with specifically selected image tags, which reflect the eight photo rules. There-fore, our dataset does not need additional human annotations. ResNet50 has the best prediction performance, however, there are images that follow several rules, which must be addressed in follow-up work. The code and the data (image URLs) are made publicly available for reproducibility.



https://doi.org/10.1109/QoMEX58391.2023.10178505
Mossakowski, Till; Hedblom, Maria M.; Neuhaus, Fabian; Arévalo Arboleda, Stephanie; Raake, Alexander
Using the diagrammatic image schema language for joint human-machine cognition. - In: Engineering for a changing world, (2023), 5.1.133, S. 1-5

https://doi.org/10.22032/dbt.58917
Robotham, Thomas; Singla, Ashutosh; Raake, Alexander; Rummukainen, Olli S.; Habets, Emanuel A.P.
Influence of multi-modal interactive formats on subjective audio quality and exploration behavior. - In: IMX 2023, (2023), S. 115-128

This study uses a mixed between- and within-subjects test design to evaluate the influence of interactive formats on the quality of binaurally rendered 360&ring; spatial audio content. Focusing on ecological validity using real-world recordings of 60 s duration, three independent groups of subjects () were exposed to three formats: audio only (A), audio with 2D visuals (A2DV), and audio with head-mounted display (AHMD) visuals. Within each interactive format, two sessions were conducted to evaluate degraded audio conditions: bit-rate and Ambisonics order. Our results show a statistically significant effect (p < .05) of format only on spatial audio quality ratings for Ambisonics order. Exploration data analysis shows that format A yields little variability in exploration, while formats A2DV and AHMD yield broader viewing distribution of 360&ring; content. The results imply audio quality factors can be optimized depending on the interactive format.



https://doi.org/10.1145/3573381.3596155
Raake, Alexander; Broll, Wolfgang; Chuang, Lewis L.; Domahidi, Emese; Wendemuth, Andreas
Cross-timescale experience evaluation framework for productive teaming. - In: Engineering for a changing world, (2023), 5.4.129, S. 1-6

This paper presents the initial concept for an evaluation framework to systematically evaluate productive teaming (PT). We consider PT as adaptive human-machine interactions between human users and augmented technical production systems. Also, human-to-human communication as part of a hybrid team with multiple human actors is considered, as well as human-human and human-machine communication for remote and mixed remote- and co-located teams. The evaluation comprises objective, performance-related success indicators, behavioral metadata, and measures of human experience. In particular, it considers affective, attentional and intentional states of human team members, their influence on interaction dynamics in the team, and researches appropriate strategies to satisfyingly adjust dysfunctional dynamics, using concepts of companion technology. The timescales under consideration span from seconds to several minutes, with selected studies targeting hour-long interactions and longer-term effects such as effort and fatigue. Two example PT scenarios will be discussed in more detail. To enable generalization and a systematic evaluation, the scenarios’ use cases will be decomposed into more general modules of interaction.



https://doi.org/10.22032/dbt.58930
Melnyk, Sergiy; Zhou, Qiuheng; Schotten, Hans D.; Rüther-Kindel, Wolfgang; Quaeck, Fabian; Stuckert, Nick; Vilter, Robert; Gebauer, Lisa; Galkow-Schneider, Mandy; Friese, Ingo; Drüsedow, Steffen; Pfandzelter, Tobias; Malekabbasi, Mohammadreza; Bermbach, David; Bassbouss, Louay; Zoubarev, Alexander; Neparidze, Andy; Kritzner, Arndt; Hartbrich, Jakob; Raake, Alexander; Zschau, Enrico; Schwahn, Klaus-Jürgen
6G NeXt - joint communication and compute mobile network: use cases and architecture. - In: Kommunikation in der Automation, (2023), 6, insges. 10 S.

The research on the new generation mobile networks is currently in the phase of defining the key technologies to make 6G successful. Hereby, the research project 6G NeXt is aiming to provide a tight integration between the communication network, consisting of the radio access as well as backbone network, and processing facilities. By the concept of split computing, the processing facilities are distributed over the entire backbone network, from centralised cloud to the edge cloud at a base station. Based on two demanding use cases, Smart Drones and Hologradic Communication, we investigate a joint communication and compute architecture that will make the application of tomorrow become reality.



https://opendata.uni-halle.de//handle/1981185920/113595
Chao, Fang-Yi; Battisti, Federica; Lebreton, Pierre; Raake, Alexander
Omnidirectional video saliency. - In: Immersive video technologies, (2023), S. 123-158

When exploring the visual world, humans (and other species) are faced with more information than they are able to process. To overcome this, selective visual attention allows them to gaze rapidly towards objects of interest in the visual environment. This function of the visual system is of high importance and has received a lot of attention from the scientific community. Being able to understand how selective attention works find applications in various fields, such as design, advertising, perceptual coding and streaming, etc., which makes it a highly regarded research topic. In recent years, with the advent of omnidirectional images and videos (ODI, ODV) new challenges are raised compared to traditional 2D visual content. Indeed, with such type of OD content, users are not restricted to explore what is shown to them in a static viewport, but are now able to freely explore the entire visual world around them by moving their head and even in some cases, by moving their entire body. Therefore, in this chapter, the work on the analysis of user behavior as well as the effort done towards modeling selective visual attention and visual exploration in OD images and video will be introduced. This chapter will provide information on the different approaches that have been taken, and key challenges that have been raised compared to traditional 2D contents, allowing the reader to grasp the key work that has been done on the understanding and modeling of the exploration of OD contents.



https://doi.org/10.1016/B978-0-32-391755-1.00011-0
Croci, Simone; Singla, Ashutosh; Fremerey, Stephan; Raake, Alexander; Smoliâc, Aljoscha
Subjective and objective quality assessment for omnidirectional video. - In: Immersive video technologies, (2023), S. 85-122

Video quality assessment is generally important to assess any kind of immersive media technology, for example, to evaluate a captured content, algorithms for encoding and projection, and systems, as well as for technology optimization. This chapter provides an overview of the two types of video quality assessment, subjective testing with human viewers, and quality prediction or estimation using video quality metrics or models. First, viewing tests with humans as the gold standard for video quality are reviewed in light of their instantiation for omnidirectional video (ODV). In the second part of the chapter, the less time-consuming, better scalable second type of assessment with objective video quality metrics and models is discussed, considering the specific requirements of ODV. Often they incorporate computational models of human perception and content properties. ODV introduces the challenges of interactivity compared to standard 2D video and typically spherical projection distortions due to its omnidirectional, “point-of-view” (in terms of camera-shot type) nature. Accordingly, subjective tests for ODV include specific considerations of the omnidirectional nature of the presented content and dedicated head-rotation or even additional eyetracking data capture. In the last part of the chapter, it is shown how to improve objective video quality prediction by taking into account user behavior and projection distortions.



https://doi.org/10.1016/B978-0-32-391755-1.00010-9
Stoll, Eckhard; Breide, Stephan; Göring, Steve; Raake, Alexander
Automatic camera selection, shot size, and video editing in theater multi-camera recordings. - In: IEEE access, ISSN 2169-3536, Bd. 11 (2023), S. 96673-96692

In a non-professional environment, multi-camera recordings of theater performances or other stage shows are difficult to realize, because amateurs are usually untrained in camera work and in using a vision mixing desk that mixes multiple cameras. This can be remedied by a production process with high-resolution cameras where recordings of image sections from long shots or medium-long shots are manually or automatically cropped in post-production. For this purpose, Gandhi et al. presented a single-camera system (referred to as Gandhi Recording System in the paper) that obtains close-ups from a high-resolution recording from the central perspective. The proposed system in this paper referred to as “Proposed Recording System” extends the method to four perspectives based on a Reference Recording System from professional TV theater recordings from the Ohnsorg Theater. Rules for camera selection, image cropping, and montage are derived from the Reference Recording System in this paper. For this purpose, body and pose recognition software is used and the stage action is reconstructed from the recordings into the stage set. Speakers are recognized by detecting lip movements and speaker changes are identified using audio diarization software. The Proposed Recording System proposed in this paper is practically instantiated on a school theater recording made by laymen using four 4K cameras. An automatic editing script is generated that outputs a montage of a scene. The principles can also be adapted for other recording situations with an audience, such as lectures, interviews, discussions, talk shows, gala events, award ceremonies, and the like. More than 70 % of test persons confirm in an online study the added value of the perspective diversity of four cameras of the Proposed Recording System versus the single-camera method of Gandhi et al.



https://doi.org/10.1109/ACCESS.2023.3311256
Immohr, Felix; Rendle, Gareth; Neidhardt, Annika; Göring, Steve; Ramachandra Rao, Rakesh Rao; Arévalo Arboleda, Stephanie; Froehlich, Bernd; Raake, Alexander
Proof-of-concept study to evaluate the impact of spatial audio on social presence and user behavior in multi-modal VR communication. - In: IMX 2023, (2023), S. 209-215

This paper presents a proof-of-concept study conducted to analyze the effect of simple diotic vs. spatial, position-dynamic binaural synthesis on social presence in VR, in comparison with face-to-face communication in the real world, for a sample two-party scenario. A conversational task with shared visual reference was realized. The collected data includes questionnaires for direct assessment, tracking data, and audio and video recordings of the individual participants’ sessions for indirect evaluation. While tendencies for improvements with binaural over diotic presentation can be observed, no significant difference in social presence was found for the considered scenario. The gestural analysis revealed that participants used the same amount and type of gestures in face-to-face as in VR, highlighting the importance of non-verbal behavior in communication. As part of the research, an end-to-end framework for conducting communication studies and analysis has been developed.



https://doi.org/10.1145/3573381.3596458
Ramachandra Rao, Rakesh Rao;
Bitstream-based video quality modeling and analysis of HTTP-based adaptive streaming. - Ilmenau : Universitätsbibliothek, 2023. - 1 Online-Ressource (viii, 252 Seiten)
Technische Universität Ilmenau, Dissertation 2023

Die Verbreitung erschwinglicher Videoaufnahmetechnologie und verbesserte Internetbandbreiten ermöglichen das Streaming von hochwertigen Videos (Auflösungen > 1080p, Bildwiederholraten ≥ 60fps) online. HTTP-basiertes adaptives Streaming ist die bevorzugte Methode zum Streamen von Videos, bei der Videoparameter an die verfügbare Bandbreite angepasst wird, was sich auf die Videoqualität auswirkt. Adaptives Streaming reduziert Videowiedergabeunterbrechnungen aufgrund geringer Netzwerkbandbreite, wirken sich jedoch auf die wahrgenommene Qualität aus, weswegen eine systematische Bewertung dieser notwendig ist. Diese Bewertung erfolgt üblicherweise für kurze Abschnitte von wenige Sekunden und während einer Sitzung (bis zu mehreren Minuten). Diese Arbeit untersucht beide Aspekte mithilfe perzeptiver und instrumenteller Methoden. Die perzeptive Bewertung der kurzfristigen Videoqualität umfasst eine Reihe von Labortests, die in frei verfügbaren Datensätzen publiziert wurden. Die Qualität von längeren Sitzungen wurde in Labortests mit menschlichen Betrachtern bewertet, die reale Betrachtungsszenarien simulieren. Die Methodik wurde zusätzlich außerhalb des Labors für die Bewertung der kurzfristigen Videoqualität und der Gesamtqualität untersucht, um alternative Ansätze für die perzeptive Qualitätsbewertung zu erforschen. Die instrumentelle Qualitätsevaluierung wurde anhand von bitstrom- und hybriden pixelbasierten Videoqualitätsmodellen durchgeführt, die im Zuge dieser Arbeit entwickelt wurden. Dazu wurde die Modellreihe AVQBits entwickelt, die auf den Labortestergebnissen basieren. Es wurden vier verschiedene Modellvarianten von AVQBits mit verschiedenen Inputinformationen erstellt: Mode 3, Mode 1, Mode 0 und Hybrid Mode 0. Die Modellvarianten wurden untersucht und schneiden besser oder gleichwertig zu anderen aktuellen Modellen ab. Diese Modelle wurden auch auf 360&ring;- und Gaming-Videos, HFR-Inhalte und Bilder angewendet. Darüber hinaus wird ein Langzeitintegrationsmodell (1 - 5 Minuten) auf der Grundlage des ITU-T-P.1203.3-Modells präsentiert, das die verschiedenen Varianten von AVQBits mit sekündigen Qualitätswerten als Videoqualitätskomponente des vorgeschlagenen Langzeitintegrationsmodells verwendet. Alle AVQBits-Varianten, das Langzeitintegrationsmodul und die perzeptiven Testdaten wurden frei zugänglich gemacht, um weitere Forschung zu ermöglichen.



https://doi.org/10.22032/dbt.57583