Publications of the Department of Audiovisual Technology

The following list (automatically generated by the University Library) contains the publications from the year 2016. The publications up to the year 2015 can be found on an extra page.

Note: If you want to search through all the publications, select "Show All" and then you can use the browser search with Ctrl+F.

Results: 172
Created on: Mon, 22 Jul 2024 23:04:25 +0200 in 0.0937 sec

Mikhailova, Veronika; Kunert, Christian; Hartbrich, Jakob; Schwandt, Tobias; Gerhardt, Christoph; Raake, Alexander; Broll, Wolfgang; Döring, Nicola
Work-in-progress: older adults' experiences with an augmented reality communication system. - In: IMX 2024, (2024), S. 282-387

Given the profound impact of staying socially connected on the well-being of older adults, this study explores the potential of augmented reality (AR) systems to enrich their social lives. A wearable AR communication system prototype was developed and tested in a user study involving N = 16 older adults from Germany. Participants wore an AR headset and engaged in a conversation task with a remote person represented by an avatar. Older adults’ experiences were assessed using think-aloud protocols, qualitative observations, posttest questionnaires, and semi-structured oral interviews. Preliminary findings indicate overall participant satisfaction, with minimal observed difficulties in headset usage and avatar-mediated interpersonal communication. The positive engagement during AR conversations highlights the system’s potential to provide positive communication experiences among older individuals. This work-in-progress paper introduces the developed system prototype and outlines the conducted user study. Further data analyses will provide deeper insights into older adults’ experiences with the system. The results will contribute to refining the prototype and offer valuable insights for the development of AR communication systems tailored to the needs and preferences of older adults.
Stoll, Eckhard;
Systematische Analyse von genrespezifischen Videoaufnahmen am Beispiel von Theateraufzeichnungen : Automatisierung von Mehrkameraaufnahmen. - Ilmenau : Universitätsbibliothek, 2024. - 1 Online-Ressource (227 Seiten)
Technische Universität Ilmenau, Dissertation 2024

In dieser Arbeit wird vorgestellt, wie Mehrkameraaufzeichnungen von Theateraufführungen in einer nicht-professionellen Umgebung vereinfacht durchgeführt und im Endergebnis verbessert werden können. Eine Produktionsmethode mit vier hochauflösenden Kameras wird vorgeschlagen. Die Totalen können mit feststehenden Kameras und die Halbtotalen mit nur geringer Nachführung aufgenommen werden. Aus diesen Aufnahmen werden in der Postproduktion automatisiert Bildausschnitte bis hin zu Nahaufnahmen ausgewählt („gecroppt“). Zur Ermittlung von Parametern, die eine gute Theateraufzeichnung ausmachen, wird eine professionelle TV-Theateraufzeichnungsreihe analysiert. Es wird eine Rekonstruktion der TV-Einstellungen in das gesamte Bühnenbild entwickelt und Bewegungen von Personen und Distanzen zwischen Personen können somit erfasst werden. Da Einstellungsgrößen in der Filmsprache nur diskrete Zuordnungen erhalten, wie z. B. Totale oder Nahaufnahme, wird für die Analyse- und Automatisierungszwecke ein quantitatives Schema entwickelt, das den menschlichen Körper in 8 Zonen unterteilt und Einstellungsgrößen zahlenmäßig kontinuierlich erfassen kann. Die Szenen und Einstellungen können so in ihrer Gesamtheit analysiert werden. Ablaufdiagramme für Szenen mit unterschiedlicher Personenanzahl auf der Bühne werden entwickelt, wie sie für eine spätere Automatisierung eingesetzt werden können. In den TV-Aufzeichnungen wird analysiert, wann die Bildschnitte bei Sprecherwechsel erfolgen. Das individuelle Schnittverhalten von fünf unterschiedlichen professionellen Bildmischern wird näher untersucht. Es zeigt sich, dass die Schnitte individuell unterschiedlich vor, in oder nach den Sprechpausen gesetzt werden. Es wird ein Modell entwickelt und ein Algorithmus, in dem individuelle Eigenschaften von Bildmischern eingestellt werden können. Die entwickelten Modelle und Prinzipien werden bei einer Mehrkameraaufzeichnung einer Schultheateraufführung mit vier 4K-Kameras erprobt. Laien bedienen die Kameras, die jeweils eine Halbtotale aufnehmen. Die Totale wird von einer feststehenden Kamera aufgenommen. Die Aufnahmen werden mit den entwickelten Methoden analysiert und daraus automatisiert ein Drehbuch erzeugt. Dieses gibt die Timecodes der jeweils ausgewählten Kamera sowie den gecroppten Bildausschnitt, z. B. einer Nahaufnahme (Zielformat 720p) aus. Abschließend werden die Ergebnisse zweier subjektiver Online-Studien vorgestellt, die der Validierung der erarbeiteten Methodik dienen. Die Ergebnisse deuten darauf hin, dass Testpersonen die größere Anzahl von Perspektiven im entstehenden Gesamtzusammenschnitt der Produktion als Mehrwert empfinden.
Arévalo Arboleda, Stephanie; Kunert, Christian; Hartbrich, Jakob; Schneiderwind, Christian; Diao, Chenyao; Gerhardt, Christoph; Surdu, Tatiana; Weidner, Florian; Broll, Wolfgang; Stephan, Werner; Raake, Alexander
Beyond looks: a study on agent movement and audiovisual spatial coherence in augmented reality. - In: IEEE Xplore digital library, ISSN 2473-2001, (2024), S. 502-512

The appearance of virtual humans (avatars and agents) has been widely explored in immersive environments. However, virtual humans’ movements and associated sounds in real-world interactions, particularly in Augmented Reality (AR), are yet to be explored. In this paper, we investigate the influence of three distinct movement patterns (circle, side-to-side, and standing), two rendering styles (realistic and cartoon), and two types of audio (spatial audio and non-spatial audio) on emotional responses, social presence, appearance and behavior plausibility, audiovisual coherence, and auditory plausibility. To enable that, we conducted a study (N=36) where participants observed an agent reciting a short fictional story. Our results indicate an effect of the rendering style and the type of movement on the subjective perception of the agents behaving in an AR environment. Participants reported higher levels of excitement when they observed the realistic agent moving in a circle compared to the cartoon agent or the other two movement patterns. Moreover, we found an influence of agent’s movement pattern on social presence and higher appearance and behavior plausibility for the realistic rendering style. Regarding audiovisual spatial coherence, we found an influence of rendering style and type of audio only for the cartoon agent. Additionally, the spatial audio was perceived as more plausible than non-spatial audio. Our findings suggest that aligning realistic rendering styles with realistic auditory experiences may not be necessary for 1-1 listening experiences with moving sources. However, movement patterns of agents influence excitement and social presence in passive unidirectional communication scenarios.
Immohr, Felix; Rendle, Gareth; Lammert, Anton; Neidhardt, Annika; Meyer Zur Heyde, Victoria; Fröhlich, Bernd; Raake, Alexander
Evaluating the effect of binaural auralization on audiovisual plausibility and communication behavior in virtual reality. - In: IEEE Xplore digital library, ISSN 2473-2001, (2024), S. 849-858

Spatial audio representations have been shown to positively impact user experience in traditional, non-immersive communication media. While spatial audio also contributes to presence in single-user immersive VR, its impact in virtual communication scenarios has not yet been fully understood. This work aims to further investigate which communication scenarios benefit from spatial audio representations. We present a study in which pairs of interlocutors undertake a collaborative task in an audiovisual Virtual Environment (VE) under different auralization and scene arrangement conditions. The novel task is designed to encourage simultaneous conversation and movement, with the aim of increasing the relevance of spatial hearing. Results are obtained through questionnaires measuring social presence and plausibility, as well as through conversational and behavioral analysis. Although participants are shown to favor binaural auralization over diotic audio in a direct active-listening comparison, no significant differences in social presence, plausibility, or communication behavior could be found. Our results suggest that spatial audio may not affect user experience in dyadic communication scenarios where spatial auditory information is not directly relevant to the considered task.
Arévalo Arboleda, Stephanie; Conde, Melisa; Döring, Nicola; Raake, Alexander
Introducing personas and scenarios to highlight older adults' perspectives on robot-mediated communication. - In: HRI '24 companion, (2024), S. 209-213

Little is known about the expectations of older adults (60+ years old) in robot-mediated communication when leaving aside care-related activities. To bridge this gap, we carried out 30 semi-structured interviews with older adults to explore their experiences and expectations related to technology-mediated communication. We present the results of the collected data through personas that portray three archetype users, Conny Connected, Stephan Skeptical, and Thomas TechFan. These personas are presented in a specific communication scenario with individual goals that go beyond mere communication, such as the desire for closeness (Conny Connected), a problem-free experience (Stephan Skeptical), and exploring affordances of telepresence robots (Thomas Tech-Fan). Also, we provide two considerations when aiming at positive experiences for older adults with robots: balance generalizable aspects and individual needs and identify and challenge preconceptions of telepresence robots.
Döring, Nicola; Mikhailova, Veronika; Brandenburg, Karlheinz; Broll, Wolfgang; Groß, Horst-Michael; Werner, Stephan; Raake, Alexander
Digital media in intergenerational communication: status quo and future scenarios for the grandparent-grandchild relationship. - In: Universal access in the information society, ISSN 1615-5297, Bd. 23 (2024), 1, S. 379-394

Communication technologies play an important role in maintaining the grandparent-grandchild (GP-GC) relationship. Based on Media Richness Theory, this study investigates the frequency of use (RQ1) and perceived quality (RQ2) of established media as well as the potential use of selected innovative media (RQ3) in GP-GC relationships with a particular focus on digital media. A cross-sectional online survey and vignette experiment were conducted in February 2021 among N = 286 university students in Germany (mean age 23 years, 57% female) who reported on the direct and mediated communication with their grandparents. In addition to face-to-face interactions, non-digital and digital established media (such as telephone, texting, video conferencing) and innovative digital media, namely augmented reality (AR)-based and social robot-based communication technologies, were covered. Face-to-face and phone communication occurred most frequently in GP-GC relationships: 85% of participants reported them taking place at least a few times per year (RQ1). Non-digital established media were associated with higher perceived communication quality than digital established media (RQ2). Innovative digital media received less favorable quality evaluations than established media. Participants expressed doubts regarding the technology competence of their grandparents, but still met innovative media with high expectations regarding improved communication quality (RQ3). Richer media, such as video conferencing or AR, do not automatically lead to better perceived communication quality, while leaner media, such as letters or text messages, can provide rich communication experiences. More research is needed to fully understand and systematically improve the utility, usability, and joy of use of different digital communication technologies employed in GP-GC relationships.
Keller, Dominik; Ramachandra Rao, Rakesh Rao; Göring, Steve; Raake, Alexander
The effect of viewing distances on 4K and 8K HDR video quality perception. - In: IEEE Xplore digital library, ISSN 2473-2001, (2023), S. 123-130

Ongoing research in the field of capture, coding and display technology and human vision has explored the advantages of high resolution up to 8K (UHD-2) considering perceived quality. One of the crucial elements impacting users’ perception of video quality is the viewing distance. As a result, the presented study employs a subjective evaluation to investigate the perceptual benefits offered by 8K or upscaled 4K in comparison to the native 4K (UHD-1) resolution in the context of HDR videos. The subjective test uses 7 distinct viewing distances, ranging from 0.5H to 3H, with H representing the display height. The findings of the study reveal a consistent trend: the increased video quality of 8K HDR against 4K HDR content decreases with distance, on average. While there are bigger improvements for close distances, beyond 2H the quality difference was very little or zero, depending on content. In general, the degree of enhancement is contingent on the spatial complexity of the content. Additionally, it is found that, on average, subjects prefer to sit at a distance of 2.07H. No significant difference in the preferred viewing distance was found when asked before and after the study.
Ramachandra Rao, Rakesh Rao; Göring, Steve; Raake, Alexander
Adaptation of bitstream-based video quality models for image quality assessment. - In: IEEE Xplore digital library, ISSN 2473-2001, (2023), S. 230-231

In recent years, video-codec-based image codecs, such as e.g. HEF, AVF, etc., have been increasingly used to compress images. Hence, there is a potential to use video quality prediction models for the evaluation of image quality. Bitstream-based models show promising results for video quality prediction, therefore, we investigate the applicability of such models for the case of image quality in this paper. For this purpose, we selected ITU-T Rec. P.1204.3 and its Mode 0 variant also known as AVQBits|M3 and AVQBits|M0 respectively for the evaluation, because they are computationally less complex and do not need a reference image. These models are evaluated using a publicly available dataset consisting of a total of 371 images of resolutions between 144 × 144 pixels to 2160 × 2160 pixels with subjective annotations. The results show that both the considered models perform well on the used dataset with a Pearson correlation of 0.958 and Root Mean Square Error (RMSE) of 0.319 (on a 1 to 5 Absolute Category Rating (ACR) scale) for the AVQBits|M3 model and a Pearson correlation of 0.942 and RMSE of 0.377 for the AVQBits|M0 model.
Fremerey, Stephan; Zaman, Raja Faseeh Uz; Ashraf, Touseef; Ramachandra Rao, Rakesh Rao; Göring, Steve; Raake, Alexander
Towards evaluation of immersion, visual comfort and exploration behaviour for non-stereoscopic and stereoscopic 360˚ videos. - In: IEEE Xplore digital library, ISSN 2473-2001, (2023), S. 131-138

Immersion, visual comfort, and exploration behaviour are important aspects that affect the overall quality of experience for 360˚ videos. To analyze the benefits of stereoscopic and non-stereoscopic 360˚ videos in terms of these factors, we created a dataset and conducted a subjective study. The dataset consists of five different high-resolution 8 K omnidirectional videos as stereoscopic and non-stereoscopic variants. The videos have been recorded using a Kandao Obsidian Pro camera. For the comparison, we designed and performed a subjective test with 30 participants. Here, each subject watched both HEVC (libx265) encoded versions of the source video and rated the videos viewed regarding presence, visual comfort, and quality. The results indicate that with the test protocol followed, non-stereoscopic video viewing leads to slightly better presence, visual comfort, and quality ratings compared to the stereoscopic variants. Further, the stereoscopic 360˚ videos may suffer from visual artefacts potentially leading to lower video quality and further lower quality of experience results. The exploration behaviour was found to be very similar for both non-stereoscopic and stereoscopic video viewing. Overall, it can be concluded that there is a slight tendency for non-stereoscopic video viewing to be preferred over stereoscopic video viewing. The dataset is made publicly available with the paper and includes both variants of all source videos along with the subjective data, and behaviour data, following an open-science approach.
Singla, Ashutosh; Wang, Shuang; Göring, Steve; Ramachandra Rao, Rakesh Rao; Viola, Irene; Cesar, Pablo; Raake, Alexander
Subjective quality evaluation of point clouds using remote testing. - In: IXR '23, (2023), S. 21-28

Subjective quality assessment serves as a method to evaluate the perceptual quality of 3D point clouds. These evaluations can be conducted using lab-based or remote or crowdsourcing tests. The lab-based tests are time-consuming and less cost-effective. As an alternative, remote or crowd tests can be used, offering a time and cost-friendly approach. Remote testing enables larger and more diverse participant pools. However, this raises the question of its applicability due to variability in participants' display devices and environments for the evaluation of the point cloud. In this paper, the focus is on investigating the applicability of remote testing by using the Absolute Category Rating (ACR) test method for assessing the subjective quality of point clouds in different tests. We compare the results of lab and remote tests by replicating lab-based tests. In the first test, we assess the subjective quality of a static point cloud geometry for two different types of geometrical degradations, namely Gaussian noise, and octree-pruning. In the second test, we compare the performance of two different compression methods (G-PCC and V-PCC) to assess the subjective quality of coloured point cloud videos. Based on the results obtained using correlation and Standard deviation of Opinion Scores (SOS) analysis, the remote testing paradigm can be used for evaluating point clouds.