Modeling of an automatic vision mixer with human characteristics for multi-camera theater recordings. - In: IEEE access, ISSN 2169-3536, Bd. 11 (2023), S. 18714-18726
A production process using high-resolution cameras can be used for multi-camera recordings of theater performances or other stage performances. One approach to automate the generation of suitable image cuts could be to focus on speaker changes so that the person who is speaking is shown in the generated cut. However, these image cuts can appear static and robotic if they are set too precisely. Therefore, the characteristics and habits of professional vision mixers (persons who operate the vision mixing desk) during the editing process are investigated in more detail in order to incorporate them into an automation process. The characteristic features of five different vision mixers are examined, which were used under almost identical recording conditions for theatrical cuts in TV productions. The cuts are examined with regard to their temporal position in relation to pauses in speech, which take place during speaker changes on stage. It is shown that different professional vision mixers set the cuts individually differently before, in or after the pauses in speech. Measured are differences on average up to 0.3 seconds. From the analysis of the image cuts, an approach for a model is developed in which the individual characteristics of a vision mixer can be set. With the help of this novel model, a more human appearance can be given to otherwise exact and robotic cuts, when automating image cuts.
Effects of binaural classroom noise scenarios on primary school children's speech perception and listening comprehension. - In: 51st International Congress and Exposition on Noise Control Engineering (INTER-NOISE 2022), (2023), S. 3214-3220
Investigating different cueing methods for auditory selective attention in virtual reality. - Berlin : Deutsche Gesellschaft für Akustik e.V.. - 1 Online-Ressource (4 Seiten)Online-Ausgabe: DAGA 2022 : 48. Jahrestagung für Akustik, 21.-24. März 2022, Stuttgart und Online, Seiten/Artikel-Nr: 1173-1176
An audio-only paradigm for investigating auditory selective attention (ASA) has previously been transferred into a classroom-type audio-visual virtual reality (VR) environment. Due to the paradigm structure, the participants were only focusing on a specific area of the VR environment during the entire experiment. In a more realistic scenario, participants are expected to interact with the scene. Therefore, this study investigates new cueing methods that may reduce the focus on one point in the virtual world and allow for further development of a close-to-real life scenario.
Examining the auditory selective attention switch in a child-suited virtual reality classroom environment. - In: International journal of environmental research and public health, ISSN 1660-4601, Bd. 19 (2022), 24, 16569, S. 1-20
The ability to focus ones attention in different acoustical environments has been thoroughly investigated in the past. However, recent technological advancements have made it possible to perform laboratory experiments in a more realistic manner. In order to investigate close-to-real-life scenarios, a classroom was modeled in virtual reality (VR) and an established paradigm to investigate the auditory selective attention (ASA) switch was translated from an audio-only version into an audiovisual VR setting. The new paradigm was validated with adult participants in a listening experiment, and the results were compared to the previous version. Apart from expected effects such as switching costs and auditory congruency effects, which reflect the robustness of the overall paradigm, a difference in error rates between the audio-only and the VR group was found, suggesting enhanced attention in the new VR setting, which is consistent with recent studies. Overall, the results suggest that the presented VR paradigm can be used and further developed to investigate the voluntary auditory selective attention switch in a close-to-real-life classroom scenario.
Differential effects of task-irrelevant monaural and binaural classroom scenarios on children's and adults' speech perception, listening comprehension, and visual-verbal short-term memory. - In: International journal of environmental research and public health, ISSN 1660-4601, Bd. 19 (2022), 23, 15998, S. 1-17
Most studies investigating the effects of environmental noise on children’s cognitive performance examine the impact of monaural noise (i.e., same signal to both ears), oversimplifying multiple aspects of binaural hearing (i.e., adequately reproducing interaural differences and spatial information). In the current study, the effects of a realistic classroom-noise scenario presented either monaurally or binaurally on tasks requiring processing of auditory and visually presented information were analyzed in children and adults. In Experiment 1, across age groups, word identification was more impaired by monaural than by binaural classroom noise, whereas listening comprehension (acting out oral instructions) was equally impaired in both noise conditions. In both tasks, children were more affected than adults. Disturbance ratings were unrelated to the actual performance decrements. Experiment 2 revealed detrimental effects of classroom noise on short-term memory (serial recall of words presented pictorially), which did not differ with age or presentation mode (monaural vs. binaural). The present results add to the evidence for detrimental effects of noise on speech perception and cognitive performance, and their interactions with age, using a realistic classroom-noise scenario. Binaural simulations of real-world auditory environments can improve the external validity of studies on the impact of noise on children’s and adults’ learning.
Digital media in intergenerational communication: status quo and future scenarios for the grandparent-grandchild relationship. - In: Universal access in the information society, ISSN 1615-5297, Bd. 0 (2022), 0, insges. 16 S.
Communication technologies play an important role in maintaining the grandparent-grandchild (GP-GC) relationship. Based on Media Richness Theory, this study investigates the frequency of use (RQ1) and perceived quality (RQ2) of established media as well as the potential use of selected innovative media (RQ3) in GP-GC relationships with a particular focus on digital media. A cross-sectional online survey and vignette experiment were conducted in February 2021 among N = 286 university students in Germany (mean age 23 years, 57% female) who reported on the direct and mediated communication with their grandparents. In addition to face-to-face interactions, non-digital and digital established media (such as telephone, texting, video conferencing) and innovative digital media, namely augmented reality (AR)-based and social robot-based communication technologies, were covered. Face-to-face and phone communication occurred most frequently in GP-GC relationships: 85% of participants reported them taking place at least a few times per year (RQ1). Non-digital established media were associated with higher perceived communication quality than digital established media (RQ2). Innovative digital media received less favorable quality evaluations than established media. Participants expressed doubts regarding the technology competence of their grandparents, but still met innovative media with high expectations regarding improved communication quality (RQ3). Richer media, such as video conferencing or AR, do not automatically lead to better perceived communication quality, while leaner media, such as letters or text messages, can provide rich communication experiences. More research is needed to fully understand and systematically improve the utility, usability, and joy of use of different digital communication technologies employed in GP-GC relationships.
Audiovisual database with 360˚ video and higher-order Ambisonics audio for perception, cognition, behavior, and QoE evaluation research. - In: 2022 14th International Conference on Quality of Multimedia Experience (QoMEX), (2022), insges. 6 S.
Research into multi-modal perception, human cognition, behavior, and attention can benefit from high-fidelity content that may recreate real-life-like scenes when rendered on head-mounted displays. Moreover, aspects of audiovisual perception, cognitive processes, and behavior may complement questionnaire-based Quality of Experience (QoE) evaluation of interactive virtual environments. Currently, there is a lack of high-quality open-source audiovisual databases that can be used to evaluate such aspects or systems capable of reproducing high-quality content. With this paper, we provide a publicly available audiovisual database consisting of twelve scenes capturing real-life nature and urban environments with a video resolution of 7680×3840 at 60 frames-per-second and with 4th-order Ambisonics audio. These 360˚ video sequences, with an average duration of 60 seconds, represent real-life settings for systematically evaluating various dimensions of uni-/multi-modal perception, cognition, behavior, and QoE. The paper provides details of the scene requirements, recording approach, and scene descriptions. The database provides high-quality reference material with a balanced focus on auditory and visual sensory information. The database will be continuously updated with additional scenes and further metadata such as human ratings and saliency information.
Modeling of energy consumption and streaming video QoE using a crowdsourcing dataset. - In: 2022 14th International Conference on Quality of Multimedia Experience (QoMEX), (2022)
In the past decade, we have witnessed an enormous growth in the demand for online video services. Recent studies estimate that nowadays, more than 1% of the global greenhouse gas emissions can be attributed to the production and use of devices performing online video tasks. As such, research on the true power consumption of devices and their energy efficiency during video streaming is highly important for a sustainable use of this technology. At the same time, over-the-top providers strive to offer high-quality streaming experiences to satisfy user expectations. Here, energy consumption and QoE partly depend on the same system parameters. Hence, a joint view is needed for their evaluation. In this paper, we perform a first analysis of both end-user power efficiency and Quality of Experience of a video streaming service. We take a crowdsourced dataset comprising 447,000 streaming events from YouTube and estimate both the power consumption and perceived quality. The power consumption is modeled based on previous work which we extended towards predicting the power usage of different devices and codecs. The user-perceived QoE is estimated using a standardized model. Our results indicate that an intelligent choice of streaming parameters can optimize both the QoE and the power efficiency of the end user device. Further, the paper discusses limitations of the approach and identifies directions for future research.
Can communication technologies reduce loneliness and social isolation in older people? : a scoping review of reviews. - In: International journal of environmental research and public health, ISSN 1660-4601, Bd. 19 (2022), 18, 11310, S. 1-20
Background: Loneliness and social isolation in older age are considered major public health concerns and research on technology-based solutions is growing rapidly. This scoping review of reviews aims to summarize the communication technologies (CTs) (review question RQ1), theoretical frameworks (RQ2), study designs (RQ3), and positive effects of technology use (RQ4) present in the research field. Methods: A comprehensive multi-disciplinary, multi-database literature search was conducted. Identified reviews were analyzed according to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) framework. A total of N = 28 research reviews that cover 248 primary studies spanning 50 years were included. Results: The majority of the included reviews addressed general internet and computer use (82% each) (RQ1). Of the 28 reviews, only one (4%) worked with a theoretical framework (RQ2) and 26 (93%) covered primary studies with quantitative-experimental designs (RQ3). The positive effects of technology use were shown in 55% of the outcome measures for loneliness and 44% of the outcome measures for social isolation (RQ4). Conclusion: While research reviews show that CTs can reduce loneliness and social isolation in older people, causal evidence is limited and insights on innovative technologies such as augmented reality systems are scarce.
AVQBits-adaptive video quality model based on bitstream information for various video applications. - In: IEEE access, ISSN 2169-3536, Bd. 10 (2022), S. 80321-80351
The paper presents AVQBits, a versatile, bitstream-based video quality model. It can be applied in several contexts such as video service monitoring, evaluation of video encoding quality, of gaming video QoE, and even of omnidirectional video quality. In the paper, it is shown that AVQBits predictions closely match video quality ratings obained in various subjective tests with human viewers, for videos up to 4K-UHD resolution (Ultra-High Definition, 3840 x 2180 pixels) and framerates up 120 fps. With the different variants of AVQBits presented in the paper, video quality can be monitored either at the client side, in the network or directly after encoding. The no-reference AVQBits model was developed for different video services and types of input data, reflecting the increasing popularity of Video-on-Demand services and widespread use of HTTP-based adaptive streaming. At its core, AVQBits encompasses the standardized ITU-T P.1204.3 model, with further model instances that can either have restricted or extended input information, depending on the application context. Four different instances of AVQBits are presented, that is, a Mode 3 model with full access to the bitstream, a Mode 0 variant using only metadata such as codec type, framerate, resoution and bitrate as input, a Mode 1 model using Mode 0 information and frame-type and -size information, and a Hybrid Mode 0 model that is based on Mode 0 metadata and the decoded video pixel information. The models are trained on the authors’ own AVT-PNATS-UHD-1 dataset described in the paper. All models show a highly competitive performance by using AVT-VQDB-UHD-1 as validation dataset, e.g., with the Mode 0 variant yielding a value of 0.890 Pearson Correlation, the Mode 1 model of 0.901, the hybrid no-reference mode 0 model of 0.928 and the model with full bitstream access of 0.942. In addition, all four AVQBits variants are evaluated when applying them out-of-the-box to different media formats such as 360˚ video, high framerate (HFR) content, or gaming videos. The analysis shows that the ITU-T P.1204.3 and Hybrid Mode 0 instances of AVQBits for the considered use-cases either perform on par with or better than even state-of-the-art full reference, pixel-based models. Furthermore, it is shown that the proposed Mode 0 and Mode 1 variants outperform commonly used no-reference models for the different application scopes. Also, a long-term integration model based on the standardized ITU-T P.1203.3 is presented to estimate ratings of overall audiovisual streaming Quality of Experience (QoE) for sessions of 30 s up to 5 min duration. In the paper, the AVQBits instances with their per-1-sec score output are evaluated as the video quality component of the proposed long-term integration model. All AVQBits variants as well as the long-term integration module are made publicly available for the community for further research.