Publications of the Department of Audiovisual Technology

The following list (automatically generated by the University Library) contains the publications from the year 2016. The publications up to the year 2015 can be found on an extra page.

Note: If you want to search through all the publications, select "Show All" and then you can use the browser search with Ctrl+F.

Results: 165
Created on: Wed, 01 May 2024 23:03:12 +0200 in 0.1007 sec


Ramachandra Rao, Rakesh Rao; Göring, Steve; Raake, Alexander
Towards high resolution video quality assessment in the crowd. - In: 2021 13th International Conference on Quality of Multimedia Experience (QoMEX), (2021), S. 1-6

Assessing high resolution video quality is usually performed using controlled, defined, and standardized lab tests. This method of acquiring human ratings in a lab environment is time-consuming and may also not reflect the typical viewing conditions. To overcome these disadvantages, crowd testing paradigms have been used for assessing video quality in general. Crowdsourcing-based tests enable a more diverse set of participants and also use a realistic hardware setup and viewing environment of typical users. However, obtaining valid ratings for high-resolution video quality poses several problems. Example issues are that streaming of such high-bandwidth content may not be feasible for some users, or that crowd participants lack an appropriate, high-resolution display device. In this paper, we propose a method to overcome such problems and conduct a crowd test using for higher resolution content by using a 540 p cutout from the center of the original 2160p video. To this aim, we use the videos from Test#1 of the publicly available dataset AVT-VQDB-UHD-1, which contains videos up to a resolution of UHD-1. The quality-labels available from that lab test allow us to compare the results with the crowd test presented in this paper. It is shown that there is a Pearson correlation of 0.96 between the lab and crowd tests and hence such crowd tests can reliably be used for video assessment of higher resolution content. The overall implementation of the crowd test framework and the results are made publicly available for further research and reproducibility1.



https://doi.org/10.1109/QoMEX51781.2021.9465425
Keller, Dominik; Vaalgamaa, Markus; Paajanen, Erkki; Ramachandra Rao, Rakesh Rao; Göring, Steve; Raake, Alexander
Groovability: using groove as a novel measure for audio QoE with the example of smartphones. - In: 2021 13th International Conference on Quality of Multimedia Experience (QoMEX), (2021), S. 13-18

Groove in music is a fundamental part of why humans entrain to it and enjoy it. Smartphones have become an important medium to listen to music. Especially when being with others, loudspeaker playback may be the method of choice. However, due to the physical limits of acoustics, for loudspeaker playback, smartphones are equipped with sub-optimal audio capabilities. Therefore, it is desirable to measure Quality of Experience (QoE) of music played on smartphones. While audio playback is often assessed in terms of sound quality, the aim of this work is to address QoE in terms of the meaning or effect that the audio has on the listener. A key component for the meaning of popular music is groove. Hence, in this paper, we study groovability, that is, the ability of a piece of audio technology to convey groove. To instantiate our novel audio QoE assessment method, we apply it to music played by 8 different smartphones. For this purpose, looped 4-bar loudness-aligned recordings from 24 music pieces of different intrinsic groove were played back on the different smartphones. Our test method uses a multi-stimulus comparison with synchronized playback capability. A total of 62 subjects evaluated groovability using two stimulus subsets. It was found that the proposed methodology is highly effective to distinguish between the groovability provided by the considered phones. In addition, a reduced-reference model is proposed to predict groovability, using a set of both acoustics-and music-groove related features. In our formal validation on unknown data, the model is shown to provide good prediction performance with a Pearson correlation of greater than 0.90.



https://doi.org/10.1109/QoMEX51781.2021.9465440
Robitza, Werner; Ramachandra Rao, Rakesh Rao; Göring, Steve; Raake, Alexander
Impact of spatial and temporal information on video quality and compressibility. - In: 2021 13th International Conference on Quality of Multimedia Experience (QoMEX), (2021), S. 65-68

Spatial Information (SI) and Temporal Information (TI) are frequently-used metrics to classify the spatiotemporal complexity of video content. However, they are mostly used on original video sources, and their impact on actual encoding efficiency is not known. In this paper, we propose a method to determine the compressibility of video sources, that is, how good video quality can be under a given bitrate constraint. We show how various aggregations of SI and TI correlate with compressibility scores obtained from a public dataset of H.264/HEVCN P9 content. We observe that the minimum TI value as well as an existing criticality metric from the literature are good indicators for compressibility, as judged by subjective ratings as well as VMAF and P.1204.3 objective scores.



https://doi.org/10.1109/QoMEX51781.2021.9465452
Ávila Soto, Mauro; Barzegar, Najmeh
I know you are looking to me: enabling eye-gaze communication between small children and parents with visual impairments. - In: AH 2021, (2021), 9, insges. 4 S.

Eye-gaze interaction is a relevant mean of communication from the early infancy. The bonding between infants and their care-takers is Strengthened through eye contact. Parents with visual impairments are excluded of this type of interaction with their children. Thus, nowadays computer vision technologies allow to track eye-gaze with different purposes, even users with visual impairments are enable to recognize faces. This work starts from the following research question: Can current available eye tracking solutions aid parents with visual impairments to have eye-gaze interaction with their young infants children? We devised a software prototype based on currently available eye tracking technologies which was tested with three sets of visually impaired parents and their young infant children to explore the possibility to assist those parents to have eye-gaze interaction with their children. The experience was documented as semi-structured interviews which were processed with a content analysis technique. The approach got positive feedback in the functionality and Emotional interaction aspects.



https://doi.org/10.1145/3460881.3460883
Singla, Ashutosh; Göring, Steve; Keller, Dominik; Ramachandra Rao, Rakesh Rao; Fremerey, Stephan; Raake, Alexander
Assessment of the simulator sickness questionnaire for omnidirectional videos. - In: 2021 IEEE Conference on Virtual Reality and 3D User Interfaces, (2021), S. 198-206

Virtual Reality/360˚ videos provide an immersive experience to users. Besides this, 360˚ videos may lead to an undesirable effect when consumed with Head-Mounted Displays (HMDs), referred to as simulator sickness/cybersickness. The Simulator Sickness Questionnaire (SSQ) is the most widely used questionnaire for the assessment of simulator sickness. Since the SSQ with its 16 questions was not designed for 360˚ video related studies, our research hypothesis in this paper was that it may be simplified to enable more efficient testing for 360˚ video. Hence, we evaluate the SSQ to reduce the number of questions asked from subjects, based on six different previously conducted studies. We derive the reduced set of questions from the SSQ using Principal Component Analysis (PCA) for each test. Pearson Correlation is analysed to compare the relation of all obtained reduced questionnaires as well as two further variants of SSQ reported in the literature, namely Virtual Reality Sickness Questionnaire (VRSQ) and Cybersickness Questionnaire (CSQ). Our analysis suggests that a reduced questionnaire with 9 out of 16 questions yields the best agreement with the initial SSQ, with less than 44% of the initial questions. Exploratory Factor Analysis (EFA) shows that the nine symptom-related attributes determined as relevant by PCA also appear to be sufficient to represent the three dimensions resulting from EFA, namely, Uneasiness, Visual Discomfort and Loss of Balance. The simplified version of the SSQ has the potential to be more efficiently used than the initial SSQ for 360˚ video by focusing on the questions that are most relevant for individuals, shortening the required testing time.



https://doi.org/10.1109/VR50410.2021.00041
Göring, Steve; Ramachandra Rao, Rakesh Rao; Feiten, Bernhard; Raake, Alexander
Modular framework and instances of pixel-based video quality models for UHD-1/4K. - In: IEEE access, ISSN 2169-3536, Bd. 9 (2021), S. 31842-31864

https://doi.org/10.1109/ACCESS.2021.3059932
Göring, Steve; Steger, Robert; Ramachandra Rao, Rakesh Rao; Raake, Alexander
Automated genre classification for gaming videos. - In: IEEE 22nd International Workshop on Multimedia Signal Processing, (2020), insges. 6 S.

Besides classical videos, videos of gaming matches, entire tournaments or individual sessions are streamed and viewed all over the world. The increased popularity of Twitch or YoutubeGaming shows the importance of additional research on gaming videos. One important pre-condition for live or offline encoding of gaming videos is the knowledge of game-specific properties. Knowing or automatically predicting the genre of a gaming video enables a more advanced and optimized encoding pipeline for streaming providers, especially because gaming videos of different genres vary a lot from classical 2D video, e.g., considering the CGI content, textures or camera motion. We describe several computer-vision based features that are optimized for speed and motivated by characteristics of popular games, to automatically predict the genre of a gaming video. Our prediction system uses random forest and gradient boosting trees as underlying machine-learning techniques, combined with feature selection. For the evaluation of our approach we use a dataset that was built as part of this work and consists of recorded gaming sessions for 6 genres from Twitch. In total 351 different videos are considered. We show that our prediction approach shows a good performance in terms of f1-score. Besides the evaluation of different machine-learning approaches, we additionally investigate the influence of the hyper-parameters for the algorithms.



https://doi.org/10.1109/MMSP48831.2020.9287122
Singla, Ashutosh; Fremerey, Stephan; Hofmeyer, Frank; Robitza, Werner; Raake, Alexander
Quality assessment protocols for omnidirectional video quality evaluation. - In: Electronic imaging, ISSN 2470-1173, Bd. 32 (2020), 11, art00003, S. 069-1-069-6

In recent years, with the introduction of powerful HMDs such as Oculus Rift, HTC Vive Pro, the QoE that can be achieved with VR/360˚ videos has increased substantially. Unfortunately, no standardized guidelines, methodologies and protocols exist for conducting and evaluating - the quality of 360˚ videos in tests with human test subjects. In this paper, we present a set of test protocols for the evaluation of quality of 360˚ videos using HMDs. To this aim, we review the state-of-the-art with respect to the assessment of 360˚ videos summarizes their results. - Also, we summarize the methodological approaches and results taken for different subjective experiments at our lab under different contextual conditions. In the first two experiments 1a and 1b, the performance of two different subjective test methods, Double-Stimulus Impairment Scale (DSIS) - and Modified Absolute Category Rating (M-ACR) was compared under different contextual conditions. In experiment 2, the performance of three different subjective test methods, DSIS, M-ACR and Absolute Category Rating (ACR) was compared this time without varying the contextual conditions. Building - on the reliability and general applicability of the procedure across different tests, a methodological framework for 360˚ video quality assessment is presented in this paper. Besides video or media quality judgments, the procedure comprises the assessment of presence and simulator sickness, - for which different methods were compared. Further, the accompanying head-rotation data can be used to analyze both content- and quality-related behavioural viewing aspects. Based on the results, the implications of different contextual settings are discussed.



https://doi.org/10.2352/ISSN.2470-1173.2020.11.HVEI-069
Zadtootaghaj, Saman; Barman, Nabajeet; Ramachandra Rao, Rakesh Rao; Göring, Steve; Martini, Maria G.; Raake, Alexander; Möller, Sebastian
DEMI: deep video quality estimation model using perceptual video quality dimensions. - In: IEEE 22nd International Workshop on Multimedia Signal Processing, (2020), insges. 6 S.

Existing works in the field of quality assessment focus separately on gaming and non-gaming content. Along with the traditional modeling approaches, deep learning based approaches have been used to develop quality models, due to their high prediction accuracy. In this paper, we present a deep learning based quality estimation model considering both gaming and non-gaming videos. The model is developed in three phases. First, a convolutional neural network (CNN) is trained based on an objective metric which allows the CNN to learn video artifacts such as blurriness and blockiness. Next, the model is fine-tuned based on a small image quality dataset using blockiness and blurriness ratings. Finally, a Random Forest is used to pool frame-level predictions and temporal information of videos in order to predict the overall video quality. The light-weight, low complexity nature of the model makes it suitable for real-time applications considering both gaming and non-gaming content while achieving similar performance to existing state-of-the-art model NDNetGaming. The model implementation for testing is available on GitHub.



https://doi.org/10.1109/MMSP48831.2020.9287080
Fremerey, Stephan; Göring, Steve; Ramachandra Rao, Rakesh Rao; Huang, Rachel; Raake, Alexander
Subjective test dataset and meta-data-based models for 360˚ streaming video quality. - In: IEEE 22nd International Workshop on Multimedia Signal Processing, (2020), insges. 6 S.

During the last years, the number of 360˚ videos available for streaming has rapidly increased, leading to the need for 360˚ streaming video quality assessment. In this paper, we report and publish results of three subjective 360˚ video quality tests, with conditions used to reflect real-world bitrates and resolutions including 4K, 6K and 8K, resulting in 64 stimuli each for the first two tests and 63 for the third. As playout device we used the HTC Vive for the first and HTC Vive Pro for the remaining two tests. Video-quality ratings were collected using the 5-point Absolute Category Rating scale. The 360˚ dataset provided with the paper contains the links of the used source videos, the raw subjective scores, video-related meta-data, head rotation data and Simulator Sickness Questionnaire results per stimulus and per subject to enable reproducibility of the provided results. Moreover, we use our dataset to compare the performance of state-of-the-art full-reference quality metrics such as VMAF, PSNR, SSIM, ADM2, WS-PSNR and WS-SSIM. Out of all metrics, VMAF was found to show the highest correlation with the subjective scores. Further, we evaluated a center-cropped version of VMAF ("VMAF-cc") that showed to provide a similar performance as the full VMAF. In addition to the dataset and the objective metric evaluation, we propose two new video-quality prediction models, a bitstream meta-data-based model and a hybrid no-reference model using bitrate, resolution and pixel information of the video as input. The new lightweight models provide similar performance as the full-reference models while enabling fast calculations.



https://doi.org/10.1109/MMSP48831.2020.9287065