Publications of the Department of Audiovisual Technology

The following list (automatically generated by the University Library) contains the publications from the year 2016. The publications up to the year 2015 can be found on an extra page.

Note: If you want to search through all the publications, select "Show All" and then you can use the browser search with Ctrl+F.

Results: 162
Created on: Fri, 19 Apr 2024 23:03:42 +0200 in 0.0665 sec


Göring, Steve; Raake, Alexander
Rule of thirds and simplicity for image aesthetics using deep neural networks. - In: IEEE 23rd International Workshop on Multimedia Signal Processing, (2021), insges. 6 S.

Considering the increasing amount of photos being uploaded to sharing platforms, a proper evaluation of photo appeal or aesthetics is required. For appealing images several "rules of thumb" have been established, e.g., the rule of thirds and simplicity. We handle rule of thirds and simplicity as binary classification problems with a deep learning based image processing pipeline. Our pipeline uses a pre-processing step, a pre-trained baseline deep neural network (DNN) and post-processing. For each of the rules, we re-train 17 pre-trained DNN models using transfer learning. Our results for publicly available datasets show that the ResNet152 DNN is best for rule of thirds prediction and DenseNet121 is best for simplicity with an accuracy of around 0.84 and 0.94 respectively. In addition to the datasets for both classifications, five experts annotated another dataset with ≈ 1100 images and we evaluate the best performing models. Results show that the best performing models have an accuracy of 0.67 for rule of thirds and 0.79 for image simplicity. Both accuracy results are within the range of pairwise accuracy of expert annotators. However, it further indicates that there is a high subjective influence for both of the considered rules.



https://doi.org/10.1109/MMSP53017.2021.9733554
Göring, Steve; Ramachandra Rao, Rakesh Rao; Fremerey, Stephan; Raake, Alexander
AVrate Voyager: an open source online testing platform. - In: IEEE 23rd International Workshop on Multimedia Signal Processing, (2021), insges. 6 S.

Subjective testing is an integral part of many research fields considering, e.g., human perception. For this purpose, lab tests are a popular approach to gather ratings for subjective evaluations. However, not in all cases controlled lab tests can be performed, either in cases where no labs are existing, accessible or it may be disallowed to use them. For this reason, online tests, e.g., using crowdsourcing are supposed to be an alternative approach for traditional lab tests. We describe in the following paper a framework to implement such online tests for audio, video, and image-related evaluations or questionnaires. Our framework AVrate Voyager builds upon previously developed frameworks for lab tests including the experience with them. AVrate Voyager uses scalable web technologies to implement a test framework, this ensures that it will be running reliably. In addition, we added strategies for pre-caching to avoid additional influence for play-out, e.g. in the case of video testing. We analyze several conducted tests using the new framework and describe the required steps to modify the provided tool in detail.



https://doi.org/10.1109/MMSP53017.2021.9733561
Döring, Nicola; Mikhailova, Veronika; Brandenburg, Karlheinz; Broll, Wolfgang; Groß, Horst-Michael; Werner, Stephan; Raake, Alexander
Saying "Hi" to grandma in nine different ways : established and innovative communication media in the grandparent-grandchild relationship. - In: Technology, Mind, and Behavior, ISSN 2689-0208, (2021), insges. 1 S.

https://doi.org/10.1037/tms0000107
Fremerey, Stephan; Reimers, Carolin; Leist, Larissa; Spilski, Jan; Klatte, Maria; Fels, Janina; Raake, Alexander
Generation of audiovisual immersive virtual environments to evaluate cognitive performance in classroom type scenarios. - In: Tagungsband, DAGA 2021 - 47. Jahrestagung für Akustik, (2021), S. 1336-1339

https://doi.org/10.22032/dbt.50292
Ramachandra Rao, Rakesh Rao; Göring, Steve; Raake, Alexander
Enhancement of pixel-based video quality models using meta-data. - In: Electronic imaging, ISSN 2470-1173, Bd. 33 (2021), 9, art00022, S. 264-1-264-6

Current state-of-the-art pixel-based video quality models for 4K resolution do not have access to explicit meta information such as resolution and framerate and may not include implicit or explicit features that model the related effects on perceived video quality. In this paper, we propose a meta concept to extend state-of-the-art pixel-based models and develop hybrid models incorporating meta-data such as framerate and resolution. Our general approach uses machine learning to incooperate the meta-data to the overall video quality prediction. To this aim, in our study, we evaluate various machine learning approaches such as SVR, random forest, and extreme gradient boosting trees in terms of their suitability for hybrid model development. We use VMAF to demonstrate the validity of the meta-information concept. Our approach was tested on the publicly available AVT-VQDB-UHD-1 dataset. We are able to show an increase in the prediction accuracy for the hybrid models in comparison with the prediction accuracy of the underlying pixel-based model. While the proof-of-concept is applied to VMAF, it can also be used with other pixel-based models.



https://doi.org/10.2352/ISSN.2470-1173.2021.9.IQSP-264
Ho, Man M.; Zhang, Lu; Raake, Alexander; Zhou, Jinjia
Semantic-driven colorization. - In: Proceedings CVMP 2021, (2021), 1, S. 1-10

Recent colorization works implicitly predict the semantic information while learning to colorize black-and-white images. Consequently, the generated color is easier to be overflowed, and the semantic faults are invisible. According to human experience in colorization, our brains first detect and recognize the objects in the photo, then imagine their plausible colors based on many similar objects we have seen in real life, and finally colorize them, as described in Figure 1. In this study, we simulate that human-like action to let our network first learn to understand the photo, then colorize it. Thus, our work can provide plausible colors at a semantic level. Plus, the semantic information predicted from a well-trained model becomes understandable and able to be modified. Additionally, we also prove that Instance Normalization is also a missing ingredient for image colorization, then re-design the inference flow of U-Net to have two streams of data, providing an appropriate way of normalizing the features extracted from the black-and-white image. As a result, our network can provide plausible colors competitive to the typical colorization works for specific objects. Our interactive application is available at https://github.com/minhmanho/semantic-driven_colorization.



https://doi.org/10.1145/3485441.3485645
Keller, Dominik; Seybold, Tamara; Skowronek, Janto; Raake, Alexander
Sensorische Evaluierung in der Kinotechnik : wie Videoqualität mit Methoden aus der Lebensmittelforschung bewertet werden kann. - In: FKT, ISSN 1430-9947, Bd. 75 (2021), 4, S. 33-37

Ramachandra Rao, Rakesh Rao; Göring, Steve; Raake, Alexander
Towards high resolution video quality assessment in the crowd. - In: 2021 13th International Conference on Quality of Multimedia Experience (QoMEX), (2021), S. 1-6

Assessing high resolution video quality is usually performed using controlled, defined, and standardized lab tests. This method of acquiring human ratings in a lab environment is time-consuming and may also not reflect the typical viewing conditions. To overcome these disadvantages, crowd testing paradigms have been used for assessing video quality in general. Crowdsourcing-based tests enable a more diverse set of participants and also use a realistic hardware setup and viewing environment of typical users. However, obtaining valid ratings for high-resolution video quality poses several problems. Example issues are that streaming of such high-bandwidth content may not be feasible for some users, or that crowd participants lack an appropriate, high-resolution display device. In this paper, we propose a method to overcome such problems and conduct a crowd test using for higher resolution content by using a 540 p cutout from the center of the original 2160p video. To this aim, we use the videos from Test#1 of the publicly available dataset AVT-VQDB-UHD-1, which contains videos up to a resolution of UHD-1. The quality-labels available from that lab test allow us to compare the results with the crowd test presented in this paper. It is shown that there is a Pearson correlation of 0.96 between the lab and crowd tests and hence such crowd tests can reliably be used for video assessment of higher resolution content. The overall implementation of the crowd test framework and the results are made publicly available for further research and reproducibility1.



https://doi.org/10.1109/QoMEX51781.2021.9465425
Keller, Dominik; Vaalgamaa, Markus; Paajanen, Erkki; Ramachandra Rao, Rakesh Rao; Göring, Steve; Raake, Alexander
Groovability: using groove as a novel measure for audio QoE with the example of smartphones. - In: 2021 13th International Conference on Quality of Multimedia Experience (QoMEX), (2021), S. 13-18

Groove in music is a fundamental part of why humans entrain to it and enjoy it. Smartphones have become an important medium to listen to music. Especially when being with others, loudspeaker playback may be the method of choice. However, due to the physical limits of acoustics, for loudspeaker playback, smartphones are equipped with sub-optimal audio capabilities. Therefore, it is desirable to measure Quality of Experience (QoE) of music played on smartphones. While audio playback is often assessed in terms of sound quality, the aim of this work is to address QoE in terms of the meaning or effect that the audio has on the listener. A key component for the meaning of popular music is groove. Hence, in this paper, we study groovability, that is, the ability of a piece of audio technology to convey groove. To instantiate our novel audio QoE assessment method, we apply it to music played by 8 different smartphones. For this purpose, looped 4-bar loudness-aligned recordings from 24 music pieces of different intrinsic groove were played back on the different smartphones. Our test method uses a multi-stimulus comparison with synchronized playback capability. A total of 62 subjects evaluated groovability using two stimulus subsets. It was found that the proposed methodology is highly effective to distinguish between the groovability provided by the considered phones. In addition, a reduced-reference model is proposed to predict groovability, using a set of both acoustics-and music-groove related features. In our formal validation on unknown data, the model is shown to provide good prediction performance with a Pearson correlation of greater than 0.90.



https://doi.org/10.1109/QoMEX51781.2021.9465440
Robitza, Werner; Ramachandra Rao, Rakesh Rao; Göring, Steve; Raake, Alexander
Impact of spatial and temporal information on video quality and compressibility. - In: 2021 13th International Conference on Quality of Multimedia Experience (QoMEX), (2021), S. 65-68

Spatial Information (SI) and Temporal Information (TI) are frequently-used metrics to classify the spatiotemporal complexity of video content. However, they are mostly used on original video sources, and their impact on actual encoding efficiency is not known. In this paper, we propose a method to determine the compressibility of video sources, that is, how good video quality can be under a given bitrate constraint. We show how various aggregations of SI and TI correlate with compressibility scores obtained from a public dataset of H.264/HEVCN P9 content. We observe that the minimum TI value as well as an existing criticality metric from the literature are good indicators for compressibility, as judged by subjective ratings as well as VMAF and P.1204.3 objective scores.



https://doi.org/10.1109/QoMEX51781.2021.9465452