Improving predictive performance and calibration by weight fusion in semantic segmentation. - San Diego, Calif. : Neural Information Processing Systems. - 1 Online-Ressource (Seite 1-20)Publikation entstand im Rahmen der Veranstaltung: Machine Learning for Autonomous Driving Workshop at the 36th Conference on Neural Information Processing Systems (NeurIPS 2022), New Orleans, USA
Averaging predictions of a deep ensemble of networks is a popular and effective method to improve predictive performance and calibration in various benchmarks and Kaggle competitions. However, the runtime and training cost of deep ensembles grow linearly with the size of the ensemble, making them unsuitable for many applications. Averaging ensemble weights instead of predictions circumvents this disadvantage during inference and is typically applied to intermediate checkpoints of a model to reduce training cost. Albeit effective, only few works have improved the understanding and the performance of weight averaging. Here, we revisit this approach and show that a simple weight fusion (WF) strategy can lead to a significantly improved predictive performance and calibration. We describe what prerequisites the weights must meet in terms of weight space, functional space and loss. Furthermore, we present a new test method (called oracle test) to measure the functional space between weights. We demonstrate the versatility of our WF strategy across state of the art segmentation CNNs and Transformers as well as real world datasets such as BDD100K and Cityscapes. We compare WF with similar approaches and show our superiority for in- and out-of-distribution data in terms of predictive performance and calibration.
On the importance of label encoding and uncertainty estimation for robotic grasp detection. - In: IEEE Xplore digital library, ISSN 2473-2001, (2022), S. 4781-4788
Automated grasping of arbitrary objects is an essential skill for many applications such as smart manufacturing and human robot interaction. This makes grasp detection a vital skill for automated robotic systems. Recent work in model-free grasp detection uses point cloud data as input and typically outperforms the earlier work on RGB(D)-based methods. We show that RGB(D)-based methods are being underestimated due to suboptimal label encodings used for training. Using the evaluation pipeline of the GraspNet-1Billion dataset, we investigate different encodings and propose a novel encoding that significantly improves grasp detection on depth images. Additionally, we show shortcomings of the 2D rectangle grasps supplied by the GraspNet-1Billion dataset and propose a filtering scheme by which the ground truth labels can be improved significantly. Furthermore, we apply established methods for uncertainty estimation on our trained models since knowing when we can trust the model's decisions provides an advantage for real-world application. By doing so, we are the first to directly estimate uncertainties of detected grasps. We also investigate the applicability of the estimated aleatoric and epistemic uncertainties based on their theoretical properties. Additionally, we demonstrate the correlation between estimated uncertainties and grasp quality, thus improving selection of high quality grasp detections. By all these modifications, our approach using only depth images can compete with point-cloud-based approaches for grasp detection despite the lower degree of freedom for grasp poses in 2D image space.
Digital media in intergenerational communication: status quo and future scenarios for the grandparent-grandchild relationship. - In: Universal access in the information society, ISSN 1615-5297, Bd. 0 (2022), 0, insges. 16 S.
Communication technologies play an important role in maintaining the grandparent-grandchild (GP-GC) relationship. Based on Media Richness Theory, this study investigates the frequency of use (RQ1) and perceived quality (RQ2) of established media as well as the potential use of selected innovative media (RQ3) in GP-GC relationships with a particular focus on digital media. A cross-sectional online survey and vignette experiment were conducted in February 2021 among N = 286 university students in Germany (mean age 23 years, 57% female) who reported on the direct and mediated communication with their grandparents. In addition to face-to-face interactions, non-digital and digital established media (such as telephone, texting, video conferencing) and innovative digital media, namely augmented reality (AR)-based and social robot-based communication technologies, were covered. Face-to-face and phone communication occurred most frequently in GP-GC relationships: 85% of participants reported them taking place at least a few times per year (RQ1). Non-digital established media were associated with higher perceived communication quality than digital established media (RQ2). Innovative digital media received less favorable quality evaluations than established media. Participants expressed doubts regarding the technology competence of their grandparents, but still met innovative media with high expectations regarding improved communication quality (RQ3). Richer media, such as video conferencing or AR, do not automatically lead to better perceived communication quality, while leaner media, such as letters or text messages, can provide rich communication experiences. More research is needed to fully understand and systematically improve the utility, usability, and joy of use of different digital communication technologies employed in GP-GC relationships.
The MORPHIA Project: first results of a long-term user study in an elderly care scenario from robotic point of view. - In: 54th International Symposium on Robotics, (2022), S. 66-73
In an aging society, efficiently organizing care taking tasks is of great importance including several players (here referred to as caregivers) like relatives, friends, professional caretakers, employees of retirement homes, clubs and so on. Especially for long-distance relationships, this can be burdensome and time-consuming. While supporting devices, like mobile phones or tablets, are slowly reaching the elder community, the drawbacks are obvious. These passive devices need to be handled by the elderly themselves, this includes an proper understanding of the operation, remembering to charge the devices, or even to hear incoming calls or messages. In the project MORPHIA, we target these drawbacks by combining a social communication platform on a tablet with a mobile robotic platform that can be remote-controlled by all mentioned actors of the supporting network or actively deliver messages emitted from the network. In this paper, we present the first stage of our demonstrator in terms of implemented hard- and software components. Since the price is a key factor for acceptance of such a system in the care community, we performed a technical assessment of these components based on our findings during the development process. In addition, we present the results of the first user tests with 5 participants over two weeks each between August and November 2021 (two further test iterations are planned for 2022 and 2023). This includes general usage of specific robotic services as well as technical benchmarks to assess the robustness of the developed system in domestic environments.
Efficient multi-task RGB-D scene analysis for indoor environments. - In: 2022 International Joint Conference on Neural Networks (IJCNN), (2022), insges. 10 S.
Semantic scene understanding is essential for mobile agents acting in various environments. Although semantic segmentation already provides a lot of information, details about individual objects as well as the general scene are missing but required for many real-world applications. However, solving multiple tasks separately is expensive and cannot be accomplished in real time given limited computing and battery capabilities on a mobile platform. In this paper, we propose an efficient multi-task approach for RGB-D scene analysis (EMSANet) that simultaneously performs semantic and instance segmentation (panoptic segmentation), instance orientation estimation, and scene classification. We show that all tasks can be accomplished using a single neural network in real time on a mobile platform without diminishing performance - by contrast, the individual tasks are able to benefit from each other. In order to evaluate our multi-task approach, we extend the annotations of the common RGB-D indoor datasets NYUv2 and SUNRGB-D for instance segmentation and orientation estimation. To the best of our knowledge, we are the first to provide results in such a comprehensive multi-task setting for indoor scene analysis on NYUv2 and SUNRGB-D.
Can communication technologies reduce loneliness and social isolation in older people? : a scoping review of reviews. - In: International journal of environmental research and public health, ISSN 1660-4601, Bd. 19 (2022), 18, 11310, S. 1-20
Background: Loneliness and social isolation in older age are considered major public health concerns and research on technology-based solutions is growing rapidly. This scoping review of reviews aims to summarize the communication technologies (CTs) (review question RQ1), theoretical frameworks (RQ2), study designs (RQ3), and positive effects of technology use (RQ4) present in the research field. Methods: A comprehensive multi-disciplinary, multi-database literature search was conducted. Identified reviews were analyzed according to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) framework. A total of N = 28 research reviews that cover 248 primary studies spanning 50 years were included. Results: The majority of the included reviews addressed general internet and computer use (82% each) (RQ1). Of the 28 reviews, only one (4%) worked with a theoretical framework (RQ2) and 26 (93%) covered primary studies with quantitative-experimental designs (RQ3). The positive effects of technology use were shown in 55% of the outcome measures for loneliness and 44% of the outcome measures for social isolation (RQ4). Conclusion: While research reviews show that CTs can reduce loneliness and social isolation in older people, causal evidence is limited and insights on innovative technologies such as augmented reality systems are scarce.
Efficient and robust semantic mapping for indoor environments. - In: 2022 IEEE International Conference on Robotics and Automation (ICRA), (2022), S. 9221-9227
A key proficiency an autonomous mobile robot must have to perform high-level tasks is a strong understanding of its environment. This involves information about what types of objects are present, where they are, what their spatial extend is, and how they can be reached, i.e., information about free space is also crucial. Semantic maps are a powerful instrument providing such information. However, applying semantic segmentation and building 3D maps with high spatial resolution is challenging given limited resources on mobile robots. In this paper, we incorporate semantic information into efficient occupancy normal distribution transform (NDT) maps to enable real-time semantic mapping on mobile robots. On the publicly available dataset Hypersim, we show that, due to their sub-voxel accuracy, semantic NDT maps are superior to other approaches. We compare them to the recent state-of-the-art approach based on voxels and semantic Bayesian spatial kernel inference (S-BKI) and to an optimized version of it derived in this paper. The proposed semantic NDT maps can represent semantics to the same level of detail, while mapping is 2.7 to 17.5 times faster. For the same grid resolution, they perform significantly better, while mapping is up to more than 5 times faster. Finally, we prove the real-world applicability of semantic NDT maps with qualitative results in a domestic application.
Deep learning for clinical decision support in oncology. - Ilmenau : Universitätsbibliothek, 2022. - 1 Online-Ressource (168 Seiten)
Technische Universität Ilmenau, Dissertation 2022
Bibliography p. 123-139
In den letzten Jahrzehnten sind medizinische Bildgebungsverfahren wie die Computertomographie (CT) zu einem unersetzbaren Werkzeug moderner Medizin geworden, welche eine zeitnahe, nicht-invasive Begutachtung von Organen und Geweben ermöglichen. Die Menge an anfallenden Daten ist dabei rapide gestiegen, allein innerhalb der letzten Jahre um den Faktor 15, und aktuell verantwortlich für 30 % des weltweiten Datenvolumens. Die Anzahl ausgebildeter Radiologen ist weitestgehend stabil, wodurch die medizinische Bildanalyse, angesiedelt zwischen Medizin und Ingenieurwissenschaften, zu einem schnell wachsenden Feld geworden ist. Eine erfolgreiche Anwendung verspricht Zeitersparnisse, und kann zu einer höheren diagnostischen Qualität beitragen. Viele Arbeiten fokussieren sich auf "Radiomics", die Extraktion und Analyse von manuell konstruierten Features. Diese sind jedoch anfällig gegenüber externen Faktoren wie dem Bildgebungsprotokoll, woraus Implikationen für Reproduzierbarkeit und klinische Anwendbarkeit resultieren. In jüngster Zeit sind Methoden des "Deep Learning" zu einer häufig verwendeten Lösung algorithmischer Problemstellungen geworden. Durch Anwendungen in Bereichen wie Robotik, Physik, Mathematik und Wirtschaft, wurde die Forschung im Bereich maschinellen Lernens wesentlich verändert. Ein Kriterium für den Erfolg stellt die Verfügbarkeit großer Datenmengen dar. Diese sind im medizinischen Bereich rar, da die Bilddaten strengen Anforderungen bezüglich Datenschutz und Datensicherheit unterliegen, und oft heterogene Qualität, sowie ungleichmäßige oder fehlerhafte Annotationen aufweisen, wodurch ein bedeutender Teil der Methoden keine Anwendung finden kann. Angesiedelt im Bereich onkologischer Bildgebung zeigt diese Arbeit Wege zur erfolgreichen Nutzung von Deep Learning für medizinische Bilddaten auf. Mittels neuer Methoden für klinisch relevante Anwendungen wie die Schätzung von Läsionswachtum, Überleben, und Entscheidungkonfidenz, sowie Meta-Learning, Klassifikator-Ensembling, und Entscheidungsvisualisierung, werden Wege zur Verbesserungen gegenüber State-of-the-Art-Algorithmen aufgezeigt, welche ein breites Anwendungsfeld haben. Hierdurch leistet die Arbeit einen wesentlichen Beitrag in Richtung einer klinischen Anwendung von Deep Learning, zielt auf eine verbesserte Diagnose, und damit letztlich eine verbesserte Gesundheitsversorgung insgesamt.
Visual scene understanding for enabling situation-aware cobots. - Ilmenau : Universitätsbibliothek. - 1 Online-Ressource (2 Seiten)Publikation entstand im Rahmen der Veranstaltung: IEEE International Conference on Automation Science and Engineering ; 17 (Lyon, France) : 2021.08.23-27, TuBT7 Special Session: Robotic Control and Robotization of Tasks within Industry 4.0
Although in the course of Industry 4.0, a high degree of automation is the objective, not every process can be fully automated - especially in versatile manufacturing. In these applications, collaborative robots (cobots) as helpers are a promising direction. We analyze the collaborative assembly scenario and conclude that visual scene understanding is a prerequisite to enable autonomous decisions by cobots. We identify the open challenges in these visual recognition tasks and propose promising new ideas on how to overcome them.
Saying "Hi" to grandma in nine different ways : established and innovative communication media in the grandparent-grandchild relationship. - In: Technology, Mind, and Behavior, ISSN 2689-0208, (2021), insges. 1 S.