Publikationsliste Hochschul-Bibliografie

Anzahl der Treffer: 37
Erstellt: Thu, 28 Mar 2024 23:14:00 +0100 in 0.0611 sec


Aganian, Dustin; Köhler, Mona; Baake, Sebastian; Eisenbach, Markus; Groß, Horst-Michael
How object information improves skeleton-based human action recognition in assembly tasks. - In: IJCNN 2023 conference proceedings, (2023), insges. 9 S.

As the use of collaborative robots (cobots) in industrial manufacturing continues to grow, human action recognition for effective human-robot collaboration becomes increasingly important. This ability is crucial for cobots to act autonomously and assist in assembly tasks. Recently, skeleton-based approaches are often used as they tend to generalize better to different people and environments. However, when processing skeletons alone, information about the objects a human interacts with is lost. Therefore, we present a novel approach of integrating object information into skeleton-based action recognition. We enhance two state-of-the-art methods by treating object centers as further skeleton joints. Our experiments on the assembly dataset IKEA ASM show that our approach improves the performance of these state-of-the-art methods to a large extent when combining skeleton joints with objects predicted by a state-of-the-art instance segmentation model. Our research sheds light on the benefits of combining skeleton joints with object information for human action recognition in assembly tasks. We analyze the effect of the object detector on the combination for action classification and discuss the important factors that must be taken into account.



https://doi.org/10.1109/IJCNN54540.2023.10191686
Aganian, Dustin; Köhler, Mona; Stephan, Benedict; Eisenbach, Markus; Groß, Horst-Michael
Fusing hand and body skeletons for human action recognition in assembly. - In: Artificial Neural Networks and Machine Learning - ICANN 2023, (2023), S. 207-219

As collaborative robots (cobots) continue to gain popularity in industrial manufacturing, effective human-robot collaboration becomes crucial. Cobots should be able to recognize human actions to assist with assembly tasks and act autonomously. To achieve this, skeleton-based approaches are often used due to their ability to generalize across various people and environments. Although body skeleton approaches are widely used for action recognition, they may not be accurate enough for assembly actions where the worker’s fingers and hands play a significant role. To address this limitation, we propose a method in which less detailed body skeletons are combined with highly detailed hand skeletons. We investigate CNNs and transformers, the latter of which are particularly adept at extracting and combining important information from both skeleton types using attention. This paper demonstrates the effectiveness of our proposed approach in enhancing action recognition in assembly scenarios.



https://doi.org/10.1007/978-3-031-44207-0_18
Aganian, Dustin; Stephan, Benedict; Eisenbach, Markus; Stretz, Corinna; Groß, Horst-Michael
ATTACH dataset: annotated two-handed assembly actions for human action understanding. - In: ICRA 2023, (2023), S. 11367-11373

With the emergence of collaborative robots (cobots), human-robot collaboration in industrial manufacturing is coming into focus. For a cobot to act autonomously and as an assistant, it must understand human actions during assembly. To effectively train models for this task, a dataset containing suitable assembly actions in a realistic setting is cru-cial. For this purpose, we present the ATTACH dataset, which contains 51.6 hours of assembly with 95.2k annotated fine-grained actions monitored by three cameras, which represent potential viewpoints of a cobot. Since in an assembly context workers tend to perform different actions simultaneously with their two hands, we annotated the performed actions for each hand separately. Therefore, in the ATTACH dataset, more than 68% of annotations overlap with other annotations, which is many times more than in related datasets, typically featuring more simplistic assembly tasks. For better generalization with respect to the background of the working area, we did not only record color and depth images, but also used the Azure Kinect body tracking SDK for estimating 3D skeletons of the worker. To create a first baseline, we report the performance of state-of-the-art methods for action recognition as well as action detection on video and skeleton-sequence inputs. The dataset is available at https://www.tu-ilmenau.de/neurob/data-sets-code/attach-dataset.



https://doi.org/10.1109/ICRA48891.2023.10160633
Eisenbach, Markus; Lübberstedt, Jannik; Aganian, Dustin; Groß, Horst-Michael
A little bit attention is all you need for person re-identification. - In: ICRA 2023, (2023), S. 7598-7605

Person re-identification plays a key role in applications where a mobile robot needs to track its users over a long period of time, even if they are partially unobserved for some time, in order to follow them or be available on demand. In this context, deep-learning-based real-time feature extraction on a mobile robot is often performed on special-purpose devices whose computational resources are shared for multiple tasks. Therefore, the inference speed has to be taken into account. In contrast, person re-identification is often improved by architectural changes that come at the cost of significantly slowing down inference. Attention blocks are one such example. We will show that some well-performing attention blocks used in the state of the art are subject to inference costs that are far too high to justify their use for mobile robotic applications. As a consequence, we propose an attention block that only slightly affects the inference speed while keeping up with much deeper networks or more complex attention blocks in terms of re-identification accuracy. We perform extensive neural architecture search to derive rules at which locations this attention block should be integrated into the architecture in order to achieve the best trade-off between speed and accuracy. Finally, we confirm that the best performing configuration on a re-identification benchmark also performs well on an indoor robotic dataset.



https://doi.org/10.1109/ICRA48891.2023.10160304
Köhler, Mona; Eisenbach, Markus; Groß, Horst-Michael
Few-shot object detection: a comprehensive survey. - In: IEEE transactions on neural networks and learning systems, ISSN 2162-2388, Bd. 0 (2023), 0, S. 1-21

Humans are able to learn to recognize new objects even from a few examples. In contrast, training deep-learning-based object detectors requires huge amounts of annotated data. To avoid the need to acquire and annotate these huge amounts of data, few-shot object detection (FSOD) aims to learn from few object instances of new categories in the target domain. In this survey, we provide an overview of the state of the art in FSOD. We categorize approaches according to their training scheme and architectural layout. For each type of approach, we describe the general realization as well as concepts to improve the performance on novel categories. Whenever appropriate, we give short takeaways regarding these concepts in order to highlight the best ideas. Eventually, we introduce commonly used datasets and their evaluation protocols and analyze the reported benchmark results. As a result, we emphasize common challenges in evaluation and identify the most promising current trends in this emerging field of FSOD.



https://doi.org/10.1109/TNNLS.2023.3265051
Stephan, Benedict; Aganian, Dustin; Hinneburg, Lars; Eisenbach, Markus; Müller, Steffen; Groß, Horst-Michael
On the importance of label encoding and uncertainty estimation for robotic grasp detection. - In: IROS 2022 Kyōoto - IEEE/RSJ International Conference on Intelligent Robots and Systems, (2022), S. 4781-4788

Automated grasping of arbitrary objects is an essential skill for many applications such as smart manufacturing and human robot interaction. This makes grasp detection a vital skill for automated robotic systems. Recent work in model-free grasp detection uses point cloud data as input and typically outperforms the earlier work on RGB(D)-based methods. We show that RGB(D)-based methods are being underestimated due to suboptimal label encodings used for training. Using the evaluation pipeline of the GraspNet-1Billion dataset, we investigate different encodings and propose a novel encoding that significantly improves grasp detection on depth images. Additionally, we show shortcomings of the 2D rectangle grasps supplied by the GraspNet-1Billion dataset and propose a filtering scheme by which the ground truth labels can be improved significantly. Furthermore, we apply established methods for uncertainty estimation on our trained models since knowing when we can trust the model's decisions provides an advantage for real-world application. By doing so, we are the first to directly estimate uncertainties of detected grasps. We also investigate the applicability of the estimated aleatoric and epistemic uncertainties based on their theoretical properties. Additionally, we demonstrate the correlation between estimated uncertainties and grasp quality, thus improving selection of high quality grasp detections. By all these modifications, our approach using only depth images can compete with point-cloud-based approaches for grasp detection despite the lower degree of freedom for grasp poses in 2D image space.



https://doi.org/10.1109/IROS47612.2022.9981866
Eisenbach, Markus; Aganian, Dustin; Köhler, Mona; Stephan, Benedict; Schröter, Christof; Groß, Horst-Michael
Visual scene understanding for enabling situation-aware cobots. - Ilmenau : Universitätsbibliothek. - 1 Online-Ressource (2 Seiten)Publikation entstand im Rahmen der Veranstaltung: IEEE International Conference on Automation Science and Engineering ; 17 (Lyon, France) : 2021.08.23-27, TuBT7 Special Session: Robotic Control and Robotization of Tasks within Industry 4.0

Although in the course of Industry 4.0, a high degree of automation is the objective, not every process can be fully automated - especially in versatile manufacturing. In these applications, collaborative robots (cobots) as helpers are a promising direction. We analyze the collaborative assembly scenario and conclude that visual scene understanding is a prerequisite to enable autonomous decisions by cobots. We identify the open challenges in these visual recognition tasks and propose promising new ideas on how to overcome them.



https://doi.org/10.22032/dbt.51471
Balada, Christoph; Eisenbach, Markus; Groß, Horst-Michael
Evaluation of transfer learning for visual road condition assessment. - In: Artificial neural networks and machine learning - ICANN 2021, (2021), S. 540-551

Through deep learning, major advances have been made in the field of visual road condition assessment in recent years. However, many approaches train from scratch and avoid transfer learning due to the different nature of road surface data and the ImageNet dataset, which is commonly used for pre-training neural networks for visual recognition. We show that, despite the huge differences in the data, transfer learning outperforms training from scratch in terms of generalization. In extensive experiments, we explore the underlying cause by examining various transfer learning effects. For our experiments, we are incorporating seven known architectures. Therefore, this is the first comprehensive study of transfer learning in the field of visual road condition assessment.



Aganian, Dustin; Eisenbach, Markus; Wagner, Joachim; Seichter, Daniel; Groß, Horst-Michael
Revisiting loss functions for person re-identification. - In: Artificial neural networks and machine learning - ICANN 2021, (2021), S. 30-42

Appearance-based person re-identification is very challenging, i.a. due to changing illumination, image distortion, and differences in viewpoint. Therefore, it is crucial to learn an expressive feature embedding that compensates for changing environmental conditions. There are many loss functions available to achieve this goal. However, it is hard to judge which one is the best. In related work, the experiments are only performed on the same datasets, but the use of different setups and different training techniques compromises the comparability. Therefore, we compare the most widely used and most promising loss functions under identical conditions on three different setups. We provide insights into why some of the loss functions work better than others and what additional benefits they provide. We further propose sequential training as an additional training trick that improves the performance of most loss functions. In our conclusion, we provide guidance for future usage an d research regarding loss functions for appearance-based person re-identification. Source code is available (Source code: https://www.tu-ilmenau.de/neurob/data-sets-code/re-id-loss/).



Eisenbach, Markus;
Personenwiedererkennung mittels maschineller Lernverfahren. - In: Ausgezeichnete Informatikdissertationen, Bd. 2019 (2021), S. 59-68