Using consensual biterms from text structures of requirements and code to improve IR-based traceability recovery. - In: The ACM digital library, (2022), 114, insges. 13 S.
Traceability approves trace links among software artifacts based on whether two artifacts are related by system functionalities. The traces are valuable for software development, but are difficult to obtain manually. To cope with the costly and fallible manual recovery, automated approaches are proposed to recover traces through textual similarities among software artifacts, such as those based on Information Retrieval (IR). However, the low quality & quantity of artifact texts negatively impact the calculated IR values, thus greatly hindering the performance of IR-based approaches. In this study, we propose to extract co-occurred word pairs from the text structures of both requirements and code (i.e., consensual biterms) to improve IR-based traceability recovery. We first collect a set of biterms based on the part-of-speech of requirement texts, and then filter them through the code texts. We then use these consensual biterms to both enrich the input corpus for IR techniques and enhance the calculations of IR values. A nine-system-based evaluation shows that in general, when solely used to enhance IR techniques, our approach can outperform pure IR-based approaches and another baseline by 21.9% & 21.8% in AP, and 9.3% & 7.2% in MAP, respectively. Moreover, when used to collaborate with another enhancing strategy from different perspectives, it can outperform this baseline by 5.9% in AP and 4.8% in MAP.
Generalizability of code clone detection on CodeBERT. - In: The ACM digital library, (2022), 143, insges. 3 S.
Transformer networks such as CodeBERT already achieve outstanding results for code clone detection in benchmark datasets, so one could assume that this task has already been solved. However, code clone detection is not a trivial task. Semantic code clones, in particular, are challenging to detect. We show that the generalizability of CodeBERT decreases by evaluating two different subsets of Java code clones from BigCloneBench. We observe a significant drop in F1 score when we evaluate different code snippets and functionality IDs than those used for model building.
Nutritional assessment of ready-to-eat salads in German supermarkets: comparison of the nutriRECIPE-Index and the Nutri-Score. - In: Foods, ISSN 2304-8158, Bd. 11 (2022), 24, 4011, S. 1-21
Globally, an unbalanced diet causes more deaths than any other factor. Due to a lack of knowledge, it is difficult for consumers to select healthy foods at the point of sale. Although different front-of-pack labeling schemes exist, their informative value is limited due to small sets of considered parameters and lacking information on ingredient composition. We developed and evalauated a manufacture-independent approach to quantify ingredient composition of 294 ready-to eat salads (distinguished into 73 subgroups) as test set. Nutritional quality was assessed by the nutriRECIPE-Index and compared to the Nutri-Score. The nutriRECIPE-Index comprises the calculation of energy-adjusted nutrient density of 16 desirable and three undesirable nutrients, which are weighted according to their degree of supply in the population. We show that the nutriRECIPE-Index has stronger discriminatory power compared to the Nutri-Score and discriminates as well or even better in 63 out of the 73 subgroups. This was evident in groups where seemingly similar products were compared, e.g., potato salads (Nutri-Score: C only, nutriRECIPE-Index: B, C and D). Moreover, the nutriRECIPE-Index is adjustable to any target population’s specific needs and supply situation, such as seniors, and children. Hence, a more sophisticated distinction between single food products is possible using the nutriRECIPE-Index.
GANSet - generating annnotated datasets using Generative Adversarial Networks. - In: IEEE Xplore digital library, ISSN 2473-2001, (2022), S. 615-620
The prediction of soil moisture for automated irrigation applications is a major challenge, as it is affected by various environmental parameters. The Application of Convolutional Neural Networks (CNN), to this end, has shown remarkable results for soil moisture prediction. These models, however, typically need large datasets, which are scarce in the agriculture field. To this end, this paper presents a Deep Convolutional Generative Adversarial Network (DCGAN) that can learn good data representations and generate highly realistic samples. Traditionally, Generative Adversarial Networks (GANs) have been used for generating data for segmentation and classification tasks or used in conjunction with CNNs or Multi Layer Perceptrons (MLPs) for regression tasks. In this paper, we propose a novel approach in which GANs are used to generate conjointly training images for plants as well as realistic regression values for their corresponding moisture levels without the use of any additional network. The generated images and regression vector targets, together with the training data, are then used to train a CNN which is then evaluated with actual test data from the dataset. We observe an improvement of error rate by 33 percent which shows the validity of our approach.
Flipped classroom: effective teaching for time series forecasting. - New York, NY : TMLR. - 1 Online-Ressource (Seite 1-36)Sonderdruck aus: Transactions on Machine Learning Research (TMLR). - New York, NY : TMLR, 10/2022
Sequence-to-sequence models based on LSTM and GRU are a most popular choice for forecasting time series data reaching state-of-the-art performance. Training such models can be delicate though. The two most common training strategies within this context are teacher forcing (TF) and free running (FR). TF can be used to help the model to converge faster but may provoke an exposure bias issue due to a discrepancy between training and inference phase. FR helps to avoid this but does not necessarily lead to better results, since it tends to make the training slow and unstable instead. Scheduled sampling was the first approach tackling these issues by picking the best from both worlds and combining it into a curriculum learning (CL) strategy. Although scheduled sampling seems to be a convincing alternative to FR and TF, we found that, even if parametrized carefully, scheduled sampling may lead to premature termination of the training when applied for time series forecasting. To mitigate the problems of the above approaches we formalize CL strategies along the training as well as the training iteration scale. We propose several new curricula, and systematically evaluate their performance in two experimental sets. For our experiments, we utilize six datasets generated from prominent chaotic systems. We found that the newly proposed increasing training scale curricula with a probabilistic iteration scale curriculum consistently outperforms previous training strategies yielding an NRMSE improvement of up to 81% over FR or TF training. For some datasets we additionally observe a reduced number of training iterations. We observed that all models trained with the new curricula yield higher prediction stability allowing for longer prediction horizons.
A deep learning approach for direction of arrival estimation using automotive-grade ultrasonic sensors. - In: Journal of physics, ISSN 1742-6596, Bd. 2234 (2022), 012009, insges. 12 S.
In this paper, a deep learning approach is presented for direction of arrival estimation using automotive-grade ultrasonic sensors which are used for driving assistance systems such as automatic parking. A study and implementation of the state of the art deterministic direction of arrival estimation algorithms is used as a benchmark for the performance of the proposed approach. Analysis of the performance of the proposed algorithms against the existing algorithms is carried out over simulation data as well as data from a measurement campaign done using automotive-grade ultrasonic sensors. Both sets of results clearly show the superiority of the proposed approach under realistic conditions such as noise from the environment as well as eventual errors in measurements. It is demonstrated as well how the proposed approach can overcome some of the known limitations of the existing algorithms such as precision dilution of triangulation and aliasing.
Towards more effective identification keys: a study of people identifying plant species characters. - In: People and nature, ISSN 2575-8314, Bd. 4 (2022), 6, S. 1603-1615
Accurate species identification is essential for ecological monitoring and biodiversity conservation. Interactive plant identification keys have been considerably improved in recent years, mainly by providing iconic symbols, illustrations, or images for the users, as these keys are also commonly used by people with relatively little plant knowledge. Only a few studies have investigated how well morphological characteristics can be recognized and correctly identified by people, which is ultimately the basis of an identification key's success. This study consists of a systematic evaluation of people's abilities in identifying plant-specific morphological characters. We conducted an online survey where 484 participants were asked to identify 25 different plant character states on six images showing a plant from different perspectives. We found that survey participants correctly identified 79% of the plant characters, with botanical novices with little or no previous experience in plant identification performing slightly worse than experienced botanists. We also found that flower characters are more often correctly identified than leaf characteristics and that characters with more states resulted in higher identification errors. Additionally, the longer the time a participant needed for answering, the higher the probability of a wrong answer. Understanding what influences users' plant character identification abilities can improve the development of interactive identification keys, for example, by designing keys that adapt to novices as well as experts. Furthermore, our study can act as a blueprint for the empirical evaluation of identifications keys. Read the free Plain Language Summary for this article on the Journal blog.
Induction of embryogenic development in haploid microspore stem cells in droplet-based microfluidics. - In: Lab on a chip, ISSN 1473-0189, Bd. 22 (2022), 22, S. 4292-4305
This work presents the application of droplet-based microfluidics for the cultivation of microspores from Brassica napus using the doubled haploid technology. Under stress conditions (e.g. heat shock) or by chemical induction a certain fraction of the microspores can be reprogrammed and androgenesis can be induced. This process is an important approach for plant breeding because desired plant properties can be anchored in the germline on a genetic level. However, the reprogramming rate of the microspores is generally very low, increasing it by specific stimulation is, therefore, both a necessary and challenging task. In order to accelerate the optimisation and development process, the application of droplet-based microfluidics can be a promising tool. Here, we used a tube-based microfluidic system for the generation and cultivation of microspores inside nL-droplets. Different factors like cell density, tube material and heat shock conditions were investigated to improve the yield of vital plant organoids. Evaluation and analysis of the stimuli response were done on an image base aided by an artificial intelligence cell detection algorithm. Droplet-based microfluidics allowed us to apply large concentration programs in small test volumes and to screen the best conditions for reprogramming cells by the histone deacetylase inhibitor trichostatin A and for enhancing the yield of vital microspores in droplets. An enhanced reprogramming rate was found under the heat shock conditions at 32 ˚C for about 3 to 6 days. In addition, the comparative experiment with MTP showed that droplet cultivation with lower cell density (<10 cells per droplet) or adding media after 3 or 6 days significantly positively affects the microspore growth and embryo rate inside 120 nL droplets. Finally, the developed embryos could be removed from the droplets and further grown into mature plants. Overall, we demonstrated that the droplet-based tube system is suitable for implementation in an automated, miniaturized system to achieve the induction of embryogenic development in haploid microspore stem cells of Brassica napus.
Utilizing traceable software artifacts to improve bug localization. - Ilmenau : Universitätsbibliothek, 2022. - 1 Online-Ressource (viii, 142, XXX Seiten)
Technische Universität Ilmenau, Dissertation 2022
Die Entwicklung von Softwaresystemen ist eine komplexe Aufgabe. Qualitätssicherung versucht auftretenden Softwarefehler (bugs) in Systemen zu vermeiden, jedoch können Fehler nie ausgeschlossen werden. Sobald ein Softwarefehler entdeckt wird, wird typischerweise ein Fehlerbericht (bug report) erstellt. Dieser dient als Ausgangspunkt für den Entwickler den Fehler im Quellcode der Software zu finden und zu beheben (bug fixing). Fehlerberichte sowie weitere Softwareartefakte, z.B. Anforderungen und der Quellcode selbst, werden in Software Repositories abgelegt. Diese erlauben die Artefakte mit trace links zur Nachvollziehbarkeit (traceability) zu verknüpfen. Oftmals ist die Erstellung der trace links im Entwicklungsprozess vorgeschrieben. Dazu zählen u.a. die Luftfahrt- und Automobilindustrie, sowie die Entwicklung von medizinischen Geräten. Das Auffinden von Softwarefehlern in großen Systemen mit tausenden Artefakten ist eine anspruchsvolle, zeitintensive und fehleranfällige Aufgabe, welche eine umfangreiche Projektkenntnis erfordert. Deswegen wird seit Jahren aktiv an der Automatisierung dieses Prozesses geforscht. Weiterhin wird die manuelle Erstellung und Pflege von trace links als Belastung empfunden und sollte weitgehend automatisiert werden. In dieser Arbeit wird ein neuartiger Algorithmus zum Auffinden von Softwarefehlern vorgestellt, der aktiv die erstellten trace links ausnutzt. Die Artefakte und deren Beziehungen dienen zur Erstellung eines Nachvollziehbarkeitsgraphen, welcher analysiert wird um fehlerhafte Quellcodedateien anhand eines Fehlerberichtes zu finden. Jedoch muss angenommen werden, dass nicht alle notwendigen trace links zwischen den Softwareartefakten eines Projektes erstellt wurden. Deswegen wird ein vollautomatisierter, projektunabhängiger Ansatz vorgestellt, der diese fehlenden trace links erstellt (augmentation). Die Grundlage zur Entwicklung dieses Algorithmus ist der typische Entwicklungsprozess eines Softwareprojektes. Die entwickelten Ansätze wurden mit mehr als 32.000 Fehlerberichten von 27 Open-Source Projekten evaluiert und die Ergebnisse zeigen, dass die Einbeziehung von traceability signifikant das Auffinden von Fehlern im Quellcode verbessert. Weiterhin kann der entwickelte Augmentation Algorithmus zuverlässig fehlende trace links erstellen.
SpikiLi: a spiking simulation of LiDAR based real-time object detection for autonomous driving. - In: 2022 8th International Conference on Event-Based Control, Communication, and Signal Processing (EBCCSP), (2022), insges. 5 S.
Spiking Neural Networks are a recent and new neural network design approach that promises tremendous improvements in power efficiency, computation efficiency, and processing latency. They do so by using asynchronous spike-based data flow, event-based signal generation, processing, and modifying the neuron model to resemble biological neurons closely. While some initial works have shown significant initial evidence of applicability to common deep learning tasks, their applications in complex real-world tasks have been relatively low. In this work, we first illustrate the applicability of spiking neural networks to a complex deep learning task, namely LiDAR based 3D object detection for automated driving. Secondly, we make a step-by-step demonstration of simulating spiking behavior using a pre-trained Convolutional Neural Network. We closely model essential aspects of spiking neural networks in simulation and achieve equivalent run-time and accuracy on a GPU. We expect significant improvements in power efficiency when the model is implemented on neuromorphic hardware.
Object classification on video data of meteors and meteor-like phenomena: algorithm and data. - In: Monthly notices of the Royal Astronomical Society, ISSN 1365-2966, Bd. 516 (2022), 1, S. 811-823
Every moment, countless meteoroids enter our atmosphere unseen. The detection and measurement of meteors offer the unique opportunity to gain insights into the composition of our solar systems’ celestial bodies. Researchers therefore carry out a wide-area-sky-monitoring to secure 360-degree video material, saving every single entry of a meteor. Existing machine intelligence cannot accurately recognize events of meteors intersecting the earth’s atmosphere due to a lack of high-quality training data publicly available. This work presents four reusable open source solutions for researchers trained on data we collected due to the lack of available labelled high-quality training data. We refer to the proposed data set as the NightSkyUCP data set, consisting of a balanced set of 10 000 meteor- and 10 000 non-meteor-events. Our solutions apply various machine-learning techniques, namely classification, feature learning, anomaly detection, and extrapolation. For the classification task, a mean accuracy of 99.1 per cent is achieved. The code and data are made public at figshare with DOI 10.6084/m9.figshare.16451625.
The potential of multispectral imaging flow cytometry for environmental monitoring. - In: Cytometry, ISSN 1552-4930, Bd. 101 (2022), 9, S. 782-799
Environmental monitoring involves the quantification of microscopic cells and particles such as algae, plant cells, pollen, or fungal spores. Traditional methods using conventional microscopy require expert knowledge, are time-intensive and not well-suited for automated high throughput. Multispectral imaging flow cytometry (MIFC) allows measurement of up to 5000 particles per second from a fluid suspension and can simultaneously capture up to 12 images of every single particle for brightfield and different spectral ranges, with up to 60x magnification. The high throughput of MIFC has high potential for increasing the amount and accuracy of environmental monitoring, such as for plant-pollinator interactions, fossil samples, air, water or food quality that currently rely on manual microscopic methods. Automated recognition of particles and cells is also possible, when MIFC is combined with deep-learning computational techniques. Furthermore, various fluorescence dyes can be used to stain specific parts of the cell to highlight physiological and chemical features including: vitality of pollen or algae, allergen content of individual pollen, surface chemical composition (carbohydrate coating) of cells, DNA- or enzyme-activity staining. Here, we outline the great potential for MIFC in environmental research for a variety of research fields and focal organisms. In addition, we provide best practice recommendations.
SVDistNet: self-supervised near-field distance estimation on surround view fisheye cameras. - In: IEEE transactions on intelligent transportation systems, Bd. 23 (2022), 8, S. 10252-10261
A 360˚ perception of scene geometry is essential for automated driving, notably for parking and urban driving scenarios. Typically, it is achieved using surround-view fisheye cameras, focusing on the near-field area around the vehicle. The majority of current depth estimation approaches focus on employing just a single camera, which cannot be straightforwardly generalized to multiple cameras. The depth estimation model must be tested on a variety of cameras equipped to millions of cars with varying camera geometries. Even within a single car, intrinsics vary due to manufacturing tolerances. Deep learning models are sensitive to these changes, and it is practically infeasible to train and test on each camera variant. As a result, we present novel camera-geometry adaptive multi-scale convolutions which utilize the camera parameters as a conditional input, enabling the model to generalize to previously unseen fisheye cameras. Additionally, we improve the distance estimation by pairwise and patchwise vector-based self-attention encoder networks. We evaluate our approach on the Fisheye WoodScape surround-view dataset, significantly improving over previous approaches. We also show a generalization of our approach across different camera viewing angles and perform extensive experiments to support our contributions. To enable comparison with other approaches, we evaluate the front camera data on the KITTI dataset (pinhole camera images) and achieve state-of-the-art performance among self-supervised monocular methods. An overview video with qualitative results is provided at https://youtu.be/bmX0UcU9wtA. Baseline code and dataset will be made public.
LiMoSeg: real-time Bird's Eye View based LiDAR motion segmentation. - In: , (2022), S. 828-835
LiDAR = Light Detection and Ranging
Moving object detection and segmentation is an essential task in the Autonomous Driving pipeline. Detecting and isolating static and moving components of a vehicle’s surroundings are particularly crucial in path planning and localization tasks. This paper proposes a novel real-time architecture for motion segmentation of Light Detection and Ranging (LiDAR) data. We use two successive scans of LiDAR data in 2D Bird’s Eye View (BEV) representation to perform pixel-wise classification as static or moving. Furthermore, we propose a novel data augmentation technique to reduce the significant class imbalance between static and moving objects. We achieve this by artificially synthesizing moving objects by cutting and pasting static vehicles. We demonstrate a low latency of 8 ms on a commonly used automotive embedded platform, namely Nvidia Jetson Xavier. To the best of our knowledge, this is the first work directly performing motion segmentation in LiDAR BEV space. We provide quantitative r esults on the challenging SemanticKITTI dataset, and qualitative results are provided in https://youtu.be/2aJ-cL8b0LI.
Automatic detection and prediction of discontinuities in laser beam butt welding utilizing deep learning. - In: Journal of advanced joining processes, ISSN 2666-3309, Bd. 6 (2022), 100119, S. 1-11
Laser beam butt welding of thin sheets of high-alloy steel can be really challenging due to the formation of joint gaps, affecting weld seam quality. Industrial approaches rely on massive clamping systems to limit joint gap formation. However, those systems have to be adapted for each individually component geometry, making them very cost-intensive and leading to a limited flexibility. In contrast, jigless welding can be a high flexible alternative to substitute conventionally used clamping systems. Based on the collaboration of different actuators, motions systems or robots, the approach allows an almost free workpiece positioning. As a result, jigless welding gives the possibility for influencing the formation of the joint gap by realizing an active position control. However, the realization of an active position control requires an early and reliable error prediction to counteract the formation of joint gaps during laser beam welding. This paper proposes different approaches to predict the formation of joint gaps and gap induced weld discontinuities in terms of lack of fusion based on optical and tactile sensor data. Our approach achieves 97.4 % accuracy for video-based weld discontinuity detection and a mean absolute error of 0.02 mm to predict the formation of joint gaps based on tactile length measurements by using inductive probes.
Estimating food ingredient compositions based on mandatory product labeling. - In: Journal of food composition and analysis, ISSN 0889-1575, Bd. 110 (2022), 104508, S. 1-9
Having a specific understanding of the actual ingredient composition of products helps to calculate additional nutritional information, such as containing fatty and amino acids, minerals and vitamins, as well as to determine its environmental impacts. Unfortunately, producers rarely provide information on how much of each ingredient is in a product. Food manufacturers are, however, required to declare their products in terms of a label comprising an ingredient list (in descending order) and Big7 nutrient values. In this paper, we propose an automated approach for estimating ingredient contents in food products. First, we parse product labels to extract declared ingredients. Next, we exert mathematical formulations on the assumption that the weighted sum of Big7 ingredients as available from food compositional tables should resemble the product’s declared overall Big7 composition. We apply mathematical optimization techniques to find the best fitting ingredient composition estimate. We apply the proposed method to a dataset of 1804 food products spanning 11 product categories. We find that 76% of these products could be analyzed by our approach, and a composition within the prescribed nutrient tolerances could be calculated, using 20% of the allowed tolerances per Big7 ingredient on average. The remaining 24% of the food products could still be estimated when relaxing one or multiple nutrient tolerances. A study with known ingredient compositions shows that estimates are within a 0.9% difference of products’ actual recipes. Hence, the automated approach presented here allows for further analysis of large product quantities and provides possibilities for more intensive nutritional and ecological evaluations of food.
SEOSS-Queries - a software engineering dataset for text-to-SQL and question answering tasks. - In: Data in Brief, ISSN 2352-3409, Bd. 42 (2022), 108211, S. 1-11
Deep learning in plant phenological research: a systematic literature review. - In: Frontiers in plant science, ISSN 1664-462X, Bd. 13 (2022), 805738, S. 1-18
Climate change represents one of the most critical threats to biodiversity with far-reaching consequences for species interactions, the functioning of ecosystems, or the assembly of biotic communities. Plant phenology research has gained increasing attention as the timing of periodic events in plants is strongly affected by seasonal and interannual climate variation. Recent technological development allowed us to gather invaluable data at a variety of spatial and ecological scales. The feasibility of phenological monitoring today and in the future depends heavily on developing tools capable of efficiently analyzing these enormous amounts of data. Deep Neural Networks learn representations from data with impressive accuracy and lead to significant breakthroughs in, e.g., image processing. This article is the first systematic literature review aiming to thoroughly analyze all primary studies on deep learning approaches in plant phenology research. In a multi-stage process, we selected 24 peer-reviewed studies published in the last five years (2016-2021). After carefully analyzing these studies, we describe the applied methods categorized according to the studied phenological stages, vegetation type, spatial scale, data acquisition- and deep learning methods. Furthermore, we identify and discuss research trends and highlight promising future directions. We present a systematic overview of previously applied methods on different tasks that can guide this emerging complex research field.
Direct data-driven forecast of local turbulent heat flux in Rayleigh-Bénard convection. - In: Physics of fluids, ISSN 1089-7666, Bd. 34 (2022), 4, 045106, insges. 14 S.
A combined convolutional autoencoder-recurrent neural network machine learning model is presented to directly analyze and forecast the dynamics and low-order statistics of the local convective heat flux field in a two-dimensional turbulent Rayleigh-Bénard convection flow at Prandtl number Pr=7 and Rayleigh number Ra=10^7. Two recurrent neural networks are applied for the temporal advancement of turbulent heat transfer data in the reduced latent data space, an echo state network, and a recurrent gated unit. Thereby, our work exploits the modular combination of three different machine learning algorithms to build a fully data-driven and reduced model for the dynamics of the turbulent heat transfer in a complex thermally driven flow. The convolutional autoencoder with 12 hidden layers is able to reduce the dimensionality of the turbulence data to about 0.2% of their original size. Our results indicate a fairly good accuracy in the first- and second-order statistics of the convective heat flux. The algorithm is also able to reproduce the intermittent plume-mixing dynamics at the upper edges of the thermal boundary layers with some deviations. The same holds for the probability density function of the local convective heat flux with differences in the far tails. Furthermore, we demonstrate the noise resilience of the framework. This suggests that the present model might be applicable as a reduced dynamical model that delivers transport fluxes and their variations to coarse grids of larger-scale computational models, such as global circulation models for atmosphere and ocean.
Graph based mining of code change patterns from version control commits. - In: IEEE transactions on software engineering, ISSN 1939-3520, Bd. 48 (2022), 3, S. 848-863
Detailed knowledge of frequently recurring code changes can be beneficial for a variety of software engineering activities. For example, it is a key step to understand the process of software evolution, but is also necessary when developing more sophisticated code completion features predicting likely changes. Previous attempts on automatically finding such code change patterns were mainly based on frequent itemset mining, which essentially finds sets of edits occurring in close proximity. However, these approaches do not analyze the interplay among code elements, e.g., two code objects being named similarly, and thereby neglect great potential in identifying a number of meaningful patterns. We present a novel method for the automated mining of code change patterns from Git repositories that captures these context relations between individual edits. Our approach relies on a transformation of source code into a graph representation, while keeping relevant relations present. We then apply graph mining techniques to extract frequent subgraphs, which can be used for further analysis of development projects. We suggest multiple usage scenarios for the resulting pattern type. Additionally, we propose a transformation into complex event processing (CEP) rules which allows for easier application, especially for event-based auto-completion recommenders or similar tools. For evaluation, we mined seven open-source code repositories. We present 25 frequent change patterns occurring across these projects. We found these patterns to be meaningful, easy to interpret and mostly persistent across project borders. On average, a pattern from our set appeared in 45 percent of the analyzed code changes.
StickyLocalization: robust end-to-end relocalization on point clouds using graph neural networks. - In: 2022 IEEE Winter Conference on Applications of Computer Vision, (2022), S. 307-316
Relocalization inside pre-built maps provides a big benefit in the course of today’s autonomous driving tasks where the map can be considered as an additional sensor for refining the estimated current pose of the vehicle. Due to potentially large drifts in the initial pose guess as well as maps containing unfiltered dynamic and temporal static objects (e.g. parking cars), traditional methods like ICP tend to fail and show high computation times. We propose a novel and fast relocalization method for accurate pose estimation inside a pre-built map based on 3D point clouds. The method is robust against inaccurate initialization caused by low performance GPS systems and tolerates the presence of unfiltered objects by specifically learning to extract significant features from current scans and adjacent map sections. More specifically, we introduce a novel distance-based matching loss enabling us to simultaneously extract important information from raw point clouds and aggregating inner- and inter-cloud context by utilizing self- and cross-attention inside a Graph Neural Network. We evaluate StickyLocalization’s (SL) performance through an extensive series of experiments using two benchmark datasets in terms of Relocalization on NuScenes and Loop Closing using KITTI’s Odometry dataset. We found that SL outperforms state-of-the art point cloud registration and relocalization methods in terms of transformation errors and runtime.
PRECODE - a generic model extension to prevent deep gradient leakage. - In: 2022 IEEE Winter Conference on Applications of Computer Vision, (2022), S. 3605-3614
Collaborative training of neural networks leverages distributed data by exchanging gradient information between different clients. Although training data entirely resides with the clients, recent work shows that training data can be reconstructed from such exchanged gradient information. To enhance privacy, gradient perturbation techniques have been proposed. However, they come at the cost of reduced model performance, increased convergence time, or increased data demand. In this paper, we introduce PRECODE, a PRivacy EnhanCing mODulE that can be used as generic extension for arbitrary model architectures. We propose a simple yet effective realization of PRECODE using variational modeling. The stochastic sampling induced by variational modeling effectively prevents privacy leakage from gradients and in turn preserves privacy of data owners. We evaluate PRECODE using state of the art gradient inversion attacks on two different model architectures trained on three datasets. In contrast to commonly used defense mechanisms, we find that our proposed modification consistently reduces the attack success rate to 0% while having almost no negative impact on model training and final performance. As a result, PRECODE reveals a promising path towards privacy enhancing model extensions.
Image-based automated recognition of 31 Poaceae species: the most relevant perspectives. - In: Frontiers in plant science, ISSN 1664-462X, Bd. 12 (2022), 804140, S. 1-12
Poaceae represent one of the largest plant families in the world. Many species are of great economic importance as food and forage plants while others represent important weeds in agriculture. Although a large number of studies currently address the question of how plants can be best recognized on images, there is a lack of studies evaluating specific approaches for uniform species groups considered difficult to identify because they lack obvious visual characteristics. Poaceae represent an example of such a species group, especially when they are non-flowering. Here we present the results from an experiment to automatically identify Poaceae species based on images depicting six well-defined perspectives. One perspective shows the inflorescence while the others show vegetative parts of the plant such as the collar region with the ligule, adaxial and abaxial side of the leaf and culm nodes. For each species we collected 80 observations, each representing a series of six images taken with a smartphone camera. We extract feature representations from the images using five different convolutional neural networks (CNN) trained on objects from different domains and classify them using four state-of-the art classification algorithms. We combine these perspectives via score level fusion. In order to evaluate the potential of identifying non-flowering Poaceae we separately compared perspective combinations either comprising inflorescences or not. We find that for a fusion of all six perspectives, using the best combination of feature extraction CNN and classifier, an accuracy of 96.1% can be achieved. Without the inflorescence, the overall accuracy is still as high as 90.3%. In all but one case the perspective conveying the most information about the species (excluding inflorescence) is the ligule in frontal view. Our results show that even species considered very difficult to identify can achieve high accuracies in automatic identification as long as images depicting suitable perspectives are available. We suggest that our approach could be transferred to other difficult-to-distinguish species groups in order to identify the most relevant perspectives.
Propagating frugal user feedback through closeness of code dependencies to improve IR-based traceability recovery. - In: Empirical software engineering, ISSN 1573-7616, Bd. 27 (2022), 2, 41, insges. 53 S.
Traceability recovery captures trace links among different software artifacts (e.g., requirements and code) when two artifacts cover the same part of system functionalities. These trace links provide important support for developers in software maintenance and evolution tasks. Information Retrieval (IR) is now the mainstream technique for semi-automatic approaches to recover candidate trace links based on textual similarities among artifacts. The performance of IR-based traceability recovery is evaluated by the ranking of relevant traces in the generated lists of candidate links. Unfortunately, this performance is greatly hindered by the vocabulary mismatch problem between different software artifacts. To address this issue, a growing body of enhancing strategies based on user feedback is proposed to adjust the calculated IR values of candidate links after the user verifies part of these links. However, the improvement brought by this kind of strategies requires a large amount of user feedback, which could be infeasible in practice. In this paper, we propose to improve IR-based traceability recovery by propagating a small amount of user feedback through the closeness analysis on call and data dependencies in the code. Specifically, our approach first iteratively asks users to verify a small set of candidate links. The collected frugal feedback is then composed with the quantified functional similarity for each code dependency (called closeness) and the generated IR values to improve the ranking of unverified links. An empirical evaluation based on nine real-world systems with three mainstream IR models shows that our approach can outperform five baseline approaches by using only a small amount of user feedback.
Deep security analysis of program code : a systematic literature review. - In: Empirical software engineering, ISSN 1573-7616, Bd. 27 (2022), 1, 2, insges. 39 S.
Due to the continuous digitalization of our society, distributed and web-based applications become omnipresent and making them more secure gains paramount relevance. Deep learning (DL) and its representation learning approach are increasingly been proposed for program code analysis potentially providing a powerful means in making software systems less vulnerable. This systematic literature review (SLR) is aiming for a thorough analysis and comparison of 32 primary studies on DL-based vulnerability analysis of program code. We found a rich variety of proposed analysis approaches, code embeddings and network topologies. We discuss these techniques and alternatives in detail. By compiling commonalities and differences in the approaches, we identify the current state of research in this area and discuss future directions. We also provide an overview of publicly available datasets in order to foster a stronger benchmarking of approaches. This SLR provides an overview and starting point for researchers interested in deep vulnerability analysis on program code.
Synaptic scaling - an artificial neural network regularization inspired by nature. - In: IEEE transactions on neural networks and learning systems, ISSN 2162-237X, Bd. 33 (2022), 7, S. 3094-3108
Nature has always inspired the human spirit and scientists frequently developed new methods based on observations from nature. Recent advances in imaging and sensing technology allow fascinating insights into biological neural processes. With the objective of finding new strategies to enhance the learning capabilities of neural networks, we focus on a phenomenon that is closely related to learning tasks and neural stability in biological neural networks, called homeostatic plasticity. Among the theories that have been developed to describe homeostatic plasticity, synaptic scaling has been found to be the most mature and applicable. We systematically discuss previous studies on the synaptic scaling theory and how they could be applied to artificial neural networks. Therefore, we utilize information theory to analytically evaluate how mutual information is affected by synaptic scaling. Based on these analytic findings, we propose two flavors in which synaptic scaling can be applied in the training process of simple and complex, feedforward, and recurrent neural networks. We compare our approach with state-of-the-art regularization techniques on standard benchmarks. We found that the proposed method yields the lowest error in both regression and classification tasks compared to previous regularization approaches in our experiments across a wide range of network feedforward and recurrent topologies and data sets.
Multi-task near-field perception for autonomous driving using surround-view fisheye cameras. - Ilmenau : Universitätsbibliothek, 2021. - 1 Online-Ressource (xxv, 219 Seiten)
Technische Universität Ilmenau, Dissertation 2021
Literaturverzeichnis: Seite 183-219
Die Bildung der Augen führte zum Urknall der Evolution. Die Dynamik änderte sich von einem primitiven Organismus, der auf den Kontakt mit der Nahrung wartete, zu einem Organismus, der durch visuelle Sensoren gesucht wurde. Das menschliche Auge ist eine der raffiniertesten Entwicklungen der Evolution, aber es hat immer noch Mängel. Der Mensch hat über Millionen von Jahren einen biologischen Wahrnehmungsalgorithmus entwickelt, der in der Lage ist, Autos zu fahren, Maschinen zu bedienen, Flugzeuge zu steuern und Schiffe zu navigieren. Die Automatisierung dieser Fähigkeiten für Computer ist entscheidend für verschiedene Anwendungen, darunter selbstfahrende Autos, Augmented Realität und architektonische Vermessung. Die visuelle Nahfeldwahrnehmung im Kontext von selbstfahrenden Autos kann die Umgebung in einem Bereich von 0-10 Metern und 360˚ Abdeckung um das Fahrzeug herum wahrnehmen. Sie ist eine entscheidende Entscheidungskomponente bei der Entwicklung eines sichereren automatisierten Fahrens. Jüngste Fortschritte im Bereich Computer Vision und Deep Learning in Verbindung mit hochwertigen Sensoren wie Kameras und LiDARs haben ausgereifte Lösungen für die visuelle Wahrnehmung hervorgebracht. Bisher stand die Fernfeldwahrnehmung im Vordergrund. Ein weiteres wichtiges Problem ist die begrenzte Rechenleistung, die für die Entwicklung von Echtzeit-Anwendungen zur Verfügung steht. Aufgrund dieses Engpasses kommt es häufig zu einem Kompromiss zwischen Leistung und Laufzeiteffizienz. Wir konzentrieren uns auf die folgenden Themen, um diese anzugehen: 1) Entwicklung von Nahfeld-Wahrnehmungsalgorithmen mit hoher Leistung und geringer Rechenkomplexität für verschiedene visuelle Wahrnehmungsaufgaben wie geometrische und semantische Aufgaben unter Verwendung von faltbaren neuronalen Netzen. 2) Verwendung von Multi-Task-Learning zur Überwindung von Rechenengpässen durch die gemeinsame Nutzung von initialen Faltungsschichten zwischen den Aufgaben und die Entwicklung von Optimierungsstrategien, die die Aufgaben ausbalancieren.