Our research is focused on two goals: scalability and performance of data management systems and supporting big data analytics.
With the ever-increasing amount of data available in enterprises and on the web, produced by devices and sensors, scalability in terms of data volume, the number of server nodes, and the number of users has become a significant challenge. In our research, we address this issue by developing new data management solutions by leveraging modern hardware technologies such as new storage technologies (persistent memory, solid state disks), cluster technologies, and special processors such as GPUs or FPGAs. We are also working on approaches to simplify the management of database systems by automating system management tasks such as physical design.
A second challenge caused by the huge amount of available data is how to deal with an information overload, either by finding relevant information or by combining the data to extract essential information. In this context, our research focuses on methods for processing and analyzing data that are continuously produced by sensors, data with spatio-temporal properties, and textual and graph data.
Data Management for Modern Hardware
Over the last years, the social and commercial relevance of efficient data management has led to the development of database systems as foundation of almost all complex software systems. Hence there is a wide acceptance of architectural patterns for database systems which are based on assumptions on classic hardware setups. However, the currently used database concepts and systems are not well prepared to support emerging application domains, but require a rethinking of architectures to utilize current and future hardware trends. In our research, we address these challenges by developing data management techniques exploiting particularly modern memory and storage technology such as persistent memory, manycore CPUs as well as special-purpose computing units such as GPUs and FPGAs. Our work is part of the DFG priority program SPP2037.
Big Spatio-Temporal Event Data Processing
Spatio-temporal data is ubiquitous: it is generated by sensors and mobile phones that record data at the current time and location or it is inherently contained in news articles reporting about events that happen at some time and location.
Processing such spatio-temporal event data means to apply filters, join one dataset with another using spatio-temporal predicates, or to run data mining algorithms to gain new insight for decision support.
The datasets containing such data can quickly grow into TB or PB ranges that cannot be handled easily by one server. We work on methods for efficiently process such spatio-temporal operations in the Big Data context, which includes:
- partitioning for parallel processing,
- spatio-temporal indexing,
- clustering and skyline computation,
- filters and joins,
- as well as data stream processing.
In the course of the EventAE project the STARK library for spatio-temporal data processing on Apache Spark is being developed.
Knowledge Engineering and Data Semantics
The increase of the amount of raw data – structured or unstructured – available on the web or locally especially in the wake of the internet of things and further digitization initiatives is immense. The use of this data to obtain insights about specific domains or topics requires several data processing algorithms as well as data storage techniques.
The Semantic Web aims at fact-based storage of information as RDF(S) triples enriched with deduction rules and ontology-based knowledge. The transformation of raw data to semantically enriched data is achieved by various processing techniques:
- Natural Language Processing
- Semantic Analysis/Annotation
- Knowledge Base Generation
- Triplification of raw data
Based on enriched information semantic search, recommendation or question answering systems can be applied.
Our research focuses on the basic techniques to enrich data, but also the application of user-frontends to provide recommendations – for example within the domain of educational data – or process user questions and provide relevant answers – for example within the domain of geo-spatial data (cf. project EventAE) or software projects (cf. project AA).