Logo TU Ilmenau


Prof. Dr. Kai-Uwe Sattler


Telefon +49 3677 69 4577

E-Mail senden



Scalable Stream Processing

Nowadays, data is produced and collected everywhere in our daily life leading to a massive amount of information generated every second. However, a significant portion of such information is only useful or valid for a certain time or simply too large to be stored for later processing. Batch processing platforms like Hadoop MapReduce do not fit these needs of incremental processing of continuous data streams. Therefore, modern big data processing engines combine the scalability of distributed architectures with the one-pass semantics of traditional stream engines.
In our work, we focus on methods and techniques for scalable stream processing
•    with low latency be exploiting modern hardware technology
•    as well as for supporting the specification and implementation of complex analysis pipelines by integrating and optimizing user-defined code, customizable rewriting, as well as code generation for multiple platforms.
For the purpose of prototyping and evaluation of such techniques we are developing a new stream processing engine called PipeFabric.

Data Management for Modern Hardware

With technological advancement, the view on classical data management systems and concepts becomes more and more outdated. Upcoming trends like Industry 4.0, Internet of Things or eSciences lead to high requirements that conventional data management approaches cannot fulfill anymore. Hence, the challenge is to deal with these requirements and to integrate for instance novel data models and adaptable consistency guarantees, reducing performance and flexibility, for such domains. On top of that, modern sensors deliver huge amounts of data in short times which has to be processed in real time on a massively scalable architecture. On the hardware side, upcoming storage novelties like NVRAM or shifting processor paradigms like manycore CPUs provide new opportunities to handle these trends.

The research of our group addresses these opportunities given by modern hardware. On the one hand, manycore CPUs like the Xeon Phi series from Intel sacrifice clockspeed for massive parallelism. The latest Xeon Phi Knights Landing uses up to 72 cores, supporting four threads each. With AVX-512 instruction set, the preliminary register width is doubled, allowing much higher SIMD effects. High bandwidth on-chip memory (MCDRAM) with transfer rates of 320GB/s opens up outright new possibilities for data management.

On the other hand, with the rise of novel storage technologies such as NVRAM and SSD the gap between RAM and disk shall be immensely reduced. Especially in the case of NVRAM, byte-addressability and direct persistence allow, e.g., faster recovery and concurrency control mechanisms to meet the requirements of a transactional stream processing system. However, one has to consider how to integrate these new technologies into the conventional memory stack and design suitable data structures.

In this research field our group pursues to optimally exploit modern hardware for data processing and to deal with the following challenges:

  • evaluation of modern data management architectures
  • flexible and scalable data processing
  • providing transactional guarantees

Big Spatio-Temporal Event Data Processing

Spatio-temporal data is ubiquitous: it is generated by sensors and mobile phones that record data at the current time and location or it is inherently contained in news articles reporting about events that happen at some time and location.
Processing such spatio-temporal event data means to apply filters, join one dataset with another using spatio-temporal predicates, or to run data mining algorithms to gain new insight for decision support.

The datasets containing such data can quickly grow into TB or PB ranges that cannot be handled easily by one server. The group research methods for efficiently process such spatio-temporal operations in the Big Data context, which includes:

  • partitioning for parallel processing,
  • spatio-temporal indexing,
  • clustering and skyline computation,
  • as well as filters and joins.

In the course of the EventAE project the STARK library for spatio-temporal data processing on Apache Spark is being developed.

Knowledge Engineering and Data Semantics

The increase of the amount of raw data – structured or unstructured – available on the web or locally especially in the wake of the internet of things and further digitization initiatives is immense. The use of this data to obtain insights about specific domains or topics requires several data processing algorithms as well as data storage techniques.
The Semantic Web aims at fact-based storage of information as RDF(S) triples enriched with deduction rules and ontology-based knowledge. The transformation of raw data to semantically enriched data is achieved by various processing techniques:

  • Natural Language Processing
  • Semantic Analysis/Annotation
  • Knowledge Base Generation
  • Triplification of raw data

Based on enriched information semantic search, recommendation or question answering systems can be applied.
The research group focusses on the basic techniques to enrich data, but also the application of user-frontends to provide recommendations – for example within the domain of educational data – or process user questions and provide relevant answers – for example within the domain of geo-spatial data (cf. project EventAE) or software projects (cf. project AA).