http://www.tu-ilmenau.de

Logo TU Ilmenau


Databases and Information Systems Group


headerphoto Databases and Information Systems Group
Ansprechpartner

Prof. Dr. Kai-Uwe Sattler

Head

Telefon +49 3677 69 4577

E-Mail senden

INHALTE

Bachelor- und Masterarbeiten am Fachgebiet DBIS

Wir bieten ständig Themen für Bachelor-, Master, Studien- und Diplomarbeiten an. Die Aufgabenstellungen sind in der Regel eingebunden in Forschungsarbeiten des Fachgebiets, stammen aber auch aus Kooperationen mit Unternehmen wie IBM, SAP sowie Thüringer IT-Unternehmen. 

Wir erwarten:

  • sehr gute Vorkenntnisse im Datenbankbereich (etwa durch die erfolgreiche Teilnahme an den Vertiefungsveranstaltungen des Fachgebiets),
  • gute Englischkenntnisse (die Arbeit selbst kann natürlich in n deutscher Sprache angefertigt werden, aber die Fachliteratur ist komplett englisch),
  • die Bereitschaft, sich überdurchschnittlich bei der Bearbeitung zu engagieren,
  • die Fähigkeit zur Teamarbeit.

Wir bieten:

  • interessante und anspruchsvolle Themen, die nicht selten zu wissenschaftlichen Veröffentlichungen auf internationalen Veranstaltungen führen,
  • eine intensive und kompetente Betreuung: Neben der individuellen Konsultation bieten wir ein wöchentliches Oberseminar an, dass allen Studierenden die Möglichkeit der Präsentation und Diskussion ihrer Probleme und Ergebnisse gibt.

Ablauf

Sie interessieren sich für ein Thema? 

  1. Kontaktieren Sie uns für weitergehende Informationen.
  2. Sie erstellen ein kurzes Exposé bestehend aus einer kurzen Problembeschreibung, einer Lösungsskizze, eventuellen Evaluierungsszenarien und - ganz wichtig - einem Arbeitsplan.
  3. Auf der Basis dieses Exposés entscheiden wir über die Vergabe und Anmeldung des Themas.
  4. Die Bearbeitung beginnt mit einem kurzen Vorstellungsvortrag im Rahmen unseres Oberseminars.

Aktuelle Themenangebote

Die folgenden Themen sind gegenwärtig verfügbar und können sowohl in Deutsch oder Englisch bearbeitet werden.

Letzte Änderung: 14.06.2017
Topic

Adapting Compression in SAP HANA for Effective Pruning

Almost every DBMS relies on index structures to speed up query processing. Typically, various flavors of indexes are available and can be used, depending on the query workload to be supported. DBMS often rely on structures such as hash-, B-Tree-, and bitmap-based indexes, to name only a few. Given its specific storage layout with columnar storage and dictionary compression, the SAP HANA database currently relies on only two main index structures: Inverted lists are used to quickly identify tuples that match a particular literal, while B-trees are used for quick (ordered) access to values that are stored in unsorted dictionaries.
In previous research we have integrated so-called small materialized aggregates (SMA) into SAP HANA. It turns out that the aggressive compression used in SAP HANA is adverse to the ability to prune blocks of data that satisfy a predicate. In this project, we want to

  • Investigate how the compression ratio in SAP HANA can be traded for better pruning abilities using SMA
  • Enhance the implementation of SAP HANA's compression scheme to consider also block-level pruning

  • Analyze real workloads with regards to the resulting compression ratio and performance impact of pruning using SMA


Requirements: experience with C++ and Python; fluency of English or German

GPU-based Data Stream Processing using OpenCL

The task of this project is to design and implement operators for data stream processing which exploit the power of GPU via OpenCL for parallel data processing.

Requirements: knowledge of data structures and query processing, GPU processing, C++ programming skills 

A Test Framework for Semantics of Data Stream Processing Platforms

The goal of this project is to design and develop a test framework for a basic set of data stream operators which allows to test, compare, and evaluate the semantics of different platforms such as Spark, Storm, and PipeFlow.

Requirements: knowledge on data data stream processing, C++ and or Scala programming skills

Graphoperations for dataflow programs

Data flow languages like Pig Latin allow to easily formulate programs that process large amounts of data. Pig's relational model and operators however, make it hard to formulate programs that have to process graph data. The goal of this topic is to implement graph operations as operators and appropriate underlying data structures (e.g., matrices) into our Piglet system, based on previous work. 

Requirements: Scala programming skills, understanding of Big Data frameworks (Spark, Flink), knowledge of graph operations

Linear Algebra Operations for Data Stream Processing

The goal of this topic is to integrate matrices as first-class citizens (data types + operations) into a data stream processing framework (PipeFabric or Piglet). Working on this topic requires to design, implement, and evaluate a basic set of data structures and processing operators.

Requirements: Basic knowledge of database / data stream systems, linear algebra, C++ or Scala skills

Dataflow Algorithms for Processing Raster Data with High Resolution (Use Case)

For the investigation of so called wafers with integrated circuits the TU Ilmenau uses a high resolution camera (nanopositioner) to create depth images of the surface. The data produced thereby is much too large to fit in memory and, thus, needs to be processed instantly and distributed. For this purpose, modern data stream processing engines such as Spark, Flink or our internal project PipeFabric are available. The goal of this topic is to find and evaluate various iterative approaches for processing such high-resolution raster data, based on previous work. 

Requirements: Basic knowledge of database / data stream systems, C++ or Scala Skills

Physical Algebra Operators for Processing Big Matrices

To process big data structures such as sensor matrices it is often useful to divide the data into multiple smaller instances beforehand. Thus, one can scatter the resulting partitions, run subqueries in parallel on them and finally gather/merge the partial results to efficiently get along with large tuples. The goal of this topic is to develop such a processing model together with suitable partition methods and integrate them into our internal project PipeFabric.

Requirements: Basic knowledge of database / data stream systems, Big Data processing models, linear algebra, C++ Skills

Investigation and Evaluation of Relational Operators on a Data Stream Processing Framework

Different relational operators are needed to process tuples from data streams. Especially join operations, used to combine tuples between two or more different data streams, are fundamental to achieve information that is otherwise impossible to get (with regarding the streams separately). Goal of this topic is to investigate the most common and used relational operators on stream processing (with focus on Joins), analyzing them with respect to opportunities given by modern hardware. Implementing and benchmarking chosen ones on a multicore and manycore processor within our internal stream processing framework PipeFabric should prove the results.

Requirements: Basic knowledge of database / data stream systems, C++ skills (understanding templates, pointers and runtime-aware implementations)

Visualization of Big Spatio-Temporal Data

Spatial and spatio-temporal data is ubiquitous: smart phones and other GPS enables devices track positions and movements, sensors monitor traffic or environmental parameters, and news articles report about events that happen at some location and in some point in time. For visualization, interactive map providers like OpenStreetMap or Google Maps can be used. However, such spatial or spatio-temporal dataset often contain a large amount of data points, which the browser-based  (JavaScript) map engines cannot handle anymore. The goal of this task is to develop a scalable image generator that is capable of rendering the content of large spatial or spatio-temporal data sets in the context of our STARK framework. References for such images can be found in the GeoMesa or GeoWave projects.

Requirements: Scala, Java, basic understanding of Spark / Hadoop

Lockfree Data Structures for Data Stream Processing

High throughput and low latency are key requirements for data stream processing. This is mainly realized by parallelization of operators and data inside a query. When multiple threads run in parallel, exchanging data between them, an efficient exchange mechanism is needed or performance will decrease significantly. Goal of this topic is to investigate state of art lockfree data structures with own implementations, benchmarking throughput and latency on example queries. Results are compared against an existing implementation of a data structure using locks for data exchange within our internal stream processing framework PipeFabric.

Requirements: Basic knowledge of database / data stream systems, C++ skills

Laufende und abgeschlossene Arbeiten

Auf dieser Seite finden Sie eine Liste der laufenden und abgeschlossenen Arbeiten der letzten Jahre.

Latex-Vorlagen

Hier erhalten sie Vorlagen für Bachelor- und Masterarbeiten, die den Einstieg in die Dokumenterstellung mit LaTeX erleichtern sollen. Dies sind keine verbindlichen Vorlagen, sondern sollen viel mehr eine Richtlinie für in LaTeX unerfahrene Studierende bieten.

  • Vorlage für Bachelor-, Master- und Diplomarbeiten
  • Vorlage für Hauptseminarausarbeitungen