Research
Exploiting Modern Hardware for Database Processing
Since the advent of the first relational database systems in the 1970s, the hardware landscape changed significantly. Fast solid state disks (SSDs) start replacing magnetic disks - leading to considerably faster I/O rates, dropping RAM prices allow storing entire databases completely in main memory, multi-core CPUs offer possibilities for small-scale parallel processing, and new processing models as OpenCL or CUDA ease the development of massively parallel algorithms for specialized coprocessors as graphic processing units (GPUs) or field programmable gate arrays (FPGAs).
These changes pose new challenges to developers of database management systems for maximizing performance. First, exploiting the processing power of all available (co)processors requires that database operations are divided into fine-granular independent subtasks. Second, executing an algorithm may involve significant overheads for data transfers to the executing processing unit, e.g., loading data into CPU caches or explicit transfers to GPU devices. Since these overheads can quickly devour any performance gain, it becomes important to know where data actually resides and sophisticated cost models have to be found that consider all hardware-dependent parameters.
The research of our group focuses on hybrid CPU/GPU systems that try to exploit the strengths of each processing unit to fully exploit the entire processing power with wise scheduling decisions.
Further Readings
Data Stream Analytics & Sensor Data Processing
Aspects of Data Mining, like clustering, finding frequent itemsets and building decision trees, are classical questions of database research and are used in a wide variety of applications, like systems for intrusion detection and traffic observation. In combination with data streams these problems present new challenges and demand for modified solutions. Data streams are characterized by their (quasi-)infiniteness, the resulting size of the data sets and their variing arrival rate. Main problems result from the limited ressources available and the real-time requirements in parallel. Answering question of Data Mining on streams we usually have a look on only a small portion of the whole data at one time and have to process it in one pass, which means each element may only be seen once. First approaches are based on window techniques, which allow for processing a selected part of the whole stream. The focus of this project lies on further developing these approaches as well as implementing new strategies and beeing able to evaluate their quality.
Further Readings
- Data stream engine AnduIN
- Analysis for facility Management Customer Bautronic System (CBS)
Funded by the BMBF under grant 03WKBD2B (2007-2010)
Database Services & Cloud Data Management
During the last years, big companies like Amazon, Google, or Microsoft advanced the ascension of Cloud-Computing. Today, a variety of different Cloud-Services are available on the market, ranging from infrastructure over storage up to software as a service. Although, basic technologies for building Cloud-Services are well known from research in the field of distributed database systems and P2P systems, new challenges arise due to the goal of Cloud-Computing – giving an unlimited number of customers the impression of unlimited resources for a reasonable amount of money. One big issues is making Cloud-Data Management as well as Cloud-Services scalable and highly available. However, this usually leads to less consistency, which requires advanced techniques for enabling tailor-made transaction processing. Giving away sensitive data also arises questions concerning data privacy. Last but not least, resource planning and Service provisioning are very important, since Cloud computing is not only technology but also business.
Research in our group mainly focusses on product configuration in cloud environments. Challenges here are developing techniques for i) guaranteeing scalable data management and query processing, ii) allowing multiple tenants, and iii) providing transactional guarantees. PC2
Self Tuning in Database Systems
The aim of this work is to reduce the effort for administrating database systems by performing typical DBA tasks automatically. As a specific task we address the problem of automatic index selection and building. Designed as an extension of so-called Index Wizards, which are available in modern DBMS, we try to identify potential beneficial indexes by analyzing running queries and build them on the fly. A first result is the QUIET system which does this on top of IBM DB2.
Funded Projects
Folksonomy based text mining
(2010 - 2012, Funded by the TAB under grant 2010FE9007)
The intelligent management of documents is one of the main challenges of modern information systems.
Due to huge numbers of documents and different kinds of data, finding relevant and interesting content is often a time consuming task. This problem is exacerbated in large environments like universities or enterprise business concerns. The main target of the TOPIEK-project is the development of a collaborative document management system based on user defined folksonomies. TOPIEK is a cooperation between the NT.AG (Erfurt) and the DBIS group.
Read more on TOPIEK
Product Configuration in the Cloud (PC2)
(2011 - 2013, Funded by the TAB under grant 2011 FE 9005)
Mass customization as an innovative production concept requires flexible configuration of products. The diversified distribution of necessary product data, e.g., descriptions, prices, or pictures, is a complex process. At the same time, product life cycles are getting shorter. Software solutions for product configuration usually comprise a database for product components, a system of rules, and an interactive tool for defining configurations. Several problems arise with respect to this approach. A promising solution to these problems is the implementation of a configurator as a Web-based service that can be connected to several databases. Hence, the overall goal of this project is the development of a Cloud-based software solution for collaboratively creating, managing, and processing electronic data for modeling configurable products.
Read more on PC2
Self-Organized Information Management for Mobile Communication in Disaster Scenarios
(2009-2012, funded by DFG as part of the International Graduate School on Mobile Communications)
In disaster scenarios, the quantity of existing data increases quickly as the recovery organization progresses, as does the number of computers one could expect to be offering computational support. A quick, self-organizing, scalable, and well balanced solution to data management is the distribution of data among these computers. However, the wireless links, power restrictions, high failure rates, and network partitioning expected in disaster scenarios present additional challenges, and further developments are necessary to maximize robustness while conserving network resources. Current approaches to conserving network resources are limited to reducing message latency – at the cost of scalability and load balancing – and largely fail to prepare the system for possible partitioning and conserve network resources. This project aims to overcome these weaknesses by designing network overlays for highly dynamic systems with consideration to additional parameters such as location, node reliability, and power availability and developing corresponding replication protocols which place and maintain data in strategic positions throughout the network.
Finished Projects
Cooperative Transaction Management for XML
(2008-2010, Funded by DFG under research grant SA 782/15)
The requirements of the media production process, as a current non-standard application of database technologies, can only be complied by classical relational database systems in a limited way. In addition to the necessity of a flexible but transactional management of hierarchical (scene) structures, the support of a cooperative working process is required in wide fields of possible system and distribution scenarios, ranging from small work group solutions to world wide acting production scenarios. The goal of our work is, starting from a concrete media process - the sound production for movies - and the current state of transaction management technologies, to develop technologies for implementing cooperative transaction models. Therefore, appropriate architecture variants are to be considered and concrete synchronization and change propagation concepts, which meet the speci
al requirements, have to be developed and evaluated.
Information Integration and Analysis for Facility Management
(2007-2010, funded by BMBF under grant 03WKBD2B)
Business-Intelligence (BI) solutions are a popular approach to analyzing business data and optimizing the business processes. Depending on this, we use BI techniques to detect optimization potential in facility management with respect to occupants' preferences (e.g. temperature, humidity, power consumption, costs, etc.) and building characteristics (e.g. heating type, water and power accommodation).
Among the DBIS group's main tasks are "information integration and -analysis". As part of this focus, we develop algorithms to detect correlations between multiple data sources using existing and novel data mining approaches. The knowledge management system then reoptimizes business processes based on mining results.
Read more on CBS
Query Processing in P2P Systems
Recently, the peer-to-peer (P2P) paradigm has emerged, mainly by file sharing systems such as Napster and Gnutella and in terms of scalable distributed data structures. The strength of these systems is their decentralization, which promises improved robustness and scalability and therefore opens new views on very large distributed databases and information systems. Our research in this sector is covered by two large projects. The SmurfPDMS projects deals with schema-based P2P systems, which we mainly use for data integration approaches. Here we mainly address aspects of efficient distributed query processing. In order to evaluate and experiment different approaches we implemented an environment for simulation and application purposes and are permanently developing it further. The second project, UniStore, is based on structured P2P networks, such as CANs (Content Addressable Networks) and similar systems, which basically are implementations of distributed hash tables. Here, again, the focus lies on processing queries efficiently and corresponding aspects like distributing the underlying data and implementing needed operators
Further Readings







