Anzahl der Treffer: 285
Erstellt: Fri, 21 Jan 2022 23:20:42 +0100 in 0.0990 sec

Steinmetz, Nadine; Sattler, Kai-Uwe;
What is in the KGQA benchmark datasets? Survey on challenges in datasets for question answering on knowledge graphs. - In: Journal on data semantics, ISSN 1861-2040, Bd. 10 (2021), 3/4, S. 241-265

Question Answering based on Knowledge Graphs (KGQA) still faces difficult challenges when transforming natural language (NL) to SPARQL queries. Simple questions only referring to one triple are answerable by most QA systems, but more complex questions requiring complex queries containing subqueries or several functions are still a tough challenge within this field of research. Evaluation results of QA systems therefore also might depend on the benchmark dataset the system has been tested on. For the purpose to give an overview and reveal specific characteristics, we examined currently available KGQA datasets regarding several challenging aspects. This paper presents a detailed look into the datasets and compares them in terms of challenges a KGQA system is facing.
Hagedorn, Stefan; Kläbe, Stefan; Sattler, Kai-Uwe;
Conquering a Panda's weaker self - fighting laziness with laziness : demo paper. - In: Advances in Database Technology - EDBT 2021, (2021), S. 670-673
Jibril, Muhammad Attahir; Baumstark, Alexander; Götze, Philipp; Sattler, Kai-Uwe;
JIT happens: transactional graph processing in persistent memory meets just-in-time compilation. - In: Advances in Database Technology - EDBT 2021, (2021), S. 37-48
Kläbe, Steffen; Sattler, Kai-Uwe; Baumann, Stephan;
PatchIndex: exploiting approximate constraints in distributed databases. - In: Distributed and parallel databases, ISSN 1573-7578, Bd. 39 (2021), 3, S. 833-853

Cloud data warehouse systems lower the barrier to access data analytics. These applications often lack a database administrator and integrate data from various sources, potentially leading to data not satisfying strict constraints. Automatic schema optimization in self-managing databases is difficult in these environments without prior data cleaning steps. In this paper, we focus on constraint discovery as a subtask of schema optimization. Perfect constraints might not exist in these unclean datasets due to a small set of values violating the constraints. Therefore, we introduce the concept of a generic PatchIndex structure, which handles exceptions to given constraints and enables database systems to define these approximate constraints. We apply the concept to the environment of distributed databases, providing parallel index creation approaches and optimization techniques for parallel queries using PatchIndexes. Furthermore, we describe heuristics for automatic discovery of PatchIndex candidate columns and prove the performance benefit of using PatchIndexes in our evaluation.
Lasch, Robert; Schulze, Robert; Legler, Thomas; Sattler, Kai-Uwe;
Workload-driven placement of column-store data structures on DRAM and NVM. - In: DAMON '21: proceedings of the 17th International Workshop on Data Management on New Hardware (DaMoN 2021), (2021), 5, insges. 8 S.

Non-volatile memory (NVM) offers lower costs per capacity and higher total capacities than DRAM. However, NVM cannot simply be used as a drop-in replacement for DRAM in database management systems due to its different performance characteristics. We thus investigate the placement of column-store data structures in a hybrid hierarchy of DRAM and NVM, with the goal of placing as much data as possible in NVM without compromising performance. After analyzing how different memory access patterns affect query runtimes when columns are placed in NVM, we propose a heuristic that leverages lightweight access counters to suggest which structures should be placed in DRAM and which in NVM. Our evaluation using TPC-H shows that more than 80% of the data touched by queries can be placed in NVM with almost no slowdown, while naively placing all data in NVM would increase runtime by 53%.
Baumstark, Alexander; Jibril, Muhammad Attahir; Götze, Philipp; Sattler, Kai-Uwe;
Instant graph query recovery on persistent memory. - In: DAMON '21: proceedings of the 17th International Workshop on Data Management on New Hardware (DaMoN 2021), (2021), 10, insges. 4 S.

Persistent memory (PMem) - also known as non-volatile memory (NVM) - offers new opportunities not only for the design of data structures and system architectures but also for failure recovery in databases. However, instant recovery can mean not only to bring the system up as fast as possible but also to continue long-running queries which have been interrupted by a system failure. In this work, we discuss how PMem can be utilized to implement query recovery for analytical graph queries. Furthermore, we investigate the trade-off between the overhead of managing the query state in PMem at query runtime as well as the recovery and restart costs.
Kläbe, Steffen; Hagedorn, Stefan;
When bears get machine support: applying machine learning models to scalable DataFrames with Grizzly. - In: Datenbanksysteme für Business, Technologie und Web (BTW 2021), (2021), S. 195-214

The popular Python Pandas framework provides an easy-to-use DataFrame API that enables a broad range of users to analyze their data. However, Pandas faces severe scalability issues in terms of runtime and memory consumption, limiting the usability of the framework. In this paper we present Grizzly, a replacement for Python Pandas. Instead of bringing data to the operators like Pandas, Grizzly ships program complexity to database systems by transpiling the DataFrame API to SQL code. Additionally, Grizzly offers user-friendly support for combining different data sources, user-defined functions, and applying Machine Learning models directly inside the database system. Our evaluation shows that Grizzly significantly outperforms Pandas as well as state-of-the-art frameworks for distributed Python processing in several use cases.

Kläbe, Steffen; Sattler, Kai-Uwe; Baumann, Stephan;
Updatable materialization of approximate constraints. - In: 2021 IEEE 37th International Conference on Data Engineering, (2021), S. 1991-1996

Modern big data applications integrate data from various sources. As a result, these datasets may not satisfy perfect constraints, leading to sparse schema information and non-optimal query performance. The existing approach of PatchIndexes enable the definition of approximate constraints and improve query performance by exploiting the materialized constraint information. As real world data warehouse workloads are often not limited to read-only queries, we enhance the PatchIndex structure towards an update-conscious design in this paper. Therefore, we present a sharded bitmap as the underlying data structure which offers efficient update operations, and describe approaches to maintain approximate constraints under updates, avoiding index recomputations and full table scans. In our evaluation, we prove that PatchIndexes provide more lightweight update support than traditional materialization approaches.
Keim, Daniel; Sattler, Kai-Uwe;
Von Daten zu Künstlicher Intelligenz - Datenmanagement als Basis für erfolgreiche KI-Anwendungen. - In: Digitale Welt, ISSN 2569-1996, Bd. 5 (2021), 3, S. 75-79
Baumstark, Alexander; Jibril, Muhammad Attahir; Sattler, Kai-Uwe;
Adaptive query compilation in graph databases. - In: 2021 IEEE 37th International Conference on Data Engineering workshops, (2021), S. 112-119

Compiling database queries into compact and efficient machine code has proven to be a great technique to improve query performance and to exploit characteristics of modern hardware. Furthermore, compilation frameworks like LLVM provide powerful optimization techniques and support different backends. However, the time for generating machine code becomes an issue for short-running queries or queries which could produce early results quickly. In this work, we present an adaptive approach integrating graph query interpretation and compilation. While query compilation and code generation are running in the background, the query execution starts using the interpreter. As soon as the code generation is finished, the execution switches to the compiled code. Our evaluation shows that autonomously switching execution modes helps to hide compilation times.