Logo TU Ilmenau


Prof. Heidi Krömker


Telefon +49 3677 69-2890

E-Mail senden


Themen für Abschluss- und Projektarbeiten


Fachgebiet(e): IDMT

Thema Nr.: 2019/27, erstellt am: 18.07.2019

Realtime drum transcription for mobile applications

In the group Semantic Music Technologies at the Fraunhofer IDMT, one of the main research focus is on innovative technologies from music information retrieval (MIR) that allow to extract semantic features such as key, instruments, and tempo from music recordings. These technologies can also be applied to music education applications which help amateur musicians to improve their musical skills with realtime performance feedback.

A special use-case is the automatic transcription of drums for interactive learning applications. The focus of this thesis lies on automatically analyzing mobile device recordings of drum sets which comprise snare drum, hi-hat, and kick drum in realtime. These transcription results can be used for giving users low-latency feedback on how to learn basic drum patterns. Current systems based on spectral decomposition either lack accuracy or are computationally too expensive for mobile devices. Furthermore, most systems currently require an initial adaptation of the algorithms to(wards) the exact drum kit and microphone. In this thesis, the goal is to apply state-of-the-art deep learning methods to transcribe audio recordings of different drum kits, which were made using different mobile devices. In the beginning of the thesis the student shall perform a thorough state-of-the-art review and select the most promising approaches for this use-case. Existing mobile drum recording datasets at Fraunhofer IDMT that contain aligned note annotations for the abovementioned three drum instruments shall be reviewed and possibly extended by other publicly available datasets for the purpose of training and evaluation. Using this dataset, at least one promising approach should be evaluated as a baseline system. In the second part, different deep neural network architectures should be trained and tested in an end-to-end fashion, taking short blocks of recorded audio samples as input and outputting sample-aligned drum transcriptions. This evaluation includes several layer types (convolutional vs. recurrent) as well as target output representations (instrument activity curves vs. binary classification). Desired skills include Python programming, audio signal processing, and deep learning.

verantwortlicher Hochschullehrer:

Karlheinz Brandenburg


Sascha Grollmisch


Sascha Grollmisch
Jakob Abeßer

zurück zur Liste