Variational Autoencoder (VAE) for Musical Instruments

Project description

Deep learning has some big successes generating artificial images, for instance of faces, using "Generative Adverserial Networks" or "Variational Autoencoders" (VAE), so-called "Deep Fakes".
A simple example program for handwritten images can be found here: https://github.com/kvfrans/variational-autoencoder


The goal of this project is to use the VAE Network to generate musical instrument sounds, similar to "Nsynth". https://magenta.tensorflow.org/nsynth-instrument.
For this, the VAE is trained on musical instrument sounds from the IDMT Musical Instruments Database. A suitable training set has to be chosen and tested, and a good set of hyper-parameters (the dimensionality of the network) has to be found. Then a psycho-acoustic similarity measure based on our psycho-acoustic pre- and post-filters has the be used and tested for the VAE network (for the "generation loss"), and compared to other similarity measures.

Supervision: Prof. Dr.-Ing. G. Schuller

Backback