Variational Autoencoder (VAE) for Musical Instruments

Project description

Deep learning has some big successes generating artificial images, for instance of faces, using "Generative Adverserial Networks" or "Variational Autoencoders" (VAE), so-called "Deep Fakes". A simple example program for handwritten images can be found here:

The goal of this project is to use the VAE Network to generate musical instrument sounds, similar to "Nsynth". For this, the VAE is trained on musical instrument sounds from the IDMT Musical Instruments Database. A suitable training set has to be chosen and tested, and a good set of hyper-parameters (the dimensionality of the network) has to be found. Then a psycho-acoustic similarity measure based on our psycho-acoustic pre- and post-filters has the be used and tested for the VAE network (for the "generation loss"), and compared to other similarity measures. Supervision: Prof. Dr.-Ing. G. Schuller