HOW DOES IT WORK?
In our ICLR 2020 paper we show that using differentiable oscillators, filters, and reverberation from the DDSP library enables us to train high-quality audio synthesis models with less data and fewer parameters, as they do not need to learn to generate audio from scratch. We demonstrated abilities such as training a high quality timbre transfer model with about 10 minutes of training data and other signal processing tasks like blind dereverberation.
The key idea is to use simple interpretable DSP elements to create complex realistic signals by using a machine learning model to precisely control their many parameters. If you are new to this, you can think of it like an ML model which can either A.) render every pixel of a video game image directly or B.) control a renderer like Unreal engine to create the image.
On one end of the spectrum are completely “deep neural networks”. These systems are often black boxes. They can adapt to many different datasets but are not interpretable. On the other end is DSP or Digital Signal Processing, (without the extra “differentiable” D). This is an area of Electrical Engineering which forms the backbone of modern society telecommunications, medical imaging and file compression.
DDSP’S ACADEMIC INFLUENCE:
DDSP kickstarted an entire subfield!
- A Review of Differentiable Digital Signal Processing for Music & Speech Synthesis
-
Differentiable Wavetable Synthesis
- Differentiable IIR Filters
- Differentiable Artifical Reverberations
- Differentiable all-pass filters for phase response estimation and automatic signal
alignment
- Lightweight and Interpretable Neural Modeling of an Audio Distortion Effect Using Hyperconditioned Differentiable Biquads
DDSP’S PRODUCT INFLUENCE:
DDSP forms the backbone of the following products:
-
Mawf Plugin
- Neutone Plugin
-
DDSP-VST Plugin
- Tone Transfer
- Sounds of India
- Sounds.Studio (timbre changing function)