Overview

We propose a model to improve automatic music transcription by adding a perceptual objective using differentiable rendering, while permitting automatic transposition to different musical instruments from the original.

Transcription

Bach - Prelude in C Major, BWV 846

Original data Transcription
Input waveform
 
Ground-truth piano-roll
Predicted piano-roll
  Predicted onset matrix
Rendered from ground-truth piano-roll
Rendered from predicted piano-roll

Chopin - Fantaisie-Impromptu in C# minor, Op. 66

Original data Transcription
Input waveform
 
Ground-truth piano-roll
Predicted piano-roll
  Predicted onset matrix
Rendered from ground-truth piano-roll
Rendered from predicted piano-roll

Schumann - Träumerei, “Kinderszenen” No. 7 in F major, Op. 15

Original data Transcription
Input waveform
 
Ground-truth piano-roll
Predicted piano-roll
  Predicted onset matrix
Rendered from ground-truth piano-roll
Rendered from predicted piano-roll

Arrangement

Orchestra to strings (Dvorak - Symphony No.9 Fourth movement)

The sounds of strings are stationary.

Original sound Arrangement


Orchestra to organ (Holst - The Planets, Jupiter)

The sound of a organ is stationary.

Original sound Arrangement


Orchestra to piano (Haydn - Menuet)

The sound of a piano is non-stationary.

Original sound Arrangement