Skip to content

Latest commit

 

History

History
35 lines (22 loc) · 1.25 KB

blaauw2020.md

File metadata and controls

35 lines (22 loc) · 1.25 KB
date tags
2020-11-17
paper, deep-learning, seq2seq, transformer, generative, singing

Sequence to sequence singing synthesis using the feed forward transformer

Link to the paper

Merlijn Blaauw, Jordi Bonada

ICASSP 2020

Year: 2020

Samples: https://mtg.github.io/singing-synthesis-demos/transformer/

This work presents a new solution for singing synthesis using a transformer architecture in the decoder side.

The use of a feed forward decoder allows for faster synthesis and exposure bias problem elimination.

The inputs of the model are the sequence of musical notes (onset, duration, pitch and associated phonemes) and the F0 signal.

The output of the model is a sequence of vocoder features which, in this case it consists of the WORLD vocoder.

Architecture

  • Encoder: CNN + Gated Linear Units + Residual connections
  • Decoder: Self-Atention + convolutions + Gated Linear Units. Attention matrix provided with a bias term along the diagonal.
  • Duration model: average look-up table with average phoneme durations
  • F0 model: the ground truth F0 is provided, the model is asked to predict timbre, but not intonation.

Results