Skip to content

Latest commit

 

History

History
29 lines (18 loc) · 1.61 KB

wang2021.md

File metadata and controls

29 lines (18 loc) · 1.61 KB
date tags
2020-03-23
paper, gan, deep-learning, vocoder, tts, speech, generative

Improve GAN-based Neural Vocoder using Pointwise Relativistic LeastSquare GAN

Link to the paper

Congyi Wang, Yu Chen, Bin Wang, Yi Shi

Report

Year: 2021

Samples: https://anonymous1086.github.io/prlsgan-vocoder/

This paper presents a GAN-based vocoder based on MelGAN, that introduces the Pointwise Relativistic Least Square GAN (PRLSGAN) to improve the performance of the GAN-based vocoder. It is based in Relativistic GAN and it shows improvement mainly in the artifacts normally produced by GAN-based vocoders.

The authors make a great overview of different types of vocoder, to motivate the use of GAN vocoders which, among other things, are able to generate audio in parallel from a set of features (e.g. spectrogram).

After, they introduce the basic methods:

  • Parallel WaveGAN (PWGAN): consists of a generator, a discriminator and a multi-resolution STFT auxiliary loss
  • Multi-resolution STFT: consists of a conventional loss applied over the STFT of the signal produced by the generator.
  • MelGAN: consists of a generator implementing transposed convolutions, and a multi-scale discriminator that handles audio at different levels. As loss function, it implements a feature-matching loss between generated and real audio, as well as an adversarial loss.

PRLSGAN builds upon MelGAN and implements a different loss: a combination of the MSE loss and a pointwise relative discrepancy loss that seems to increase the minimax difficulty, forcing the generator to refine the output.

The method