Loss value and reconstruction gender change #15

inconnu11 · 2021-10-13T10:09:33Z

Hi, great work! I trained and tested the model and I have 2 questions to clarify.

I am not sure whether the loss values behave well. The loss values at epoch 500 is shown below. Is that in normal range?
While I conduct reconstruction which means the input wavs of content, speaker and pitch encoder are the same, the reconstructed wav sounds like speaker gender change. I tested the p316_003.wav. I have tried the pretrained ckpt you provided and ckpt I trained myself, and the phenomenon both exist. The reconstructed was are here.

Wendison · 2021-10-14T09:38:16Z

Hi, thanks for your interest. For your questions,

I don't remember the exact values for these losses on VCTK as the work was done during my internship about 1 year ago... But according to my recent experiments on LibriTTS, I think your losses behave normally;
I tested the currently released model on p316_003, and obtained the similar result as yours. I think this is a bad case, and actually when I tested the model before, I also found some bad cases during conversion, where the generated has the gender that is different from the reference utterance's gender. If you look at our paper on the results of Table 3, the speaker classification accuracy for speaker representation is 99.5% (lambda_MI=1e-2), which means the speaker representation may lose partial speaker information and can't fully capture the speaking characteristics, I think this may explain why sometimes the converted speech or reconstructed speech doesn't sound like the reference utterance.

Wendison closed this as completed Nov 10, 2021

Wendison mentioned this issue Jul 13, 2022

Training Loss Abnormal #35

Open

Provide feedback