Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loss value and reconstruction gender change #15

Closed
inconnu11 opened this issue Oct 13, 2021 · 1 comment
Closed

Loss value and reconstruction gender change #15

inconnu11 opened this issue Oct 13, 2021 · 1 comment

Comments

@inconnu11
Copy link

Hi, great work! I trained and tested the model and I have 2 questions to clarify.

  1. I am not sure whether the loss values behave well. The loss values at epoch 500 is shown below. Is that in normal range?
    image

  2. While I conduct reconstruction which means the input wavs of content, speaker and pitch encoder are the same, the reconstructed wav sounds like speaker gender change. I tested the p316_003.wav. I have tried the pretrained ckpt you provided and ckpt I trained myself, and the phenomenon both exist. The reconstructed was are here.

@Wendison
Copy link
Owner

Hi, thanks for your interest. For your questions,

  1. I don't remember the exact values for these losses on VCTK as the work was done during my internship about 1 year ago... But according to my recent experiments on LibriTTS, I think your losses behave normally;
  2. I tested the currently released model on p316_003, and obtained the similar result as yours. I think this is a bad case, and actually when I tested the model before, I also found some bad cases during conversion, where the generated has the gender that is different from the reference utterance's gender. If you look at our paper on the results of Table 3, the speaker classification accuracy for speaker representation is 99.5% (lambda_MI=1e-2), which means the speaker representation may lose partial speaker information and can't fully capture the speaking characteristics, I think this may explain why sometimes the converted speech or reconstructed speech doesn't sound like the reference utterance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants