Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lf0 question about convert phase #34

Open
powei-C opened this issue Jul 12, 2022 · 3 comments
Open

lf0 question about convert phase #34

powei-C opened this issue Jul 12, 2022 · 3 comments

Comments

@powei-C
Copy link

powei-C commented Jul 12, 2022

Hi,
I wonder why you normalize f0 series before feeding to the f0encoder in convert.py.
However, this kind of normalization for f0 isn't used in preprocessing phase.

@Wendison
Copy link
Owner

Wendison commented Jul 13, 2022

Hi, normalizing f0 aims to remove the speaker characteristics. During preprocessing phase, f0 is not normalized, but during training and inference, f0 is normalized as shown below:

lf0 = (lf0 - mean) / (std + 1e-8)

lf0[nonzeros_indices] = (lf0[nonzeros_indices] - mean) / (std + 1e-8)

@powei-C
Copy link
Author

powei-C commented Jul 13, 2022

Hi,
thank you for your explanation!!!
I have another question about perplexity when training the model with another dataset.
I found that the perplexity didn't keep increasing (have run around 360 epochs in the figure), was it reasonable?
And do you have any suggestions to verify this issue?
image

@Wendison
Copy link
Owner

The perplexity should be increasing during training, as higer perplexity indicates that the vectors in the VQ codebook are distinguiable and can be used to represent different acoustic units. I also saw your recon_loss is high. Based on my experience, recon_loss should be less than 0.5, then you would obtain good converted samples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants