Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What do z_dim and c_dim stand for? #36

Open
Hu-chengyang opened this issue Sep 7, 2022 · 4 comments
Open

What do z_dim and c_dim stand for? #36

Hu-chengyang opened this issue Sep 7, 2022 · 4 comments

Comments

@Hu-chengyang
Copy link

Dear PHD:
Could you tell me what do z_dim:64 and c_dim:256 in config/model/default stand for?And what n_embeddings: 512 in config/model/default stand for?Thank you very much.

@Wendison
Copy link
Owner

Hi, all these three variables are related with content encoder, z_dim denotes the dimension of acoustic units (z) in VQ codebook, c_dim denotes the dimension of continuous vectors after LSTM (g-net in the paper) that takes z as inputs, n_embeddings is the number of acoustic units in VQ codebook.

@Hu-chengyang
Copy link
Author

Thank you!

@Hu-chengyang
Copy link
Author

In model_encoder.py/class Encoder(nn.Module)/def forwad(self, mels):
z = self.conv(mels.float()) # (bz, 80, 128) -> (bz, 512, 128/2)

what does 128 mean?What variable does it represent?
Thank you very much.

@Wendison
Copy link
Owner

128 is the number of frames of mel-spectrograms used for training, it denotes 1.28s of waveform.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants