Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AccentProbe - CSV file #1

Open
natalietoulova opened this issue May 12, 2021 · 1 comment
Open

AccentProbe - CSV file #1

natalietoulova opened this issue May 12, 2021 · 1 comment

Comments

@natalietoulova
Copy link

Hi, I am new to programming and I am a little bit confused about which CSV file is used in AccentProbe and also which hyperparameters are used for training, did I miss it somewhere?

@archiki
Copy link
Owner

archiki commented May 19, 2021

Hey @natalietoulova!
I don't have an example .csv file that was used for accent probes, but if you look at the AccentProbe/data_loader.py file the csv file needs to contain file_name, accent_label, duration. Here I think my folder organization is such that I have stored representations (after different layers of the network) for each audio file, and these representation files are stored in a folder indicating the type of representation (eg. probe_data/lstm_0/[file name].npy). Here, data_path = probe_data, rep_type = lstm_0 and the file name comes from the csv file. The other thing that you will need is a meta file, which I have used in several experiments that aligns phones in speech to time durations in the audio files. This basically corresponds to end_times used in data_loader.py file. If you are only running this experiment, the entire alignment is not needed, you can simply do a voice activity detection to mark the time the speech starts and the time it ends. The code basically needs end_times to process out the silence as it may unnecessarily increase the data-size loaded into the accent classifiers. All this being said, you can always make your own custom data_loader.py that works for your set-up.

Regarding the hyperparameters, I will have to refer you to the paper for details. We did not find accent classification trends to be very hyper-parameter sensitive, so we picked learning_rate = 1e-03 and batch_size = 16/32 depending on available GPU space. Hope that answers your questions.

Best,
Archiki

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants