problem with tutorial steps, short files from wavenet #6

atrush · 2021-12-13T00:37:15Z

hi, i'm trying to reproduce your tutorial with pretrained models, but there is a problem with outputting files from the wavenet - after starting infer.sh I get files 1 second long, please tell me what i am doing wrong and how can i get fully processed files?

yumahayomaso · 2021-12-14T08:46:47Z

Same for me

RussellSB · 2021-12-14T14:42:02Z

For both of you, I think there may be something wrong with how the inference from voice_conversion has been prepared for inference from wavenet.

Could either of you provide a bit more context? Like how are you inferring from voice_conversion, and how are you inferring from wavenet command-wise.

yumahayomaso · 2021-12-14T15:07:02Z

Thank you for your quick reply

I've used pre-trained models for both voice_conversion infer and wavenet

Extracted G1_99,G2-99, encoder_99 from VAE-GAN Flickr to saved_models folder and ran python inference.py --model_name [expname] --epoch [int] --trg_id 2 --src_id 1 --wav myfile.wav . out_infer folder was generated and from the first point of view looks ok. The gen folder contains audio that sounds like griffin lim. The input file is 5 seconds long
Then I repeated all the steps as mentioned here
Then I run spk="flickr_2" inferdir="initial_99_G2_S1" hparams=conf/flickr.json ./infer.sh
Output:

stage 1: Feature Generation
Sampling frequency: 16000
100%|##########| 1/1 [00:02<00:00,  2.87s/it]
Wrote 1 utterances, 507 time steps (0.00 hours)
Min frame length: 507
Max frame length: 507
*\AppData\Roaming\Python\Python39\site-packages\sklearn\base.py:310: UserWarning: Trying to unpickle estimator StandardScaler from version 0.24.1 w using version 0.24.2. This might lead to breaking code or invalid results. Use at your own risk.
  warnings.warn(
100%|##########| 1/1 [00:00<00:00,  1.03it/s]
stage 2: Synthesis waveform from WaveNet
Load checkpoint from exp/flickr_2_train_no_dev_flickr/checkpoint_latest.pth
100%|##########| 11264/11264 [01:48<00:00, 103.45it/s]
Finished! Check out out/flickr_2_initial_99_G2_S1 for generated audio samples.

the gausiann/out folder contains 2 files each of them 20Kb and <1s

RussellSB · 2021-12-14T16:00:51Z

Thank you @yumahayomaso. Looking into this ... wavenet might be expecting more data to infer in parallel then its reading and then just weirdly returning some empty data.

This could make sense as I didn't extensively test inferring from wavenet with just one audio file but mostly folders of multiple audio files with at least minutes worth of data.

Try going to egs/gaussian/conf/flickr.json (if that's the config you're using) and adjusting the batch size from 8 to 1 (line 46). I hope this does the trick. Let me know if not.

yumahayomaso · 2021-12-15T09:23:52Z

Thank you @RussellSB. I've tried changing the batchsize but no changes after new inferring. I also tried to increase the audio file duration but no noticeable changes

RussellSB · 2021-12-15T16:59:17Z

@yumahayomaso thanks for trying that. Sorry to hear it didn't work. I'm not entirely sure what could be the problem. Will continue looking into it. And just to check - do you have meanvar.joblib within the directory wavenet_vocoder/egs/gaussian/dump/[spk]/logmelspectrogram/org/ ?

yumahayomaso · 2021-12-16T09:29:05Z

@RussellSB Yes, the file is there

RussellSB · 2021-12-20T20:14:55Z

Excuse the delayed response. Could you try with multiple files and let me know if the problem still persists? Maybe try having 10 utterance files in the inferdir (whether new files or just copy pastes of the same one).

I apologise if the vocoder code is a bit buggy. That is the part of the pipeline I'm least involved with code-wise. I was having issues with this wavenet's implementation of inference before. Having set up this script for multiple files (hundreds of 5 second samples or 2 songs of 1.5 mins each) seemed to do the trick. Not that it's a long-term solution of course. Just want to home in and ensure that that is the problem.

atrush · 2021-12-23T10:31:45Z

hi, sorry for the long silence. in below my test with multiple wav files (win environment):
git clone https://github.com/RussellSB/tt-vae-gan.git
cd tt-vae-gan
git submodule init
git submodule update

install env with conda - result environment.yml (https://drive.google.com/file/d/1azB2ArI3tduwH_hcG3wdEz9DdCb9FW1J/view?usp=sharing)

STEP_01

copy 'initial' model files into .\voice_conversion\saved_models\initial\ (https://drive.google.com/drive/folders/1Wui2Pt4sOBl71exRh49GX_JEBpFv_vNg)
copy wav files for inference into .\voice_conversion\inf_female\ (that files you can see in output refs folder in link below)

in file voice_conversion\src\inference.py fix 190. "wavname = f.split('.')[0]" to "wavname = os.path.basename(wav)" - win path contains "\" and split not work
jump to .\voice_conversion
python .\src\inference.py --model_name initial --epoch 99 --trg_id 2 --wavdir .\inf_female

OUTPUT

(C:_ml\tt-vae-gan_clear\tt-vae-gan\env) C:_ml\tt-vae-gan_clear\tt-vae-gan\voice_conversion>python .\src\inference.py --model_name initial --epoch 99 --trg_id 2 --wavdir .\inf_female
Namespace(channels=1, dim=32, epoch=99, img_height=128, img_width=128, model_name='initial', n_downsample=2, n_overlap=4, plot=1, src_id=None, trg_id='2', wav=None, wavdir='.\inf_female')
[File 1/11]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 17/17 [00:01<00:00, 10.02it/s]
Reconstructing with Griffin Lim...
[File 2/11]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 35.09it/s]
Reconstructing with Griffin Lim...
[File 3/11]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 46.48it/s]
Reconstructing with Griffin Lim...
[File 4/11]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 56.76it/s]
Reconstructing with Griffin Lim...
[File 5/11]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 36.59it/s]
Reconstructing with Griffin Lim...
[File 6/11]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 42.02it/s]
Reconstructing with Griffin Lim...
[File 7/11]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 43.35it/s]
Reconstructing with Griffin Lim...
[File 8/11]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 19/19 [00:00<00:00, 47.50it/s]
Reconstructing with Griffin Lim...
[File 9/11]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 45.18it/s]
Reconstructing with Griffin Lim...
[File 10/11]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 17/17 [00:00<00:00, 43.37it/s]
Reconstructing with Griffin Lim...
[File 11/11]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 74/74 [00:01<00:00, 57.28it/s]
Reconstructing with Griffin Lim...

all files processed and folder .\voice_conversion\out_infer contains that files
https://drive.google.com/drive/folders/1WMZW_vtZWo9HpbW63P8heqmNgWTpOTDf?usp=sharing

STEP_02

copy wavenet pretrained files into .\wavenet_vocoder\tst\flickr_2_train_no_dev_flickr\ (from https://drive.google.com/drive/folders/1SliS5budtnV7P1L9ALbPgTaq53a84Eyu?usp=sharing)
jump to .\wavenet_vocoder
python preprocess.py wavallin ..\voice_conversion\out_infer\initial_99_G2\gen .\tst\dump\ --hparams global_gain_scale=0.55 --preset .\egs\gaussian\conf\flickr.json

OUTPUT

(C:_ml\tt-vae-gan_clear\tt-vae-gan\env) C:_ml\tt-vae-gan_clear\tt-vae-gan\wavenet_vocoder>python preprocess.py wavallin ..\voice_conversion\out_infer\initial_99_G2\gen .\tst\dump\ --hparams global_gain_scale=0.55 --preset .\egs\gaussian\conf\flickr.json
Sampling frequency: 16000
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:04<00:00, 2.65it/s]
Wrote 11 utterances, 5665 time steps (0.00 hours)
Min frame length: 237
Max frame length: 2256

in folder .\tst\dump\ generated files https://drive.google.com/drive/folders/11oQX9WMztOzYNY2PWsXbIc_mlt5YwO_m?usp=sharing

STEP_03
python preprocess_normalize.py .\tst\dump\ .\tst\norm\ .\tst\flickr_2_train_no_dev_flickr\meanvar.joblib

OUTPUT

(C:_ml\tt-vae-gan_clear\tt-vae-gan\env) C:_ml\tt-vae-gan_clear\tt-vae-gan\wavenet_vocoder>python preprocess_normalize.py .\tst\dump\ .\tst\norm\ .\tst\flickr_2_train_no_dev_flickr\meanvar.joblib
C:_ml\tt-vae-gan_clear\tt-vae-gan\env\lib\site-packages\sklearn\base.py:310: UserWarning: Trying to unpickle estimator StandardScaler from version 0.24.1 when using version 0.24.2. This might lead to breaking code or invalid results. Use at your own risk.
warnings.warn(
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:01<00:00, 8.75it/s]

in folder .\tst\norm\ generated files https://drive.google.com/drive/folders/1DJuejed0AHmN4_lB0QdJOkBQIfUyVZLp?usp=sharing

STEP_04
python evaluate.py .\tst\norm\ .\tst\flickr_2_train_no_dev_flickr\checkpoint_latest.pth .\tst\out\ --hparams global_gain_scale=0.55 --preset .\egs\gaussian\conf\flickr.json

OUTPUT

(C:_ml\tt-vae-gan_clear\tt-vae-gan\env) C:_ml\tt-vae-gan_clear\tt-vae-gan\wavenet_vocoder>python evaluate.py .\tst\norm\ .\tst\flickr_2_train_no_dev_flickr\checkpoint_latest.pth .\tst\out\ --hparams global_gain_scale=0.55 --preset .\egs\gaussian\conf\flickr.json
Load checkpoint from .\tst\flickr_2_train_no_dev_flickr\checkpoint_latest.pth
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11264/11264 [03:21<00:00, 55.76it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11264/11264 [03:28<00:00, 53.98it/s]
Finished! Check out .\tst\out\ for generated audio samples.

in folder .\tst\out\ generated files https://drive.google.com/drive/folders/1ucfcsuH2XtF_kogAp6N5HY10apw8tnDm?usp=sharing
all of them have same size and short length. and i do not understand why, i beg you to help me figure it out.

SlistInc · 2021-12-30T08:47:28Z

I am running into the same problem. I do not get the version discrepancy warning, so it's not that.

I am applying the scripts on a folder with multiple files
I am also setting everything to single-thread/worker on CPU (to avoid any multi-threading issues)
Output files are 21 KB (both *_ref.wav and *_gen.wav)

fujistoo · 2022-05-18T10:13:42Z

Any updates on this? Running into the same issue of 21KB files 1 second generated file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

problem with tutorial steps, short files from wavenet #6

problem with tutorial steps, short files from wavenet #6

atrush commented Dec 13, 2021

yumahayomaso commented Dec 14, 2021

RussellSB commented Dec 14, 2021

yumahayomaso commented Dec 14, 2021

RussellSB commented Dec 14, 2021 •

edited

Loading

yumahayomaso commented Dec 15, 2021

RussellSB commented Dec 15, 2021

yumahayomaso commented Dec 16, 2021

RussellSB commented Dec 20, 2021 •

edited

Loading

atrush commented Dec 23, 2021 •

edited

Loading

OUTPUT

OUTPUT

OUTPUT

OUTPUT

SlistInc commented Dec 30, 2021

fujistoo commented May 18, 2022

problem with tutorial steps, short files from wavenet #6

problem with tutorial steps, short files from wavenet #6

Comments

atrush commented Dec 13, 2021

yumahayomaso commented Dec 14, 2021

RussellSB commented Dec 14, 2021

yumahayomaso commented Dec 14, 2021

RussellSB commented Dec 14, 2021 • edited Loading

yumahayomaso commented Dec 15, 2021

RussellSB commented Dec 15, 2021

yumahayomaso commented Dec 16, 2021

RussellSB commented Dec 20, 2021 • edited Loading

atrush commented Dec 23, 2021 • edited Loading

OUTPUT

OUTPUT

OUTPUT

OUTPUT

SlistInc commented Dec 30, 2021

fujistoo commented May 18, 2022

RussellSB commented Dec 14, 2021 •

edited

Loading

RussellSB commented Dec 20, 2021 •

edited

Loading

atrush commented Dec 23, 2021 •

edited

Loading