Make your photo tell their best stories.
This project uses :
- TTS to create the audio speech from a text and a voice sample.
- Wav2Lip to make the lips move on a photo.
Download this file, rename it to s3fd.pth
and place it in assets/Wav2Lip
.
Download these files and place them in assets/Wav2Lip/checkpoints/
:
Create the Docker image (~22 GB) :
docker build -t text2lips .
Create the container :
docker run -it --rm -v .:/home text2lips
To run the whole process :
/home/src/text2lips.sh \
-t /home/assets/sources/text.txt \
-v /home/assets/sources/voice_sample_audio_or_video.mp4 \
-p /home/assets/sources/photo_or_video.jpg \
-l en
To run only the text to speech :
python3 /home/src/text_to_speech.py \
--text_file /home/assets/sources/text.txt \
--voice /home/assets/sources/voice_sample_audio_or_video.mp4 \
--language en \
--output_path /home/results/resulting_speech.wav
To run only the audio to lip sync :
cd /vendor/Wav2Lip
python3 /vendor/Wav2Lip/inference.py \
--checkpoint_path /vendor/Wav2Lip/checkpoints/wav2lip.pth \
--face /home/assets/sources/photo_or_video.jpg \
--audio /home/results/resulting_speech.wav
The result will be in /vendor/Wav2Lip/results/result_voice.mp4
.