Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making MuseTalk 40% faster #173

Open
mvoodarla opened this issue Aug 20, 2024 · 6 comments
Open

Making MuseTalk 40% faster #173

mvoodarla opened this issue Aug 20, 2024 · 6 comments

Comments

@mvoodarla
Copy link

I've been pretty impressed with MuseTalk albeit some of its shortcomings and have been playing around with the model. Ended up doing a ton of optimizations that made it run 40% faster. Most of these revolved around how we load, store, and save video frames in memory during pre/post-processing which turns out to be pretty inefficient. To that end, my company Sieve is now hosting it at a rate that's cheaper than self-hosting on GCP!

We also fixed a couple quality issues around audio silences.

We wrote about the work here and would appreciate any feedback / areas of improvement the community has noticed around the model that might be worthwhile for us to check out!

You can also just run the model directly in this playground!

@dubeno
Copy link

dubeno commented Aug 21, 2024

I saw your blog,very nice jobs!,the prepocess is too long ,the teech low resolution is a big problem, can you show more detail how to solves this cons!

@evan-zhao-thermofisher
Copy link

Hi @mvoodarla , your blog is like a guidance towards making the model perfect. Do you mind guiding me how you tackled the hallucination problems from silent audio? just change the temperature or replace with a new whisper model? Appreciate it!

@liuzysy
Copy link

liuzysy commented Aug 23, 2024

Thanks for your work, i just wondering that you have train a new model or use the checkpoint and optimize the inference part? Looking forward your reply.

@mvoodarla
Copy link
Author

Hey folks! Thanks for the notes here. We're still doing more active work around this model that we're turning into a high quality pipeline. More specifically, we're doing things like using CodeFormer to upscale, fixing how facial alignment is done, etc.

As per how we tackled hallucination in silent audio, one of the fixes involves first trying to detect the silent audio and then changing input parameters to MuseTalk in those moments to make the mouth shut. We hope to do a more technical post around all of these things soon!

@evan-zhao-thermofisher
Copy link

Look forward to it. @mvoodarla , you guys are doing a really meaningful work.

@mvoodarla
Copy link
Author

Join our Discord! Happy to share more active updates there.

https://discord.com/invite/Pnh97rvRtD

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants