Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add speech-to-text pipeline #3078

Open
wants to merge 12 commits into
base: ai-video
Choose a base branch
from

Conversation

eliteprox
Copy link
Contributor

@eliteprox eliteprox commented Jun 12, 2024

What does this pull request do? Explain your changes. (required)

Adds the new /speech-to-text pipeline to go-livepeer, supporting the openai/whisper-large-v3 model.

File formats supported are mp3, m4a, mp4, webm, wav and flac

This change requires livepeer/ai-worker#103

Specific updates (required)

  • Refactors handleAIRequest and processAIRequest to support new response types like TextResponse
  • Adds /speech-to-text endpoint to ai_mediaserver.go
  • Pricing fixed to one pixel per millisecond

How did you test each of these updates (required)

  • Tested with rich vocal audio up to 4 hours long. Regression tested other pipelines to ensure refactoring cause any issues.
  • Tested with all supported file formats and unsupported ones

curl request example:

curl --request POST   --url http://dev.eliteencoder.net:8937/speech-to-text --header 'Content-Type: multipart/form-data'   --form '[email protected]'   --form 'model_id=openai/whisper-large-v3'   --form seed=123

Does this pull request close any open issues?

LIV-429
LIV-289

Checklist:

@github-actions github-actions bot added the AI Issues and PR related to the AI-video branch. label Jun 12, 2024
@eliteprox eliteprox marked this pull request as ready for review June 19, 2024 07:57
@eliteprox eliteprox requested a review from rickstaa as a code owner June 19, 2024 07:57
@eliteprox
Copy link
Contributor Author

Added error handlers to respond with "400 bad request" when duration cannot be calculated due to unsupported file format or file corruption. This prevents invalid jobs from being sent to the network.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AI Issues and PR related to the AI-video branch.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant