Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check if model folder exists on startup and request processing #3044

Open
wants to merge 5 commits into
base: ai-video
Choose a base branch
from

Conversation

eliteprox
Copy link
Contributor

@eliteprox eliteprox commented May 6, 2024

What does this pull request do? Explain your changes. (required)

This PR is dependent on livepeer/ai-worker#79

The change checks if the requested model folder exists when loading during startup (warm only) and gracefully handles the condition of a model folder missing in requests from gateway.

  • This improves response times on the network by immediately returning a 503 API error code when the orchestrator is missing the model and is primarily useful for cold models.
  • This improves orchestrator onboarding by logging the exact path the container is looking for the model in on startup and individual requests when model is not found.

Gateway error log:

I0506 09:29:28.120307 1985227 discovery.go:180] Done fetching orch info numOrch=1 responses=1/1 timedOut=false
I0506 09:29:30.600500 1985227 ai_process.go:344] clientIP=127.0.0.1 request_id=14b57a61 Error submitting request cap=27 modelID=stabilityai/stable-video-diffusion-img2vid-xt-1-1 try=1 orch=https://0.0.0.0:8936 err=Insufficient capacity for modelID=stabilityai/stable-video-diffusion-img2vid-xt-1-1
E0506 09:29:30.600545 1985227 handlers.go:1479] clientIP=127.0.0.1 request_id=14b57a61 Error with API code=503 err=no orchestrators available within 2s timeout

AI Core error log on cold model request:

I0506 09:29:28.121922 1984042 ai_http.go:198] manifestID=27_stabilityai/stable-video-diffusion-img2vid-xt-1-1 orchSessionID=8983c425 clientIP=127.0.0.1 Received request id=6156387e cap=27 modelID=stabilityai/stable-video-diffusion-img2vid-xt-1-1
2024/05/06 09:29:30 ERROR model stabilityai/stable-video-diffusion-img2vid-xt-1-1 does not exist at /livepeer/ai-core/arbitrum-one-mainnet/models/models--stabilityai--stable-video-diffusion-img2vid-xt-1-1
E0506 09:29:30.600020 1984042 handlers.go:1511] HTTP Response Error 503: Insufficient capacity for modelID=stabilityai/stable-video-diffusion-img2vid-xt-1-1

AI Core error log on startup:

2024/05/06 10:04:25 ERROR model stabilityai/stable-video-diffusion-img2vid-xt-1-1 does not exist at /livepeer/ai-core/arbitrum-one-mainnet/models/models--stabilityai--stable-video-diffusion-img2vid-xt-1-1
E0506 10:04:25.144208 2005927 starter.go:549] Error AI worker warming text-to-image container: model stabilityai/stable-video-diffusion-img2vid-xt-1-1 does not exist
I0506 10:04:25.144224 2005927 db.go:368] Closing DB

Specific updates (required)

  • This code checks if the given model exists on startup and when processing requests.
  • Uses a new method ModelExsits in ai-worker that returns boolean if specific model folder exists

How did you test each of these updates (required)

  1. Started go-livepeer with aiModels.json config containing a model that does not exist with warm set to true
  2. Started go-livepeer with aiModels.json config containing a model that does not exist with warm set to false
  3. Sent AI request with gateway to go-livepeer running a cold model name that doesn't exist, received immediate error response from orchestrator of 503.

Does this pull request close any open issues?
Addresses LIV-117

Checklist:

@github-actions github-actions bot added the AI Issues and PR related to the AI-video branch. label May 6, 2024
@eliteprox eliteprox changed the title Check-model-folder Check if model folder exists on startup and request processing May 6, 2024
server/ai_http.go Outdated Show resolved Hide resolved
@eliteprox eliteprox marked this pull request as ready for review May 6, 2024 14:36
@eliteprox eliteprox requested a review from rickstaa as a code owner May 6, 2024 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AI Issues and PR related to the AI-video branch.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant