-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add llm endpoint creation and inference sample code to self hosting d… #153
Conversation
06b7a9c
to
129d7b0
Compare
…oc play with it section
129d7b0
to
58faee6
Compare
|
||
Next, let's create a LLM endpoint using llama-7b: | ||
``` | ||
$ curl -X POST 'http://localhost:5000/v1/llm/model-endpoints' \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I think we'll want a separate guide that explains the server abstractions (maybe outside the scope of this PR), so that people understand what name
vs model_name
mean.
Also, I think we want docs per field, but that's more of an API reference, which doesn't need to go here.
docs/guides/self_hosting.md
Outdated
{"endpoint_creation_task_id":"8d323344-b1b5-497d-a851-6d6284d2f8e4"} | ||
``` | ||
|
||
Wait a few minutes for the endpoint to be ready. Once it's ready, you can list pods and see `2/2` in the `READY` column: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
subtle nit around causality: Is seeing READY
an indicator that the endpoint is ready? In other words, I might reword this to "You can tell that it's ready by listing pods and..."
Though alternatively, we may want to curl
a status endpoint, to keep things at the API level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The status
in model endpoint record is independent from the status of the pods .... Maybe at some point we might want to add an api that checks the endpoint pod status?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do have something that returns the number of available/unavailable workers. Not sure if this exposed on the llm endpoint get/list routes though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right nm.
docs/guides/self_hosting.md
Outdated
|
||
You should get a response similar to: | ||
``` | ||
{"status":"SUCCESS","outputs":[{"text":"hi hi hi2 hi2 hi2 hi2","num_completion_tokens":10}],"traceback":null} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm @seanshi-scale @yunfeng-scale I think status
and traceback
should be gone?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh oops they're still there in this repo. Will create a separate issue: #159
Add llm endpoint creation and inference sample code to self hosting doc play with it section.
Related Issue
#141
Test Plan