Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add llm endpoint creation and inference sample code to self hosting d… #153

Merged
merged 4 commits into from
Jul 21, 2023

Conversation

ruizehung-scale
Copy link
Contributor

@ruizehung-scale ruizehung-scale commented Jul 20, 2023

Add llm endpoint creation and inference sample code to self hosting doc play with it section.

Related Issue

#141

Test Plan

(launch) ➜  devbox ~ curl -X POST 'http://localhost:5000/v1/llm/model-endpoints' \
    -H 'Content-Type: application/json' \
    -d '{
        "name": "llama-7b",
        "model_name": "llama-7b",
        "source": "hugging_face",
        "inference_framework": "text_generation_inference",
        "inference_framework_image_tag": "0.9.3",
        "num_shards": 4,
        "endpoint_type": "streaming",
        "cpus": 32,
        "gpus": 4,
        "memory": "40Gi",
        "storage": "40Gi",
        "gpu_type": "nvidia-ampere-a10",
        "min_workers": 1,
        "max_workers": 12,
        "per_worker": 1,
        "labels": {},
        "metadata": {}
    }' \
    -u test_user_id:
{"endpoint_creation_task_id":"8d323344-b1b5-497d-a851-6d6284d2f8e4"}%                                                                               
(launch) ➜  devbox ~ curl -X POST 'http://localhost:5000/v1/llm/completions-sync?model_endpoint_name=llama-7b' \
    -H 'Content-Type: application/json' \
    -d '{
        "prompts": ["Tell me a joke about AI"],
        "max_new_tokens": 30,
        "temperature": 0.1
    }' \
    -u test-user-id:
{"status":"SUCCESS","outputs":[{"text":". Tell me a joke about AI. Tell me a joke about AI. Tell me a joke about AI. Tell me","num_completion_tokens":30}],"traceback":null}%                                                                                                                          

@ruizehung-scale ruizehung-scale force-pushed the self-hosting-llm-endpoint-creation-doc branch from 06b7a9c to 129d7b0 Compare July 20, 2023 17:36
@ruizehung-scale ruizehung-scale marked this pull request as ready for review July 20, 2023 17:36
@ruizehung-scale ruizehung-scale self-assigned this Jul 20, 2023
@ruizehung-scale ruizehung-scale force-pushed the self-hosting-llm-endpoint-creation-doc branch from 129d7b0 to 58faee6 Compare July 20, 2023 21:46
docs/guides/self_hosting.md Outdated Show resolved Hide resolved
docs/guides/self_hosting.md Show resolved Hide resolved
docs/guides/self_hosting.md Outdated Show resolved Hide resolved
docs/guides/self_hosting.md Outdated Show resolved Hide resolved

Next, let's create a LLM endpoint using llama-7b:
```
$ curl -X POST 'http://localhost:5000/v1/llm/model-endpoints' \
Copy link
Member

@yixu34 yixu34 Jul 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I think we'll want a separate guide that explains the server abstractions (maybe outside the scope of this PR), so that people understand what name vs model_name mean.

Also, I think we want docs per field, but that's more of an API reference, which doesn't need to go here.

{"endpoint_creation_task_id":"8d323344-b1b5-497d-a851-6d6284d2f8e4"}
```

Wait a few minutes for the endpoint to be ready. Once it's ready, you can list pods and see `2/2` in the `READY` column:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subtle nit around causality: Is seeing READY an indicator that the endpoint is ready? In other words, I might reword this to "You can tell that it's ready by listing pods and..."

Though alternatively, we may want to curl a status endpoint, to keep things at the API level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The status in model endpoint record is independent from the status of the pods .... Maybe at some point we might want to add an api that checks the endpoint pod status?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have something that returns the number of available/unavailable workers. Not sure if this exposed on the llm endpoint get/list routes though

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right nm.


You should get a response similar to:
```
{"status":"SUCCESS","outputs":[{"text":"hi hi hi2 hi2 hi2 hi2","num_completion_tokens":10}],"traceback":null}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm @seanshi-scale @yunfeng-scale I think status and traceback should be gone?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh oops they're still there in this repo. Will create a separate issue: #159

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants