Add llm endpoint creation and inference sample code to self hosting d… #153

ruizehung-scale · 2023-07-20T17:34:12Z

Add llm endpoint creation and inference sample code to self hosting doc play with it section.

Related Issue

Test Plan

(launch) ➜  devbox ~ curl -X POST 'http://localhost:5000/v1/llm/model-endpoints' \
    -H 'Content-Type: application/json' \
    -d '{
        "name": "llama-7b",
        "model_name": "llama-7b",
        "source": "hugging_face",
        "inference_framework": "text_generation_inference",
        "inference_framework_image_tag": "0.9.3",
        "num_shards": 4,
        "endpoint_type": "streaming",
        "cpus": 32,
        "gpus": 4,
        "memory": "40Gi",
        "storage": "40Gi",
        "gpu_type": "nvidia-ampere-a10",
        "min_workers": 1,
        "max_workers": 12,
        "per_worker": 1,
        "labels": {},
        "metadata": {}
    }' \
    -u test_user_id:
{"endpoint_creation_task_id":"8d323344-b1b5-497d-a851-6d6284d2f8e4"}%

(launch) ➜  devbox ~ curl -X POST 'http://localhost:5000/v1/llm/completions-sync?model_endpoint_name=llama-7b' \
    -H 'Content-Type: application/json' \
    -d '{
        "prompts": ["Tell me a joke about AI"],
        "max_new_tokens": 30,
        "temperature": 0.1
    }' \
    -u test-user-id:
{"status":"SUCCESS","outputs":[{"text":". Tell me a joke about AI. Tell me a joke about AI. Tell me a joke about AI. Tell me","num_completion_tokens":30}],"traceback":null}%

…oc play with it section

docs/guides/self_hosting.md

yixu34 · 2023-07-20T22:07:32Z

docs/guides/self_hosting.md

+
+Next, let's create a LLM endpoint using llama-7b:
+```
+$ curl -X POST 'http://localhost:5000/v1/llm/model-endpoints' \


Hmm I think we'll want a separate guide that explains the server abstractions (maybe outside the scope of this PR), so that people understand what name vs model_name mean.

Also, I think we want docs per field, but that's more of an API reference, which doesn't need to go here.

yixu34 · 2023-07-20T22:09:31Z

docs/guides/self_hosting.md

+{"endpoint_creation_task_id":"8d323344-b1b5-497d-a851-6d6284d2f8e4"}
+```
+
+Wait a few minutes for the endpoint to be ready. Once it's ready, you can list pods and see `2/2` in the `READY` column:


subtle nit around causality: Is seeing READY an indicator that the endpoint is ready? In other words, I might reword this to "You can tell that it's ready by listing pods and..."

Though alternatively, we may want to curl a status endpoint, to keep things at the API level.

The status in model endpoint record is independent from the status of the pods .... Maybe at some point we might want to add an api that checks the endpoint pod status?

We do have something that returns the number of available/unavailable workers. Not sure if this exposed on the llm endpoint get/list routes though

Oh right nm.

yixu34 · 2023-07-20T22:10:01Z

docs/guides/self_hosting.md

+
+You should get a response similar to:
+```
+{"status":"SUCCESS","outputs":[{"text":"hi hi hi2 hi2 hi2 hi2","num_completion_tokens":10}],"traceback":null}


Hmm @seanshi-scale @yunfeng-scale I think status and traceback should be gone?

Oh oops they're still there in this repo. Will create a separate issue: #159

ruizehung-scale force-pushed the self-hosting-llm-endpoint-creation-doc branch from 06b7a9c to 129d7b0 Compare July 20, 2023 17:36

ruizehung-scale requested review from yixu34, song-william and seanshi-scale July 20, 2023 17:36

ruizehung-scale marked this pull request as ready for review July 20, 2023 17:36

ruizehung-scale self-assigned this Jul 20, 2023

ruizehung-scale requested a review from yunfeng-scale July 20, 2023 17:54

Add llm endpoint creation and inference sample code to self hosting d…

58faee6

…oc play with it section

ruizehung-scale force-pushed the self-hosting-llm-endpoint-creation-doc branch from 129d7b0 to 58faee6 Compare July 20, 2023 21:46

ruizehung-scale and others added 2 commits July 20, 2023 14:47

Merge branch 'main' into self-hosting-llm-endpoint-creation-doc

bc7304e

Small fix on sending list llm endpoints request

de7d525

yunfeng-scale reviewed Jul 20, 2023

View reviewed changes

docs/guides/self_hosting.md Outdated Show resolved Hide resolved

docs/guides/self_hosting.md Show resolved Hide resolved

docs/guides/self_hosting.md Outdated Show resolved Hide resolved

docs/guides/self_hosting.md Outdated Show resolved Hide resolved

yixu34 reviewed Jul 20, 2023

View reviewed changes

Update doc

2531537

ruizehung-scale requested review from yixu34 and yunfeng-scale July 20, 2023 22:52

yixu34 approved these changes Jul 21, 2023

View reviewed changes

yixu34 mentioned this pull request Jul 21, 2023

Create guide for how to deploy an existing Hugging Face model on self-hosted LLM Engine #141

Open

ruizehung-scale merged commit d9acb5b into main Jul 21, 2023

ruizehung-scale deleted the self-hosting-llm-endpoint-creation-doc branch July 21, 2023 17:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add llm endpoint creation and inference sample code to self hosting d… #153

Add llm endpoint creation and inference sample code to self hosting d… #153

ruizehung-scale commented Jul 20, 2023 •

edited

Loading

yixu34 Jul 20, 2023 •

edited

Loading

yixu34 Jul 20, 2023

ruizehung-scale Jul 20, 2023

seanshi-scale Jul 20, 2023

yixu34 Jul 21, 2023

yixu34 Jul 20, 2023

yixu34 Jul 21, 2023

Add llm endpoint creation and inference sample code to self hosting d… #153

Add llm endpoint creation and inference sample code to self hosting d… #153

Conversation

ruizehung-scale commented Jul 20, 2023 • edited Loading

Related Issue

Test Plan

yixu34 Jul 20, 2023 • edited Loading

Choose a reason for hiding this comment

yixu34 Jul 20, 2023

Choose a reason for hiding this comment

ruizehung-scale Jul 20, 2023

Choose a reason for hiding this comment

seanshi-scale Jul 20, 2023

Choose a reason for hiding this comment

yixu34 Jul 21, 2023

Choose a reason for hiding this comment

yixu34 Jul 20, 2023

Choose a reason for hiding this comment

yixu34 Jul 21, 2023

Choose a reason for hiding this comment

ruizehung-scale commented Jul 20, 2023 •

edited

Loading

yixu34 Jul 20, 2023 •

edited

Loading