GKE Helm deployment #157

brosand · 2023-07-20T18:41:19Z

Is there any thought/desire to setup a helm chart for gke deployment as well?

yixu34 · 2023-07-20T18:57:02Z

Hi @brosand , our initial self-hosting deliverable is going to be a fully-contained helm chart that points to k8s-native dependencies (e.g. postgres running on k8s). You can follow along at #141. Once that's in place, presumably you can run on GKE as well. There may be some kinks to iron out at first. You can replace the k8s native bits with cloud-managed equivalents.

brosand · 2023-07-20T19:00:05Z

Awesome thanks! While I'm feature requesting -- do you have any latency testing benchmarks available? Particularly interested in how this engine compares to engines such as https://github.com/huggingface/text-generation-inference

yixu34 · 2023-07-20T21:13:15Z

Oh we use text-generation-inference under the hood (see

llm-engine/server/llm_engine_server/domain/use_cases/llm_model_endpoint_use_cases.py

Line 230 in b2588ea

    
           repository="ghcr.io/huggingface/text-generation-inference",  # TODO: let user choose repo

), so you can expect comparable numbers.

The main thing we're offering here is a helm chart with k8s resources to help with scaling, etc.

In terms of performance, we generally try to patch text-generation-inference first, and update our own server code only when needed.

brosand closed this as completed Jul 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GKE Helm deployment #157

GKE Helm deployment #157

brosand commented Jul 20, 2023

yixu34 commented Jul 20, 2023 •

edited

Loading

brosand commented Jul 20, 2023

yixu34 commented Jul 20, 2023

GKE Helm deployment #157

GKE Helm deployment #157

Comments

brosand commented Jul 20, 2023

yixu34 commented Jul 20, 2023 • edited Loading

brosand commented Jul 20, 2023

yixu34 commented Jul 20, 2023

yixu34 commented Jul 20, 2023 •

edited

Loading