-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GKE Helm deployment #157
Comments
Hi @brosand , our initial self-hosting deliverable is going to be a fully-contained helm chart that points to k8s-native dependencies (e.g. postgres running on k8s). You can follow along at #141. Once that's in place, presumably you can run on GKE as well. There may be some kinks to iron out at first. You can replace the k8s native bits with cloud-managed equivalents. |
Awesome thanks! While I'm feature requesting -- do you have any latency testing benchmarks available? Particularly interested in how this engine compares to engines such as https://github.com/huggingface/text-generation-inference |
Oh we use text-generation-inference under the hood (see llm-engine/server/llm_engine_server/domain/use_cases/llm_model_endpoint_use_cases.py Line 230 in b2588ea
The main thing we're offering here is a helm chart with k8s resources to help with scaling, etc. In terms of performance, we generally try to patch text-generation-inference first, and update our own server code only when needed. |
Is there any thought/desire to setup a helm chart for gke deployment as well?
The text was updated successfully, but these errors were encountered: