Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GKE Helm deployment #157

Closed
brosand opened this issue Jul 20, 2023 · 3 comments
Closed

GKE Helm deployment #157

brosand opened this issue Jul 20, 2023 · 3 comments

Comments

@brosand
Copy link

brosand commented Jul 20, 2023

Is there any thought/desire to setup a helm chart for gke deployment as well?

@yixu34
Copy link
Member

yixu34 commented Jul 20, 2023

Hi @brosand , our initial self-hosting deliverable is going to be a fully-contained helm chart that points to k8s-native dependencies (e.g. postgres running on k8s). You can follow along at #141. Once that's in place, presumably you can run on GKE as well. There may be some kinks to iron out at first. You can replace the k8s native bits with cloud-managed equivalents.

@brosand
Copy link
Author

brosand commented Jul 20, 2023

Awesome thanks! While I'm feature requesting -- do you have any latency testing benchmarks available? Particularly interested in how this engine compares to engines such as https://github.com/huggingface/text-generation-inference

@yixu34
Copy link
Member

yixu34 commented Jul 20, 2023

Oh we use text-generation-inference under the hood (see

repository="ghcr.io/huggingface/text-generation-inference", # TODO: let user choose repo
), so you can expect comparable numbers.

The main thing we're offering here is a helm chart with k8s resources to help with scaling, etc.

In terms of performance, we generally try to patch text-generation-inference first, and update our own server code only when needed.

@brosand brosand closed this as completed Jul 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants