Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Control frequency - completion #277

Open
Stealthwriter opened this issue Sep 14, 2023 · 3 comments
Open

Control frequency - completion #277

Stealthwriter opened this issue Sep 14, 2023 · 3 comments

Comments

@Stealthwriter
Copy link

Hi,

Is there a way to change the frequency_penality or logit bias when sending a completion request?

@yixu34
Copy link
Member

yixu34 commented Sep 18, 2023

Hi @Stealthwriter , thanks for reaching out. Yes, you can basically route any API changes through to the underlying inference framework(s) we use, assuming they support the fields you need. For instance, we currently support https://github.com/scaleapi/open-tgi (forked from text-generation-inference v0.9.4) and vLLM. Would you like to try making the change yourself?

@Stealthwriter
Copy link
Author

How does llm engine differs from TGI and VLLM?

@yixu34
Copy link
Member

yixu34 commented Sep 18, 2023

You can think of LLM Engine as adding 1) a set of higher-level abstractions (e.g. APIs are expressed in terms of Completions and Fine-tunes) and 2) autoscaling via k8s. TGI and vLLM are great but you have to bring your own scaling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants