Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel Python execution for tool completion #470

Merged
merged 10 commits into from
Apr 5, 2024

Conversation

yunfeng-scale
Copy link
Collaborator

@yunfeng-scale yunfeng-scale commented Mar 13, 2024

Pull Request Summary

with some sample data (1000 prompts). on my devbox (96 CPU cores), but since i used threadpool to start subprocess i think it still might be slowed down by GIL to start processes, probably not high util for the 96 cores

467s total, 283s tool use
vs
without this change 862s total, 687s tool use

Test Plan and Usage Guide

tested with sample data
also CPU count works inside and outside container

@yunfeng-scale yunfeng-scale requested a review from a team March 13, 2024 19:15
@@ -265,7 +270,9 @@ def tool_func(text: str, past_context: Optional[str]):
or gen_item.remaining_tokens <= 0
):
gen_item.completed = True
continue

pool = ThreadPool(CPU_COUNT)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's probably fine to use single process to start subprocesses when CPU number is low

Copy link

@Georgepu1 Georgepu1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks for investigating! btw were u able to validate the completions were close enough on the "467s total, 283s tool use" vs "862s total, 687s tool use" runs? just want to make sure nothing weird's happening here on the generation side

@yunfeng-scale
Copy link
Collaborator Author

lgtm, thanks for investigating! btw were u able to validate the completions were close enough on the "467s total, 283s tool use" vs "862s total, 687s tool use" runs? just want to make sure nothing weird's happening here on the generation side

there are actually differences, but for a hand-picked sample i can't repro either for some reason. i wonder if this is due to some tiny randomness in vLLM, or python execution. will investigate a bit more

@yunfeng-scale
Copy link
Collaborator Author

took a bit more look into this @Georgepu1 15/1000 output texts have changed, i manually checked 5 of them and i don't think python code execution results are different. the differences appear to be some randomness in sampling that sentences in both cases made sense to me

@yunfeng-scale yunfeng-scale merged commit c46162a into main Apr 5, 2024
5 checks passed
@yunfeng-scale yunfeng-scale deleted the yunfeng-parallel-python branch April 5, 2024 23:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants