Parallel Python execution for tool completion #470

yunfeng-scale · 2024-03-13T19:14:54Z

Pull Request Summary

with some sample data (1000 prompts). on my devbox (96 CPU cores), but since i used threadpool to start subprocess i think it still might be slowed down by GIL to start processes, probably not high util for the 96 cores

467s total, 283s tool use
vs
without this change 862s total, 687s tool use

Test Plan and Usage Guide

tested with sample data
also CPU count works inside and outside container

yunfeng-scale · 2024-03-13T19:16:23Z

model-engine/model_engine_server/inference/batch_inference/vllm_batch.py

@@ -265,7 +270,9 @@ def tool_func(text: str, past_context: Optional[str]):
                or gen_item.remaining_tokens <= 0
            ):
                gen_item.completed = True
-                continue
+
+        pool = ThreadPool(CPU_COUNT)


it's probably fine to use single process to start subprocesses when CPU number is low

Georgepu1

lgtm, thanks for investigating! btw were u able to validate the completions were close enough on the "467s total, 283s tool use" vs "862s total, 687s tool use" runs? just want to make sure nothing weird's happening here on the generation side

yunfeng-scale · 2024-03-14T04:18:07Z

lgtm, thanks for investigating! btw were u able to validate the completions were close enough on the "467s total, 283s tool use" vs "862s total, 687s tool use" runs? just want to make sure nothing weird's happening here on the generation side

there are actually differences, but for a hand-picked sample i can't repro either for some reason. i wonder if this is due to some tiny randomness in vLLM, or python execution. will investigate a bit more

yunfeng-scale · 2024-04-04T23:25:48Z

took a bit more look into this @Georgepu1 15/1000 output texts have changed, i manually checked 5 of them and i don't think python code execution results are different. the differences appear to be some randomness in sampling that sentences in both cases made sense to me

model-engine/model_engine_server/inference/batch_inference/vllm_batch.py

…ine into yunfeng-parallel-python

model-engine/model_engine_server/inference/batch_inference/vllm_batch.py

yunfeng-scale added 2 commits March 13, 2024 18:28

test

e828d2d

Update

524d386

yunfeng-scale requested a review from a team March 13, 2024 19:15

yunfeng-scale commented Mar 13, 2024

View reviewed changes

yunfeng-scale requested a review from Georgepu1 March 13, 2024 19:18

Georgepu1 approved these changes Mar 13, 2024

View reviewed changes

yunfeng-scale added 3 commits April 4, 2024 18:52

Merge remote-tracking branch 'origin/main' into yunfeng-parallel-python

c3d8981

Don't fail

d2b2927

dont' fail

4da52fc

Merge branch 'main' into yunfeng-parallel-python

a711890

yixu34 reviewed Apr 4, 2024

View reviewed changes

model-engine/model_engine_server/inference/batch_inference/vllm_batch.py Outdated Show resolved Hide resolved

yunfeng-scale added 2 commits April 5, 2024 18:33

Accurate CPU count

0bab89d

Merge branch 'yunfeng-parallel-python' of github.com:scaleapi/llm-eng…

c5b5ee8

…ine into yunfeng-parallel-python

yunfeng-scale requested a review from yixu34 April 5, 2024 18:34

format

126de8d

yixu34 reviewed Apr 5, 2024

View reviewed changes

model-engine/model_engine_server/inference/batch_inference/vllm_batch.py Outdated Show resolved Hide resolved

yixu34 reviewed Apr 5, 2024

View reviewed changes

model-engine/model_engine_server/inference/batch_inference/vllm_batch.py Outdated Show resolved Hide resolved

comments

7e446f8

yunfeng-scale requested a review from yixu34 April 5, 2024 20:50

yixu34 approved these changes Apr 5, 2024

View reviewed changes

yunfeng-scale merged commit c46162a into main Apr 5, 2024
5 checks passed

yunfeng-scale deleted the yunfeng-parallel-python branch April 5, 2024 23:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel Python execution for tool completion #470

Parallel Python execution for tool completion #470

yunfeng-scale commented Mar 13, 2024 •

edited

Loading

yunfeng-scale Mar 13, 2024

Georgepu1 left a comment

yunfeng-scale commented Mar 14, 2024

yunfeng-scale commented Apr 4, 2024

Parallel Python execution for tool completion #470

Parallel Python execution for tool completion #470

Conversation

yunfeng-scale commented Mar 13, 2024 • edited Loading

Pull Request Summary

Test Plan and Usage Guide

yunfeng-scale Mar 13, 2024

Choose a reason for hiding this comment

Georgepu1 left a comment

Choose a reason for hiding this comment

yunfeng-scale commented Mar 14, 2024

yunfeng-scale commented Apr 4, 2024

yunfeng-scale commented Mar 13, 2024 •

edited

Loading