After multiple model workers start working concurrently for the first time, requests will only be received by one of the workers. #3484

PaulX1029 · 2024-08-19T09:21:00Z

I am using a server.controller to control 3 model_workers, which are placed on 3 GPUs, and then I opened 3 identical server.gradio_web_server and input the same question. The first time, all 3 gradio_web_server can output content at the same time. But when all the outputs are finished, the second time I send requests to all three gradio_web_server simultaneously, only one model_worker works (i.e., only one gradio_web_server has a streaming output), and when I check the GPU utilization, only one GPU is being used. Can anyone tell me what the reason for this is?
Is there anyone who has the same question?

我使用一个server.controller控制了3个model_worker，分别放置在3张GPU上，然后打开了3个相同的server.gradio_web_server，输入同一个问题，第一次，这3个gradio_web_server能同时输出内容，等到全部输出完毕后，第二次同时向这三个gradio_web_server发送请求，只会有一个model_worker工作（即只有一个gradio_web_server有流式输出），查看显卡利用率也仅仅只有一块GPU被使用，请问这是什么原因呢？
有任何朋友跟我有一样的疑问吗？

surak · 2024-08-19T20:17:04Z

I have noticed this too. There is a queue which should do a round-robin between the workers, but it’s not working. Thanks for the report.

PaulX1029 · 2024-08-23T02:52:42Z

@surak do you have a plan to fix that? Thanks

surak · 2024-08-23T07:38:51Z

It’s being worked on at #3490

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

After multiple model workers start working concurrently for the first time, requests will only be received by one of the workers. #3484

After multiple model workers start working concurrently for the first time, requests will only be received by one of the workers. #3484

PaulX1029 commented Aug 19, 2024

surak commented Aug 19, 2024

PaulX1029 commented Aug 23, 2024

surak commented Aug 23, 2024

After multiple model workers start working concurrently for the first time, requests will only be received by one of the workers. #3484

After multiple model workers start working concurrently for the first time, requests will only be received by one of the workers. #3484

Comments

PaulX1029 commented Aug 19, 2024

surak commented Aug 19, 2024

PaulX1029 commented Aug 23, 2024

surak commented Aug 23, 2024