GptManager pybind 2/4TP run demo #701

BasicCoder · 2023-12-19T13:25:12Z

I'm trying to enable GptManager pybinding according to this, but this demo only provides a simple scheduling API.

I'm trying to run model in 2/4 TP. According to cpp gptManagerBenchmark, I encountered a problem, TypeError: cannot pickle 'tensorrt_llm.bindings.InferenceRequest' object , I investigated the corresponding function serialize() and deserialize() which are closed source.

Can you provide a complete 2/4TP execution demo, or provide some ideas for tensorrt_llm.bindings.InferenceRequest python object serialization?

My demo:

rval = []
...
if num_new_work_items > 0:
    count = 0
    while count < num_new_work_items:
        workItem, markedInProgress = self.mWorkItemsQueue.pop()
        if markedInProgress:
            rval.append(workItem.getInferenceRequest())
            count += 1
        else:
            warnStr = f"request Id {workItem.requestId()} has been stopped. Request is ignored."
            print(f"request Id {workItem.requestId()} has been stopped. Request is ignored.")
            self.sendResponse(workItem.requestId(), [], True, warnStr)
    if world_size > 1:
        MPI.COMM_WORLD.bcast(rval, 0)

The last line of code prompts an error: TypeError: cannot pickle 'tensorrt_llm.bindings.InferenceRequest' object.

The text was updated successfully, but these errors were encountered:

MartinMarciniszyn · 2024-01-03T13:36:26Z

Thank you for reporting this. We shall add pickle support for InferenceRequest in the next release.

BasicCoder · 2024-01-05T02:32:33Z

@byshiue @MartinMarciniszyn
This problem has been solved by adding two new interface functions serialize and deserialize

std::vector<int64_t> InferenceRequest::serialize() const
{
    std::shared_ptr<tb::InferenceRequest> ir = toTrtLlm();
    return ir->serialize();
}

std::shared_ptr<InferenceRequest> InferenceRequest::deserialize(const std::vector<int64_t>& packed)
{
    std::shared_ptr<tb::InferenceRequest> ir = tb::InferenceRequest::deserialize(packed);
    return TrtLlmTo(ir);
}

But I encountered the same problem as #782

Only by setting exclude_input_in_output=False can the results be returned correctly.
trtllm commit id: a75618d

[c224f3d064d0:30219:0:31671] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
[c224f3d064d0:30220:0:31666] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
[c224f3d064d0:30221:0:31672] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
[c224f3d064d0:30222:0:31667] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
==== backtrace (tid:  31667) ====
 0 0x0000000000042520 __sigaction()  ???:0
 1 0x0000000000236d0a tensorrt_llm::batch_manager::GptManager::returnCompletedRequests()  ???:0
 2 0x000000000023c13e tensorrt_llm::batch_manager::GptManager::decoupled_execution_loop()  ???:0
 3 0x00000000000dc253 std::error_code::default_error_condition()  ???:0
 4 0x0000000000094ac3 pthread_condattr_setpshared()  ???:0
 5 0x0000000000125bf4 clone()  ???:0
=================================
==== backtrace (tid:  31671) ====
 0 0x0000000000042520 __sigaction()  ???:0
 1 0x0000000000236d0a tensorrt_llm::batch_manager::GptManager::returnCompletedRequests()  ???:0
 2 0x000000000023c13e tensorrt_llm::batch_manager::GptManager::decoupled_execution_loop()  ???:0
 3 0x00000000000dc253 std::error_code::default_error_condition()  ???:0
 4 0x0000000000094ac3 pthread_condattr_setpshared()  ???:0
 5 0x0000000000125bf4 clone()  ???:0
=================================
==== backtrace (tid:  31672) ====
 0 0x0000000000042520 __sigaction()  ???:0
 1 0x0000000000236d0a tensorrt_llm::batch_manager::GptManager::returnCompletedRequests()  ???:0
 2 0x000000000023c13e tensorrt_llm::batch_manager::GptManager::decoupled_execution_loop()  ???:0
 3 0x00000000000dc253 std::error_code::default_error_condition()  ???:0
 4 0x0000000000094ac3 pthread_condattr_setpshared()  ???:0
 5 0x0000000000125bf4 clone()  ???:0
=================================
==== backtrace (tid:  31666) ====
 0 0x0000000000042520 __sigaction()  ???:0
 1 0x0000000000236d0a tensorrt_llm::batch_manager::GptManager::returnCompletedRequests()  ???:0
 2 0x000000000023c13e tensorrt_llm::batch_manager::GptManager::decoupled_execution_loop()  ???:0
 3 0x00000000000dc253 std::error_code::default_error_condition()  ???:0
 4 0x0000000000094ac3 pthread_condattr_setpshared()  ???:0
 5 0x0000000000125bf4 clone()  ???:0
=================================

MartinMarciniszyn · 2024-01-05T14:58:52Z

The pickle support for InferenceRequest will be released soon. #782 will be addressed independently.

cody-moveworks · 2024-01-24T18:40:28Z

Hi there~! Will an official release containing the commit that introduces pickle support for InferenceRequest be coming soon? I am also trying to use GptManager in a multi-gpu manner.

MartinMarciniszyn · 2024-01-30T17:41:22Z

Please expect to see a release in the next couple of weeks.

byshiue assigned MartinMarciniszyn Dec 25, 2023

byshiue added triaged Issue has been triaged by maintainers runtime labels Dec 25, 2023

kaiyux mentioned this issue Jan 9, 2024

Update TensorRT-LLM #846

Merged

MartinMarciniszyn closed this as completed Jan 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GptManager pybind 2/4TP run demo #701

GptManager pybind 2/4TP run demo #701

BasicCoder commented Dec 19, 2023

MartinMarciniszyn commented Jan 3, 2024

BasicCoder commented Jan 5, 2024 •

edited

Loading

MartinMarciniszyn commented Jan 5, 2024

cody-moveworks commented Jan 24, 2024

MartinMarciniszyn commented Jan 30, 2024

GptManager pybind 2/4TP run demo #701

GptManager pybind 2/4TP run demo #701

Comments

BasicCoder commented Dec 19, 2023

MartinMarciniszyn commented Jan 3, 2024

BasicCoder commented Jan 5, 2024 • edited Loading

MartinMarciniszyn commented Jan 5, 2024

cody-moveworks commented Jan 24, 2024

MartinMarciniszyn commented Jan 30, 2024

BasicCoder commented Jan 5, 2024 •

edited

Loading