Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GptManager pybind 2/4TP run demo #701

Closed
BasicCoder opened this issue Dec 19, 2023 · 5 comments
Closed

GptManager pybind 2/4TP run demo #701

BasicCoder opened this issue Dec 19, 2023 · 5 comments
Assignees
Labels
runtime triaged Issue has been triaged by maintainers

Comments

@BasicCoder
Copy link
Contributor

I'm trying to enable GptManager pybinding according to this, but this demo only provides a simple scheduling API.

I'm trying to run model in 2/4 TP. According to cpp gptManagerBenchmark, I encountered a problem, TypeError: cannot pickle 'tensorrt_llm.bindings.InferenceRequest' object , I investigated the corresponding function serialize() and deserialize() which are closed source.

Can you provide a complete 2/4TP execution demo, or provide some ideas for tensorrt_llm.bindings.InferenceRequest python object serialization?

My demo:

rval = []
...
if num_new_work_items > 0:
    count = 0
    while count < num_new_work_items:
        workItem, markedInProgress = self.mWorkItemsQueue.pop()
        if markedInProgress:
            rval.append(workItem.getInferenceRequest())
            count += 1
        else:
            warnStr = f"request Id {workItem.requestId()} has been stopped. Request is ignored."
            print(f"request Id {workItem.requestId()} has been stopped. Request is ignored.")
            self.sendResponse(workItem.requestId(), [], True, warnStr)
    if world_size > 1:
        MPI.COMM_WORLD.bcast(rval, 0)

The last line of code prompts an error: TypeError: cannot pickle 'tensorrt_llm.bindings.InferenceRequest' object.

@byshiue byshiue added triaged Issue has been triaged by maintainers runtime labels Dec 25, 2023
@MartinMarciniszyn
Copy link
Collaborator

Thank you for reporting this. We shall add pickle support for InferenceRequest in the next release.

@BasicCoder
Copy link
Contributor Author

BasicCoder commented Jan 5, 2024

@byshiue @MartinMarciniszyn
This problem has been solved by adding two new interface functions serialize and deserialize

std::vector<int64_t> InferenceRequest::serialize() const
{
    std::shared_ptr<tb::InferenceRequest> ir = toTrtLlm();
    return ir->serialize();
}

std::shared_ptr<InferenceRequest> InferenceRequest::deserialize(const std::vector<int64_t>& packed)
{
    std::shared_ptr<tb::InferenceRequest> ir = tb::InferenceRequest::deserialize(packed);
    return TrtLlmTo(ir);
}

But I encountered the same problem as #782

Only by setting exclude_input_in_output=False can the results be returned correctly.
trtllm commit id: a75618d

[c224f3d064d0:30219:0:31671] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
[c224f3d064d0:30220:0:31666] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
[c224f3d064d0:30221:0:31672] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
[c224f3d064d0:30222:0:31667] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
==== backtrace (tid:  31667) ====
 0 0x0000000000042520 __sigaction()  ???:0
 1 0x0000000000236d0a tensorrt_llm::batch_manager::GptManager::returnCompletedRequests()  ???:0
 2 0x000000000023c13e tensorrt_llm::batch_manager::GptManager::decoupled_execution_loop()  ???:0
 3 0x00000000000dc253 std::error_code::default_error_condition()  ???:0
 4 0x0000000000094ac3 pthread_condattr_setpshared()  ???:0
 5 0x0000000000125bf4 clone()  ???:0
=================================
==== backtrace (tid:  31671) ====
 0 0x0000000000042520 __sigaction()  ???:0
 1 0x0000000000236d0a tensorrt_llm::batch_manager::GptManager::returnCompletedRequests()  ???:0
 2 0x000000000023c13e tensorrt_llm::batch_manager::GptManager::decoupled_execution_loop()  ???:0
 3 0x00000000000dc253 std::error_code::default_error_condition()  ???:0
 4 0x0000000000094ac3 pthread_condattr_setpshared()  ???:0
 5 0x0000000000125bf4 clone()  ???:0
=================================
==== backtrace (tid:  31672) ====
 0 0x0000000000042520 __sigaction()  ???:0
 1 0x0000000000236d0a tensorrt_llm::batch_manager::GptManager::returnCompletedRequests()  ???:0
 2 0x000000000023c13e tensorrt_llm::batch_manager::GptManager::decoupled_execution_loop()  ???:0
 3 0x00000000000dc253 std::error_code::default_error_condition()  ???:0
 4 0x0000000000094ac3 pthread_condattr_setpshared()  ???:0
 5 0x0000000000125bf4 clone()  ???:0
=================================
==== backtrace (tid:  31666) ====
 0 0x0000000000042520 __sigaction()  ???:0
 1 0x0000000000236d0a tensorrt_llm::batch_manager::GptManager::returnCompletedRequests()  ???:0
 2 0x000000000023c13e tensorrt_llm::batch_manager::GptManager::decoupled_execution_loop()  ???:0
 3 0x00000000000dc253 std::error_code::default_error_condition()  ???:0
 4 0x0000000000094ac3 pthread_condattr_setpshared()  ???:0
 5 0x0000000000125bf4 clone()  ???:0
=================================

@MartinMarciniszyn
Copy link
Collaborator

The pickle support for InferenceRequest will be released soon. #782 will be addressed independently.

@cody-moveworks
Copy link

Hi there~! Will an official release containing the commit that introduces pickle support for InferenceRequest be coming soon? I am also trying to use GptManager in a multi-gpu manner.

@MartinMarciniszyn
Copy link
Collaborator

Please expect to see a release in the next couple of weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
runtime triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

4 participants