Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TensorRT-LLM][ERROR] Assertion failed: hasValues == configValue.has_value() (/app/tensorrt_llm/cpp/include/tensorrt_llm/runtime/samplingConfig.h:46 #1447

Closed
4 tasks
NikolaBorisov opened this issue Apr 13, 2024 · 3 comments
Assignees
Labels
bug Something isn't working triaged Issue has been triaged by maintainers

Comments

@NikolaBorisov
Copy link

System Info

H100 x8 SXM 80G, 2TB Ram, x86, main branch of TRTLLM

Who can help?

@byshiue

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Run Triton with tensorrtLLM on 8x H100 server with mistral 8x22b model

Expected behavior

No crashes

actual behavior

at some point the server prints an error
We are seeing a crash in samplingConfig.h:46.

[TensorRT-LLM][ERROR] Encountered an error in forward function: [TensorRT-LLM][ERROR] Assertion failed: hasValues == configValue.has_value() (/app/tensorrt_llm/cpp/include/tensorrt_llm/runtime/samplingConfig.h:46)
1       0x7f2f2005df31 tensorrt_llm::common::throwRuntimeError(char const*, int, std::string const&) + 102
2       0x7f2db5ceffcc std::optional<std::vector<int, std::allocator<int> > > tensorrt_llm::runtime::SamplingConfig::fuseValues<int>(std::vector<tensorrt_llm::runtime::SamplingConfig, std::allocator<tensorrt_llm::runtime::SamplingConfig> > const&, std::function<std::optional<std::vector<int, std::allocator<int> > > (int)>) + 476
3       0x7f2db5cf0ae9 tensorrt_llm::runtime::SamplingConfig::SamplingConfig(std::vector<tensorrt_llm::runtime::SamplingConfig, std::allocator<tensorrt_llm::runtime::SamplingConfig> > const&) + 873
4       0x7f2db5ce4344 tensorrt_llm::runtime::GptDecoderBatch::newRequests(std::vector<int, std::allocator<int> > const&, std::vector<tensorrt_llm::runtime::decoder_batch::Request, std::allocator<tensorrt_llm::runtime::decoder_batch::Request> > const&, std::vector<tensorrt_llm::runtime::SamplingConfig, std::allocator<tensorrt_llm::runtime::SamplingConfig> > const&) + 404
5       0x7f2db5dfb5f3 tensorrt_llm::batch_manager::TrtGptModelInflightBatching::setupDecoderStep(std::map<unsigned long, std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > > >&, std::vector<unsigned long, std::allocator<unsigned long> > const&) + 851
6       0x7f2db5dfd8d7 tensorrt_llm::batch_manager::TrtGptModelInflightBatching::forward(std::list<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > >&) + 5495
7       0x7f2db5dafd34 tensorrt_llm::batch_manager::GptManager::step(std::list<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > >&, std::set<unsigned long, std::less<unsigned long>, std::allocator<unsigned long> >&) + 36
8       0x7f2db5db7e64 tensorrt_llm::batch_manager::GptManager::decoupled_execution_loop() + 404
9       0x7f31359f2253 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253) [0x7f31359f2253]
10      0x7f3135781ac3 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f3135781ac3]
11      0x7f3135813850 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x126850) [0x7f3135813850]
[TensorRT-LLM][WARNING] Step function failed, continuing.

After this crash the server continues to work, but batch size is limited to 2. It prints the above error number of times.

additional notes

Causes server to be unsuable

@NikolaBorisov NikolaBorisov added the bug Something isn't working label Apr 13, 2024
@juney-nvidia
Copy link
Collaborator

@nekorobov Can you help take a look of this issue? Thanks

June

@juney-nvidia juney-nvidia added the triaged Issue has been triaged by maintainers label Apr 13, 2024
@nekorobov
Copy link
Collaborator

nekorobov commented Apr 18, 2024

Hi @NikolaBorisov , thank you for reporting the issue. It likely happens due to the fact that one request has some value set in the samplingConfig, while the other request does not have it. Could you, please, either confirm or deny that this is the case in your setup? If it is the case, the temporary fix is, on the caller side, to enforce that either all requests or none of them specify parameter to sampling config.

Meanwhile, we'll work on the fix from our side, thanks

@nekorobov
Copy link
Collaborator

The issue should be solved in the latest main branch. Could you try it and reopen if it does not work for you? Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

3 participants