Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segment falut cause bfloat16 decoder not create #335

Closed
elinx opened this issue Nov 9, 2023 · 2 comments
Closed

segment falut cause bfloat16 decoder not create #335

elinx opened this issue Nov 9, 2023 · 2 comments
Assignees
Labels
triaged Issue has been triaged by maintainers

Comments

@elinx
Copy link

elinx commented Nov 9, 2023

I was running bfloat16 baichuan-7b model for benchmark and run into segment falut:

+ ../../cpp/build/benchmarks/gptSessionBenchmark --duration 20 --num_runs 5 --model baichuan --engine_dir xxx/baichuan_v1_7b/trt_engines/f16/1-gpu/ --batch_size 1 --input_output_len 128,240
craete dype:7
[sucloud-A100-02-devel:2315620:0:2315620] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
==== backtrace (tid:2315620) ====
0 0x0000000000042520 __sigaction()  ???:0
1 0x0000000000caed18 tensorrt_llm::runtime::StatefulGptDecoder::newBatch()  ???:0
2 0x0000000000c786e9 tensorrt_llm::runtime::GptSession::initNewTokens()  ???:0
3 0x0000000000c7c10c tensorrt_llm::runtime::GptSession::generateSingleBatch()  ???:0
4 0x0000000000017537 main()  ???:0
5 0x0000000000029d90 __libc_init_first()  ???:0
6 0x0000000000029e40 __libc_start_main()  ???:0
7 0x0000000000018fe5 _start()  ???:0

after digging into the code, I found it's because the gpt decoder not created sussefullly if the dtype is bfloat16:

switch (dtype)
{
case nvinfer1::DataType::kFLOAT: return std::make_unique<GptDecoder<float>>(vocabSize, vocabSizePadded, stream);
case nvinfer1::DataType::kHALF: return std::make_unique<GptDecoder<half>>(vocabSize, vocabSizePadded, stream);
default: return nullptr;
}

Also I noticed that several components like decoder, dynamic decoder, topk etc all haven't instantiated bfloat16 template. Is this left there for some reason or did I miss something? thanks

namespace tensorrt_llm::runtime
{
template class GptDecoder<float>;
template class GptDecoder<half>;
} // namespace tensorrt_llm::runtime

@juney-nvidia juney-nvidia self-assigned this Nov 9, 2023
@juney-nvidia juney-nvidia added the triaged Issue has been triaged by maintainers label Nov 9, 2023
@nekorobov
Copy link
Collaborator

Hi @elinx , thank you for reporting the issue. Indeed, we do not support decoder with bfloat16. Usually, we force lm_head and logits dtype to be either float32 or float16 and use the respective decoder type. However, this cast is missing in baichuan model definition. We'll include proper fix to the next release/update to the repo. Meanwhile, you can solve this issue by changing this line:
https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/models/baichuan/model.py#L271
to lm_logits.mark_output('logits', 'float32'), i.e. forcing your logits and decoder to be float32 type. I hope it'll help.

Thanks,
Nikita

@juney-nvidia
Copy link
Collaborator

The fix has already been pushed to the main branch, pls have a try. @elinx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

3 participants