segment falut cause bfloat16 decoder not create #335

elinx · 2023-11-09T12:34:17Z

I was running bfloat16 baichuan-7b model for benchmark and run into segment falut:

+ ../../cpp/build/benchmarks/gptSessionBenchmark --duration 20 --num_runs 5 --model baichuan --engine_dir xxx/baichuan_v1_7b/trt_engines/f16/1-gpu/ --batch_size 1 --input_output_len 128,240
craete dype:7
[sucloud-A100-02-devel:2315620:0:2315620] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
==== backtrace (tid:2315620) ====
0 0x0000000000042520 __sigaction()  ???:0
1 0x0000000000caed18 tensorrt_llm::runtime::StatefulGptDecoder::newBatch()  ???:0
2 0x0000000000c786e9 tensorrt_llm::runtime::GptSession::initNewTokens()  ???:0
3 0x0000000000c7c10c tensorrt_llm::runtime::GptSession::generateSingleBatch()  ???:0
4 0x0000000000017537 main()  ???:0
5 0x0000000000029d90 __libc_init_first()  ???:0
6 0x0000000000029e40 __libc_start_main()  ???:0
7 0x0000000000018fe5 _start()  ???:0

after digging into the code, I found it's because the gpt decoder not created sussefullly if the dtype is bfloat16:

TensorRT-LLM/cpp/include/tensorrt_llm/runtime/gptDecoder.h

Lines 85 to 90 in 71a5b97

    
           switch (dtype) 
        
           { 
        
           case nvinfer1::DataType::kFLOAT: return std::make_unique<GptDecoder<float>>(vocabSize, vocabSizePadded, stream); 
        
           case nvinfer1::DataType::kHALF: return std::make_unique<GptDecoder<half>>(vocabSize, vocabSizePadded, stream); 
        
           default: return nullptr; 
        
           }

Also I noticed that several components like decoder, dynamic decoder, topk etc all haven't instantiated bfloat16 template. Is this left there for some reason or did I miss something? thanks

TensorRT-LLM/cpp/tensorrt_llm/runtime/gptDecoder.cpp

Lines 264 to 268 in 71a5b97

    
           namespace tensorrt_llm::runtime 
        
           { 
        
           template class GptDecoder<float>; 
        
           template class GptDecoder<half>; 
        
           } // namespace tensorrt_llm::runtime

The text was updated successfully, but these errors were encountered:

nekorobov · 2023-11-09T15:21:33Z

Hi @elinx , thank you for reporting the issue. Indeed, we do not support decoder with bfloat16. Usually, we force lm_head and logits dtype to be either float32 or float16 and use the respective decoder type. However, this cast is missing in baichuan model definition. We'll include proper fix to the next release/update to the repo. Meanwhile, you can solve this issue by changing this line:
https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/models/baichuan/model.py#L271
to lm_logits.mark_output('logits', 'float32'), i.e. forcing your logits and decoder to be float32 type. I hope it'll help.

Thanks,
Nikita

juney-nvidia · 2023-11-10T14:59:49Z

The fix has already been pushed to the main branch, pls have a try. @elinx

juney-nvidia self-assigned this Nov 9, 2023

juney-nvidia added the triaged Issue has been triaged by maintainers label Nov 9, 2023

juney-nvidia assigned juney-nvidia and unassigned juney-nvidia Nov 9, 2023

juney-nvidia assigned nekorobov Nov 9, 2023

juney-nvidia closed this as completed Nov 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

segment falut cause bfloat16 decoder not create #335

segment falut cause bfloat16 decoder not create #335

elinx commented Nov 9, 2023 •

edited

Loading

nekorobov commented Nov 9, 2023

juney-nvidia commented Nov 10, 2023

segment falut cause bfloat16 decoder not create #335

segment falut cause bfloat16 decoder not create #335

Comments

elinx commented Nov 9, 2023 • edited Loading

nekorobov commented Nov 9, 2023

juney-nvidia commented Nov 10, 2023

elinx commented Nov 9, 2023 •

edited

Loading