-
Notifications
You must be signed in to change notification settings - Fork 897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation fault when building engine #2089
Comments
Hi, @kevinsouthByteDance could you help to provide more information? are you working on 8 GPUs? what's the model config? and what's the precision? |
@Kefeng-Duan i encountered the same error. I'm using the llama2-70b model with 4 GPUs, and the command is in examples/llama/README.md to build fp16 model. Do you have any suggestions on how to resolve this?thanks |
@zhangts20 which type of GPU? |
GPU: 4xA100 |
Hello @Kefeng-Duan @zhangts20 , i have found that this issue is cause by |
I couldn't find the variable |
Yes, it is the value of |
@kevinsouthByteDance sorry for delay replay, how can i set this value when exporting engine, this is a fixed value from |
I change this |
Can we manually change it? I understand it is determined during training. @kevinsouthByteDance @Kefeng-Duan Do you have any other solutions in mind? |
When I tried to build llama engine using
trtllm-build --checkpoint_dir ./tllm_checkpoint_8gpu_tp8/ --output_dir ./output_engine_1 --max_num_tokens 4096 --max_input_len 131072 --max_seq_len 131082 --gemm_plugin auto
, the error information occurs:Can any one solve this problem?
Environment:
Cuda 12.5
tensorrt 10.0.1
tensorrt_llm 0.11.0
tensorrt_bindings 9.0.1.post11.dev4
tensorrt-cu12 10.2.0.post1
tensorrt-cu12-bindings 10.2.0.post1
tensorrt-cu12-libs 10.2.0.post1
tensorrt-libs 9.0.1.post11.dev4
The text was updated successfully, but these errors were encountered: