Segmentation fault when building engine #2089

kevinsouthByteDance · 2024-08-06T04:00:53Z

When I tried to build llama engine usingtrtllm-build --checkpoint_dir ./tllm_checkpoint_8gpu_tp8/ --output_dir ./output_engine_1 --max_num_tokens 4096 --max_input_len 131072 --max_seq_len 131082 --gemm_plugin auto, the error information occurs:

[08/06/2024-11:56:31] [TRT] [I] Global timing cache in use. Profiling results in this builder pass will be stored.
[08/06/2024-11:56:35] [TRT] [I] [GraphReduction] The approximate region cut reduction algorithm is called.
[08/06/2024-11:56:35] [TRT] [I] Detected 15 inputs and 1 output network tensors.
[n101-006-033:3841514] *** Process received signal ***
[n101-006-033:3841514] Signal: Segmentation fault (11)
[n101-006-033:3841514] Signal code:  (128)
[n101-006-033:3841514] Failing at address: (nil)
[n101-006-033:3841514] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7fce4699b420]
[n101-006-033:3841514] [ 1] /root/miniconda3/envs/torch/bin/../lib/libstdc++.so.6(_ZNSs6assignERKSs+0x9d)[0x7fcda3a429cd]
[n101-006-033:3841514] [ 2] /mnt/llm/TensorRT-LLM/tensorrt_llm/libs/libtensorrt_llm_nvrtc_wrapper.so(tllmXqaJitCreateAndCompileProgram+0x4a2)[0x7fcb29164ef2]
[n101-006-033:3841514] [ 3] /mnt/llm/TensorRT-LLM/tensorrt_llm/libs/libtensorrt_llm.so(_ZNK12tensorrt_llm7kernels3jit13CompileEngine7compileEv+0xb1)[0x7fcb2dbcf941]
[n101-006-033:3841514] [ 4] /mnt/llm/TensorRT-LLM/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm7kernels17DecoderXQAImplJIT25prepareForActualXQAParamsERKNS0_9XQAParamsE+0x15a)[0x7fcb2dbd0daa]
[n101-006-033:3841514] [ 5] /mnt/llm/TensorRT-LLM/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm7kernels17DecoderXQAImplJIT7prepareERKNS0_9XQAParamsE+0x60)[0x7fcb2dbd1040]
[n101-006-033:3841514] [ 6] /mnt/llm/TensorRT-LLM/tensorrt_llm/libs/libnvinfer_plugin_tensorrt_llm.so(+0xd5b7f)[0x7fcae0046b7f]
[n101-006-033:3841514] [ 7] /mnt/llm/TensorRT-LLM/tensorrt_llm/libs/libnvinfer_plugin_tensorrt_llm.so(+0xf5d26)[0x7fcae0066d26]
[n101-006-033:3841514] [ 8] /root/miniconda3/envs/torch/lib/python3.10/site-packages/tensorrt_libs/libnvinfer.so.10(+0xe4f930)[0x7fcd8fe67930]
[n101-006-033:3841514] [ 9] /root/miniconda3/envs/torch/lib/python3.10/site-packages/tensorrt_libs/libnvinfer.so.10(+0xe5026c)[0x7fcd8fe6826c]
[n101-006-033:3841514] [10] /root/miniconda3/envs/torch/lib/python3.10/site-packages/tensorrt_libs/libnvinfer.so.10(+0xb3183f)[0x7fcd8fb4983f]
[n101-006-033:3841514] [11] /root/miniconda3/envs/torch/lib/python3.10/site-packages/tensorrt_libs/libnvinfer.so.10(+0xb3dcdf)[0x7fcd8fb55cdf]
[n101-006-033:3841514] [12] /root/miniconda3/envs/torch/lib/python3.10/site-packages/tensorrt_libs/libnvinfer.so.10(+0xb44c5c)[0x7fcd8fb5cc5c]
[n101-006-033:3841514] [13] /root/miniconda3/envs/torch/lib/python3.10/site-packages/tensorrt_libs/libnvinfer.so.10(+0xb46e05)[0x7fcd8fb5ee05]
[n101-006-033:3841514] [14] /root/miniconda3/envs/torch/lib/python3.10/site-packages/tensorrt_libs/libnvinfer.so.10(+0xaaa78c)[0x7fcd8fac278c]
[n101-006-033:3841514] [15] /root/miniconda3/envs/torch/lib/python3.10/site-packages/tensorrt_libs/libnvinfer.so.10(+0xaafa1a)[0x7fcd8fac7a1a]
[n101-006-033:3841514] [16] /root/miniconda3/envs/torch/lib/python3.10/site-packages/tensorrt_libs/libnvinfer.so.10(+0xab0465)[0x7fcd8fac8465]
[n101-006-033:3841514] [17] /root/miniconda3/envs/torch/lib/python3.10/site-packages/tensorrt_bindings/tensorrt.so(+0xa84d8)[0x7fcc56d594d8]
[n101-006-033:3841514] [18] /root/miniconda3/envs/torch/lib/python3.10/site-packages/tensorrt_bindings/tensorrt.so(+0x459d3)[0x7fcc56cf69d3]
[n101-006-033:3841514] [19] /root/miniconda3/envs/torch/bin/python3[0x4fdc87]
[n101-006-033:3841514] [20] /root/miniconda3/envs/torch/bin/python3(_PyObject_MakeTpCall+0x25b)[0x4f741b]
[n101-006-033:3841514] [21] /root/miniconda3/envs/torch/bin/python3[0x509cbf]
[n101-006-033:3841514] [22] /root/miniconda3/envs/torch/bin/python3(_PyEval_EvalFrameDefault+0x4b26)[0x4f2c16]
[n101-006-033:3841514] [23] /root/miniconda3/envs/torch/bin/python3(_PyFunction_Vectorcall+0x6f)[0x4fe0cf]
[n101-006-033:3841514] [24] /root/miniconda3/envs/torch/bin/python3(_PyEval_EvalFrameDefault+0x2b79)[0x4f0c69]
[n101-006-033:3841514] [25] /root/miniconda3/envs/torch/bin/python3(_PyFunction_Vectorcall+0x6f)[0x4fe0cf]
[n101-006-033:3841514] [26] /root/miniconda3/envs/torch/bin/python3(_PyEval_EvalFrameDefault+0x731)[0x4ee821]
[n101-006-033:3841514] [27] /root/miniconda3/envs/torch/bin/python3(_PyFunction_Vectorcall+0x6f)[0x4fe0cf]
[n101-006-033:3841514] [28] /root/miniconda3/envs/torch/bin/python3(_PyEval_EvalFrameDefault+0x31f)[0x4ee40f]
[n101-006-033:3841514] [29] /root/miniconda3/envs/torch/bin/python3(_PyFunction_Vectorcall+0x6f)[0x4fe0cf]
[n101-006-033:3841514] *** End of error message ***
Segmentation fault (core dumped)

Can any one solve this problem?

Environment:
Cuda 12.5
tensorrt 10.0.1
tensorrt_llm 0.11.0
tensorrt_bindings 9.0.1.post11.dev4
tensorrt-cu12 10.2.0.post1
tensorrt-cu12-bindings 10.2.0.post1
tensorrt-cu12-libs 10.2.0.post1
tensorrt-libs 9.0.1.post11.dev4

The text was updated successfully, but these errors were encountered:

Kefeng-Duan · 2024-08-20T01:07:22Z

Hi, @kevinsouthByteDance could you help to provide more information? are you working on 8 GPUs? what's the model config? and what's the precision?

zhangts20 · 2024-08-25T08:55:55Z

@Kefeng-Duan i encountered the same error. I'm using the llama2-70b model with 4 GPUs, and the command is in examples/llama/README.md to build fp16 model. Do you have any suggestions on how to resolve this?thanks

Kefeng-Duan · 2024-08-26T00:56:02Z

@zhangts20 which type of GPU?

zhangts20 · 2024-08-26T01:06:26Z

@zhangts20 which type of GPU?

GPU: 4xA100
tensorrt_llm: v0.11.0
tensorrt: 10.1.0
cuda: 12.1
driver: 525.60.13

kevinsouthByteDance · 2024-08-26T02:36:09Z

Hi, @kevinsouthByteDance could you help to provide more information? are you working on 8 GPUs? what's the model config? and what's the precision?

Hello @Kefeng-Duan @zhangts20 , i have found that this issue is cause by max_position_ids in my model config. When I change the max_position_ids to a larger value, i could build successfully.

zhangts20 · 2024-08-26T04:16:33Z

Hi, @kevinsouthByteDance could you help to provide more information? are you working on 8 GPUs? what's the model config? and what's the precision?

Hello @Kefeng-Duan @zhangts20 , i have found that this issue is cause by max_position_ids in my model config. When I change the max_position_ids to a larger value, i could build successfully.

I couldn't find the variable max_position_ids in my config.json, do you mean max_position_embeddings? And what would be an appropriate value to set for it? @kevinsouthByteDance

kevinsouthByteDance · 2024-08-26T04:47:35Z

Hi, @kevinsouthByteDance could you help to provide more information? are you working on 8 GPUs? what's the model config? and what's the precision?

Hello @Kefeng-Duan @zhangts20 , i have found that this issue is cause by max_position_ids in my model config. When I change the max_position_ids to a larger value, i could build successfully.

I couldn't find the variable max_position_ids in my config.json, do you mean max_position_embeddings? And what would be an appropriate value to set for it? @kevinsouthByteDance

Yes, it is the value of max_position_embeddings, it should be greater than input length + output length. For example, if you wanna test the model with input length: 8192 and output length: 50, then max_position_embeddings should be greater than 8192 + 50 = 8242.

zhangts20 · 2024-09-03T08:35:37Z

@kevinsouthByteDance sorry for delay replay, how can i set this value when exporting engine, this is a fixed value from config.json. In TensorRT-LLM, --n-positions can change the value of max_position_embeddings only when model_dir is none https://github.com/NVIDIA/TensorRT-LLM/blob/v0.11.0/examples/llama/convert_checkpoint.py#L294

kevinsouthByteDance · 2024-09-03T08:41:10Z

@kevinsouthByteDance sorry for delay replay, how can i set this value when exporting engine, this is a fixed value from config.json. In TensorRT-LLM, --n-positions can change the value of max_position_embeddings only when model_dir is none https://github.com/NVIDIA/TensorRT-LLM/blob/v0.11.0/examples/llama/convert_checkpoint.py#L294

I change this max_position_embeddings in the config file of the huggingface model. Actually there is multiple steps before you build engine, first you should convert the model from a huggingface model(or your own model) to a trtllm checkpoint(you will call convert_checkpoint.py), second you will call trtllm-build to build the engine. And I change the config file before I call the convert_checkpoint.py.

zhangts20 · 2024-09-03T09:28:33Z

Can we manually change it? I understand it is determined during training. @kevinsouthByteDance @Kefeng-Duan Do you have any other solutions in mind?

lfr-0531 assigned Kefeng-Duan Sep 4, 2024

lfr-0531 added bug Something isn't working triaged Issue has been triaged by maintainers labels Sep 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segmentation fault when building engine #2089

Segmentation fault when building engine #2089

kevinsouthByteDance commented Aug 6, 2024 •

edited

Loading

Kefeng-Duan commented Aug 20, 2024

zhangts20 commented Aug 25, 2024

Kefeng-Duan commented Aug 26, 2024

zhangts20 commented Aug 26, 2024

kevinsouthByteDance commented Aug 26, 2024

zhangts20 commented Aug 26, 2024 •

edited

Loading

kevinsouthByteDance commented Aug 26, 2024

zhangts20 commented Sep 3, 2024

kevinsouthByteDance commented Sep 3, 2024

zhangts20 commented Sep 3, 2024

Segmentation fault when building engine #2089

Segmentation fault when building engine #2089

Comments

kevinsouthByteDance commented Aug 6, 2024 • edited Loading

Kefeng-Duan commented Aug 20, 2024

zhangts20 commented Aug 25, 2024

Kefeng-Duan commented Aug 26, 2024

zhangts20 commented Aug 26, 2024

kevinsouthByteDance commented Aug 26, 2024

zhangts20 commented Aug 26, 2024 • edited Loading

kevinsouthByteDance commented Aug 26, 2024

zhangts20 commented Sep 3, 2024

kevinsouthByteDance commented Sep 3, 2024

zhangts20 commented Sep 3, 2024

kevinsouthByteDance commented Aug 6, 2024 •

edited

Loading

zhangts20 commented Aug 26, 2024 •

edited

Loading