Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault when building engine #2089

Open
kevinsouthByteDance opened this issue Aug 6, 2024 · 10 comments
Open

Segmentation fault when building engine #2089

kevinsouthByteDance opened this issue Aug 6, 2024 · 10 comments
Assignees
Labels
bug Something isn't working triaged Issue has been triaged by maintainers

Comments

@kevinsouthByteDance
Copy link

kevinsouthByteDance commented Aug 6, 2024

When I tried to build llama engine usingtrtllm-build --checkpoint_dir ./tllm_checkpoint_8gpu_tp8/ --output_dir ./output_engine_1 --max_num_tokens 4096 --max_input_len 131072 --max_seq_len 131082 --gemm_plugin auto, the error information occurs:

[08/06/2024-11:56:31] [TRT] [I] Global timing cache in use. Profiling results in this builder pass will be stored.
[08/06/2024-11:56:35] [TRT] [I] [GraphReduction] The approximate region cut reduction algorithm is called.
[08/06/2024-11:56:35] [TRT] [I] Detected 15 inputs and 1 output network tensors.
[n101-006-033:3841514] *** Process received signal ***
[n101-006-033:3841514] Signal: Segmentation fault (11)
[n101-006-033:3841514] Signal code:  (128)
[n101-006-033:3841514] Failing at address: (nil)
[n101-006-033:3841514] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7fce4699b420]
[n101-006-033:3841514] [ 1] /root/miniconda3/envs/torch/bin/../lib/libstdc++.so.6(_ZNSs6assignERKSs+0x9d)[0x7fcda3a429cd]
[n101-006-033:3841514] [ 2] /mnt/llm/TensorRT-LLM/tensorrt_llm/libs/libtensorrt_llm_nvrtc_wrapper.so(tllmXqaJitCreateAndCompileProgram+0x4a2)[0x7fcb29164ef2]
[n101-006-033:3841514] [ 3] /mnt/llm/TensorRT-LLM/tensorrt_llm/libs/libtensorrt_llm.so(_ZNK12tensorrt_llm7kernels3jit13CompileEngine7compileEv+0xb1)[0x7fcb2dbcf941]
[n101-006-033:3841514] [ 4] /mnt/llm/TensorRT-LLM/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm7kernels17DecoderXQAImplJIT25prepareForActualXQAParamsERKNS0_9XQAParamsE+0x15a)[0x7fcb2dbd0daa]
[n101-006-033:3841514] [ 5] /mnt/llm/TensorRT-LLM/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm7kernels17DecoderXQAImplJIT7prepareERKNS0_9XQAParamsE+0x60)[0x7fcb2dbd1040]
[n101-006-033:3841514] [ 6] /mnt/llm/TensorRT-LLM/tensorrt_llm/libs/libnvinfer_plugin_tensorrt_llm.so(+0xd5b7f)[0x7fcae0046b7f]
[n101-006-033:3841514] [ 7] /mnt/llm/TensorRT-LLM/tensorrt_llm/libs/libnvinfer_plugin_tensorrt_llm.so(+0xf5d26)[0x7fcae0066d26]
[n101-006-033:3841514] [ 8] /root/miniconda3/envs/torch/lib/python3.10/site-packages/tensorrt_libs/libnvinfer.so.10(+0xe4f930)[0x7fcd8fe67930]
[n101-006-033:3841514] [ 9] /root/miniconda3/envs/torch/lib/python3.10/site-packages/tensorrt_libs/libnvinfer.so.10(+0xe5026c)[0x7fcd8fe6826c]
[n101-006-033:3841514] [10] /root/miniconda3/envs/torch/lib/python3.10/site-packages/tensorrt_libs/libnvinfer.so.10(+0xb3183f)[0x7fcd8fb4983f]
[n101-006-033:3841514] [11] /root/miniconda3/envs/torch/lib/python3.10/site-packages/tensorrt_libs/libnvinfer.so.10(+0xb3dcdf)[0x7fcd8fb55cdf]
[n101-006-033:3841514] [12] /root/miniconda3/envs/torch/lib/python3.10/site-packages/tensorrt_libs/libnvinfer.so.10(+0xb44c5c)[0x7fcd8fb5cc5c]
[n101-006-033:3841514] [13] /root/miniconda3/envs/torch/lib/python3.10/site-packages/tensorrt_libs/libnvinfer.so.10(+0xb46e05)[0x7fcd8fb5ee05]
[n101-006-033:3841514] [14] /root/miniconda3/envs/torch/lib/python3.10/site-packages/tensorrt_libs/libnvinfer.so.10(+0xaaa78c)[0x7fcd8fac278c]
[n101-006-033:3841514] [15] /root/miniconda3/envs/torch/lib/python3.10/site-packages/tensorrt_libs/libnvinfer.so.10(+0xaafa1a)[0x7fcd8fac7a1a]
[n101-006-033:3841514] [16] /root/miniconda3/envs/torch/lib/python3.10/site-packages/tensorrt_libs/libnvinfer.so.10(+0xab0465)[0x7fcd8fac8465]
[n101-006-033:3841514] [17] /root/miniconda3/envs/torch/lib/python3.10/site-packages/tensorrt_bindings/tensorrt.so(+0xa84d8)[0x7fcc56d594d8]
[n101-006-033:3841514] [18] /root/miniconda3/envs/torch/lib/python3.10/site-packages/tensorrt_bindings/tensorrt.so(+0x459d3)[0x7fcc56cf69d3]
[n101-006-033:3841514] [19] /root/miniconda3/envs/torch/bin/python3[0x4fdc87]
[n101-006-033:3841514] [20] /root/miniconda3/envs/torch/bin/python3(_PyObject_MakeTpCall+0x25b)[0x4f741b]
[n101-006-033:3841514] [21] /root/miniconda3/envs/torch/bin/python3[0x509cbf]
[n101-006-033:3841514] [22] /root/miniconda3/envs/torch/bin/python3(_PyEval_EvalFrameDefault+0x4b26)[0x4f2c16]
[n101-006-033:3841514] [23] /root/miniconda3/envs/torch/bin/python3(_PyFunction_Vectorcall+0x6f)[0x4fe0cf]
[n101-006-033:3841514] [24] /root/miniconda3/envs/torch/bin/python3(_PyEval_EvalFrameDefault+0x2b79)[0x4f0c69]
[n101-006-033:3841514] [25] /root/miniconda3/envs/torch/bin/python3(_PyFunction_Vectorcall+0x6f)[0x4fe0cf]
[n101-006-033:3841514] [26] /root/miniconda3/envs/torch/bin/python3(_PyEval_EvalFrameDefault+0x731)[0x4ee821]
[n101-006-033:3841514] [27] /root/miniconda3/envs/torch/bin/python3(_PyFunction_Vectorcall+0x6f)[0x4fe0cf]
[n101-006-033:3841514] [28] /root/miniconda3/envs/torch/bin/python3(_PyEval_EvalFrameDefault+0x31f)[0x4ee40f]
[n101-006-033:3841514] [29] /root/miniconda3/envs/torch/bin/python3(_PyFunction_Vectorcall+0x6f)[0x4fe0cf]
[n101-006-033:3841514] *** End of error message ***
Segmentation fault (core dumped)

Can any one solve this problem?

Environment:
Cuda 12.5
tensorrt 10.0.1
tensorrt_llm 0.11.0
tensorrt_bindings 9.0.1.post11.dev4
tensorrt-cu12 10.2.0.post1
tensorrt-cu12-bindings 10.2.0.post1
tensorrt-cu12-libs 10.2.0.post1
tensorrt-libs 9.0.1.post11.dev4

@Kefeng-Duan
Copy link
Collaborator

Hi, @kevinsouthByteDance could you help to provide more information? are you working on 8 GPUs? what's the model config? and what's the precision?

@zhangts20
Copy link

@Kefeng-Duan i encountered the same error. I'm using the llama2-70b model with 4 GPUs, and the command is in examples/llama/README.md to build fp16 model. Do you have any suggestions on how to resolve this?thanks

@Kefeng-Duan
Copy link
Collaborator

@zhangts20 which type of GPU?

@zhangts20
Copy link

@zhangts20 which type of GPU?

GPU: 4xA100
tensorrt_llm: v0.11.0
tensorrt: 10.1.0
cuda: 12.1
driver: 525.60.13

@kevinsouthByteDance
Copy link
Author

Hi, @kevinsouthByteDance could you help to provide more information? are you working on 8 GPUs? what's the model config? and what's the precision?

Hello @Kefeng-Duan @zhangts20 , i have found that this issue is cause by max_position_ids in my model config. When I change the max_position_ids to a larger value, i could build successfully.

@zhangts20
Copy link

zhangts20 commented Aug 26, 2024

Hi, @kevinsouthByteDance could you help to provide more information? are you working on 8 GPUs? what's the model config? and what's the precision?

Hello @Kefeng-Duan @zhangts20 , i have found that this issue is cause by max_position_ids in my model config. When I change the max_position_ids to a larger value, i could build successfully.

I couldn't find the variable max_position_ids in my config.json, do you mean max_position_embeddings? And what would be an appropriate value to set for it? @kevinsouthByteDance

@kevinsouthByteDance
Copy link
Author

Hi, @kevinsouthByteDance could you help to provide more information? are you working on 8 GPUs? what's the model config? and what's the precision?

Hello @Kefeng-Duan @zhangts20 , i have found that this issue is cause by max_position_ids in my model config. When I change the max_position_ids to a larger value, i could build successfully.

I couldn't find the variable max_position_ids in my config.json, do you mean max_position_embeddings? And what would be an appropriate value to set for it? @kevinsouthByteDance

Yes, it is the value of max_position_embeddings, it should be greater than input length + output length. For example, if you wanna test the model with input length: 8192 and output length: 50, then max_position_embeddings should be greater than 8192 + 50 = 8242.

@zhangts20
Copy link

@kevinsouthByteDance sorry for delay replay, how can i set this value when exporting engine, this is a fixed value from config.json. In TensorRT-LLM, --n-positions can change the value of max_position_embeddings only when model_dir is none https://github.com/NVIDIA/TensorRT-LLM/blob/v0.11.0/examples/llama/convert_checkpoint.py#L294

@kevinsouthByteDance
Copy link
Author

@kevinsouthByteDance sorry for delay replay, how can i set this value when exporting engine, this is a fixed value from config.json. In TensorRT-LLM, --n-positions can change the value of max_position_embeddings only when model_dir is none https://github.com/NVIDIA/TensorRT-LLM/blob/v0.11.0/examples/llama/convert_checkpoint.py#L294

I change this max_position_embeddings in the config file of the huggingface model. Actually there is multiple steps before you build engine, first you should convert the model from a huggingface model(or your own model) to a trtllm checkpoint(you will call convert_checkpoint.py), second you will call trtllm-build to build the engine. And I change the config file before I call the convert_checkpoint.py.

@zhangts20
Copy link

Can we manually change it? I understand it is determined during training. @kevinsouthByteDance @Kefeng-Duan Do you have any other solutions in mind?

@lfr-0531 lfr-0531 added bug Something isn't working triaged Issue has been triaged by maintainers labels Sep 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

4 participants