Error while running any tensorrt-llm script, issue from cython bindings #2062

manickavela29 · 2024-07-31T04:06:13Z

System Info

CPU architecture : 86_64
CPU/Host memory size 187GB
GPU properties
GPU name : A10
GPU memory size : 24 GB
Clock frequencies used (if applicable)
Libraries
TensorRT-LLM tag : v0.10.0 and v0.11.0
Versions of TensorRT, AMMO, CUDA, cuBLAS, etc. used

Model : Lama 3 8B

Container used :

for v0.10.0
nvcr.io/nvidia/tritonserver:24.06-trtllm-python-py3
for v0.11.0
nvcr.io/nvidia/tritonserver:25.06-trtllm-python-py3

NVIDIA driver version
Running in docker

Error :

root@ip-10-40-6-105:/app# python TensorRT-LLM/examples/quantization/quantize.py --model_dir model/ \ --output_dir tllm_checkpoint_1gpu_awq
--dtype float16
--qformat int4_awq
--awq_block_size 128
~
Traceback (most recent call last):
File "/app/TensorRT-LLM/examples/quantization/quantize.py", line 5, in
from tensorrt_llm.quantization import (quantize_and_export,
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/init.py", line 32, in
import tensorrt_llm.functional as functional
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/functional.py", line 28, in
from . import graph_rewriting as gw
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/graph_rewriting.py", line 12, in
from .network import Network
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/network.py", line 26, in
from tensorrt_llm.module import Module
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/module.py", line 17, in
from ._common import default_net
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/_common.py", line 31, in
from ._utils import str_dtype_to_trt
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/_utils.py", line 29, in
from tensorrt_llm.bindings.BuildInfo import ENABLE_MULTI_DEVICE
ImportError: /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c106detail14torchCheckFailEPKcS2_jRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

Dockerfile

ARG BASE_IMAGE=nvcr.io/nvidia/tritonserver
ARG BASE_TAG=24.07-trtllm-python-py3
FROM ${BASE_IMAGE}:${BASE_TAG}

RUN git clone --recurse-submodules https://github.com/NVIDIA/TensorRT-LLM.git && cd TensorRT-LLM && git checkout tags/v0.11.0
RUN pip install torch==2.3.1 pydantic==1.10.11
RUN pip install datasets>=2.14.4 mpmath==1.3.0 rouge_score~=0.1.2 transformers_stream_generator==0.0.4 tiktoken mpmath==1.3.0

# RUN pip install -r /app/TensorRT-LLM/examples/quantization/requirements.txt
WORKDIR TensorRT-LLM/examples/llama/

COPY requirements.txt requirements-local.txt

RUN pip install -r requirements-local.txt

cmd inside docker image for quantizing llama model

 python TensorRT-LLM/examples/quantization/quantize.py --model_dir model/ \                                                                                             --output_dir tllm_checkpoint_1gpu_awq \
                                   --dtype float16 \
                                   --qformat int4_awq \
                                   --awq_block_size 128

Who can help?

@Tracin @byshiue

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Docker file from given NGC containers
mount llama3 8B model
Run the quantization script after cloning Tensorrt-llm

Expected behavior

Quantization script to quantize the mode

actual behavior

Failing with error
from tensorrt_llm.bindings.BuildInfo import ENABLE_MULTI_DEVICE
ImportError: /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c1021throwNullDataPtrErrorEv

additional notes

Was able to run it successfully with my local setup, only within docker image
I am facing this issue, looks like a library mismatch issue

The text was updated successfully, but these errors were encountered:

Kefeng-Duan · 2024-08-21T08:29:54Z

Hi, @manickavela29 could you rebuild and reinstall trtllm?
https://nvidia.github.io/TensorRT-LLM/installation/build-from-source-linux.html#build-tensorrt-llm

manickavela29 · 2024-09-01T06:52:00Z

I moved on to completely different tasks, not sure if I can pull this up again and verify so closing this.
Thank you

manickavela29 added the bug Something isn't working label Jul 31, 2024

manickavela29 closed this as completed Sep 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error while running any tensorrt-llm script, issue from cython bindings #2062

Error while running any tensorrt-llm script, issue from cython bindings #2062

manickavela29 commented Jul 31, 2024 •

edited

Loading

Kefeng-Duan commented Aug 21, 2024

manickavela29 commented Sep 1, 2024

Error while running any tensorrt-llm script, issue from cython bindings #2062

Error while running any tensorrt-llm script, issue from cython bindings #2062

Comments

manickavela29 commented Jul 31, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Kefeng-Duan commented Aug 21, 2024

manickavela29 commented Sep 1, 2024

manickavela29 commented Jul 31, 2024 •

edited

Loading