Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gptManagerBenchmark std::bad_alloc error #66

Closed
clockfly opened this issue Oct 23, 2023 · 19 comments
Closed

gptManagerBenchmark std::bad_alloc error #66

clockfly opened this issue Oct 23, 2023 · 19 comments
Labels
triaged Issue has been triaged by maintainers

Comments

@clockfly
Copy link

machine: nv 4090 24GB
model: llama13B-gptq (the GPU memory should be enough)
Problem: std::bad_alloc error when starting GptManager.
Expected: runs successfully.

root@ubuntu-devel:/code/tensorrt_llm/cpp/build/benchmarks# CUDA_VISIBLE_DEVICES=0  ./gptManagerBenchmark     --model llama13b_gptq_compiled     --engine_dir /code/tensorrt_llm/models/llama13b_gptq_compiled     --type IFB     --dataset /code/tensorrt_llm/models/llama13b_gptq/preprocessed_dataset.json  --log_level verbose --kv_cache_free_gpu_mem_fraction 0.2
[TensorRT-LLM][INFO] Set logger level by TRACE
[TensorRT-LLM][DEBUG] Registered plugin creator Identity version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator BertAttention version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator GPTAttention version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Gemm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Send version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Recv version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator AllReduce version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator AllGather version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Layernorm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Rmsnorm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator SmoothQuantGemm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator LayernormQuantization version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator QuantizePerToken version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator QuantizeTensor version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator RmsnormQuantization version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator WeightOnlyGroupwiseQuantMatmul version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator WeightOnlyQuantMatmul version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Lookup version 1 in namespace tensorrt_llm
[TensorRT-LLM][INFO] Initializing MPI with thread mode 1
[TensorRT-LLM][INFO] MPI size: 1, rank: 0
[TensorRT-LLM][ERROR] std::bad_alloc
@jdemouth-nvidia
Copy link
Collaborator

Hi @clockfly ,

Can you share the command used to build the model, please? We'd like to see if we can reproduce the problem.

Thanks,
Julien

@jdemouth-nvidia jdemouth-nvidia added the triaged Issue has been triaged by maintainers label Oct 23, 2023
@ljayx
Copy link

ljayx commented Oct 23, 2023

@jdemouth-nvidia
Same issue. Please help.

build:

python build.py --model_dir /models/Llama-2-7b-chat-hf \
	--dtype float16 \
	--use_gpt_attention_plugin float16 \
	--use_gemm_plugin float16 \
	--output_dir /models/Llama-2-7b-chat-hf/fp16/1-gpu/ \
	--use_inflight_batching \
	--paged_kv_cache \
	--remove_input_padding

run:

CUDA_VISIBLE_DEVICES=7 ${proj_dir}/cpp/bbb/benchmarks/gptManagerBenchmark \
	--model=llama \
	--engine_dir=/models/Llama-2-7b-chat-hf/fp16/1-gpu/ \
	--dataset=${proj_dir}/benchmarks/cpp/preprocessed_dataset.json \
	--log_level=verbose

log:

[TensorRT-LLM][INFO] Set logger level by TRACE
[TensorRT-LLM][DEBUG] Registered plugin creator Identity version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator BertAttention version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator GPTAttention version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Gemm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Send version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Recv version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator AllReduce version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator AllGather version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Layernorm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Rmsnorm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator SmoothQuantGemm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator LayernormQuantization version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator QuantizePerToken version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator QuantizeTensor version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator RmsnormQuantization version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator WeightOnlyGroupwiseQuantMatmul version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator WeightOnlyQuantMatmul version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Lookup version 1 in namespace tensorrt_llm
[TensorRT-LLM][INFO] Initializing MPI with thread mode 1
[TensorRT-LLM][INFO] MPI size: 1, rank: 0
[TensorRT-LLM][ERROR] std::bad_alloc

Model: llama2-7b
GPU: 8 x A100 80GB, but used only the last
CPU: 96 x Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz
Memory: 768GB

@juney-nvidia
Copy link
Collaborator

@jdemouth-nvidia Same issue. Please help.

build:

python build.py --model_dir /models/Llama-2-7b-chat-hf \
	--dtype float16 \
	--use_gpt_attention_plugin float16 \
	--use_gemm_plugin float16 \
	--output_dir /models/Llama-2-7b-chat-hf/fp16/1-gpu/ \
	--use_inflight_batching \
	--paged_kv_cache \
	--remove_input_padding

run:

CUDA_VISIBLE_DEVICES=7 ${proj_dir}/cpp/bbb/benchmarks/gptManagerBenchmark \
	--model=llama \
	--engine_dir=/models/Llama-2-7b-chat-hf/fp16/1-gpu/ \
	--dataset=${proj_dir}/benchmarks/cpp/preprocessed_dataset.json \
	--log_level=verbose

log:

[TensorRT-LLM][INFO] Set logger level by TRACE
[TensorRT-LLM][DEBUG] Registered plugin creator Identity version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator BertAttention version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator GPTAttention version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Gemm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Send version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Recv version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator AllReduce version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator AllGather version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Layernorm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Rmsnorm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator SmoothQuantGemm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator LayernormQuantization version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator QuantizePerToken version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator QuantizeTensor version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator RmsnormQuantization version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator WeightOnlyGroupwiseQuantMatmul version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator WeightOnlyQuantMatmul version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Lookup version 1 in namespace tensorrt_llm
[TensorRT-LLM][INFO] Initializing MPI with thread mode 1
[TensorRT-LLM][INFO] MPI size: 1, rank: 0
[TensorRT-LLM][ERROR] std::bad_alloc

Model: llama2-7b GPU: 8 x A100 80GB, but used only the last CPU: 96 x Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz Memory: 768GB

Thanks for sharing the concrete steps, we will try to reproduce it and hopefully provide some feedback tomorrow.

June

@ljayx
Copy link

ljayx commented Oct 24, 2023

Hi June, how's the issue going.
I'm stuck here. From the backtrace, it looks like the binary throwed from std::filesystem::path ctor. Not sure if CXX11_ABI matters since it's different with CXX17.

terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
*** Process received signal ***
...
...
[ 8] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZSt28__throw_bad_array_new_lengthv+0x0)[0x7f790fd70265]
[ 9] /ljay/workspace/local/TensorRT-LLM/cpp/bbb/tensorrt_llm/libtensorrt_llm.so(_ZNSt10filesystem7__cxx114pathC1ERKS1_+0xff)[0x7f794d1995ef]
[10] /ljay/workspace/local/TensorRT-LLM/cpp/bbb/tensorrt_llm/libtensorrt_llm.so(_ZN12tensorrt_llm13batch_manager18TrtGptModelFactory6createERKNSt10filesystem7__cxx114pathENS0_15TrtGptModelTypeEiNS0_15batch_scheduler15SchedulerPolicyERKNS0_25TrtGptModelOptionalParamsE+0xcf)[0x7f794d19bf8f]

@kaiyux
Copy link
Member

kaiyux commented Oct 24, 2023

Hi @clockfly @ljayx , I did not reproduce that issue.

Can you share the operating system that you're using? Thanks.

@ryxli
Copy link

ryxli commented Oct 24, 2023

Hi @kaiyux, was also able to reproduce this issue with both V1 and IFB:

cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.2 LTS"

Build Engine

python build.py --dtype float16 \
                --remove_input_padding \
                --use_gpt_attention_plugin float16 \
                --use_gemm_plugin float16 \
                --enable_context_fmha \
                --world_size 8 \
                --tp_size 8 \
                --model_dir $HF_DIR \
                --output_dir $ENGINE_DIR \
                --use_inflight_batching \
                --paged_kv_cache

Run

mpirun -n 8 --allow-run-as-root gptManagerBenchmark \
    --model llama \
    --engine_dir $ENGINE_DIR \
    --type V1 \
    --dataset $DATASET_OUT \
    --log_level verbose

Out

[TensorRT-LLM][INFO] Set logger level by TRACE
[TensorRT-LLM][DEBUG] Registered plugin creator Identity version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator BertAttention version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator GPTAttention version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Gemm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Send version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Recv version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator AllReduce version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator AllGather version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Layernorm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Rmsnorm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator SmoothQuantGemm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator LayernormQuantization version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator QuantizePerToken version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator QuantizeTensor version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator RmsnormQuantization version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator WeightOnlyGroupwiseQuantMatmul version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator WeightOnlyQuantMatmul version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Lookup version 1 in namespace tensorrt_llm
[TensorRT-LLM][INFO] Initializing MPI with thread mode 1
[TensorRT-LLM][INFO] Set logger level by TRACE
[TensorRT-LLM][DEBUG] Registered plugin creator Identity version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator BertAttention version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator GPTAttention version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Gemm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Send version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Recv version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator AllReduce version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator AllGather version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Layernorm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Rmsnorm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator SmoothQuantGemm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator LayernormQuantization version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator QuantizePerToken version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator QuantizeTensor version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator RmsnormQuantization version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator WeightOnlyGroupwiseQuantMatmul version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator WeightOnlyQuantMatmul version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Lookup version 1 in namespace tensorrt_llm
[TensorRT-LLM][INFO] Initializing MPI with thread mode 1
[TensorRT-LLM][INFO] Set logger level by TRACE
[TensorRT-LLM][DEBUG] Registered plugin creator Identity version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator BertAttention version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator GPTAttention version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Gemm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Send version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Recv version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator AllReduce version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator AllGather version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Layernorm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Rmsnorm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator SmoothQuantGemm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator LayernormQuantization version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator QuantizePerToken version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator QuantizeTensor version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator RmsnormQuantization version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator WeightOnlyGroupwiseQuantMatmul version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator WeightOnlyQuantMatmul version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Lookup version 1 in namespace tensorrt_llm
[TensorRT-LLM][INFO] Initializing MPI with thread mode 1
[TensorRT-LLM][INFO] Set logger level by TRACE
[TensorRT-LLM][DEBUG] Registered plugin creator Identity version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator BertAttention version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator GPTAttention version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Gemm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Send version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Recv version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator AllReduce version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator AllGather version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Layernorm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Rmsnorm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator SmoothQuantGemm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator LayernormQuantization version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator QuantizePerToken version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator QuantizeTensor version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator RmsnormQuantization version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator WeightOnlyGroupwiseQuantMatmul version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator WeightOnlyQuantMatmul version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Lookup version 1 in namespace tensorrt_llm
[TensorRT-LLM][INFO] Initializing MPI with thread mode 1
[TensorRT-LLM][INFO] Set logger level by TRACE
[TensorRT-LLM][DEBUG] Registered plugin creator Identity version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator BertAttention version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator GPTAttention version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Gemm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Send version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Recv version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator AllReduce version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator AllGather version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Layernorm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Rmsnorm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator SmoothQuantGemm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator LayernormQuantization version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator QuantizePerToken version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator QuantizeTensor version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator RmsnormQuantization version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator WeightOnlyGroupwiseQuantMatmul version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator WeightOnlyQuantMatmul version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Lookup version 1 in namespace tensorrt_llm
[TensorRT-LLM][INFO] Initializing MPI with thread mode 1
[TensorRT-LLM][INFO] Set logger level by TRACE
[TensorRT-LLM][DEBUG] Registered plugin creator Identity version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator BertAttention version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator GPTAttention version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Gemm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Send version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Recv version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator AllReduce version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator AllGather version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Layernorm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Rmsnorm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator SmoothQuantGemm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator LayernormQuantization version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator QuantizePerToken version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator QuantizeTensor version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator RmsnormQuantization version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator WeightOnlyGroupwiseQuantMatmul version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator WeightOnlyQuantMatmul version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Lookup version 1 in namespace tensorrt_llm
[TensorRT-LLM][INFO] Initializing MPI with thread mode 1
[TensorRT-LLM][INFO] Set logger level by TRACE
[TensorRT-LLM][DEBUG] Registered plugin creator Identity version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator BertAttention version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator GPTAttention version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Gemm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Send version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Recv version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator AllReduce version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator AllGather version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Layernorm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Rmsnorm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator SmoothQuantGemm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator LayernormQuantization version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator QuantizePerToken version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator QuantizeTensor version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator RmsnormQuantization version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator WeightOnlyGroupwiseQuantMatmul version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator WeightOnlyQuantMatmul version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Lookup version 1 in namespace tensorrt_llm
[TensorRT-LLM][INFO] Initializing MPI with thread mode 1
[TensorRT-LLM][INFO] Set logger level by TRACE
[TensorRT-LLM][DEBUG] Registered plugin creator Identity version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator BertAttention version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator GPTAttention version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Gemm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Send version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Recv version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator AllReduce version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator AllGather version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Layernorm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Rmsnorm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator SmoothQuantGemm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator LayernormQuantization version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator QuantizePerToken version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator QuantizeTensor version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator RmsnormQuantization version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator WeightOnlyGroupwiseQuantMatmul version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator WeightOnlyQuantMatmul version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Lookup version 1 in namespace tensorrt_llm
[TensorRT-LLM][INFO] Initializing MPI with thread mode 1
[TensorRT-LLM][INFO] MPI size: 8, rank: 0
[TensorRT-LLM][INFO] MPI size: 8, rank: 7
[TensorRT-LLM][INFO] MPI size: 8, rank: 1
[TensorRT-LLM][INFO] MPI size: 8, rank: 2
[TensorRT-LLM][INFO] MPI size: 8, rank: 5
[TensorRT-LLM][INFO] MPI size: 8, rank: 3
[TensorRT-LLM][INFO] MPI size: 8, rank: 6
[TensorRT-LLM][INFO] MPI size: 8, rank: 4
[TensorRT-LLM][ERROR] std::bad_alloc
[TensorRT-LLM][ERROR] std::bad_alloc
[TensorRT-LLM][ERROR] std::bad_alloc
[TensorRT-LLM][ERROR] std::bad_alloc
[TensorRT-LLM][ERROR] std::bad_alloc
[TensorRT-LLM][ERROR] std::bad_alloc
[TensorRT-LLM][ERROR] std::bad_alloc
[TensorRT-LLM][ERROR] std::bad_alloc
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[7622,1],7]
  Exit code:    1

@ljayx
Copy link

ljayx commented Oct 25, 2023

Hi @clockfly @ljayx , I did not reproduce that issue.

Can you share the operating system that you're using? Thanks.

It is a docker container with base image nvcr.io/nvidia/pytorch:23.08-py3 running on host.

## Host:
CentOS release 7.6 (Final)
Linux sh***d 5.10.0-1.0.0.29 #1 SMP Fri Jul 28 08:23:51 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

## Container:
Ubuntu 22.04.2 LTS
Linux sh***d 5.10.0-1.0.0.29 #1 SMP Fri Jul 28 08:23:51 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

@gesanqiu
Copy link

Meet the same issue on the base image nvcr.io/nvidia/pytorch:23.08-py3.
Build Engine:

python build.py --model_dir /workdir/hf_models/llama-2-7b-chat-hf/ --dtype float16 --use_gpt_attention_plugin float16 --enable_context_fmha --use_gemm_plugin float16 --max_batch_size 16 --max_input_len 2048 --max_output_len 2048 --use_inflight_batching --paged_kv_cache --remove_input_padding --output_dir /workdir/trt_llm_models/llama-2-7b-chat/fp16-inflight/1-gpu/

Run:

root@dell:/workdir/TensorRT-LLM/cpp/build# CUDA_VISIBLE_DEVICES=1 ./benchmarks/gptManagerBenchmark --model llama_7b --engine_dir /workdir/trt_llm_models/llama-2-7b-chat/fp16-inflight/1-gpu/ --type IFB --dataset ../../examples/llama/llama_preprocessed_dataset.json --log_level verbose
[TensorRT-LLM][INFO] Set logger level by TRACE
[TensorRT-LLM][DEBUG] Registered plugin creator Identity version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator BertAttention version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator GPTAttention version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Gemm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Send version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Recv version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator AllReduce version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator AllGather version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Layernorm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Rmsnorm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator SmoothQuantGemm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator LayernormQuantization version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator QuantizePerToken version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator QuantizeTensor version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator RmsnormQuantization version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator WeightOnlyGroupwiseQuantMatmul version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator WeightOnlyQuantMatmul version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Lookup version 1 in namespace tensorrt_llm
[TensorRT-LLM][INFO] Initializing MPI with thread mode 1
[TensorRT-LLM][INFO] MPI size: 1, rank: 0
[TensorRT-LLM][ERROR] std::bad_alloc

gptSessionBenchmark also run into error.

root@dell:/workdir/TensorRT-LLM/cpp/build# CUDA_VISIBLE_DEVICES=1 ./benchmarks/gptSessionBenchmark --model llama_7b --engine_dir /workdir/trt_llm_models/llama-2-7b-chat/fp16-inflight/1-gpu/ --batch_size "1" --input_output_len "60,20"
[TensorRT-LLM][ERROR] [TensorRT-LLM][ERROR] Assertion failed: Error opening engine file: /workdir/trt_llm_models/llama-2-7b-chat/fp16-inflight/1-gpu/llama_7b_float16_tp1_rank0.engine (/workdir/TensorRT-LLM/cpp/tensorrt_llm/runtime/utils/sessionUtils.cpp:42)
1       0x5555ebef46ee tensorrt_llm::common::throwRuntimeError(char const*, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 100
2       0x7f522d9790ed /workdir/TensorRT-LLM/cpp/build/tensorrt_llm/libtensorrt_llm.so(+0x3720ed) [0x7f522d9790ed]
3       0x5555ebf04dd3 ./benchmarks/gptSessionBenchmark(+0x23dd3) [0x5555ebf04dd3]
4       0x5555ebef7bef ./benchmarks/gptSessionBenchmark(+0x16bef) [0x5555ebef7bef]
5       0x7f51f0bd8d90 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f51f0bd8d90]
6       0x7f51f0bd8e40 __libc_start_main + 128
7       0x5555ebef9fe5 ./benchmarks/gptSessionBenchmark(+0x18fe5) [0x5555ebef9fe5]
root@dell:/workdir/TensorRT-LLM/cpp/build#

@ljayx
Copy link

ljayx commented Oct 25, 2023

Hi June, how's the issue going. I'm stuck here. From the backtrace, it looks like the binary throwed from std::filesystem::path ctor. Not sure if CXX11_ABI matters since it's different with CXX17.

terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
*** Process received signal ***
...
...
[ 8] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZSt28__throw_bad_array_new_lengthv+0x0)[0x7f790fd70265]
[ 9] /ljay/workspace/local/TensorRT-LLM/cpp/bbb/tensorrt_llm/libtensorrt_llm.so(_ZNSt10filesystem7__cxx114pathC1ERKS1_+0xff)[0x7f794d1995ef]
[10] /ljay/workspace/local/TensorRT-LLM/cpp/bbb/tensorrt_llm/libtensorrt_llm.so(_ZN12tensorrt_llm13batch_manager18TrtGptModelFactory6createERKNSt10filesystem7__cxx114pathENS0_15TrtGptModelTypeEiNS0_15batch_scheduler15SchedulerPolicyERKNS0_25TrtGptModelOptionalParamsE+0xcf)[0x7f794d19bf8f]

I resolved the issue. The root cause is CXX11_ABI related.

Root cause:
The Dockerfile skipped install pytorch, but the default pytorch inside nvcr.io/nvidia/pytorch:23.08-py3 has cxx11_abi.
The CMakeLists.txt enabled USE_CXX11_ABI as per the pytorch cxx11_abi.
Since GptManager used std::filesystem::path and this api is different between C++11 and C++17, the binary throwed from the ctor.

Solution:
Just:

bash install_pytorch.sh src_non_cxx11_abi

It should an issue would confuse users.

@ryxli
Copy link

ryxli commented Oct 25, 2023

*** Process received signal ***
...
...
[ 8] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZSt28__throw_bad_array_new_lengthv+0x0)[0x7f790fd70265]
[ 9] /ljay/workspace/local/TensorRT-LLM/cpp/bbb/tensorrt_llm/libtensorrt_llm.so(ZNSt10filesystem7__cxx114pathC1ERKS1+0xff)[0x7f794d1995ef]
[10] /ljay/workspace/local/TensorRT-LLM/cpp/bbb/tensorrt_llm/libtensorrt_llm.so(_ZN12tensorrt_llm13batch_manager18TrtGptModelFactory6createERKNSt10filesystem7__cxx114pathENS0_15TrtGptModelTypeEiNS0_15batch_scheduler15SchedulerPolicyERKNS0_25TrtGptModelOptionalParamsE+0xcf)[0x7f794d19bf8f]

@ljayx thanks for finding the issue and sharing the workaround.
Would you mind sharing how you get the backtrace to show up in the console?

@ryxli
Copy link

ryxli commented Oct 25, 2023

Also share this, unsure if it's relevant

the tensorrtllm_backend Dockerfile sets the pytorch installation arg as

# `pypi` for x86_64 arch and `src_cxx11_abi` for aarch64 arch
ARG TORCH_INSTALL_TYPE="pypi"
COPY tensorrt_llm/docker/common/install_pytorch.sh install_pytorch.sh
RUN bash ./install_pytorch.sh $TORCH_INSTALL_TYPE && rm install_pytorch.sh

https://github.com/triton-inference-server/tensorrtllm_backend/blob/release/0.5.0/dockerfile/Dockerfile.trt_llm_backend#L33C1-L37C1

VS skipped in this package Dockerfile:

# Install PyTorch
ARG TORCH_INSTALL_TYPE="skip"
COPY docker/common/install_pytorch.sh install_pytorch.sh
RUN bash ./install_pytorch.sh $TORCH_INSTALL_TYPE && rm install_pytorch.sh

Using same base image:

ARG BASE_IMAGE=nvcr.io/nvidia/pytorch
ARG BASE_TAG=23.08-py3

@ryxli
Copy link

ryxli commented Oct 25, 2023

Install pytorch with src_non_cxx11_abi seems to fix the issues encountered when using the batch manager, but maybe breaks some of the other scripts ?

python3 build.py .....

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1099, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
  File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 32, in <module>
    from ...modeling_utils import PreTrainedModel
  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 86, in <module>
    from accelerate import dispatch_model, infer_auto_device_map, init_empty_weights
  File "/usr/local/lib/python3.10/dist-packages/accelerate/__init__.py", line 3, in <module>
    from .accelerator import Accelerator
  File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 34, in <module>
    from .checkpointing import load_accelerator_state, load_custom_state, save_accelerator_state, save_custom_state
  File "/usr/local/lib/python3.10/dist-packages/accelerate/checkpointing.py", line 24, in <module>
    from .utils import (
  File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/__init__.py", line 112, in <module>
    from .launch import (
  File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/launch.py", line 27, in <module>
    from ..utils.other import merge_dicts
  File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/other.py", line 24, in <module>
    from .transformer_engine import convert_model
  File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/transformer_engine.py", line 21, in <module>
    import transformer_engine.pytorch as te
  File "/usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/__init__.py", line 6, in <module>
    from .module import LayerNormLinear
  File "/usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/module/__init__.py", line 6, in <module>
    from .layernorm_linear import LayerNormLinear
  File "/usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/module/layernorm_linear.py", line 15, in <module>
    from .. import cpp_extensions as tex
  File "/usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/cpp_extensions/__init__.py", line 6, in <module>
    from transformer_engine_extensions import *
ImportError: /usr/local/lib/python3.10/dist-packages/transformer_engine_extensions.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/tensorrt_llm/examples/llama/build.py", line 24, in <module>
    from transformers import LlamaConfig, LlamaForCausalLM
  File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1090, in __getattr__
    value = getattr(module, name)
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1089, in __getattr__
    module = self._get_module(self._class_to_module[name])
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1101, in _get_module
    raise RuntimeError(
RuntimeError: Failed to import transformers.models.llama.modeling_llama because of the following error (look up to see its traceback):
/usr/local/lib/python3.10/dist-packages/transformer_engine_extensions.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

@ljayx
Copy link

ljayx commented Oct 26, 2023

*** Process received signal ***
...
...
[ 8] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZSt28__throw_bad_array_new_lengthv+0x0)[0x7f790fd70265]
[ 9] /ljay/workspace/local/TensorRT-LLM/cpp/bbb/tensorrt_llm/libtensorrt_llm.so(ZNSt10filesystem7__cxx114pathC1ERKS1+0xff)[0x7f794d1995ef]
[10] /ljay/workspace/local/TensorRT-LLM/cpp/bbb/tensorrt_llm/libtensorrt_llm.so(_ZN12tensorrt_llm13batch_manager18TrtGptModelFactory6createERKNSt10filesystem7__cxx114pathENS0_15TrtGptModelTypeEiNS0_15batch_scheduler15SchedulerPolicyERKNS0_25TrtGptModelOptionalParamsE+0xcf)[0x7f794d19bf8f]

@ljayx thanks for finding the issue and sharing the workaround. Would you mind sharing how you get the backtrace to show up in the console?

@ryxli I write a simple demo, FYI:

int main(void) {
    auto logger = std::make_shared<TllmLogger>();
    using severity = nvinfer1::ILogger::Severity;
    logger->setLevel(severity::kWARNING);
    initTrtLlmPlugins(logger.get());

    std::filesystem::path engine_path{"/ljay/model/llama2-7b-chat-hf/trt_engines/fp16/1-gpu/"};
    auto model_type = TrtGptModelType::InflightFusedBatching;
    int32_t max_seq_len = 512;
    int32_t max_num_req = 8;
    int32_t max_beam_width = 1;
    int32_t max_tokens_in_paged_kvcache = -1;
    float kv_cache_free_gpu_mem_fraction = -1;
    bool enable_trt_overlap = false;
    uint64_t terminate_reqId = 10000;

    batch_scheduler::SchedulerPolicy scheduler_policy{batch_scheduler::SchedulerPolicy::GUARANTEED_NO_EVICT};
    auto const worldConfig = WorldConfig::mpi(*logger);

    const TrtGptModelOptionalParams& optional_params = TrtGptModelOptionalParams(
        max_num_req, max_tokens_in_paged_kvcache, kv_cache_free_gpu_mem_fraction, enable_trt_overlap);

    auto m = std::make_shared<GptManager>(engine_path, model_type, max_beam_width, scheduler_policy, requests_callback, response_callback, nullptr, nullptr, optional_params, terminate_reqId);

    for (;;)
        ;
}

@ljayx
Copy link

ljayx commented Oct 26, 2023

Install pytorch with src_non_cxx11_abi seems to fix the issues encountered when using the batch manager, but maybe breaks some of the other scripts ?

python3 build.py .....

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1099, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
  File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 32, in <module>
    from ...modeling_utils import PreTrainedModel
  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 86, in <module>
    from accelerate import dispatch_model, infer_auto_device_map, init_empty_weights
  File "/usr/local/lib/python3.10/dist-packages/accelerate/__init__.py", line 3, in <module>
    from .accelerator import Accelerator
  File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 34, in <module>
    from .checkpointing import load_accelerator_state, load_custom_state, save_accelerator_state, save_custom_state
  File "/usr/local/lib/python3.10/dist-packages/accelerate/checkpointing.py", line 24, in <module>
    from .utils import (
  File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/__init__.py", line 112, in <module>
    from .launch import (
  File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/launch.py", line 27, in <module>
    from ..utils.other import merge_dicts
  File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/other.py", line 24, in <module>
    from .transformer_engine import convert_model
  File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/transformer_engine.py", line 21, in <module>
    import transformer_engine.pytorch as te
  File "/usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/__init__.py", line 6, in <module>
    from .module import LayerNormLinear
  File "/usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/module/__init__.py", line 6, in <module>
    from .layernorm_linear import LayerNormLinear
  File "/usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/module/layernorm_linear.py", line 15, in <module>
    from .. import cpp_extensions as tex
  File "/usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/cpp_extensions/__init__.py", line 6, in <module>
    from transformer_engine_extensions import *
ImportError: /usr/local/lib/python3.10/dist-packages/transformer_engine_extensions.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/tensorrt_llm/examples/llama/build.py", line 24, in <module>
    from transformers import LlamaConfig, LlamaForCausalLM
  File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1090, in __getattr__
    value = getattr(module, name)
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1089, in __getattr__
    module = self._get_module(self._class_to_module[name])
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1101, in _get_module
    raise RuntimeError(
RuntimeError: Failed to import transformers.models.llama.modeling_llama because of the following error (look up to see its traceback):
/usr/local/lib/python3.10/dist-packages/transformer_engine_extensions.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

Hit the same issue with you. Please Nvidia experts help check it @jdemouth-nvidia @kaiyux

@ljayx
Copy link

ljayx commented Oct 26, 2023

tried uninstall transformer-engine, build.py and gptManagerBenchmark get to work. Not sure will it have bad influence.

pip uninstall transformer-engine

@kaiyux
Copy link
Member

kaiyux commented Oct 26, 2023

We are still trying to reproduce and investigate, we will get back to you when we have conclusion. Thanks.

@zhaoyang-star
Copy link

Same error. Keep tuned.

@juney-nvidia
Copy link
Collaborator

Thanks for the patience. We have found the root cause and are now working on the fix. We will push the fix(with other enhancements) in the recent days, and when it gets pushed, a new "announcement" will also get updated.

June

@Shixiaowei02
Copy link
Collaborator

Shixiaowei02 commented Oct 27, 2023

The fixed MR is: #152, we tested it and will merge it. Thank you all for your support and help! @zhaoyang-star @ljayx @ryxli @gesanqiu @clockfly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

10 participants