Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[0.6.1] InternLM SmoothQuant does not work #705

Closed
lanking520 opened this issue Dec 20, 2023 · 1 comment
Closed

[0.6.1] InternLM SmoothQuant does not work #705

lanking520 opened this issue Dec 20, 2023 · 1 comment
Assignees
Labels
quantization Issue about lower bit quantization, including int8, int4, fp8 triaged Issue has been triaged by maintainers

Comments

@lanking520
Copy link

I am running 0.6.1 with InternLM model, with the following configurations

python hf_internlm_convert.py -i internlm/internlm-7b -o ./internlm-chat-7b/smooth_internlm/sq0.5/ -sq 0.5 --tensor-parallelism 1 --storage-type fp16

The conversion is finished successfully. However, when you start build the engine:

 python3 build.py  --ft_model_dir=./internlm-chat-7b/smooth_internlm/sq0.5/1-gpu --use_smooth_quant --output_dir /tmp/trtllm/internlm-internlm-7b/1

This error started to appear

Loading from /tmp/trtllm/internlm-internlm-7b/smoothquant/1-gpu/model.layers.0.attention.dense.scale_y_accum_quant.bin
Loading from /tmp/trtllm/internlm-internlm-7b/smoothquant/1-gpu/model.layers.0.attention.dense.scale_y_quant_orig.bin
Loading from /tmp/trtllm/internlm-internlm-7b/smoothquant/1-gpu/model.layers.0.attention.dense.smoother.0.bin
<tensorrt_llm.quantization.layers.SmoothQuantLinear object at 0x7f9b6503ef50>
0
Loading from /tmp/trtllm/internlm-internlm-7b/smoothquant/1-gpu/model.layers.0.attention.query_key_value.bias.0.bin
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/build_scripts/internlm/build.py", line 733, in <module>
    build(0, args)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/build_scripts/internlm/build.py", line 704, in build
    engine = build_rank_engine(builder, builder_config, engine_name,
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/build_scripts/internlm/build.py", line 581, in build_rank_engine
    load_from_binary(tensorrt_llm_internlm,
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/build_scripts/internlm/weight.py", line 770, in load_from_binary
    dst.value = np.ascontiguousarray(t)
AttributeError: 'NoneType' object has no attribute 'value'

I have printed out the layer

<tensorrt_llm.quantization.layers.SmoothQuantLinear object at 0x7f9b6503ef50>

It seemed that the bias property are not passed correctly when converting to smoothquant. This caused the bias object are initialized as None.

Environment information

  • TRTLLM 0.6.1 with CUDA 12.2
  • Transformers 4.34.0
@byshiue byshiue added triaged Issue has been triaged by maintainers quantization Issue about lower bit quantization, including int8, int4, fp8 labels Dec 25, 2023
@Tracin
Copy link
Collaborator

Tracin commented Dec 25, 2023

Yeah, you are totally right. It will be fixed in next release.
You can change the code a little bit to make it work.
attention bias
change bias=False to bias=layer.attention.qkv.bias is not None
mlp bias
change bias=False to bias=layer.mlp.fc.bias is not None

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
quantization Issue about lower bit quantization, including int8, int4, fp8 triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

3 participants