Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qwen-72B-chat-GPTQ TP=4 ERROR #1344

Closed
4 tasks
Hukongtao opened this issue Mar 24, 2024 · 17 comments
Closed
4 tasks

Qwen-72B-chat-GPTQ TP=4 ERROR #1344

Hukongtao opened this issue Mar 24, 2024 · 17 comments
Assignees
Labels
bug Something isn't working

Comments

@Hukongtao
Copy link

System Info

xx

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

xx

Expected behavior

xx

actual behavior

xx

additional notes

xx

@Hukongtao Hukongtao added the bug Something isn't working label Mar 24, 2024
@Hukongtao
Copy link
Author

Do you test your code before releasing it?

@byshiue
Copy link
Collaborator

byshiue commented Mar 25, 2024

Please fill the info to help reproduce the issue. We have tests before releasing.

@Hukongtao
Copy link
Author

Please fill the info to help reproduce the issue. We have tests before releasing.

I followed the official documentation:
https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/qwen/README.md#int4-gptq
image
I encountered the first error when converting the model weights:

python3 examples/qwen/convert_checkpoint.py \
    --model_dir  ./Qwen-72B-Chat-Int4/ \
    --output_dir ./Qwen-72B-Chat-Int4-TRT/tllm_checkpoint_4gpu_tp4_gptq/ \
    --dtype float16 \
    --use_weight_only \
    --weight_only_precision int4_gptq \
    --tp_size 4 \
    --pp_size 1 \
    --per_group

image

ValueError: You are trying to save a non contiguous tensor: `transformer.layers.0.mlp.gate.weights_scaling_factor` which is not allowed. It either means you are trying to save tensors which are reference of each other in which case it's recommended to save only the full tensors, and reslice at load time, or simply call `.contiguous()` on your tensor to pack it before saving.

I can fix this error by following the modifications below:
image

results = {
f'{tllm_prex}.weight': qweight_interleaved,
f'{tllm_prex}.weights_scaling_factor': scales_fp16,
f'{tllm_prex}.zero': zeros_x_scales_fp16,
}

Then, I try to build the engine:

trtllm-build \
    --checkpoint_dir   ./Qwen-72B-Chat-Int4-TRT/tllm_checkpoint_4gpu_tp4_gptq/ \
    --output_dir       ./Qwen-72B-Chat-Int4-TRT/trt_engines/int4-gptq/4-gpu/ \
    --max_batch_size   1    \
    --max_input_len    2048 \
    --max_output_len   512  \
    --gather_all_token_logits \
    --gemm_plugin float16 \
    --tp_size 4

Then I got:
image

RuntimeError: Encounter error 'The value updated is not the same shape as the original. Updated: (8192, 6144), original: (8192, 1536)' for parameter 'transformer.layers.0.attention.qkv.weight'

I don't know how to fix it

@Hukongtao
Copy link
Author

One more point to add,
under commit 4bb65f2, it can run successfully.

@Hukongtao
Copy link
Author

Looking forward to your reply @byshiue

@adamydwang
Copy link

I suffered this problem too.
qwen-72b-chat, tp=4, smoothquant

@ZhangJinxin1
Copy link

I suffered this problem too.
qwen-72b-chat, tp=8, smoothquant

@HermitSun
Copy link

+1
qwen-72b-chat, tp=4, gptq

@Hukongtao
Copy link
Author

+1 qwen-72b-chat, tp=4, gptq

Is the error message you encountered the same as mine?

@HermitSun
Copy link

HermitSun commented Apr 10, 2024

+1 qwen-72b-chat, tp=4, gptq

Is the error message you encountered the same as mine?

Yes. And I used same cmds to build the engine under trt-llm 0.9.0.dev2024040200.

@Hukongtao
Copy link
Author

@byshiue @Tracin Do you have any plan to fix this bug?

@Hukongtao
Copy link
Author

image
If I modify the code like this, the code run successfully. But the inference result(run.py) is weried.

image

@Tracin
Copy link
Collaborator

Tracin commented Apr 17, 2024

@Hukongtao Sure, I am working on this, I will keep you posted.

@Tracin
Copy link
Collaborator

Tracin commented Apr 17, 2024

@Hukongtao @HermitSun
I will push a MR to fix this. If you want to fix it in advance, please try to replace this section to:

        for suf in suffixs:
            qkv_part = model_params[prefix + "attn.c_attn." + suf]
            split_dim = qkv_part.shape[1] // 3
            q_part = torch_split(qkv_part[:, :split_dim], dim=1)
            k_part = torch_split(qkv_part[:, split_dim : split_dim * 2], dim=1)
            v_part = torch_split(qkv_part[:, -split_dim:], dim=1)
            qkv_part = torch.cat([q_part, k_part, v_part], 1)
            qkv_weight_list.append(qkv_part)

@Tracin
Copy link
Collaborator

Tracin commented Apr 17, 2024

@adamydwang @ZhangJinxin1
As for SmoothQuant problem, do we have an issues on Github? Let's move to there.

@HermitSun
Copy link

@Hukongtao @HermitSun I will push a MR to fix this. If you want to fix it in advance, please try to replace this section to:

        for suf in suffixs:
            qkv_part = model_params[prefix + "attn.c_attn." + suf]
            split_dim = qkv_part.shape[1] // 3
            q_part = torch_split(qkv_part[:, :split_dim], dim=1)
            k_part = torch_split(qkv_part[:, split_dim : split_dim * 2], dim=1)
            v_part = torch_split(qkv_part[:, -split_dim:], dim=1)
            qkv_part = torch.cat([q_part, k_part, v_part], 1)
            qkv_weight_list.append(qkv_part)

It works in my situation. Thank you for your effort!

@Hukongtao
Copy link
Author

@Hukongtao @HermitSun I will push a MR to fix this. If you want to fix it in advance, please try to replace this section to:

        for suf in suffixs:
            qkv_part = model_params[prefix + "attn.c_attn." + suf]
            split_dim = qkv_part.shape[1] // 3
            q_part = torch_split(qkv_part[:, :split_dim], dim=1)
            k_part = torch_split(qkv_part[:, split_dim : split_dim * 2], dim=1)
            v_part = torch_split(qkv_part[:, -split_dim:], dim=1)
            qkv_part = torch.cat([q_part, k_part, v_part], 1)
            qkv_weight_list.append(qkv_part)

Thank you for your work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants