Qwen-72B-chat-GPTQ TP=4 ERROR #1344

Hukongtao · 2024-03-24T15:28:29Z

System Info

xx

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

xx

Expected behavior

xx

actual behavior

xx

additional notes

xx

The text was updated successfully, but these errors were encountered:

Hukongtao · 2024-03-24T15:34:04Z

Do you test your code before releasing it?

byshiue · 2024-03-25T07:57:23Z

Please fill the info to help reproduce the issue. We have tests before releasing.

Hukongtao · 2024-03-25T09:11:26Z

Please fill the info to help reproduce the issue. We have tests before releasing.

I followed the official documentation:
https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/qwen/README.md#int4-gptq

I encountered the first error when converting the model weights:

python3 examples/qwen/convert_checkpoint.py \
    --model_dir  ./Qwen-72B-Chat-Int4/ \
    --output_dir ./Qwen-72B-Chat-Int4-TRT/tllm_checkpoint_4gpu_tp4_gptq/ \
    --dtype float16 \
    --use_weight_only \
    --weight_only_precision int4_gptq \
    --tp_size 4 \
    --pp_size 1 \
    --per_group

ValueError: You are trying to save a non contiguous tensor: `transformer.layers.0.mlp.gate.weights_scaling_factor` which is not allowed. It either means you are trying to save tensors which are reference of each other in which case it's recommended to save only the full tensors, and reslice at load time, or simply call `.contiguous()` on your tensor to pack it before saving.

I can fix this error by following the modifications below:

TensorRT-LLM/examples/qwen/convert_checkpoint.py

Lines 250 to 254 in 66ca337

    
           results = { 
        
               f'{tllm_prex}.weight': qweight_interleaved, 
        
               f'{tllm_prex}.weights_scaling_factor': scales_fp16, 
        
               f'{tllm_prex}.zero': zeros_x_scales_fp16, 
        
           }

Then, I try to build the engine:

trtllm-build \
    --checkpoint_dir   ./Qwen-72B-Chat-Int4-TRT/tllm_checkpoint_4gpu_tp4_gptq/ \
    --output_dir       ./Qwen-72B-Chat-Int4-TRT/trt_engines/int4-gptq/4-gpu/ \
    --max_batch_size   1    \
    --max_input_len    2048 \
    --max_output_len   512  \
    --gather_all_token_logits \
    --gemm_plugin float16 \
    --tp_size 4

Then I got:

RuntimeError: Encounter error 'The value updated is not the same shape as the original. Updated: (8192, 6144), original: (8192, 1536)' for parameter 'transformer.layers.0.attention.qkv.weight'

I don't know how to fix it

Hukongtao · 2024-03-25T09:14:26Z

One more point to add，
under commit 4bb65f2, it can run successfully.

Hukongtao · 2024-03-26T03:01:34Z

Looking forward to your reply @byshiue

adamydwang · 2024-03-27T04:08:28Z

I suffered this problem too.
qwen-72b-chat, tp=4, smoothquant

ZhangJinxin1 · 2024-04-03T08:08:58Z

I suffered this problem too.
qwen-72b-chat, tp=8, smoothquant

HermitSun · 2024-04-09T09:40:13Z

+1
qwen-72b-chat, tp=4, gptq

Hukongtao · 2024-04-10T04:30:32Z

+1 qwen-72b-chat, tp=4, gptq

Is the error message you encountered the same as mine?

HermitSun · 2024-04-10T05:57:39Z

+1 qwen-72b-chat, tp=4, gptq

Is the error message you encountered the same as mine?

Yes. And I used same cmds to build the engine under trt-llm 0.9.0.dev2024040200.

Hukongtao · 2024-04-10T10:11:35Z

@byshiue @Tracin Do you have any plan to fix this bug?

Hukongtao · 2024-04-10T14:09:15Z

If I modify the code like this, the code run successfully. But the inference result(run.py) is weried.

Tracin · 2024-04-17T02:17:49Z

@Hukongtao Sure, I am working on this, I will keep you posted.

Tracin · 2024-04-17T05:53:13Z

@Hukongtao @HermitSun
I will push a MR to fix this. If you want to fix it in advance, please try to replace this section to:

        for suf in suffixs:
            qkv_part = model_params[prefix + "attn.c_attn." + suf]
            split_dim = qkv_part.shape[1] // 3
            q_part = torch_split(qkv_part[:, :split_dim], dim=1)
            k_part = torch_split(qkv_part[:, split_dim : split_dim * 2], dim=1)
            v_part = torch_split(qkv_part[:, -split_dim:], dim=1)
            qkv_part = torch.cat([q_part, k_part, v_part], 1)
            qkv_weight_list.append(qkv_part)

Tracin · 2024-04-17T05:56:01Z

@adamydwang @ZhangJinxin1
As for SmoothQuant problem, do we have an issues on Github? Let's move to there.

HermitSun · 2024-04-17T08:35:20Z

@Hukongtao @HermitSun I will push a MR to fix this. If you want to fix it in advance, please try to replace this section to:

        for suf in suffixs:
            qkv_part = model_params[prefix + "attn.c_attn." + suf]
            split_dim = qkv_part.shape[1] // 3
            q_part = torch_split(qkv_part[:, :split_dim], dim=1)
            k_part = torch_split(qkv_part[:, split_dim : split_dim * 2], dim=1)
            v_part = torch_split(qkv_part[:, -split_dim:], dim=1)
            qkv_part = torch.cat([q_part, k_part, v_part], 1)
            qkv_weight_list.append(qkv_part)

It works in my situation. Thank you for your effort!

Hukongtao · 2024-04-17T13:00:15Z

@Hukongtao @HermitSun I will push a MR to fix this. If you want to fix it in advance, please try to replace this section to:

        for suf in suffixs:
            qkv_part = model_params[prefix + "attn.c_attn." + suf]
            split_dim = qkv_part.shape[1] // 3
            q_part = torch_split(qkv_part[:, :split_dim], dim=1)
            k_part = torch_split(qkv_part[:, split_dim : split_dim * 2], dim=1)
            v_part = torch_split(qkv_part[:, -split_dim:], dim=1)
            qkv_part = torch.cat([q_part, k_part, v_part], 1)
            qkv_weight_list.append(qkv_part)

Thank you for your work!

Hukongtao added the bug Something isn't working label Mar 24, 2024

byshiue added the need more info label Mar 25, 2024

byshiue assigned Tracin Mar 27, 2024

byshiue removed the need more info label Mar 27, 2024

Tracin closed this as completed Apr 18, 2024

kaiyux mentioned this issue Apr 24, 2024

Update TensorRT-LLM #1492

Merged

kaiyux mentioned this issue Jun 5, 2024

TensorRT-LLM v0.10 update #1734

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen-72B-chat-GPTQ TP=4 ERROR #1344

Qwen-72B-chat-GPTQ TP=4 ERROR #1344

Hukongtao commented Mar 24, 2024

Hukongtao commented Mar 24, 2024

byshiue commented Mar 25, 2024

Hukongtao commented Mar 25, 2024

Hukongtao commented Mar 25, 2024

Hukongtao commented Mar 26, 2024

adamydwang commented Mar 27, 2024

ZhangJinxin1 commented Apr 3, 2024

HermitSun commented Apr 9, 2024

Hukongtao commented Apr 10, 2024

HermitSun commented Apr 10, 2024 •

edited

Loading

Hukongtao commented Apr 10, 2024

Hukongtao commented Apr 10, 2024

Tracin commented Apr 17, 2024

Tracin commented Apr 17, 2024 •

edited

Loading

Tracin commented Apr 17, 2024 •

edited

Loading

HermitSun commented Apr 17, 2024

Hukongtao commented Apr 17, 2024

Qwen-72B-chat-GPTQ TP=4 ERROR #1344

Qwen-72B-chat-GPTQ TP=4 ERROR #1344

Comments

Hukongtao commented Mar 24, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Hukongtao commented Mar 24, 2024

byshiue commented Mar 25, 2024

Hukongtao commented Mar 25, 2024

Hukongtao commented Mar 25, 2024

Hukongtao commented Mar 26, 2024

adamydwang commented Mar 27, 2024

ZhangJinxin1 commented Apr 3, 2024

HermitSun commented Apr 9, 2024

Hukongtao commented Apr 10, 2024

HermitSun commented Apr 10, 2024 • edited Loading

Hukongtao commented Apr 10, 2024

Hukongtao commented Apr 10, 2024

Tracin commented Apr 17, 2024

Tracin commented Apr 17, 2024 • edited Loading

Tracin commented Apr 17, 2024 • edited Loading

HermitSun commented Apr 17, 2024

Hukongtao commented Apr 17, 2024

HermitSun commented Apr 10, 2024 •

edited

Loading

Tracin commented Apr 17, 2024 •

edited

Loading

Tracin commented Apr 17, 2024 •

edited

Loading