-
Notifications
You must be signed in to change notification settings - Fork 284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] qwen2 awq量化微调后的模型报错 #1836
Comments
Can you paste the output of running |
下面是lmdeploy环境下运行awq量化的所有输出结果 |
不是这个,是执行命令 “lmdeploy check_env”,它会把环境信息打印出来。我们想看下在哪个环境中可以复现这个问题 |
Related to #1786, env is listed |
执行lmdeploy check_env报错了,报错信息如下: |
可能和torch的版本有关系。我在torch2.1.0 + cu118 下也遇到了 nan 的问题,但是在 torch 2.1.2 + cu12 下是正常的。 你方便创建 cuda 12的环境试试么? |
是跟torch 版本有关,我这边相同环境,torch2.1.2 + cu118 降到 torch2.1.0 + cu118 就会 Nan。可能需要更新下发布的 docker 内的 torch 版本。 |
好的,谢谢,我在cuda12环境下试下 |
Checklist
Describe the bug
使用lmdeploy lite auto_awq将sft后的qwen2-7b进行awq量化,报错assert torch.isnan(p).sum() == 0
Reproduction
lmdeploy lite auto_awq
qwen2-sft-checkpoint-1506-merged
--calib-dataset 'c4'
--calib-samples 128
--calib-seqlen 4096
--work-dir qwen2_7b_qg_2_epoch_awq
Environment
Error traceback
The text was updated successfully, but these errors were encountered: