does the chatglm2-6B example support codegeex2-6b's building and running? #93

thendwk · 2023-10-24T08:43:05Z

i tried codegeex2-6b's building and running with chatglm2-6B example, but it resulted incorrectly.
the result listed as follows:
root@***:/app/tensorrt_llm/examples/chatglm2-6b# python3 run.py --input_text '# language: Python\n# write a bubble sort function\n'
[10/24/2023-08:24:41] [TRT] [I] Loaded engine size: 11921 MiB
[10/24/2023-08:24:43] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 12584, GPU 12164 (MiB)
[10/24/2023-08:24:43] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 12586, GPU 12174 (MiB)
[10/24/2023-08:24:43] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +11912, now: CPU 0, GPU 11912 (MiB)
[10/24/2023-08:24:43] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 12585, GPU 15910 (MiB)
[10/24/2023-08:24:43] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +1, GPU +8, now: CPU 12586, GPU 15918 (MiB)
[10/24/2023-08:24:44] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 11912 (MiB)
[10/24/2023-08:24:44] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 12680, GPU 15940 (MiB)
[10/24/2023-08:24:44] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 12680, GPU 15950 (MiB)
[10/24/2023-08:24:44] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 11912 (MiB)

Input --->

language: Python\n# write a bubble sort function\n

Output --->
\n#bubblesort\n#takesalistofnumbers\n#returnsasortedlist\n\ndefbubble_sort(numbers):\nforiinrange(len(numbers)):\nforjinrange(len(numbers)-1):\nifnumbers[j]>numbers[j+1]:\nnumbers[j],numbers[j+1]=numbers[j+1],numbers[j]\nreturnnumbers\n\nprint(bubble_sort([5,2,1,8,4]))\n\n#bubblesort\n#takesalistofnumbers\n#returnsasortedlist\n\ndefbubble_sort(numbers):\nforiinrange(len(numbers)):\nforjinrange(len(numbers)-1):\nifnumbers[j]>numbers[j+1]:\nnumbers[j],numbers[j+1]=numbers[j+1],numbers[j]\nreturnnumbers\n\nprint(bubble_sort([5,2,1,8,4]))\n\n#bubblesort\n#takesalistofnumbers\n#returnsasortedlist\n\ndefbubble_sort(numbers):\nforiinrange(len(numbers)):\nforjinrange(len(numbers)-1):\nifnumbers[j]>numbers[j+1]:\nnumbers[j],numbers[j+1]=numbers[j+1],numbers[j]\nreturnnumbers\n\nprint(bubble_sort([5,2,1,8,4]))\n\n#bubblesort\n#takesalistofnumbers\n#returnsasortedlist\n\ndefbubble_sort(numbers):\nforiinrange(len(numbers)):\nforjinrange(len(numbers)-1):\nifnumbers[j]>numbers[j+1]:\nnumbers[j],numbers[j+1]=numbers[j+1],numbers[j]\nreturnnumbers\n\nprint(bubble_sort([5,2,1,8,4]))\n\n#bubblesort\n#takesalistofnumbers\n#returnsasortedlist\n\ndefbubble_sort(numbers):\nforiinrange(len(numbers)):\nforjinrange(len(numbers)-1):\nifnumbers[j]>numbers[j+1]:\nnumbers[j],numbers[j+1]=numbers[j+1],numbers[j]\nreturnnumbers\n\nprint(bubble_sort([5,2,1,8,4]))\n\n#bubblesort\n#takesalistofnumbers\n#returnsasortedlist\n\ndefbubble_sort(numbers):\nforiinrange(len(numbers)):\nforjinrange(len(numbers)-1):\nifnumbers[j]>numbers[j+1]:\nnumbers[j],numbers[j+1]=numbers[j+1],numbers[j]\nreturnnumbers\n\nprint(bubble_sort([5,2,1,8,4]))\n\n#bubblesort\n#takesalistofnumbers\n#returnsasortedlist\n\ndefbubble_sort(numbers):\nforiinrange(len(numbers)):\nforjinrange(len(numbers)-1):\nifnumbers[j]>numbers[j+1]:\nnumbers[j],numbers[j+1]=numbers[j+1],numbers[j]\nreturnnumbers\n\nprint(bubble_sort([5,2,1,8,4]))\n\n#bubblesort\n#takesalistofnumbers\n#returnsasortedlist\n\ndefbubble_sort(numbers):\nforiinrange(len(numbers)):\nforjinrange(len(numbers)-1):\nifnumbers[j]>numbers[j+1]:\n

Finished!

byshiue · 2023-10-24T08:54:45Z

Can you share the expected results and the steps you use to generate the results of TRT-LLM?

thendwk · 2023-10-24T09:03:53Z

Can you share the expected results and the steps you use to generate the results of TRT-LLM?

expected result is something like this:
`def bubble_sort(list):
for i in range(len(list) - 1):
for j in range(len(list) - 1):
if list[j] > list[j + 1]:
list[j], list[j + 1] = list[j + 1], list[j]
return list

print(bubble_sort([5, 2, 1, 8, 4]))`

my steps as follows:
1、git clone model：git clone https://huggingface.co/THUDM/codegeex2-6b
2、run build.py
python3 build.py --model_dir=/docker_storage/codegeex/codegeex2-6b/ \ --dtype float16 \ --use_gpt_attention_plugin float16 \ --use_gemm_plugin float16 \ --max_input_len 2048 \ --max_output_len 1024
doing above throws exception，i modified build.py source code as follows:
hf_model = transformers.AutoModel.from_pretrained( args.model_dir, trust_remote_code=True, torch_dtype=torch.float16).cpu()
maybe codogeex2-6b is trained in bf16
3、run prediction
python3 run.py --input_text '# language: Python\n# write a bubble sort function\n'

byshiue · 2023-10-24T09:29:46Z

The TRT-LLM results you shared at the beginning is not very different to the HF's result. Can you try building the engine with FP32 (because chatglm2 does not support BF16 now) first?

thendwk · 2023-10-24T09:50:29Z

The TRT-LLM results you shared at the beginning is not very different to the HF's result. Can you try building the engine with FP32 (because chatglm2 does not support BF16 now) first?

build in default mode, it throws exception as follows:

does this indicate that codegeex2-6B is trained in bf16 and TensorRT-LLM does not support build codegeex2-6B？

byshiue · 2023-10-24T12:51:36Z

The TRT-LLM results you shared at the beginning is not very different to the HF's result. Can you try building the engine with FP32 (because chatglm2 does not support BF16 now) first?

build in default mode, it throws exception as follows: does this indicate that codegeex2-6B is trained in bf16 and TensorRT-LLM does not support build codegeex2-6B？

For BF16 weight, TensorRT-LLM chatglm2 does not support now. It requires to add some flags and data type converter. So, that's why I mention you can try FP32 first.

thendwk · 2023-10-24T13:09:19Z

The TRT-LLM results you shared at the beginning is not very different to the HF's result. Can you try building the engine with FP32 (because chatglm2 does not support BF16 now) first?

build in default mode, it throws exception as follows: does this indicate that codegeex2-6B is trained in bf16 and TensorRT-LLM does not support build codegeex2-6B？

For BF16 weight, TensorRT-LLM chatglm2 does not support now. It requires to add some flags and data type converter. So, that's why I mention you can try FP32 first.

ok，i got，thx

byshiue · 2023-10-25T03:05:38Z

hf_model = transformers.AutoModel.from_pretrained( args.model_dir, trust_remote_code=True, torch_dtype=torch.float16).cpu()

I take a try and find that chatglm2 only supports FP16 now. So, you cannot run it on FP32. We will fix it soon.

thendwk · 2023-10-25T03:09:55Z

hf_model = transformers.AutoModel.from_pretrained( args.model_dir, trust_remote_code=True, torch_dtype=torch.float16).cpu()

I take a try and find that chatglm2 only supports FP16 now. So, you cannot run it on FP32. We will fix it soon.

okkk thx

byshiue · 2023-10-27T08:14:55Z

This issue is fixed by this MR #148, you can try on latest main branch. Close this bug. Feel free to reopen if needed.

byshiue self-assigned this Oct 24, 2023

byshiue added the triaged Issue has been triaged by maintainers label Oct 24, 2023

byshiue mentioned this issue Oct 27, 2023

benchmark for chatglm2-6b failed #138

Closed

byshiue closed this as completed Oct 27, 2023

byshiue added the bug Something isn't working label Oct 27, 2023

juney-nvidia mentioned this issue Oct 28, 2023

benchmark for chatglm2-6b failed #141

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

does the chatglm2-6B example support codegeex2-6b's building and running? #93

does the chatglm2-6B example support codegeex2-6b's building and running? #93

thendwk commented Oct 24, 2023 •

edited

Loading

byshiue commented Oct 24, 2023

thendwk commented Oct 24, 2023

byshiue commented Oct 24, 2023

thendwk commented Oct 24, 2023 •

edited

Loading

byshiue commented Oct 24, 2023

thendwk commented Oct 24, 2023

byshiue commented Oct 25, 2023

thendwk commented Oct 25, 2023

byshiue commented Oct 27, 2023

does the chatglm2-6B example support codegeex2-6b's building and running? #93

does the chatglm2-6B example support codegeex2-6b's building and running? #93

Comments

thendwk commented Oct 24, 2023 • edited Loading

language: Python\n# write a bubble sort function\n

byshiue commented Oct 24, 2023

thendwk commented Oct 24, 2023

byshiue commented Oct 24, 2023

thendwk commented Oct 24, 2023 • edited Loading

byshiue commented Oct 24, 2023

thendwk commented Oct 24, 2023

byshiue commented Oct 25, 2023

thendwk commented Oct 25, 2023

byshiue commented Oct 27, 2023

thendwk commented Oct 24, 2023 •

edited

Loading

thendwk commented Oct 24, 2023 •

edited

Loading