[0.11.0] Model Conversion OOMed in P4D.24xl #2093

lanking520 · 2024-08-06T20:45:05Z

System Info

A100 40GB x8, Ubuntu 22.04

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

python3 llama/convert_checkpoint.py --model_dir meta-llama/Meta-Llama-3-70B-Instruct --dtype float16 --output_dir /tmp/trtllm_llama_ckpt/ --tp_size 8 --pp_size 1 --workers 8

Tried to use 8 workers to compile the model for parallism. However, got OOMed issue with TRTLLM

Expected behavior

Model is able to run conversion steps without getting OOMed

actual behavior

INFO  LmiUtils convert_py: Loading checkpoint shards: 100%|??????????| 30/30 [00:42<00:00,  1.22s/it]
INFO  LmiUtils convert_py: Loading checkpoint shards: 100%|??????????| 30/30 [00:42<00:00,  1.42s/it]
INFO  LmiUtils convert_py: Traceback (most recent call last):
INFO  LmiUtils convert_py:   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/build_scripts/llama/convert_checkpoint.py", line 409, in execute
INFO  LmiUtils convert_py:     future.result()
INFO  LmiUtils convert_py:   File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result
INFO  LmiUtils convert_py:     return self.__get_result()
INFO  LmiUtils convert_py:   File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
INFO  LmiUtils convert_py:     raise self._exception
INFO  LmiUtils convert_py:   File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
INFO  LmiUtils convert_py:     result = self.fn(*self.args, **self.kwargs)
INFO  LmiUtils convert_py:   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/build_scripts/llama/convert_checkpoint.py", line 367, in convert_and_save_rank
INFO  LmiUtils convert_py:     llama = LLaMAForCausalLM.from_hugging_face(
INFO  LmiUtils convert_py:   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/model.py", line 317, in from_hugging_face
INFO  LmiUtils convert_py:     weights = load_weights_from_hf_model(hf_model, config)
INFO  LmiUtils convert_py:   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 1128, in load_weights_from_hf_model
INFO  LmiUtils convert_py:     convert_layer(l)
INFO  LmiUtils convert_py:   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 1006, in convert_layer
INFO  LmiUtils convert_py:     mlp_gate_weight = get_weight(model_params, prefix + 'mlp.up_proj',
INFO  LmiUtils convert_py:   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 431, in get_weight
INFO  LmiUtils convert_py:     config[prefix + '.weight'].data = config[prefix + '.weight'].to(dtype)
INFO  LmiUtils convert_py: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 448.00 MiB. GPU
INFO  LmiUtils convert_py: Traceback (most recent call last):
INFO  LmiUtils convert_py:   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/build_scripts/llama/convert_checkpoint.py", line 409, in execute
INFO  LmiUtils convert_py:     future.result()
INFO  LmiUtils convert_py:   File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result
INFO  LmiUtils convert_py:     return self.__get_result()
INFO  LmiUtils convert_py:   File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
INFO  LmiUtils convert_py:     raise self._exception
INFO  LmiUtils convert_py:   File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
INFO  LmiUtils convert_py:     result = self.fn(*self.args, **self.kwargs)
INFO  LmiUtils convert_py:   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/build_scripts/llama/convert_checkpoint.py", line 367, in convert_and_save_rank
INFO  LmiUtils convert_py:     llama = LLaMAForCausalLM.from_hugging_face(
INFO  LmiUtils convert_py:   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/model.py", line 317, in from_hugging_face
INFO  LmiUtils convert_py:     weights = load_weights_from_hf_model(hf_model, config)
INFO  LmiUtils convert_py:   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 1128, in load_weights_from_hf_model
INFO  LmiUtils convert_py:     convert_layer(l)
INFO  LmiUtils convert_py:   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 1006, in convert_layer
INFO  LmiUtils convert_py:     mlp_gate_weight = get_weight(model_params, prefix + 'mlp.up_proj',
INFO  LmiUtils convert_py:   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 431, in get_weight
INFO  LmiUtils convert_py:     config[prefix + '.weight'].data = config[prefix + '.weight'].to(dtype)
INFO  LmiUtils convert_py: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 448.00 MiB. GPU

additional notes

Changing workers number to 1 could mitigate the issue. But very slow

The text was updated successfully, but these errors were encountered:

Kefeng-Duan · 2024-08-21T07:25:54Z

@lanking520 please enable load_by_shard when you set 8 works, or each work load full weights

lanking520 added the bug Something isn't working label Aug 6, 2024

lfr-0531 assigned Kefeng-Duan Sep 4, 2024

lfr-0531 added not a bug Some known limitation, but not a bug. triaged Issue has been triaged by maintainers and removed bug Something isn't working labels Sep 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[0.11.0] Model Conversion OOMed in P4D.24xl #2093

[0.11.0] Model Conversion OOMed in P4D.24xl #2093

lanking520 commented Aug 6, 2024

Kefeng-Duan commented Aug 21, 2024

[0.11.0] Model Conversion OOMed in P4D.24xl #2093

[0.11.0] Model Conversion OOMed in P4D.24xl #2093

Comments

lanking520 commented Aug 6, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Kefeng-Duan commented Aug 21, 2024