Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[0.11.0] Model Conversion OOMed in P4D.24xl #2093

Open
4 tasks
lanking520 opened this issue Aug 6, 2024 · 1 comment
Open
4 tasks

[0.11.0] Model Conversion OOMed in P4D.24xl #2093

lanking520 opened this issue Aug 6, 2024 · 1 comment
Assignees
Labels
not a bug Some known limitation, but not a bug. triaged Issue has been triaged by maintainers

Comments

@lanking520
Copy link

System Info

A100 40GB x8, Ubuntu 22.04

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

python3 llama/convert_checkpoint.py --model_dir meta-llama/Meta-Llama-3-70B-Instruct --dtype float16 --output_dir /tmp/trtllm_llama_ckpt/ --tp_size 8 --pp_size 1 --workers 8

Tried to use 8 workers to compile the model for parallism. However, got OOMed issue with TRTLLM

Expected behavior

Model is able to run conversion steps without getting OOMed

actual behavior

INFO  LmiUtils convert_py: Loading checkpoint shards: 100%|??????????| 30/30 [00:42<00:00,  1.22s/it]
INFO  LmiUtils convert_py: Loading checkpoint shards: 100%|??????????| 30/30 [00:42<00:00,  1.42s/it]
INFO  LmiUtils convert_py: Traceback (most recent call last):
INFO  LmiUtils convert_py:   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/build_scripts/llama/convert_checkpoint.py", line 409, in execute
INFO  LmiUtils convert_py:     future.result()
INFO  LmiUtils convert_py:   File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result
INFO  LmiUtils convert_py:     return self.__get_result()
INFO  LmiUtils convert_py:   File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
INFO  LmiUtils convert_py:     raise self._exception
INFO  LmiUtils convert_py:   File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
INFO  LmiUtils convert_py:     result = self.fn(*self.args, **self.kwargs)
INFO  LmiUtils convert_py:   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/build_scripts/llama/convert_checkpoint.py", line 367, in convert_and_save_rank
INFO  LmiUtils convert_py:     llama = LLaMAForCausalLM.from_hugging_face(
INFO  LmiUtils convert_py:   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/model.py", line 317, in from_hugging_face
INFO  LmiUtils convert_py:     weights = load_weights_from_hf_model(hf_model, config)
INFO  LmiUtils convert_py:   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 1128, in load_weights_from_hf_model
INFO  LmiUtils convert_py:     convert_layer(l)
INFO  LmiUtils convert_py:   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 1006, in convert_layer
INFO  LmiUtils convert_py:     mlp_gate_weight = get_weight(model_params, prefix + 'mlp.up_proj',
INFO  LmiUtils convert_py:   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 431, in get_weight
INFO  LmiUtils convert_py:     config[prefix + '.weight'].data = config[prefix + '.weight'].to(dtype)
INFO  LmiUtils convert_py: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 448.00 MiB. GPU
INFO  LmiUtils convert_py: Traceback (most recent call last):
INFO  LmiUtils convert_py:   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/build_scripts/llama/convert_checkpoint.py", line 409, in execute
INFO  LmiUtils convert_py:     future.result()
INFO  LmiUtils convert_py:   File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result
INFO  LmiUtils convert_py:     return self.__get_result()
INFO  LmiUtils convert_py:   File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
INFO  LmiUtils convert_py:     raise self._exception
INFO  LmiUtils convert_py:   File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
INFO  LmiUtils convert_py:     result = self.fn(*self.args, **self.kwargs)
INFO  LmiUtils convert_py:   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/build_scripts/llama/convert_checkpoint.py", line 367, in convert_and_save_rank
INFO  LmiUtils convert_py:     llama = LLaMAForCausalLM.from_hugging_face(
INFO  LmiUtils convert_py:   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/model.py", line 317, in from_hugging_face
INFO  LmiUtils convert_py:     weights = load_weights_from_hf_model(hf_model, config)
INFO  LmiUtils convert_py:   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 1128, in load_weights_from_hf_model
INFO  LmiUtils convert_py:     convert_layer(l)
INFO  LmiUtils convert_py:   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 1006, in convert_layer
INFO  LmiUtils convert_py:     mlp_gate_weight = get_weight(model_params, prefix + 'mlp.up_proj',
INFO  LmiUtils convert_py:   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 431, in get_weight
INFO  LmiUtils convert_py:     config[prefix + '.weight'].data = config[prefix + '.weight'].to(dtype)
INFO  LmiUtils convert_py: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 448.00 MiB. GPU

additional notes

Changing workers number to 1 could mitigate the issue. But very slow

@lanking520 lanking520 added the bug Something isn't working label Aug 6, 2024
@Kefeng-Duan
Copy link
Collaborator

@lanking520 please enable load_by_shard when you set 8 works, or each work load full weights

@lfr-0531 lfr-0531 added not a bug Some known limitation, but not a bug. triaged Issue has been triaged by maintainers and removed bug Something isn't working labels Sep 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
not a bug Some known limitation, but not a bug. triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

3 participants