Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

world_size = 2 will raise error "array split does not result in an equal division" #374

Closed
yank666 opened this issue Nov 14, 2023 · 6 comments
Assignees
Labels
bug Something isn't working triaged Issue has been triaged by maintainers

Comments

@yank666
Copy link

yank666 commented Nov 14, 2023

Model : YeungNLP/bloomz-2b6-zh

Exec COMMAND:
python build.py --model_dir bloomz-2b6
--dtype float16
--use_gemm_plugin float16
--use_gpt_attention_plugin float16
--output_dir ./bloom_trt/trt_engines/fp16/2-gpu/
--world_size 2

ERROR log:
[TRT-LLM] [I] Loading weights from HF BLOOM...
Traceback (most recent call last):
File "/app/tensorrt_llm/examples/bloom/build.py", line 515, in
build(0, args)
File "/app/tensorrt_llm/examples/bloom/build.py", line 485, in build
engine = build_rank_engine(builder, builder_config, engine_name,
File "/app/tensorrt_llm/examples/bloom/build.py", line 364, in build_rank_engine
load_from_hf_bloom(tensorrt_llm_bloom,
File "/app/tensorrt_llm/examples/bloom/weight.py", line 200, in load_from_hf_bloom
tensorrt_llm_bloom.lm_head.weight.value = split_matrix_tp(
File "/app/tensorrt_llm/examples/bloom/weight.py", line 90, in split_matrix_tp
return np.ascontiguousarray(split(v, tensor_parallel, rank, dim=dim))
File "/app/tensorrt_llm/examples/bloom/weight.py", line 34, in split
return np.ascontiguousarray(np.split(v, tp_size, axis=dim)[idx])
File "<array_function internals>", line 180, in split
File "/usr/local/lib/python3.10/dist-packages/numpy/lib/shape_base.py", line 872, in split
raise ValueError(
ValueError: array split does not result in an equal division

@byshiue
Copy link
Collaborator

byshiue commented Nov 14, 2023

Thank you for the report.
You can replace the codes here https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/bloom/weight.py#L197-L199 by

    if not share_embedding_table:
        vocab_size = embed_w.shape[0]
        lm_head_weight = embed_w.copy()
        if vocab_size % tensor_parallel != 0:
            # padding
            vocab_size_padded = tensorrt_llm_bloom.lm_head.out_features * tensor_parallel
            pad_width = vocab_size_padded - vocab_size
            lm_head_weight = np.pad(lm_head_weight, ((pad_width, 0), (0, 0)),
                                    'constant',
                                    constant_values=0)
        tensorrt_llm_bloom.lm_head.weight.value = split_matrix_tp(
            lm_head_weight, tensor_parallel, rank, dim=0)

We will fix it soon.

@byshiue byshiue self-assigned this Nov 14, 2023
@byshiue byshiue added bug Something isn't working triaged Issue has been triaged by maintainers labels Nov 14, 2023
@cocoking99
Copy link

@byshiue
I had a similar problem when building the llama 13b model.
The error is as follows:
python3 build.py --model_dir=llama13b --dtype float16 --use_gpt_attention_plugin float16 --use_gemm_plugin float16 --enable_context_fmha --remove_input_padding --output_dir trtModel_tp --world_size 2 --tp_size 2

/TensorRT-LLM/examples/llama/weight.py:237 in load_from_hf_llama │
│ │
│ 234 │ │ elif 'lm_head.weight' in k: │
│ 235 │ │ │ if mapping.is_last_pp_rank(): │
│ 236 │ │ │ │ tensorrt_llm_llama.lm_head.weight.value = np.ascontiguousarray( │
│ ❱ 237 │ │ │ │ │ split(v, mapping.tp_size, mapping.tp_rank)) │
│ 238 │ │ else: │
│ 239 │ │ │ layer_idx = extract_layer_idx(k) │
│ 240 │ │ │ if layer_idx is None or int(layer_idx) not in layers_range: │
│ │
│ /TensorRT-LLM/examples/llama/weight.py:145 in split │
│ │
│ 142 │ if len(v.shape) == 1: │
│ 143 │ │ return np.ascontiguousarray(np.split(v, tp_size)[idx]) │
│ 144 │ else: │
│ ❱ 145 │ │ return np.ascontiguousarray(np.split(v, tp_size, axis=dim)[idx]) │
│ 146 │
│ 147 │
│ 148 def dup_kv_weight(v, num_head, tp_size): │
│ │
│ /usr/local/lib/python3.10/dist-packages/numpy/lib/shape_base.py:864 in split │
│ │
│ 861 │ │ sections = indices_or_sections │
│ 862 │ │ N = ary.shape[axis] │
│ 863 │ │ if N % sections: │
│ ❱ 864 │ │ │ raise ValueError( │
│ 865 │ │ │ │ 'array split does not result in an equal division') from None │
│ 866 │ return array_split(ary, indices_or_sections, axis) │
│ 867 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: array split does not result in an equal division

How to fix it?

@byshiue
Copy link
Collaborator

byshiue commented Nov 14, 2023

@byshiue I had a similar problem when building the llama 13b model. The error is as follows: python3 build.py --model_dir=llama13b --dtype float16 --use_gpt_attention_plugin float16 --use_gemm_plugin float16 --enable_context_fmha --remove_input_padding --output_dir trtModel_tp --world_size 2 --tp_size 2

/TensorRT-LLM/examples/llama/weight.py:237 in load_from_hf_llama │ │ │ │ 234 │ │ elif 'lm_head.weight' in k: │ │ 235 │ │ │ if mapping.is_last_pp_rank(): │ │ 236 │ │ │ │ tensorrt_llm_llama.lm_head.weight.value = np.ascontiguousarray( │ │ ❱ 237 │ │ │ │ │ split(v, mapping.tp_size, mapping.tp_rank)) │ │ 238 │ │ else: │ │ 239 │ │ │ layer_idx = extract_layer_idx(k) │ │ 240 │ │ │ if layer_idx is None or int(layer_idx) not in layers_range: │ │ │ │ /TensorRT-LLM/examples/llama/weight.py:145 in split │ │ │ │ 142 │ if len(v.shape) == 1: │ │ 143 │ │ return np.ascontiguousarray(np.split(v, tp_size)[idx]) │ │ 144 │ else: │ │ ❱ 145 │ │ return np.ascontiguousarray(np.split(v, tp_size, axis=dim)[idx]) │ │ 146 │ │ 147 │ │ 148 def dup_kv_weight(v, num_head, tp_size): │ │ │ │ /usr/local/lib/python3.10/dist-packages/numpy/lib/shape_base.py:864 in split │ │ │ │ 861 │ │ sections = indices_or_sections │ │ 862 │ │ N = ary.shape[axis] │ │ 863 │ │ if N % sections: │ │ ❱ 864 │ │ │ raise ValueError( │ │ 865 │ │ │ │ 'array split does not result in an equal division') from None │ │ 866 │ return array_split(ary, indices_or_sections, axis) │ │ 867 │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ ValueError: array split does not result in an equal division

How to fix it?

Can you share the config.json of your model? By defualt, the vocab_size of llama-13b is 32000, it should be divided by 2.

@cocoking99
Copy link

@byshiue I had a similar problem when building the llama 13b model. The error is as follows: python3 build.py --model_dir=llama13b --dtype float16 --use_gpt_attention_plugin float16 --use_gemm_plugin float16 --enable_context_fmha --remove_input_padding --output_dir trtModel_tp --world_size 2 --tp_size 2
/TensorRT-LLM/examples/llama/weight.py:237 in load_from_hf_llama │ │ │ │ 234 │ │ elif 'lm_head.weight' in k: │ │ 235 │ │ │ if mapping.is_last_pp_rank(): │ │ 236 │ │ │ │ tensorrt_llm_llama.lm_head.weight.value = np.ascontiguousarray( │ │ ❱ 237 │ │ │ │ │ split(v, mapping.tp_size, mapping.tp_rank)) │ │ 238 │ │ else: │ │ 239 │ │ │ layer_idx = extract_layer_idx(k) │ │ 240 │ │ │ if layer_idx is None or int(layer_idx) not in layers_range: │ │ │ │ /TensorRT-LLM/examples/llama/weight.py:145 in split │ │ │ │ 142 │ if len(v.shape) == 1: │ │ 143 │ │ return np.ascontiguousarray(np.split(v, tp_size)[idx]) │ │ 144 │ else: │ │ ❱ 145 │ │ return np.ascontiguousarray(np.split(v, tp_size, axis=dim)[idx]) │ │ 146 │ │ 147 │ │ 148 def dup_kv_weight(v, num_head, tp_size): │ │ │ │ /usr/local/lib/python3.10/dist-packages/numpy/lib/shape_base.py:864 in split │ │ │ │ 861 │ │ sections = indices_or_sections │ │ 862 │ │ N = ary.shape[axis] │ │ 863 │ │ if N % sections: │ │ ❱ 864 │ │ │ raise ValueError( │ │ 865 │ │ │ │ 'array split does not result in an equal division') from None │ │ 866 │ return array_split(ary, indices_or_sections, axis) │ │ 867 │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ ValueError: array split does not result in an equal division
How to fix it?

Can you share the config.json of your model? By defualt, the vocab_size of llama-13b is 32000, it should be divided by 2.

in my model, the vocab_size is 151851

@byshiue
Copy link
Collaborator

byshiue commented Nov 14, 2023

You can refer the above idea to add pad into the lm_head.

@yank666
Copy link
Author

yank666 commented Nov 15, 2023

solved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

3 participants