world_size = 2 will raise error "array split does not result in an equal division" #374

yank666 · 2023-11-14T06:21:07Z

Model : YeungNLP/bloomz-2b6-zh

Exec COMMAND:
python build.py --model_dir bloomz-2b6
--dtype float16
--use_gemm_plugin float16
--use_gpt_attention_plugin float16
--output_dir ./bloom_trt/trt_engines/fp16/2-gpu/
--world_size 2

ERROR log:
[TRT-LLM] [I] Loading weights from HF BLOOM...
Traceback (most recent call last):
File "/app/tensorrt_llm/examples/bloom/build.py", line 515, in
build(0, args)
File "/app/tensorrt_llm/examples/bloom/build.py", line 485, in build
engine = build_rank_engine(builder, builder_config, engine_name,
File "/app/tensorrt_llm/examples/bloom/build.py", line 364, in build_rank_engine
load_from_hf_bloom(tensorrt_llm_bloom,
File "/app/tensorrt_llm/examples/bloom/weight.py", line 200, in load_from_hf_bloom
tensorrt_llm_bloom.lm_head.weight.value = split_matrix_tp(
File "/app/tensorrt_llm/examples/bloom/weight.py", line 90, in split_matrix_tp
return np.ascontiguousarray(split(v, tensor_parallel, rank, dim=dim))
File "/app/tensorrt_llm/examples/bloom/weight.py", line 34, in split
return np.ascontiguousarray(np.split(v, tp_size, axis=dim)[idx])
File "<array_function internals>", line 180, in split
File "/usr/local/lib/python3.10/dist-packages/numpy/lib/shape_base.py", line 872, in split
raise ValueError(
ValueError: array split does not result in an equal division

byshiue · 2023-11-14T07:57:42Z

Thank you for the report.
You can replace the codes here https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/bloom/weight.py#L197-L199 by

    if not share_embedding_table:
        vocab_size = embed_w.shape[0]
        lm_head_weight = embed_w.copy()
        if vocab_size % tensor_parallel != 0:
            # padding
            vocab_size_padded = tensorrt_llm_bloom.lm_head.out_features * tensor_parallel
            pad_width = vocab_size_padded - vocab_size
            lm_head_weight = np.pad(lm_head_weight, ((pad_width, 0), (0, 0)),
                                    'constant',
                                    constant_values=0)
        tensorrt_llm_bloom.lm_head.weight.value = split_matrix_tp(
            lm_head_weight, tensor_parallel, rank, dim=0)

We will fix it soon.

cocoking99 · 2023-11-14T08:21:13Z

@byshiue
I had a similar problem when building the llama 13b model.
The error is as follows:
python3 build.py --model_dir=llama13b --dtype float16 --use_gpt_attention_plugin float16 --use_gemm_plugin float16 --enable_context_fmha --remove_input_padding --output_dir trtModel_tp --world_size 2 --tp_size 2

/TensorRT-LLM/examples/llama/weight.py:237 in load_from_hf_llama │
│ │
│ 234 │ │ elif 'lm_head.weight' in k: │
│ 235 │ │ │ if mapping.is_last_pp_rank(): │
│ 236 │ │ │ │ tensorrt_llm_llama.lm_head.weight.value = np.ascontiguousarray( │
│ ❱ 237 │ │ │ │ │ split(v, mapping.tp_size, mapping.tp_rank)) │
│ 238 │ │ else: │
│ 239 │ │ │ layer_idx = extract_layer_idx(k) │
│ 240 │ │ │ if layer_idx is None or int(layer_idx) not in layers_range: │
│ │
│ /TensorRT-LLM/examples/llama/weight.py:145 in split │
│ │
│ 142 │ if len(v.shape) == 1: │
│ 143 │ │ return np.ascontiguousarray(np.split(v, tp_size)[idx]) │
│ 144 │ else: │
│ ❱ 145 │ │ return np.ascontiguousarray(np.split(v, tp_size, axis=dim)[idx]) │
│ 146 │
│ 147 │
│ 148 def dup_kv_weight(v, num_head, tp_size): │
│ │
│ /usr/local/lib/python3.10/dist-packages/numpy/lib/shape_base.py:864 in split │
│ │
│ 861 │ │ sections = indices_or_sections │
│ 862 │ │ N = ary.shape[axis] │
│ 863 │ │ if N % sections: │
│ ❱ 864 │ │ │ raise ValueError( │
│ 865 │ │ │ │ 'array split does not result in an equal division') from None │
│ 866 │ return array_split(ary, indices_or_sections, axis) │
│ 867 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: array split does not result in an equal division

How to fix it?

byshiue · 2023-11-14T09:03:08Z

@byshiue I had a similar problem when building the llama 13b model. The error is as follows: python3 build.py --model_dir=llama13b --dtype float16 --use_gpt_attention_plugin float16 --use_gemm_plugin float16 --enable_context_fmha --remove_input_padding --output_dir trtModel_tp --world_size 2 --tp_size 2

/TensorRT-LLM/examples/llama/weight.py:237 in load_from_hf_llama │ │ │ │ 234 │ │ elif 'lm_head.weight' in k: │ │ 235 │ │ │ if mapping.is_last_pp_rank(): │ │ 236 │ │ │ │ tensorrt_llm_llama.lm_head.weight.value = np.ascontiguousarray( │ │ ❱ 237 │ │ │ │ │ split(v, mapping.tp_size, mapping.tp_rank)) │ │ 238 │ │ else: │ │ 239 │ │ │ layer_idx = extract_layer_idx(k) │ │ 240 │ │ │ if layer_idx is None or int(layer_idx) not in layers_range: │ │ │ │ /TensorRT-LLM/examples/llama/weight.py:145 in split │ │ │ │ 142 │ if len(v.shape) == 1: │ │ 143 │ │ return np.ascontiguousarray(np.split(v, tp_size)[idx]) │ │ 144 │ else: │ │ ❱ 145 │ │ return np.ascontiguousarray(np.split(v, tp_size, axis=dim)[idx]) │ │ 146 │ │ 147 │ │ 148 def dup_kv_weight(v, num_head, tp_size): │ │ │ │ /usr/local/lib/python3.10/dist-packages/numpy/lib/shape_base.py:864 in split │ │ │ │ 861 │ │ sections = indices_or_sections │ │ 862 │ │ N = ary.shape[axis] │ │ 863 │ │ if N % sections: │ │ ❱ 864 │ │ │ raise ValueError( │ │ 865 │ │ │ │ 'array split does not result in an equal division') from None │ │ 866 │ return array_split(ary, indices_or_sections, axis) │ │ 867 │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ ValueError: array split does not result in an equal division

How to fix it?

Can you share the config.json of your model? By defualt, the vocab_size of llama-13b is 32000, it should be divided by 2.

cocoking99 · 2023-11-14T09:30:55Z

@byshiue I had a similar problem when building the llama 13b model. The error is as follows: python3 build.py --model_dir=llama13b --dtype float16 --use_gpt_attention_plugin float16 --use_gemm_plugin float16 --enable_context_fmha --remove_input_padding --output_dir trtModel_tp --world_size 2 --tp_size 2
/TensorRT-LLM/examples/llama/weight.py:237 in load_from_hf_llama │ │ │ │ 234 │ │ elif 'lm_head.weight' in k: │ │ 235 │ │ │ if mapping.is_last_pp_rank(): │ │ 236 │ │ │ │ tensorrt_llm_llama.lm_head.weight.value = np.ascontiguousarray( │ │ ❱ 237 │ │ │ │ │ split(v, mapping.tp_size, mapping.tp_rank)) │ │ 238 │ │ else: │ │ 239 │ │ │ layer_idx = extract_layer_idx(k) │ │ 240 │ │ │ if layer_idx is None or int(layer_idx) not in layers_range: │ │ │ │ /TensorRT-LLM/examples/llama/weight.py:145 in split │ │ │ │ 142 │ if len(v.shape) == 1: │ │ 143 │ │ return np.ascontiguousarray(np.split(v, tp_size)[idx]) │ │ 144 │ else: │ │ ❱ 145 │ │ return np.ascontiguousarray(np.split(v, tp_size, axis=dim)[idx]) │ │ 146 │ │ 147 │ │ 148 def dup_kv_weight(v, num_head, tp_size): │ │ │ │ /usr/local/lib/python3.10/dist-packages/numpy/lib/shape_base.py:864 in split │ │ │ │ 861 │ │ sections = indices_or_sections │ │ 862 │ │ N = ary.shape[axis] │ │ 863 │ │ if N % sections: │ │ ❱ 864 │ │ │ raise ValueError( │ │ 865 │ │ │ │ 'array split does not result in an equal division') from None │ │ 866 │ return array_split(ary, indices_or_sections, axis) │ │ 867 │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ ValueError: array split does not result in an equal division
How to fix it?

Can you share the config.json of your model? By defualt, the vocab_size of llama-13b is 32000, it should be divided by 2.

in my model, the vocab_size is 151851

byshiue · 2023-11-14T09:32:33Z

You can refer the above idea to add pad into the lm_head.

yank666 · 2023-11-15T07:42:53Z

solved

byshiue self-assigned this Nov 14, 2023

byshiue added bug Something isn't working triaged Issue has been triaged by maintainers labels Nov 14, 2023

yank666 closed this as completed Nov 15, 2023

kaiyux mentioned this issue Nov 17, 2023

Update TensorRT-LLM #422

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

world_size = 2 will raise error "array split does not result in an equal division" #374

world_size = 2 will raise error "array split does not result in an equal division" #374

yank666 commented Nov 14, 2023

byshiue commented Nov 14, 2023

cocoking99 commented Nov 14, 2023

byshiue commented Nov 14, 2023

cocoking99 commented Nov 14, 2023

byshiue commented Nov 14, 2023

yank666 commented Nov 15, 2023 •

edited

Loading

world_size = 2 will raise error "array split does not result in an equal division" #374

world_size = 2 will raise error "array split does not result in an equal division" #374

Comments

yank666 commented Nov 14, 2023

byshiue commented Nov 14, 2023

cocoking99 commented Nov 14, 2023

byshiue commented Nov 14, 2023

cocoking99 commented Nov 14, 2023

byshiue commented Nov 14, 2023

yank666 commented Nov 15, 2023 • edited Loading

yank666 commented Nov 15, 2023 •

edited

Loading