Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mixtral-8x7b build fails with custom_all_reduce #825

Closed
rohithkrn opened this issue Jan 5, 2024 · 2 comments
Closed

Mixtral-8x7b build fails with custom_all_reduce #825

rohithkrn opened this issue Jan 5, 2024 · 2 comments
Assignees

Comments

@rohithkrn
Copy link

Env:
TRT-LLM 0.7.1
Host: p4d.24xlarge ec2 instance(A100)

Model: Mixtral-8x7b
Build args: Tp=8, use_custom_all_reduce

python3 /usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/build_scripts/llama/build.py                         --world_size 8                         --tp_size 8                         --dtype float16                         --max_input_len 1024                         --max_output_len 512                         --max_batch_size 32                         --max_beam_width 1                         --use_gpt_attention_plugin float16                         --use_gemm_plugin float16                         --enable_context_fmha                         --use_inflight_batching                         --remove_input_padding                         --paged_kv_cache                         --tokens_per_block 128                         --rotary_base 10000.0                         --output_dir /tmp/trtllm/5dfe24d9115c5ddd8811d9e898e0598c37a274ad/f840fbd912400603858dbefdc15e2597d94438de/1                         --parallel_build --use_custom_all_reduce --model_dir /tmp/download/f840fbd912400603858dbefdc15e2597d94438de

Error log:

Fails with AttributeError: 'NoneType' object has no attribute 'trt_tensor'

[INFO ] 2024-01-05 17:43:41 LmiUtils - convert_py: [01/05/2024-17:43:41] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/0/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/0/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
[INFO ] 2024-01-05 17:43:41 LmiUtils - convert_py: [01/05/2024-17:43:41] [TRT-LLM] [W] Custom allreduce has already used id 0
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py: Traceback (most recent call last):
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/build_scripts/llama/build.py", line 902, in <module>
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:     mp.spawn(build, nprocs=args.world_size, args=(args, ))
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:   File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 246, in spawn
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:     return start_processes(fn, args, nprocs, join, daemon, start_method="spawn")
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:   File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 202, in start_processes
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:     while not context.join():
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:   File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 163, in join
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:     raise ProcessRaisedException(msg, error_index, failed_process.pid)
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py: torch.multiprocessing.spawn.ProcessRaisedException: 
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py: 
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py: -- Process 2 terminated with the following error:
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py: Traceback (most recent call last):
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:   File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 74, in _wrap
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:     fn(i, *args)
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/build_scripts/llama/build.py", line 850, in build
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:     engine = build_rank_engine(builder, builder_config, engine_name,
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/build_scripts/llama/build.py", line 777, in build_rank_engine
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:     tensorrt_llm_llama(*inputs)
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/module.py", line 40, in __call__
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:     return self.forward(*args, **kwargs)
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/model.py", line 398, in forward
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:     hidden_states = super().forward(input_ids, position_ids, use_cache,
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/model.py", line 277, in forward
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:     hidden_states = layer(
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/module.py", line 40, in __call__
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:     return self.forward(*args, **kwargs)
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/model.py", line 150, in forward
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:     hidden_states = self.mlp(hidden_states,
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/module.py", line 40, in __call__
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:     return self.forward(*args, **kwargs)
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/layers/moe.py", line 302, in forward
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:     routing = self.router(routing_input)
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/module.py", line 40, in __call__
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:     return self.forward(*args, **kwargs)
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/layers/linear.py", line 220, in forward
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:     return self.multiply_reduce(x,
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/layers/linear.py", line 206, in multiply_reduce
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:     x = allreduce(x, self.tp_group, workspace, self.instance_id)
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/functional.py", line 2876, in allreduce
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:     plug_inputs.append(workspace.trt_tensor)
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py: AttributeError: 'NoneType' object has no attribute 'trt_tensor'
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py: 
[INFO ] 2024-01-05 17:44:10 LmiUtils - convert_py: /usr/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 6 leaked semaphore objects to clean up at shutdown
@rohithkrn
Copy link
Author

Works well without --use_custom_all_reduce

@symphonylyh
Copy link
Collaborator

@rohithkrn thanks for reporting this!

Mixtral is MOE model, therefore its MLP layers is calling layers/moe.py, where the function signature is forward(self, hidden_states, finished=None, workspace=None,...) -- finished is the additional field specific to MOE but not in MLP/GatedMLP/FusedGatedMLP, so we should use a kwarg workspace=all_reduce_workspace at this line.

I will fix internally and update on next main branch release. Please apply this local change in the meantime, thanks!

Closing for now. If you still have issue after this fix, please feel free to re-open!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants