Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

moe router tp removed #1091

Closed
wants to merge 1 commit into from
Closed

moe router tp removed #1091

wants to merge 1 commit into from

Conversation

megha95
Copy link
Contributor

@megha95 megha95 commented Feb 16, 2024

As the title suggests, this PR removes TP (tensor parallelism) for MoE router. Duplicating router across GPUs removes an allreduce for each MoE layer. This small change leads to 4-18% speedup in decoding for Mixtral-8x7B-v0.1 (4% at bs=1, 10-18% at batch size 2-16). Measured on 2xA100-80GB.

@juney-nvidia
Copy link
Collaborator

@megha95 thanks for the contribution!
We will integrate this MR into our internal repo, then pushed to the github, for sure with quoting this MR to acknowledge your contribution.

June

@kaiyux kaiyux mentioned this pull request Mar 12, 2024
@juney-nvidia
Copy link
Collaborator

@megha95 close this since it has already been merged in this push

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants