Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FSDP: Mistral(mathstral) sbatch file - MISTRAL MODEL SUPPORT #385

Merged
merged 6 commits into from
Aug 26, 2024

Conversation

nithiyn
Copy link
Collaborator

@nithiyn nithiyn commented Jul 23, 2024

Adding support for Mistral models when using FSDP on Hyperpod. Previously only mixtral was supported. This sbatch file is part of a hyperpod blog on training mathstral. This PR is part of other PRs that Armando and I will incorporate as part of the mathstral sbatch job already tested on Hyperpod.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

…g-mixtral.sbatch

renaming to mixtral because the args here/ referenced in model _type flag are specific to mixtral models
contains updated sbatch for mistral models. Before these changes, only mixtral was supported. This PR is part of other PRs that Armando and I will incorporate. This has been tested on hyperpod already.
--val_batch_size=1 \
--max_steps=5000 \
--seed=42 \
# torch.get_default_dtype() returns fp32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey Pierre, would you like us to just remove just the comment with fp32 or the other lines you've highlighted as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment only :) The thing is that either you keep it and document why it's there and commented or remove. typically these are left-overs of coding or left for a good reason but in which case it needs a bit more info

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood! Thanks for the quick response

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have removed the comment

@perifaws
Copy link
Contributor

Hey @nithiyn , you'll want to amend the readme file. For example, you could add a subsection in the 3.Launch Training to show how to run this new case (example of correct output for example).

@nithiyn
Copy link
Collaborator Author

nithiyn commented Aug 15, 2024

Hey @nithiyn , you'll want to amend the readme file. For example, you could add a subsection in the 3.Launch Training to show how to run this new case (example of correct output for example).

sorry for the late response, somehow missed this but yes, can do

@nithiyn
Copy link
Collaborator Author

nithiyn commented Aug 25, 2024

Hey @nithiyn , you'll want to amend the readme file. For example, you could add a subsection in the 3.Launch Training to show how to run this new case (example of correct output for example).

Hey @perifaws, have updated the readme to include mathstral and the output when launching training, please let me know if there are any other changes required. Thanks!

@perifaws perifaws merged commit fe01050 into main Aug 26, 2024
@perifaws perifaws deleted the mistral(mathstral)-sbatch-file branch August 26, 2024 12:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants