Skip to content

Pull requests: aws-samples/awsome-distributed-training

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Sort

Pull requests list

Update Megatron-LM base image
#402 opened Aug 8, 2024 by KeitaW Loading…
add maxtext test case enhancement New feature or request
#397 opened Aug 5, 2024 by KeitaW Draft
Update bionemo test case + propose to subdirectories per orchastrator documentation Improvements or additions to documentation
#396 opened Aug 5, 2024 by KeitaW Draft
Smhp add features in LCS utills
#392 opened Aug 1, 2024 by gmgtamz Loading…
Esm2 on Sagemaker Hyperpod
#387 opened Jul 25, 2024 by awsankur Loading…
FSDP: Add mistral model type support New model
#384 opened Jul 23, 2024 by arm-diaz Loading…
update dependencies of PyTorch base image
#375 opened Jul 15, 2024 by KeitaW Loading…
Neuron distributed
#359 opened Jun 13, 2024 by KeitaW Loading…
End-to-End LLM Model Development with Torchtitan and Torchtune enhancement New feature or request
#341 opened May 20, 2024 by KeitaW Loading…
Llama training with FP8
#331 opened May 15, 2024 by pbelevich Draft
Add draft gpu troubles
#290 opened Apr 30, 2024 by mhuguesaws Draft
[WIP] torchtune usecase
#260 opened Apr 12, 2024 by pbelevich Draft
Bump pytorch dockerfile template
#211 opened Mar 12, 2024 by verdimrc Loading…
SMHP: slurm exporter to report gpu metrics
#181 opened Mar 6, 2024 by verdimrc Loading…
Update organization and tag to V1
#150 opened Feb 22, 2024 by perifaws Loading…
megatron-lm test case: update README
#114 opened Jan 25, 2024 by verdimrc Draft
ProTip! Follow long discussions with comments:>50.