01 Training

NVIDIA/Megatron-LM 提供了一套不错的训练框架，但是如果需要在DeepSpeed中使用，请使用Microsoft/Megatron-DeepSpeed fork出来的代码。微软调整了一部分Sample代码 examples_deepspeed,支持Azure和BERT训练。
微软的examples_deepspeed里提供了Megatron-LM没有的llama训练sample，这部分内容对训练自己的LLAMA会有很重要的参考意义

LLAMA的训练Shell,微软使用pretrain_gpt.py开始,变动的内容比较多

# LLAMA
pretrain_llama_distributed.sh

# LLAMA2
pretrain_llama2_distributed.sh

同时微软也提供了一套bootstrap用于启动deepspeed和Megatron，可供参考

HabanaAI提供了一套比较完备的bootstrap shell，可以参考 https://github.com/HabanaAI/Model-References/blob/master/PyTorch/nlp/DeepSpeedExamples/Megatron-DeepSpeed/scripts/run_llama13b.sh

参考