Skip to content

Pretraining an LLM like ChatGPT from scratch with SOTA techniques πŸ˜€

Notifications You must be signed in to change notification settings

namanbnsl/moose-mini

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

9 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

moose-mini

Lots of code borrowed from Andrej Karpathy, Umar Jamil and Evin Tunador.

import torch
from huggingface_hub import hf_hub_download
moose = hf_hub_download(repo_id="namanbnsl/moose-mini", filename="model.py")
weights = hf_hub_download(repo_id="namanbnsl/moose-mini", filename="model.pth")
exec(open(moose).read())

params = ModelArgs()
model = Moose(params)
model.load_state_dict(torch.load(weights))
model.to(params.device)
print(model.generate("Once upon a time, there was a little car named Beep."))
  • Only trained on 100m tokens
  • Uses llama architecture

Github

huggingface πŸ€—

About

Pretraining an LLM like ChatGPT from scratch with SOTA techniques πŸ˜€

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages