Skip to content

Building and training a Transformer on the Lex Fridman Podcast to answer questions.

Notifications You must be signed in to change notification settings

till2/GPT_from_scratch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GPT from scratch

Building & training a transformer on the first 325 episodes of the Lex Fridman Podcast to answer questions.

Architecture

Byte pair encoding (BPE)

The text is encoded with byte pair encoding (BPE) to get a vocabulary of 1,000 tokens.
The number of tokens after encoding is approx. 60% of the original text length.

Here's an example of the encoding process:

tokens = encode("I think this is going to be awesome.")
>>>
tensor([360, 237, 153,  61, 158,  61, 158, 253, 194, 186, 280,  53,  75, 169,
         67, 183,  11], device='cuda:0')
len("I think this is going to be awesome.") # 36
len(tokens) # 17

decode(tokens)
>>>
"I think this is going to be awesome."

Inference

It's not very good yet, but can mimick some english.

prompt = "What do you think about language models?"
answer = prompt_model(model, prompt, max_new_tokens=800, topk=2)
print(answer)
>>>
I think that the sort that.
 But know?
 And there's a lot one the because the but the comple to the of the somether and of comple
 of of the because a look, the so the blange,
 but I don't some the sort of an and that the be there any had the to,
 but I'm unders to don't there there to the some of the sorther.
 And that the some that the bractive,
 but that.
 But the because actory the be the because this to that start of the some the call the of the
 and there's they're going the be exconce,
 the same that the some to through an that and of it
 of they're good, when the ARLOL the good the bedher a conver of of a conver the be of the see
 of they're good on That think to, I don't going of,
 the can the say, they like,
 they they world, you can toper one of the becople
 freed that the sorld?
 Yeah, they

Notes

You can find my notes on the implementation details here: 🤖 Transformer blogpost.
The implementation is based on the "Attention Is All You Need" paper and the "Let's build GPT" tutorial by Andrej Karpathy.

Lex Fridman Podcast Dataset

The transcribed subtitles for the first 325 episodes of the Lex Fridman Podcast are from Andrej Karpathy's Lexicap project, which used OpenAI's whisper model to transcribe them. I cleaned the data with some regular expressions to get one big corpus of text for training the transformer model.

Training

The model was trained for ~5 hours on a GPU.

References

Vaswani et. al: Attention Is All You Need - Link
Andrej Karpathy: Let's build GPT: from scratch, in code, spelled out - Link
Rasa: Rasa Algorithm Whiteboard - Transformers & Attention 1: Self Attention Link
Thumbnail: Link
AI Coffee Break with Letitia: Positional embeddings in transformers EXPLAINED - Demystifying positional encodings. Link

About

Building and training a Transformer on the Lex Fridman Podcast to answer questions.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published