Warning
This project is still a work in progress.
GPT-2 [1] implementation with manually computed gradients, inspired by karpathy/llm.c and karpathy/nanoGPT. Also features a BPE tokenizer [2]. The plan is to eventually rewrite this in C++ with hand-optimized CPU kernels for small, hardware-restricted LM purposes.
Requires PyTorch >= 2.6.0. Run python src/gpt.py
to train using the Shakespeare dataset and display a sample of inference.
- [1] Phuong, Mary, and Marcus Hutter. ‘Formal Algorithms for Transformers’. arXiv [Cs.LG], 2022, http://arxiv.org/abs/2207.09238.
- [2] Sennrich, Rico, et al. ‘Neural Machine Translation of Rare Words with Subword Units’. arXiv [Cs.CL], 2016, http://arxiv.org/abs/1508.07909.