Make GPT model training simple and fun! A visual training platform based on karpathy/nanoGPT.
Mini-NanoGPT is a tool that helps you easily get started with GPT models. Whether you are:
- 🎓 A deep learning beginner
- 👨🔬 A researcher
- 🛠️ A developer
Or just someone interested in experiencing the charm of large models,
You can train models through a simple graphical interface!
- 📱 Visual Interface: Say goodbye to the command line and complete training with just a few clicks.
- 🌍 Bilingual (Chinese and English): Full support for both Chinese and English interfaces.
- 🎯 One-Click Operations: Data processing, training, and text generation can all be done with a single click.
- 🔤 Flexible Tokenization: Supports character-level and GPT-2 tokenizers (better support for English).
- 🚄 Efficient Training: Supports multi-processing acceleration and distributed training.
- 📊 Real-Time Feedback: Displays training progress and results in real time.
- ⚙️ Parameter Visualization: All training parameters can be directly adjusted in the interface.
# Clone the repository
git clone https://github.com/yourname/mini-nanogpt.git
cd mini-nanogpt
# Install dependencies (Python 3.7+)
pip install -r requirements.txt
python main.py
Open your browser and visit the displayed link to see the training interface! (Usually http://localhost:7860)
-
Open the "Data Processing" page, select or paste your training text, and choose the tokenization method. For English text, using the GPT-2 tokenizer can yield better results. For Chinese, you can keep the default.
-
If you do not want to use a validation set for now, you can check "Do not use a validation set."
-
After completion, click "Start Processing."
Here's an example using a small piece of text:
- Switch to the "Training" page and adjust the parameters as needed (if you just want to experience it, you can keep the default values).
- The program supports real-time display of loss curves for the training set and validation set. If you generated a validation set in Step 1, there should theoretically be two curves below: the blue one for the training set loss and the orange one for the validation set loss.
- If only one curve is displayed, please check the terminal output. If you see output similar to:
It means that the block size you set is larger than your validation set. Please reduce its size, for example, to 128.
Error while evaluating val loss: Dataset too small: minimum dataset(val) size is 147, but block size is 512. Either reduce block size or add more data.
- This way, you should be able to see two dynamically changing curves normally.
- Click "Start Training" and wait for the model training to complete.
- This mode allows you to evaluate the model's loss on the validation set. Set the
Number of Evaluation Seeds
to any value greater than 0 to enable evaluation-only mode. You can see the model's loss with different seeds.
- Go to the "Inference" page
- Enter an opening text
- Click "Generate" to see what the model writes!
mini-nanogpt/
├── main.py # Launch program
├── config/ # Configuration files
├── data/ # Data processing and storage
├── models/ # Model definitions
├── trainer/ # Training-related
├── inference/ # Inference generation
└── app/ # Interface-related
- 💡 Reduce the
batch_size
or model size. - 💡 Using a GPU will significantly speed up the process.
- 💡 Increase the evaluation interval.
- 💡 Try increasing the amount of training data.
- 💡 Adjust the model parameters appropriately.
- 💡 Change the temperature parameter during generation.
- 💡 On the "Training" page, select "resume" in the "Initialization Method."
- 💡 Specify the previous output directory.
Suggestions and improvements are welcome! You can contribute in the following ways:
- Submit an Issue
- Submit a Pull Request
- Share your usage experience
This project is open-sourced under the MIT License.
🎉 Start Your GPT Journey Now!