This release doubles down on transformers and introduces a training loop program hala
. Pretraining bidirectional models with token denoising objective (aka masked LM) is available hala --objective denoise
. The first training run on uk4b dataset is happening here: https://wandb.ai/stud76/ha/runs/tjoqx491?workspace=user-stud76
Existing causal models can now be finetuned with conditional language modeling objective hala --objective cond
.
hat
is now a repl for both causal and bidirectional models. The hat
repl now supports history thanks to readline.
![image](https://private-user-images.githubusercontent.com/66214/246396210-546217e0-9f24-4f6d-8df7-bd70beb205cb.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk0Mzc1NTgsIm5iZiI6MTczOTQzNzI1OCwicGF0aCI6Ii82NjIxNC8yNDYzOTYyMTAtNTQ2MjE3ZTAtOWYyNC00ZjZkLThkZjctYmQ3MGJlYjIwNWNiLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTMlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjEzVDA5MDA1OFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWU2YmIyNGExZDNlMjViMTFiMTEyMmJmOTNkMDEwODIyNzNhYjUwMzlhNmQ0Zjc0YmRkNzk2NmYyNDVhMWZkMzMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.dZC12ebU9vqtH8Wp12grYFvM0HkDO-nppbe1REJ1n8w)
RNN training program hal
now supports training from u16
binary datasets like hala
. This allowed me to train a world model on VQ-VAE-tokenized images.
New randomly initialized checkpoints can be created with new the hai
program.