This release doubles down on transformers and introduces a training loop program hala
. Pretraining bidirectional models with token denoising objective (aka masked LM) is available hala --objective denoise
. The first training run on uk4b dataset is happening here: https://wandb.ai/stud76/ha/runs/tjoqx491?workspace=user-stud76
Existing causal models can now be finetuned with conditional language modeling objective hala --objective cond
.
hat
is now a repl for both causal and bidirectional models. The hat
repl now supports history thanks to readline.
![image](https://private-user-images.githubusercontent.com/66214/246396210-546217e0-9f24-4f6d-8df7-bd70beb205cb.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkwNzIxNzcsIm5iZiI6MTczOTA3MTg3NywicGF0aCI6Ii82NjIxNC8yNDYzOTYyMTAtNTQ2MjE3ZTAtOWYyNC00ZjZkLThkZjctYmQ3MGJlYjIwNWNiLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDklMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA5VDAzMzExN1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWQ1NDc4N2IxYzk5MWZmMjVjMjIzOWZhNDFmNGQ1ODM2NzdmNGRkMWM1ODk0ZGZhY2Q5OWNhMTI0NzBmNzJiY2UmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.-ZYfjTikaNTB33xU5Cd0Nt3RumZpnei2UnysFKe3VXA)
RNN training program hal
now supports training from u16
binary datasets like hala
. This allowed me to train a world model on VQ-VAE-tokenized images.
New randomly initialized checkpoints can be created with new the hai
program.