-
Notifications
You must be signed in to change notification settings - Fork 505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expanded README, added requirements.txt #40
Open
rjurney
wants to merge
2
commits into
openai:master
Choose a base branch
from
rjurney:master
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,11 +1,89 @@ | ||
**Status:** Archive (code is provided as-is, no updates expected) | ||
## `finetune-transformer-lm`: Code for Improving Language Understanding by Generative pre-Training | ||
|
||
# finetune-transformer-lm | ||
Code and model for the paper "Improving Language Understanding by Generative Pre-Training" | ||
This project contains the code and model for the paper ["Improving Language Understanding by Generative Pre-Training"](https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf). | ||
|
||
**Note: This project is no longer actively developed. This code is provided as-is, and no updates are expected.** | ||
|
||
From the abstract: | ||
|
||
> We demonstrate that large gains on these tasks can be realized by generative pre-training of a language model on a diverse corpus of unlabeled text, followed by discriminative fine-tuning on each specific task... we achieve absolute improvements of 8.9% on commonsense reasoning (Stories Cloze Test), 5.7% on question answering (RACE), and 1.5% on textual entailment (MultiNLI). | ||
|
||
The blog post describing this work is ["Improving Language Understanding with Unsupervised Learning"](https://blog.openai.com/language-unsupervised/). | ||
|
||
Authors: Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever | ||
|
||
### License | ||
|
||
This code is Copyright OpenAI and published under the MIT License. | ||
|
||
### Requirements | ||
|
||
This code is verified to run on Python 2.7 and 3.3.6 in a clean conda environment. It requires the following modules: | ||
|
||
``` | ||
ftfy | ||
joblib | ||
numpy | ||
pandas | ||
sklearn | ||
spacy | ||
tensorflow | ||
tqdm | ||
``` | ||
|
||
### Setup | ||
|
||
To install requirements, run: | ||
|
||
``` | ||
pip install -r requirements.txt | ||
``` | ||
|
||
#### Python 2.7 | ||
|
||
Note: `tqdm` requires `ftfy`, which dropped Python 2 support after version 4.4.3. This is handled in `requirements.txt`. | ||
|
||
#### Spacy Models | ||
|
||
You need to download the `en` model for spacy: | ||
|
||
``` | ||
python -m spacy download en | ||
``` | ||
|
||
#### Data | ||
|
||
You need to download all of the ROCStories 2016 datasets to train the model. Doing so requires filling out a form so the data's creators can track who is using it. They can be found at [website](http://cs.rochester.edu/nlp/rocstories/). The location of the data is a command line argument. | ||
|
||
Once you've downloaded them, the files should look something like this: | ||
|
||
``` | ||
data/ROCStories__spring2016 - ROCStories_spring2016.csv | ||
data/cloze_test_test__spring2016 - cloze_test_ALL_test.csv | ||
data/cloze_test_test__spring2016 - cloze_test_ALL_test.tsv | ||
data/cloze_test_val__spring2016 - cloze_test_ALL_val.csv | ||
data/cloze_test_val__spring2016 - cloze_test_ALL_val.tsv | ||
``` | ||
|
||
#### Model | ||
|
||
The model is precomputed and stored in the [`model`](model) directory. | ||
|
||
### Training | ||
|
||
Currently this code implements the ROCStories Cloze Test result reported in the paper by running: | ||
`python train.py --dataset rocstories --desc rocstories --submit --analysis --data_dir [path to data here]` | ||
|
||
``` | ||
python train.py --dataset rocstories --desc rocstories --submit --analysis --data_dir [path to data here] | ||
``` | ||
|
||
You can put the data files anywhere and change the `--data-dir` value. | ||
|
||
Note: The code is currently non-deterministic due to various GPU ops. The median accuracy of 10 runs with this codebase (using default hyperparameters) is 85.8% - slightly lower than the reported single run of 86.5% from the paper. | ||
|
||
The ROCStories dataset can be downloaded from the associated [website](http://cs.rochester.edu/nlp/rocstories/). | ||
### Testing | ||
|
||
This code was tested by training the model in Python 2.7 and 3 on Ubuntu Linux 17.10/artful with the 4.13.0-46-generic kernel. Each of the two Python processes consumed 24GB of RAM (12GB remained free) on a 12 core 64GB/RAM machine with a GTX 1080, and used all CPU 12 cores in addition to the GPUs. It ran overnight. | ||
|
||
Your results may vary. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
ftfy==4.4.3; python_version < '3.0' | ||
ftfy>= 5.0.0; python_version >= '3.0' | ||
joblib | ||
numpy | ||
pandas | ||
sklearn | ||
spacy | ||
tensorflow | ||
tqdm | ||
joblib | ||
numpy |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code in this repo is not compatible with the latest TF, so some restriction here is needed.