Skip to content

kofzera/goodreads

 
 

Repository files navigation

The datasets were collected in late 2017 from goodreads. Details of the datasets are described in the dataset website

We collected these datasets for academic use only! Please do not redistribute them or use for commercial purposes.

Citations

If you are using our dataset, please cite the following papers:

Notebooks/Code Samples

We've created several notebooks (in python 3.7) to illustrate how to download/read these datasets, and provide some basic explorations of the data.

  • download.ipynb: If you prefer to download datasets without GUI. This notebook will show how to download files in bash/python. Note: It requires installing the gdown package as the datasets are hosted on google drive.
  • samples.ipynb: This notebook will show how to read '.json.gz' files line-by-line and display sample records of each file.
  • statistics.ipynb: This notebook will calculate some basic statistics of the datasets (except the largest complete interaction file 'goodreads_interactions.csv'). Running this notebook may take a while.
  • distributions.ipynb: This notebook will operate on the complete interaction file 'goodreads_interactions.csv' and provide some explorations of the distributions of these interactions. Note: Run this notebook only when you have LARGE memory (recommend 32g+)!!
  • reviews.ipynb: This notebook will calculate some statistics of the review datasets.

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%