Skip to content

Running your own copy

rspeer edited this page Sep 17, 2014 · 29 revisions

As of ConceptNet 5.3, you can look up data in ConceptNet on your own computer, in a SQLite database that can be accessed in pure Python. This database is set up using the commonly-used make build system. On Ubuntu, you can get make with:

sudo apt-get install build-essential

There are a few routes to getting this database.

First, get the code

You'll want to download and extract the ConceptNet code first. I don't recommend using Python's pip package manager for this -- it'll install the code just fine, but you won't have the data directory or the Makefile anywhere.

You can download the code from PyPI and extract it, or if you have git, you can run:

git clone https://github.com/commonsense/conceptnet5

Python dependencies

Once you've extracted the code, activate your favorite user-mode Python environment.

I highly recommend using virtualenv to do this. I also highly recommend using an environment that runs Python 3, the language that ConceptNet is developed in.

In this Python environment, install the ConceptNet code and its dependencies with:

python setup.py develop

(If you get a message saying you don't have permission and need to use sudo, it means you're trying to install into your system's global Python environment. Go set up virtualenv.)

Now you have a few options for how to get the data.

Getting the database

The high-bandwidth, low-computation way

The easiest way to get the ConceptNet database is just to download it from conceptnet5.media.mit.edu, by running this command:

make download_db

The main problem with this approach is that the database is not very space-efficient. You'll be downloading over 5 GB of data. If you can handle this, our server probably can too, so go for it.

The low-bandwidth, high-computation way

Given that the database is so excessive with its use of space, it would make more sense to download just the data, and run the code that puts it into the database on your own computer.

Using SQLite is a tradeoff; we can't write to it as quickly as we did to Solr, but on the other hand, once the data is in it, it's easier to run and uses less RAM than Solr. To make the write speed bearable at all, we need to use a Python library for SQLite that's more specialized than the one built into Python, called APSW. Compile and set up this optional dependency by typing:

make apsw

(You will need a C compiler.)

Once that's set up, you can run:

make download_assertions build_db

Then go do something else for the rest of the day. This process takes about 8 hours for me.

The medium-bandwidth, high-computation way

"Why would you even choose this method?", you may ask. This is the method that builds ConceptNet 5 from scratch. If you just want the DB, it's clearly inferior to other options, but if you want to tinker with the code and include new data, this is the path you'll want to take.

This approach absolutely requires Python 3. It won't work on Python 2.7.

You'll need to start by downloading about 1 GB of raw data that ConceptNet is built from. You'll also need APSW, as in the previous option. Get them both with:

make apsw download

Now it's time to do the computation. How many cores does your CPU have? You'll want to use all of them. Let's say that number is 4. You can build ConceptNet in 4 parallel processes with:

make -j4 all

Querying ConceptNet

Now that you have this all set up, the conceptnet5.query module will let you find ConceptNet assertions. Read its docstrings for more details, but here's an example that should work if everything's set up:

>>> from conceptnet5.query import lookup
>>> for assertion in lookup('/c/en/example'):
...     print(assertion)

Running the API server

To run a local server for the API:

  • As a quick and dirty way to start the API, you could just run python -m conceptnet5.api.
  • To be more robust, set up a WSGI server such as uwsgi, gunicorn, or Apache's mod_wsgi. The entry point you run is conceptnet5.wsgi_api.

You can also run the rendered Web interface by running conceptnet5.wsgi_web instead of conceptnet5.wsgi_api.

Clone this wiki locally