Skip to content

Running your own copy

Rob Speer edited this page Oct 12, 2017 · 29 revisions

Running ConceptNet on Amazon Web Services

Probably the most reliable way to run a copy of ConceptNet is to use the AMI that runs it on Amazon Web Services.

This is not essential. If you have a computer that can run Docker Compose, skip to the "Running ConceptNet using Docker" section. Or, to run ConceptNet outside of a container on a machine that meets the requirements, you can use the standalone build process.

  • Go to https://aws.amazon.com/ec2/, sign up if necessary, and log in if necessary.
  • Click "Launch Instance".
  • Choose "Community AMIs", search for "conceptnet-5.5.5", and select that image (ami-99cec98f).
  • Choose a machine type to launch. You can run the API on a t2.medium or better (currently less than 5 cents per hour). If you want to be able to modify and rebuild the data, however, you'll need an r4.xlarge or better.
  • Proceed to "Configure Instance Details". Set "Auto-assign Public IP" to "Enable".
  • Proceed to "Add Storage". Ensure that the disk size is at least 120 GiB for running the API, or 240 GiB if you want to rebuild.
  • Proceed to "Configure Security Group". Add a rule allowing HTTP. The default IP range of 0.0.0.0/0, ::/0 (all addresses) is probably what you want.
  • "Review and Launch". Download the security key that you'll need to log into the system.
  • On your EC2 instances list, take note of the public IP of your new machine. Let's call it YOUR.IP.ADDR.
  • After a few minutes, go to http://YOUR.IP.ADDR/ and you should get a response from the ConceptNet API. However, you won't be able to query anything, because the database hasn't loaded yet. That will take a few hours.
  • You can SSH to the system, using the security key you downloaded, by following Amazon's instructions.

ConceptNet is running inside Docker on the system. docker logs conceptnet5_db_1 and docker logs conceptnet5_api_1 will show you the logs.

A systemd service is responsible for running the ConceptNet containers. You can stop it with sudo systemctl stop conceptnet, and restart it with sudo systemctl start conceptnet. If you merely stop the containers using Docker, the service will quickly restart them.

The service configuration is in ~/conceptnet5/conceptnet.service, and the Docker Compose configuration is in ~/conceptnet5/docker-compose.yml. For more details about the Docker setup, read on.

Running ConceptNet using Docker

Docker is a platform for making software reproducible, by reproducing the entire Linux environment it runs in.

ConceptNet 5.5 uses Docker as the primary way of making its build process reproducible. If you want, you can ignore Docker and set up all the dependencies of ConceptNet separately, but Docker will make sure you have a container that satisfies all of ConceptNet's dependencies.

If you find that Docker is not making the situation any easier, you can also get all the dependencies and run the process without any containerization. See Build process.

You will need:

  • 120 GB of disk space on the drive that Docker runs on
  • 4 GB of RAM
  • The time and bandwidth to download about 12 GB of raw data
  • An OS that supports virtualization (Linux kernel 3.10 or later, or you might be able to make it work on macOS or Windows)
  • Docker 1.12 or later
  • Docker Compose 1.8 or later
  • Git

These recent versions of Docker utilities are unlikely to be packaged with your OS. You need to go download them specifically. Docker Compose 1.5 is the latest version packaged for Ubuntu at the moment, and it definitely won't work.

Now get the ConceptNet 5.5 repository if you don't already have it:

git clone [email protected]:commonsense/conceptnet5 -b version5.5
cd conceptnet5

Operating system support

Docker is designed to run Linux containers on Linux systems, but now there are ways to run it inside a virtual machine on Windows 10 or macOS. I've only minimally experimented with these.

In many cases, they'll require you to run a virtual machine that Docker runs inside. The most important thing to make sure of is that your virtual machine meets the requirements, particularly the requirement of 120 GB of disk space. (Docker Machine only allocates 10 GB of disk space by default.)

If you're on Linux, you should either add yourself to the docker group or put sudo before all of the following commands.

Running ConceptNet for the first time

At the command line, in the root of the ConceptNet repository (which contains a file called docker-compose.yml), just run this command:

docker-compose up --build

This will start downloading the data and loading it into PostgreSQL, and also start serving the Web interface on localhost. (It uses the standard Web port, port 80. If your machine already runs a Web server, edit docker-compose.yml and change the ports entry from "80:80" to something else like "8000:80".)

The only problem is that you won't be able to browse ConceptNet, because the database will still be loading. It turns out to unavoidably take a while to load data into a PostgreSQL database. In my experience, you need to wait about 3 hours.

The loading process will output several warnings that aren't important:

  • "WARNING: No password has been set for the database."

    • This is okay because the database is not accessible from the network, only from inside of the container.
  • "WARNING: you are running uWSGI as root !!! (use the --uid flag)"

    • Docker has complete power over the containers it creates. It's always root.
  • "LOG: checkpoints are occurring too frequently (16 seconds apart)"

    • PostgreSQL complains about this when it's loading lots of data. It'll be fine once the data is done loading.

You will also notice that the container named "conceptnet5_conceptnet_1" gets built and immediately stops. This is normal. This container has no persistent process of its own, it's just there to make sure that a container of the ConceptNet code and its dependencies gets built.

Managing your Docker volumes

ConceptNet creates three named Docker volumes where it stores data, so that the second and subsequent times you start it up, it doesn't have to go through the long downloading-and-loading process.

If you're changing the ConceptNet code, or if you start a build that fails (perhaps because you ran out of disk space), you may want to remove these volumes to start fresh. To remove a volume, type docker volume rm followed by the volume name.

Here's what the volumes contain

  • conceptnet5_psql: The PostgreSQL database, in its loaded, non-portable form. Remove this to build a fresh database.
  • conceptnet5_data: Where input data is downloaded to, and where intermediate data goes if you run the full build. If you want to restart the build process from scratch, remove this and conceptnet5_psql.
  • conceptnet5_cache: the Web server's cache. The Web server takes advantage of the fact that each version of ConceptNet is immutable: once it renders a particular page, it saves it in the cache and never has to render it again. If you change the data or the page layout, remove conceptnet5_cache to clear the cache.

Building ConceptNet from raw data

This step is for people who want to make changes to ConceptNet's code or input data and build their own version. It's not essential. You can skip it if you want!

Remove all of ConceptNet's data volumes (see above), then run:

docker-compose run conceptnet build.sh

Several hours later, you will have your own edition of ConceptNet, built from your local code.

This step has higher system requirements than the others. You'll need about 240 GB of disk space and 60 GB of RAM.

Reproducing the word embedding evaluation

To reproduce an evaluation showing that ConceptNet Numberbatch 16.09 is a good system of word embeddings that outperforms others, within the Docker framework, run:

docker-compose run conceptnet scripts/reproduce-evaluation.sh

Browsing your local ConceptNet

Once the container is running, you should be able to see the ConceptNet API at http://localhost/. For example, you can get the API response for a concept at http://localhost/c/en/example.

This is a default that we switched in version 5.5.5. It used to serve the Web frontend by default.

The Web frontend is also being served, but requires a different hostname to distinguish it. One hostname it accepts is http://www.localhost/. If you configure your hosts file to point http://www.localhost to 127.0.0.1 (on Linux, you can add this name to /etc/hosts), then you can see the rendered Web interface with URLs like http://www.localhost/c/en/example.

Clone this wiki locally