Skip to content

Containerized pipeline used for annotation of Slovenian text corpora, composed for the MARCELL sustainability project.

Notifications You must be signed in to change notification settings

clarinsi/marcell-annotation-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deploying

Docker

Building

Build the Docker image:

$ ./build.sh

Deploying

Run with docker-compose:

$ docker-compose up -d

or manually:

$ docker run --name marcell-sl-pipeline -d \
    -p 127.0.0.1:5000:80 \
    marcell-sl-pipeline:latest

GPU support

To enable Docker GPU support on your host, please refer to the Nvidia docs. Once you have nvidia-container-runtime set up, you can add the folowing runtime definition to /etc/docker/daemon.json:

{
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

and uncomment lines containing nvidia in docker-compose.yml.

Usage

The API is listening for HTTP POST requests under the /annotate path containing the following form data: - "text": raw text data - "meta": standoff metadata in JSON format

An example of standoff metadata:

{
    "doc_id":"sl-test123",
    "language":"sl",
    "date":"2020-06-30",
    "title":"Poskusni dokument",
    "type":"poskus",
    "entype":"test"
}

You can test the API with cURL:

$ curl -X POST -F 'text=Pozdravljen, svet!' -F 'meta={"doc_id":"sl-test123", "language":"sl", "date":"2020-06-30", "title":"Poskusni dokument", "type":"poskus", "entype":"test"}' http://localhost:5000/annotate 

Issues

  • Preloading with Gunicorn doesn't work yet. Thus every worker has to load the whole pipeline separately in memory, instead of just using one instance.

About

Containerized pipeline used for annotation of Slovenian text corpora, composed for the MARCELL sustainability project.

Resources

Stars

Watchers

Forks

Packages

No packages published