Skip to content

Latest commit

 

History

History
219 lines (166 loc) · 5.45 KB

README.md

File metadata and controls

219 lines (166 loc) · 5.45 KB

spaCy REST services

This repository provides REST microservices for Explosion AI's interactive demos and visualisers. All requests and responses are JSON-encoded as text/string, so all requests require the header Content-Type: text/string.

A simple Falcon app for exposing a spaCy dependency parser and spaCy named entity recognition model as a REST microservice, formatted for the displaCy.js and displaCy ENT visualiser. For more info on the rendering on the front-end that consumes the data produced by this service, see this blog post.

The service exposes two endpoints that accept POST requests, and two endpoints that accept GET requests to describe the available models and schemas.


POST /dep/

Example request:

{
    "text": "They ate the pizza with anchovies",
    "model":"en",
    "collapse_punctuation": 0,
    "collapse_phrases": 1
}
Name Type Description
text string text to be parsed
model string identifier string for a model installed on the server
collapse_punctuation boolean Merge punctuation onto the preceding token?
collapse_phrases boolean Merge noun chunks and named entities into single tokens?

Example response:

{
    "arcs": [
        { "dir": "left", "start": 0, "end": 1, "label": "nsubj" },
        { "dir": "right", "start": 1, "end": 2, "label": "dobj" },
        { "dir": "right", "start": 1, "end": 3, "label": "prep" },
        { "dir": "right", "start": 3, "end": 4, "label": "pobj" },
        { "dir": "left", "start": 2, "end": 3, "label": "prep" }
    ],
    "words": [
        { "tag": "PRP", "text": "They" },
        { "tag": "VBD", "text": "ate" },
        { "tag": "NN", "text": "the pizza" },
        { "tag": "IN", "text": "with" },
        { "tag": "NNS", "text": "anchovies" }
    ]
}
Name Type Description
arcs array data to generate the arrows
dir string direction of arrow ("left" or "right")
start integer offset of word the arrow starts on
end integer offset of word the arrow ends on
label string dependency label
words array data to generate the words
tag string part-of-speech tag
text string token

POST /ent/

Example request:

{
    "text": "When Sebastian Thrun started working on self-driving cars at Google in 2007, few people outside of the company took him seriously.",
    "model": "en"
}
Name Type Description
text string text to be parsed
model string identifier string for a model installed on the server

Example response:

[
    { "end": 20, "start": 5,  "type": "PERSON" },
    { "end": 67, "start": 61, "type": "ORG" },
    { "end": 75, "start": 71, "type": "DATE" }
]
Name Type Description
end integer character offset the entity ends after
start integer character offset the entity starts on
type string entity type

POST /train/ent

Example request:

{
    "text": "Google es una empresa.",
    "model": "es",
    "tags": [
      {
        "start": 0,
        "len": 6,
        "type": "ORG"
      }
    ]
}
Name Type Description
text string text to be parsed
tags array entities to be used for training named entity recognition
model string identifier string for a model installed on the server

Example response:

[
    { "end": 6, "start": 0,  "type": "ORG" }
]
Name Type Description
end integer character offset the entity ends after
start integer character offset the entity starts on
type string entity type

GET /models

List the names of models installed on the server.

Example request:

GET /models

Example response:

["en", "de"]

GET /{model}/schema/

Example request:

GET /en/schema
Name Type Description
model string identifier string for a model installed on the server

Example response:

{
  "dep_types": ["ROOT", "nsubj"],
  "ent_types": ["PERSON", "LOC", "ORG"],
  "pos_types": ["NN", "VBZ", "SP"]
}

A simple Falcon app for exposing a sense2vec model as a REST microservice, as used in the sense2vec demo

The service exposes a single endpoint over GET.


GET /{word|POS}

Example query:

GET /natural_language_processing%7CNOUN

Example response:

[
    {
        "score": 0.1,
        "key": "computational_linguistics|NOUN",
        "text": "computational linguistics",
        "count": 20,
        "head": "linguistics"
    }
]
Name Type Description
score float similarity to query
key string identifier string
text string human-readable token
count integer absolute frequency in training corpus
head string head word in text