Skip to content

Latest commit

 

History

History
 
 

polar_component

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

🪐 spaCy Project: Polar Component

This example project shows how to implement a simple stateful component to score docs on semantic poles.

The method here is based on SemAxis from An et al 2018. The basic idea is that given a set of word vectors and some seed poles, like "bad-good", it's possible to calculate reference vectors. The distance of document vectors from those reference vectors is like a sentiment or polar score of the document. While not as sophisticated as a trained model, it's easy to test with existing data.

If you use enough poles, you can use the scores as semantic vectors that can make downstream tasks explainable. This is explored in the SemAxis paper as well as Mathew et al 2020, "The Polar Framework". (Incorporating semantic vectors as features in a spaCy model is left as an exercise for the reader.)

Note: Because the data is hosted on Kaggle, it can't be automatically downloaded by spacy project assets, so you'll have to download it yourself. See the assets section of this README for the link.

📋 project.yml

The project.yml defines the data assets required by the project, as well as the available commands and workflows. For details, see the spaCy projects documentation.

⏯ Commands

The following commands are defined by the project. They can be executed using spacy project run [name]. Commands are only re-run if their inputs have changed.

Command Description
evaluate Check output on sample data

⏭ Workflows

The following workflows are defined by the project. They can be executed using spacy project run [name] and will run the specified commands in order. Commands are only re-run if their inputs have changed.

Workflow Steps
all evaluate

🗂 Assets

The following assets are defined by the project. They can be fetched by running spacy project assets in the project directory.

File Source Description
assets/IMDB Dataset.csv Local IMDB Review Corpus. Download from Kaggle.