This is a Flask App with ML Naive Bayes model for check message on the Spam statement, Ukrainian words oriented.
Also we train and use here Count Vectorizer(so 2 different models will be used for prediction).
All functions for prepearing text in ml.preprocess_text.py:
- Text Preprocessing (removing emails, mentions, urls)
- Replace symbols (like like ₽, @, 0, €) to letters
- Tokenizer
- StopWords
- Lemmatizer
- Stemmer
For project startup uses docker-compose.yml
. In case with prod you can use docker-compose.prod.yml
.
You can also use python venv
for development. All dependent python packages in requirements.txt
Docker-compose use configurations from .env.dev
or .env.prod
files.
For Flask config uses services/web/project/config.py
.
You must add Bearer Token(API_KEY) to each reqest
- /healthy (check app)
- /static/path:filename (get static file)
- /upload (upload file to media folder)
- /web/spam-predict (simple UI form for checking Spam)
- /api/spam-predict (api request for checking Spam)
- /api/set-models (api request for setting Naive Bayes and Count Vectorizer ml models)
- /api/train-data-upload (api request for upload new train file for ML)
- /api/train-models (train new Naive Bayes and Count Vectorizer models)
- /api/translit-convertor (cyrillic to rs convertor)
- /api/spec-symb (replacing special characters with letters)
- /api/test-compare-models (for testing/compare old and new models)
For example we will check spam staitment uses json request
POST http://host:5000/api/spam-predict
BODY
{
"message": "Я спав а він стріляв",
"rate": 0.8
}
the response will be like
{
"result": {
"deleteRank": 0.8,
"isSpam": false,
"spamRate": 0.2528499848085302,
"tensorToken": "я спат а стріл"
},
"time": 6.2989999999999995
}