Skip to content

Commit

Permalink
#90 Implementare la ricerca di item (#150)
Browse files Browse the repository at this point in the history
  • Loading branch information
simo86 authored and GendoIkari committed Jun 5, 2017
1 parent 047d3ac commit 7d5ae57
Show file tree
Hide file tree
Showing 13 changed files with 697 additions and 26 deletions.
3 changes: 2 additions & 1 deletion app.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
from views.address import AddressesHandler, AddressHandler
from views.auth import LoginHandler
from views.orders import OrdersHandler, OrderHandler
from views.items import ItemHandler, ItemsHandler
from views.items import ItemHandler, ItemsHandler, SearchItemHandler
from views.user import UsersHandler, UserHandler
from views.pictures import PictureHandler, ItemPictureHandler
from views.favorites import FavoritesHandler, FavoriteHandler
Expand Down Expand Up @@ -57,6 +57,7 @@ def database_disconnect(response):
api.add_resource(ItemsHandler, "/items/")
api.add_resource(ItemHandler, "/items/<uuid:item_uuid>")
api.add_resource(ItemPictureHandler, '/items/<uuid:item_uuid>/pictures/')
api.add_resource(SearchItemHandler, "/items/db/")
api.add_resource(OrdersHandler, '/orders/')
api.add_resource(OrderHandler, '/orders/<uuid:order_uuid>')
api.add_resource(UsersHandler, '/users/')
Expand Down
3 changes: 3 additions & 0 deletions docs/source/api/quick_reference/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,6 @@ This API documentation section is autogenerated from source code.
tests.test_utils
tests.conftest
exceptions
search
search.core
search.utils
115 changes: 115 additions & 0 deletions docs/source/api/search.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
Search Engine Package
=====================

Full text search engine for the application database.

.. contents:: :local:

Introduction
------------

Main functionality is exposed through :func:`search.core.search` that can be
imported directly from search as in

.. note::
This is a simple search algorithm that implements just a few checks
and while it tries to do a full text search in an efficient way, as of now
it cannot be relied upon with the utmost certainty.

This piece of code is in a developing stage and will most probably removed
from the final implementation of the search in favor of other libraries such
as `Whoosh <https://goo.gl/hGs11I>`_.

Anyway it can be used to digest any type of object-like collection, as long
as ``getattr(object, '<attribute>')`` returns a value, so for quick search
implementation, as placeholder or for testing purposes it does the trick.


Basic usage
-----------

.. code-block:: python
from search import search
collection = Model.select() # returns an iterable of objects
results = search('query', ['attr'], collection)
Algorithm
---------

Basic search functionality tries to do a `sort-of` full text search that relies
on the `Jaro-Winkler <https://goo.gl/b59g4v>`_ algorithm to calculate the
distance between words in a matrix `query * term`, the `movement cost` for each
word in the phrases (words not where they should be have less value) and on a
weigth value when searching through multiple model attributes.

Since I'm no mathematician I can't actually put down a formula for you, sorry.
Feel free to check the code and come up with something :)


Tweaking
--------

Basic algorithm configuration can be found in :any:`search.config`, that allows
some tweaking on how it filters words and weights stuff.


E-commerce API implementation
-----------------------------

The package is implemented in our REST API through the database models.
:any:`BaseModel` has a new method (:any:`BaseModel.search`) that wraps the
search functionality on the callee.

By default all the models are not allowed to run a search (calling a
``search()`` raises an :class:`exceptions.SearchAttributeMismatch`).
To `enable` the search functionality there are two class attributes to define:

* ``_search_attributes`` that specifies what fields to look up into
* ``_search_weights`` that specifies the weight of each field. This is optional

.. code-block:: python
class Item(BaseModel):
# ...
_search_attributes = ['name', 'category', 'description']
_search_weights = [3, 2, 1] # optional
If the weights are not specified they will be ranked as they appear in the
`_search_attributes` attribute, with first more important.

Another quick option is to pass the attributes to lookup at call time, such as

.. code-block:: python
result = Item.search('query', Item.select(), limit=10, attributes=['name'])
This will override any existing class attributes that have been setup (no search
into `category` and `description` fields.


APIs
----

search.core
+++++++++++

.. automodule:: search.core
:members:


search.utils
++++++++++++

.. automodule:: search.utils
:members:


search.config
+++++++++++++

.. automodule:: search.config
:members:
7 changes: 7 additions & 0 deletions exceptions.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,10 @@ def __init__(self, item, requested_quantity):

class WrongQuantity(Exception):
pass


class SearchAttributeMismatch(Exception):
"""Raised when a model tries to call its ``search`` method but no
fields to lookup are set, either as class attributes or at call time.
"""
pass
80 changes: 70 additions & 10 deletions models.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,18 +3,20 @@
"""
import datetime
import os
from exceptions import (InsufficientAvailabilityException,
WrongQuantity, SearchAttributeMismatch)
from uuid import uuid4

from flask_login import UserMixin
from passlib.hash import pbkdf2_sha256
from peewee import DateTimeField, TextField, CharField, BooleanField
from peewee import DecimalField, PostgresqlDatabase
from peewee import UUIDField, ForeignKeyField, IntegerField
from peewee import (BooleanField, CharField, DateTimeField, DecimalField,
ForeignKeyField, IntegerField, PostgresqlDatabase,
TextField, UUIDField)
from playhouse.signals import Model, post_delete, pre_delete

from exceptions import InsufficientAvailabilityException, WrongQuantity
from schemas import (ItemSchema, UserSchema, OrderSchema, OrderItemSchema,
BaseSchema, AddressSchema, PictureSchema, FavoriteSchema)
from schemas import (AddressSchema, BaseSchema, FavoriteSchema, ItemSchema,
OrderItemSchema, OrderSchema, PictureSchema, UserSchema)
import search
from utils import remove_image


Expand Down Expand Up @@ -57,6 +59,16 @@ class BaseModel(Model):
updated_at = DateTimeField(default=datetime.datetime.now)
_schema = BaseSchema

#: Each model that needs to implement the search functionality `should`
#: override this attribute with the fields that needs to be checked while
#: searching.
#: Attribute should be a list of names of class attributes (strings)
_search_attributes = None
#: Attributes weights can be specified with a list of numbers that will
#: map each weight to attributes (:any:`BaseModel._search_attributes`)
#: indexes.
_search_weights = None

def save(self, *args, **kwargs):
"""
Overrides Peewee ``save`` method to automatically update
Expand Down Expand Up @@ -116,6 +128,53 @@ def validate_input(cls, data, partial=False):
"""
return cls._schema.validate_input(data, partial=partial)

@classmethod
def search(cls, query, dataset, limit=-1,
attributes=None, weights=None,
threshold=search.config.THRESHOLD):
"""
Search a list of resources with the callee class.
Arguments:
query (str): Query to lookup for
dataset (iterable): sequence of resource objects to lookup into
limit (int): maximum number of resources to return (default -1, all)
attributes (list): model attribute names. Can be set as default
inside the model definition or specified on the fly while
searching.
weights (list): attributes weights values,indexes should
match the attribute position in the `attributes` argument.
if length does not match it will be ignored.
threshold (float): value between 0 and 1, identify the matching
threshold for a result to be included.
Returns:
list: list of resources that may match the query.
Raises:
SearchAttributeMismatch:
if ``attributes`` are missing, either as model
default in ``<Model>._search_attributes`` or as param
one of the object does not have one of the given attribute(s).
Examples:
.. code-block:: python
results = Item.search('shoes', Item.select(), limit=20)
"""

attributes = attributes or cls._search_attributes
weights = weights or cls._search_weights

if not attributes:
raise SearchAttributeMismatch(
'Attributes to look for not defined for {}. \
Please update the Model or specify during search call.\
'.format(cls.__name__))

return search.search(query, attributes, dataset, limit, threshold, weights)


class Item(BaseModel):
"""
Expand All @@ -137,6 +196,7 @@ class Item(BaseModel):
availability = IntegerField()
category = TextField()
_schema = ItemSchema
_search_attributes = ['name', 'category', 'description']

def __str__(self):
return '{}, {}, {}, {}'.format(
Expand Down Expand Up @@ -268,10 +328,10 @@ def verify_password(self, password):
def add_favorite(user, item):
"""Link the favorite item to user."""
return Favorite.create(
uuid=uuid4(),
item=item,
user=user,
)
uuid=uuid4(),
item=item,
user=user,
)

def delete_favorite(self, obj):
obj.delete_instance()
Expand Down
4 changes: 3 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ aniso8601==1.2.0
click==6.7
colorama==0.3.9
coverage==4.3.4
distance==0.1.3
Faker==0.7.11
flake8==3.3.0
Flask==0.12
Expand Down Expand Up @@ -35,4 +36,5 @@ Sphinx==1.6.1
sphinx-autobuild==0.6.0
sphinx-rtd-theme==0.2.4
six==1.10.0
Werkzeug==0.12.1
Werkzeug==0.12.1
jellyfish==0.5.6
1 change: 1 addition & 0 deletions search/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from search.core import search # noqa: F401
19 changes: 19 additions & 0 deletions search/config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
"""
This module contains constants and configuration options for the search engine,
that allows to quickly customize the threshold, matching parameters' weights
and other options without having to touch the code.
"""
#: string equality weight for weighted average with positional coefficient
MATCH_WEIGHT = 0.2

#: positional coeff weight for weighted avg with equality match
DIST_WEIGHT = 0.8

#: matching threshold for a resource to be considered for the inclusion.
THRESHOLD = 0.75

#: minimum length for a word to be considered in the search
MIN_WORD_LENGTH = 3

#: Regex that will be used to split a string into separate chunks
STR_SPLIT_REGEX = r'\W+'
Loading

0 comments on commit 7d5ae57

Please sign in to comment.