Skip to content

maneelusf/extpersonalization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Table of Contents

About

The purpose of this repository is to enhance explainability of sequential recommender models with SHAP values. We currently use 2 packages for this
- [Rechorus](https://github.com/THUwangcy/ReChorus): A general PyTorch framework for Top-K recommendation with implicit feedback
- [TimeSHAP](https://github.com/feedzai/timeshap): A model-agnostic, recurrent explainer that builds upon KernelSHAP and extends it to the sequential domain. TimeSHAP computes event/timestamp- feature-, and cell-level attributions.

In this repository, we have chosen GRU4Rec as the sequential recommender system and provided a local level explanations of a few interactions on MovieLens 1m dataset.

Prerequisites

pip install -r requirements.txt

Getting Started

  1. Install Anaconda with Python >= 3.5
  2. Clone the repository
git clone https://github.com/maneelusf/extpersonalization
  1. Install requirements and step into the src folder
cd src
  1. Run model with the build-in dataset
python argcorpus.py --model_name GRU4Rec --emb_size 64 --lr 1e-3 --l2 1e-6 --dataset Grocery_and_Gourmet_Food
python main.py --model_name GRU4Rec --emb_size 64 --lr 1e-3 --l2 1e-6 --dataset Grocery_and_Gourmet_Food
  1. (optional) Run jupyter notebook in data folder to download and build new datasets, or prepare your own datasets according to Guideline in data

  2. (optional) Implement your own models according to Guideline in src

  3. Then move into the notebooks folder

cd ../Notebooks

Approach

  1. The first step is to generate a list of the top K recommended items, and to calculate their scores using a matrix multiplication of the output vector from the model's forward loop with each item. These scores are expected to be highly positive numbers.
  2. When perturbing the sequence with a baseline item, the output vector from the forward loop is calculated using the original top K recommended items, rather than calculating it with each individual item in the sequence.
  3. This approach is motivated by the hypothesis that the initial list of top K items should not result in a higher positive score when perturbing the sequence. The objective is to evaluate the effect of the perturbed items on the recommendation score relative to the original recommended items.
  4. For example, it is expected that the SHAPLEY values for events -1 to -5 will be the highest. Therefore, perturbing these events should result in a vector that is less similar to the recommended items. Conversely, the last items in a sequence (e.g. events -20 and onwards) should output a vector that is similar to the recommended items, as these events have less impact on the subsequent items in the sequence.

Calculating SHAP values

After training a model, we can run the following the [notebook] (https://github.com/maneelusf/extpersonalization/blob/main/notebooks/Notebook%20to%20generate%20top%20K%20recommendations.ipynb)to generate SHAP values.

Support

Reach out to the maintainer at one of the following places:

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published