Skip to content
/ HT2020 Public

Source codes and results for a paper "Off-line vs. On-line Evaluation of Recommender Systems in Small E-commerce"

Notifications You must be signed in to change notification settings

lpeska/HT2020

Repository files navigation

HT2020

Source codes, raw data and complete results of paper Off-line vs. On-line Evaluation of Recommender Systems in Small E-commerce accepted for HT2020 conference

The paper as well as this repository is partially based on the preliminary version (https://arxiv.org/abs/1809.03186) presented at REVEAL 2018 vorkshop. For instructions on running the recommending algorithms and their source codes, please refer to https://github.com/lpeska/REVEAL2018 repository. This repository contains scripts to process on-line and off-line evaluation, regression models aiming to predict the on-line results based on the off-line ones and results of the second on-line experiment aiming to evaluate these models.

  • off-line results can be generated via offLineResultsConstructor.ipynb file and are contained in the resultsWithNovDiv_32_0dot01Temporal.rar file
  • on-line results are processed in OnlineResultsEvaluation.ipynb file in OnlineResults folder
  • several off-line to on-line prediction models are evaluated there as well
  • results of a follow-up on-line experiment are in the onlineResults_secondExperiment folder

Abstract

In this paper, we present our work towards comparing on-line and off-line evaluation metrics in the context of small e-commerce recommender systems. Recommending on small e-commerce enterprises is rather challenging due to the lower volume of interactions and low user loyalty, rarely extending beyond a single session. On the other hand, we usually have to deal with lower volumes of objects, which are easier to discover by users through various browsing/searching GUIs.

The main goal of this paper is to determine applicability of off-line evaluation metrics in learning true usability of recommender systems (evaluated on-line in A/B testing). In total 800 variants of recommenders were evaluated off-line w.r.t. 18 metrics covering rating-based, ranking-based, novelty and diversity evaluation. The off-line results were afterwards compared with on-line evaluation of 12 selected recommender variants and based on the results, we tried to learn and utilize an off-line to on-line results prediction model.

Off-line results shown a great variance in performance w.r.t. different metrics with the Pareto front covering 64% of the approaches. Furthermore, we observed that on-line results are considerably affected by the seniority of users. On-line metrics correlates positively with ranking-based metrics (AUC, MRR, nDCG) for novice users, while too high values of novelty had a negative impact on the on-line results for them.

Novelty compared to the previous version

Finally, this paper contains numerous extensions as compared to our preliminary work in this field (https://arxiv.org/abs/1809.03186). The main ones are re-defined VRR and novelty metrics, more thorough results analysis including the effect of user profile size, evaluated regression algorithms aiming to learn on-line CTR and VRR results from the off-line metrics and additional on-line evaluations.

About

Source codes and results for a paper "Off-line vs. On-line Evaluation of Recommender Systems in Small E-commerce"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published