Skip to content

Latest commit

 

History

History
216 lines (157 loc) · 12.3 KB

README.md

File metadata and controls

216 lines (157 loc) · 12.3 KB

Paper DOI CI status Code coverage Maintenance
PyPI version PyPI downloads Python versions
Docker version Docker pulls Docker image size
Issues Pull requests Commit activity License

↕️ ir_axioms

Intuitive axiomatic retrieval experimentation.

ir_axioms is a Python framework for experimenting with axioms in information retrieval in a declarative way. It includes reference implementations of many commonly used retrieval axioms and is well integrated with the PyTerrier framework and the Pyserini toolkit. Re-rank your search results today with ir_axioms and understand your retrieval systems better by analyzing axiomatic preferences!

Presentation video on YouTube Poster
Presentation video Poster

Usage

The ir_axioms framework is easy to use. Below, we've prepared some notebooks showcasing the main features. If you have questions or need assistance, please contatct us.

Example Notebooks

We include several example notebooks to demonstrate re-ranking and preference evaluation in PyTerrier using ir_axioms. You can find all examples in the examples/ directory.

Backends

You can experiment with ir_axioms in PyTerrier and Pyserini. However, we recommend PyTerrier as not all features are implemented for the Pyserini backend.

PyTerrier (Terrier index)

To use ir_axioms with a Terrier index, please use our PyTerrier transformers (modules):

Transformer Class Type Description
AggregatedPreferences 𝑅 → 𝑅𝑓 Aggregate axiom preferences for each document
EstimatorKwikSortReranker 𝑅 → 𝑅′ Train estimator for ORACLE, use it to re-rank with KwikSort.
KwikSortReranker 𝑅 → 𝑅′ Re-rank using axiom preferences aggregated by KwikSort.
PreferenceMatrix 𝑅 → (𝑅×𝑅)𝑓 Compute an axiom’s preference matrix.

You can also directly instantiate a index context object from a Terrier index if you want to build custom axiomatic modules:

from ir_axioms.backend.pyterrier import TerrierIndexContext
context = TerrierIndexContext("/path/to/index/dir")
axiom.preference(context, query, doc1, doc2)

Pyserini (Anserini index)

We don't have modules for Pyserini to re-rank or analyze results out of the box. However, you can still comute axiom preferences to integrate retrieval axioms into your search pipeline:

from ir_axioms.backend.pyserini import AnseriniIndexContext
context = AnseriniIndexContext("/path/to/index/dir")
axiom.preference(context, query, doc1, doc2)

TIRA

Here's an example how ir_axioms can be used to get axiomatic preferences for a run in TIRA:

tira-run \
  --input-directory ${PWD}/data/tira/input-of-re-ranker \
  --input-run ${PWD}/data/tira/output-of-indexer \
  --output-directory ${PWD}/data/tira/output \
  --image webis/ir_axioms \
  --command '/venv/bin/python -m ir_axioms --offline --terrier-version 5.7 --terrier-helper-version 0.0.7 preferences --run-file $inputDataset/run.jsonl --run-format jsonl --index-dir $inputRun/index --output-dir $outputDir AND ANTI-REG ASPECT-REG DIV LB1 LNC1 LEN-AND LEN-DIV LEN-M-AND LEN-M-TDC LNC1 M-AND M-TDC PROX1 PROX2 PROX3 PROX4 PROX5 REG STMC1 STMC2 TF-LNC TFC1 TFC3'

Citation

If you use this package or its components in your research, please cite the following paper describing the ir_axioms framework and its use-cases:

Alexander Bondarenko, Maik Fröbe, Jan Heinrich Reimer, Benno Stein, Michael Völske, and Matthias Hagen. Axiomatic Retrieval Experimentation with ir_axioms. In 45th International ACM Conference on Research and Development in Information Retrieval (SIGIR 2022), July 2022. ACM.

You can use the following BibTeX entry for citation:

@InProceedings{bondarenko:2022d,
  author =    {Alexander Bondarenko and
               Maik Fr{\"o}be and
               {Jan Heinrich} Reimer and
               Benno Stein and
               Michael V{\"o}lske and
               Matthias Hagen},
  booktitle = {45th International ACM Conference on Research and Development
               in Information Retrieval (SIGIR 2022)},
  month =     jul,
  publisher = {ACM},
  site =      {Madrid, Spain},
  title =     {{Axiomatic Retrieval Experimentation with ir_axioms}},
  year =      2022
}

Development

To build ir_axioms and contribute to its development you need to install the build, and setuptools and wheel packages:

pip install build setuptools wheel

(On most systems, these packages are already pre-installed.)

Installation

Install dependencies for developing the ir_axioms package:

pip install -e .

If you want to develop the Pyserini backend, install dependencies like this:

pip install -e .[pyserini]

If you want to develop the PyTerrier backend, install dependencies like this:

pip install -e .[pyterrier]

Testing

Install test dependencies:

pip install -e .[test]

Verify your changes against our test suite to verify.

flake8 ir_axioms tests
pylint -E ir_axioms tests.unit --ignore-paths=^ir_axioms.backend
pytest ir_axioms/ tests/unit/ --ignore=ir_axioms/backend/

Please also add tests for the axioms or integrations you've added.

Testing backend integrations

Install test dependencies (replace <BACKEND> with either pyserini or pyterrier):

pip install -e .[<BACKEND>]

Verify your changes against our test suite to verify.

pylint -E ir_axioms.backend.<BACKEND> tests.integration.<BACKEND>
pytest tests/integration/<BACKEND>/

Build wheel

A wheel for this package can be built by running:

python -m build

Support

If you hit any problems using ir_axioms or reproducing our experiments, please write us an email or file an issue:

We're happy to help!

License

This repository is released under the MIT license. If you use ir_axioms in your research, we'd be glad if you'd cite us.

Abstract

Axiomatic approaches to information retrieval have played a key role in determining basic constraints that characterize good retrieval models. Beyond their importance in retrieval theory, axioms have been operationalized to improve an initial ranking, to “guide” retrieval, or to explain some model’s rankings. However, recent open-source retrieval frameworks like PyTerrier and Pyserini, which made it easy to experiment with sparse and dense retrieval models, have not included any retrieval axiom support so far. To fill this gap, we propose ir_axioms, an open-source Python framework that integrates retrieval axioms with common retrieval frameworks. We include reference implementations for 25 retrieval axioms, as well as components for preference aggregation, re-ranking, and evaluation. New axioms can easily be defined by implementing an abstract data type or by intuitively combining existing axioms with Python operators or regression. Integration with PyTerrier and ir_datasets makes standard retrieval models, corpora, topics, and relevance judgments—including those used at TREC—immediately accessible for axiomatic experimentation. Our experiments on the TREC Deep Learning tracks showcase some potential research questions that ir_axioms can help to address.