Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the documentation and citation of mauve #416

Merged
merged 3 commits into from
Nov 2, 2023
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 11 additions & 5 deletions metrics/mauve/mauve.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,20 +27,26 @@

_CITATION = """\
@inproceedings{pillutla-etal:mauve:neurips2021,
title={MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers},
title={{MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers}},
author={Pillutla, Krishna and Swayamdipta, Swabha and Zellers, Rowan and Thickstun, John and Welleck, Sean and Choi, Yejin and Harchaoui, Zaid},
booktitle = {NeurIPS},
year = {2021}
}

@article{pillutla-etal:mauve:arxiv2022,
title={{MAUVE Scores for Generative Models: Theory and Practice}},
author={Pillutla, Krishna and Liu, Lang and Thickstun, John and Welleck, Sean and Swayamdipta, Swabha and Zellers, Rowan and Oh, Sewoong and Choi, Yejin and Harchaoui, Zaid},
journal={arXiv Preprint},
year={2022}
}
"""

_DESCRIPTION = """\
MAUVE is a library built on PyTorch and HuggingFace Transformers to measure the gap between neural text and human text with the eponymous MAUVE measure.

MAUVE summarizes both Type I and Type II errors measured softly using Kullback–Leibler (KL) divergences.
MAUVE is a measure of the statistical gap between two text distributions, e.g., how far the text written by a model is the distribution of human text, using samples from both distributions.
lvwerra marked this conversation as resolved.
Show resolved Hide resolved

For details, see the MAUVE paper: https://arxiv.org/abs/2102.01454 (Neurips, 2021).
MAUVE is obtained by computing Kullback–Leibler (KL) divergences between the two distributions in a quantized embedding space of a large language model.
lvwerra marked this conversation as resolved.
Show resolved Hide resolved
It can quantify differences in the quality of generated text based on the size of the model, the decoding algorithm, and the length of the generated text.
lvwerra marked this conversation as resolved.
Show resolved Hide resolved
MAUVE was found to correlate the strongest with human evaluations over baseline metrics for open-ended text generation.

This metrics is a wrapper around the official implementation of MAUVE:
https://github.com/krishnap25/mauve
Expand Down