diff --git a/.gitignore b/.gitignore
index 82f92755..8cac0bcd 100644
--- a/.gitignore
+++ b/.gitignore
@@ -3,9 +3,6 @@ __pycache__/
*.py[cod]
*$py.class
-# C extensions
-*.so
-
# Distribution / packaging
.Python
build/
@@ -26,16 +23,6 @@ share/python-wheels/
*.egg
MANIFEST
-# PyInstaller
-# Usually these files are written by a python script from a template
-# before PyInstaller builds the exe, so as to inject date/other infos into it.
-*.manifest
-*.spec
-
-# Installer logs
-pip-log.txt
-pip-delete-this-directory.txt
-
# Unit test / coverage reports
htmlcov/
.tox/
@@ -51,112 +38,14 @@ coverage.xml
.pytest_cache/
cover/
-# Translations
-*.mo
-*.pot
-
-# Django stuff:
-*.log
-local_settings.py
-db.sqlite3
-db.sqlite3-journal
-
-# Flask stuff:
-instance/
-.webassets-cache
-
-# Scrapy stuff:
-.scrapy
-
-# Sphinx documentation
-docs/_build/
-
-# PyBuilder
-.pybuilder/
-target/
-
# Jupyter Notebook
.ipynb_checkpoints
-# IPython
-profile_default/
-ipython_config.py
-
-# pyenv
-# For a library or package, you might want to ignore these files since the code is
-# intended to run in multiple environments; otherwise, check them in:
-# .python-version
-
-# pipenv
-# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
-# However, in case of collaboration, if having platform-specific dependencies or dependencies
-# having no cross-platform support, pipenv may install dependencies that don't work, or not
-# install all needed dependencies.
-#Pipfile.lock
-
-# poetry
-# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
-# This is especially recommended for binary packages to ensure reproducibility, and is more
-# commonly ignored for libraries.
-# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
-#poetry.lock
-
-# pdm
-# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
-#pdm.lock
-# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
-# in version control.
-# https://pdm.fming.dev/latest/usage/project/#working-with-version-control
-.pdm.toml
-.pdm-python
-.pdm-build/
-
-# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
-__pypackages__/
-
-# Celery stuff
-celerybeat-schedule
-celerybeat.pid
-
-# SageMath parsed files
-*.sage.py
-
# Environments
.env
.venv
env/
venv/
ENV/
-env.bak/
-venv.bak/
-
-# Spyder project settings
-.spyderproject
-.spyproject
-
-# Rope project settings
-.ropeproject
-
-# mkdocs documentation
-/site
-
-# mypy
-.mypy_cache/
-.dmypy.json
-dmypy.json
-
-# Pyre type checker
-.pyre/
-
-# pytype static type analyzer
-.pytype/
-
-# Cython debug symbols
-cython_debug/
-# PyCharm
-# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
-# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
-# and can be added to the global gitignore or merged into this file. For a more nuclear
-# option (not recommended) you can uncomment the following to ignore the entire idea folder.
-#.idea/
+.DS_store
\ No newline at end of file
diff --git a/README.md b/README.md
index 3984c9cb..7b8b48bc 100644
--- a/README.md
+++ b/README.md
@@ -1,2 +1,5 @@
-# respect
-Home for the paper "Retrospective Learning from Interactions"
+# Retrospective Learning from Interactions
+
+Project page:
+ Multi-turn interactions between large language models (LLMs) and users naturally include implicit feedback signals. If an LLM responds in an unexpected way to an instruction, the user is likely to signal it by rephrasing the request, expressing frustration, or pivoting to an alternative task. Such signals are task-independent and occupy a relatively constrained subspace of language, allowing the LLM to identify them even if it fails on the actual task. This creates an avenue for continually learning from interactions without additional annotations. We introduce ReSpect, a method to learn from such signals in past interactions via retrospection. We deploy ReSpect in a new multimodal interaction scenario, where humans instruct an LLM to solve an abstract reasoning task with a combinatorial solution space. Through thousands of interactions with humans, we show how ReSpect gradually improves task completion rate from 31% to 82%, all without any external annotation. +
++ We deploy an LLM policy \(\pi_{\theta_{\rho}}(a \vert x) \) to interact with users in multi-turn interactions. Following each round, the LLM reasons retrospectively about each of its actions (highlighted in blue) to decode feedback given the interaction context, including follow up utterances. After each round, the model is retrained using all data aggregated so far \(D_{\leq \rho}\). + The LLM improves over time without any external annotations. The plot on the right shows the performance curve in our experiments - the LLM improves from 31% to 82% task completion rate over six rounds. +
+ ++ Multiref is a multi-turn reference game . A speaker and a listener both observe a shared set of tangram shapes, but in different order. The goal of the speaker is to describe a subset of targets for the listener to select. Because the target requires multiple abstract shapes, humans often communicate the targets gradually over multiple turns. As an interaction progresses naturally, the speaker produces implicit feedback signals that validate or reject the listener's actions. +
++ We present deployment results across three rounds for six concurrent systems, and three more rounds for the top system (B-SUP), together with human-human references (HH) and a redeployment of the initial policy \(\pi_{\theta_0}\) (CONTROL). + Left: interaction-level success rate (\(\uparrow\), higher is better). + Center: interaction-level efficiency by # turns per interactions (\(\downarrow\)). + Right: micro-level performance by click accuracy (\(\uparrow\)). +
+ ++ More granularly, we present the turn-level performance of B-SUP and controls, evaluated by post-hoc human annotations. + Left: % turns where the policy's action \(\hat a\) matches exactly the human listener's action \(a^*\) (\(\uparrow\)). + Center: similarity between the policy's action and the human listener's action (\(\uparrow\)). Even actions that receive negative feedback in deployment (NEG FB) are increasingly similar to human actions. + Right: % turns that annotated to have received positive implicit feedback from human listeners (\(\uparrow\)). +
+ +@misc{chen2024retrospective,
+ title={Retrospective Learning from Interactions},
+ author={Zizhao Chen and Mustafa Omer Gul and Yiwei Chen and Gloria Geng and Anne Wu and Yoav Artzi},
+ year={2024},
+ eprint={2410.13852},
+ archivePrefix={arXiv},
+ primaryClass={cs.CL},
+ url={https://arxiv.org/abs/2410.13852},
+}
+