You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In order to further fine-tune language models and align them with human preferences, it's necessary to collect preference feedback on model responses. There are a few types of data points to collect: unpaired and paired.
In order to collect data for this form of fine-tuning, we want to introduce a thumbs up/down button that appears on each assistant response. When the user presses this button, we want to record the following information:
immediate model response
previous user message
conversation ID
model ID
For instance, consider how the UI appears in the following popular chat assistant:
Example of a response with thumbs up/down buttons:
In order to further fine-tune language models and align them with human preferences, it's necessary to collect preference feedback on model responses. There are a few types of data points to collect: unpaired and paired.
Unpaired preference data is used for unpaired preference optimization, as described by the following paper: UPO: Unpaired Preference Optimization for Large Language Models.
In order to collect data for this form of fine-tuning, we want to introduce a thumbs up/down button that appears on each assistant response. When the user presses this button, we want to record the following information:
For instance, consider how the UI appears in the following popular chat assistant:
Example of a response with thumbs up/down buttons:
Example of the thumbs up/down buttons:
This issue depends on #13 and #394
The text was updated successfully, but these errors were encountered: