Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Newbie Qs: RLHF fine-tuning & dataset #6470

Closed
vtharmalingam opened this issue Dec 28, 2024 · 0 comments
Closed

Newbie Qs: RLHF fine-tuning & dataset #6470

vtharmalingam opened this issue Dec 28, 2024 · 0 comments
Labels
wontfix This will not be worked on

Comments

@vtharmalingam
Copy link

vtharmalingam commented Dec 28, 2024

First off, thank you for the awesome library!!

I want to train Qwen for RLHF fine-tuning

Here is the use case context: My LLM is responding to user queries, and both query and response are tracked for human validation. The human feedback is given as a scalar value between 0 and 1. That makes up the dataset for fine-tuning the model.

So the question here is:

What is the acceptable dataset format? Will the below format work for the finetuning? Also, please throw some lights as to how the dataset structure/format is flexible enough if I need to add an additional key/value in the JSON for my domain/context needs—does it give such flexibility? If yes, which Python file or configuration do I need to edit the new field?

```json	
[
        {
            "query": "What are the benefits of regular exercise?",
            "response": "Regular exercise boosts physical health, improves mental health, and enhances overall well-being. It helps in weight management and reduces the risk of chronic diseases.",
            "feedback": 0.9
        },
        {
            "query": "Explain the theory of relativity in simple terms.",
            "response": "The theory of relativity states that the laws of physics are the same for all non-accelerating observers, and that the speed of light is constant no matter how fast you are moving. It includes both special and general relativity.",
            "feedback": 0.8
        },
```

Thanks,
Tharma

@github-actions github-actions bot added the pending This problem is yet to be addressed label Dec 28, 2024
Repository owner locked and limited conversation to collaborators Jan 9, 2025
@hiyouga hiyouga converted this issue into discussion #6587 Jan 9, 2025
@hiyouga hiyouga added wontfix This will not be worked on and removed pending This problem is yet to be addressed labels Jan 9, 2025

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

2 participants