Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I have generated a dataset #261

Closed
abrahimzaman360 opened this issue Oct 7, 2023 · 2 comments
Closed

I have generated a dataset #261

abrahimzaman360 opened this issue Oct 7, 2023 · 2 comments

Comments

@abrahimzaman360
Copy link

I have generated the dataset using the method shown in docs.
Now, when I'm fine-tuning it on base model of bloom 1.3B it's giving me error:

[2023-10-07 06:39:28,529] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
trainable params: 1179648 || all params: 1066493952 || trainable%: 0.11060990995662018
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/xturing/datasets/instruction_dataset.py", line 89, in from_jsonl
data["text"].append(json_line["text"])
KeyError: 'text'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/content/main.py", line 8, in
dataset = InstructionDataset('./dataset/tasks.jsonl')
File "/usr/local/lib/python3.10/dist-packages/xturing/datasets/instruction_dataset.py", line 64, in init
self.data = {"train": HFDataset.from_dict(self.from_jsonl(path))}
File "/usr/local/lib/python3.10/dist-packages/xturing/datasets/instruction_dataset.py", line 93, in from_jsonl
raise ValueError(
ValueError: The jsonl file should have keys text, instruction and target

@abrahimzaman360
Copy link
Author

{"id": "seed_task_2", "instruction": "What are the major matters related to the enactment and revision?", "instances": [{"input": "", "output": "Answer: The major matters include integrating existing , addressing overlaps and conflicts, and incorporating parts."}]}

This is sample row generated by your InstructionSet Dataset Method using Custom Data.

@StochasticRomanAgeev
Copy link
Contributor

Hi @abrahimzaman360,
Can you please share generation code?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants