I have generated a dataset #261

abrahimzaman360 · 2023-10-07T06:42:40Z

I have generated the dataset using the method shown in docs.
Now, when I'm fine-tuning it on base model of bloom 1.3B it's giving me error:

[2023-10-07 06:39:28,529] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
trainable params: 1179648 || all params: 1066493952 || trainable%: 0.11060990995662018
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/xturing/datasets/instruction_dataset.py", line 89, in from_jsonl
data["text"].append(json_line["text"])
KeyError: 'text'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/content/main.py", line 8, in
dataset = InstructionDataset('./dataset/tasks.jsonl')
File "/usr/local/lib/python3.10/dist-packages/xturing/datasets/instruction_dataset.py", line 64, in init
self.data = {"train": HFDataset.from_dict(self.from_jsonl(path))}
File "/usr/local/lib/python3.10/dist-packages/xturing/datasets/instruction_dataset.py", line 93, in from_jsonl
raise ValueError(
ValueError: The jsonl file should have keys text, instruction and target

abrahimzaman360 · 2023-10-07T06:44:59Z

{"id": "seed_task_2", "instruction": "What are the major matters related to the enactment and revision?", "instances": [{"input": "", "output": "Answer: The major matters include integrating existing , addressing overlaps and conflicts, and incorporating parts."}]}

This is sample row generated by your InstructionSet Dataset Method using Custom Data.

StochasticRomanAgeev · 2023-10-30T08:11:06Z

Hi @abrahimzaman360,
Can you please share generation code?

StochasticRomanAgeev closed this as completed Dec 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I have generated a dataset #261

I have generated a dataset #261

abrahimzaman360 commented Oct 7, 2023

abrahimzaman360 commented Oct 7, 2023

StochasticRomanAgeev commented Oct 30, 2023

I have generated a dataset #261

I have generated a dataset #261

Comments

abrahimzaman360 commented Oct 7, 2023

abrahimzaman360 commented Oct 7, 2023

StochasticRomanAgeev commented Oct 30, 2023