-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fine tuning #147
Comments
right, without fine-tuning, its not much valuable |
I recommend the following fine tune: # [email protected]
import torch
from transformers import AutoModelForCausalLM, TrainingArguments, Trainer
from janus.models import MultiModalityCausalLM, VLChatProcessor
from janus.utils.io import load_pil_images
from datasets import Dataset
# Định nghĩa model path
model_name = "deepseek-ai/Janus-Pro-7B"
# Load processor và tokenizer
vl_chat_processor: VLChatProcessor = VLChatProcessor.from_pretrained(model_name)
tokenizer = vl_chat_processor.tokenizer
# Load model
vl_gpt: MultiModalityCausalLM = AutoModelForCausalLM.from_pretrained(
model_name, trust_remote_code=True
)
vl_gpt = vl_gpt.to(torch.bfloat16).cuda().eval()
# ---- Tạo dataset mẫu ----
dataset_samples = [
{"question": "What is this image about?", "image": "path/to/image1.jpg", "answer": "This image is about AI."},
{"question": "Describe the object in the image.", "image": "path/to/image2.jpg", "answer": "It is a red car."},
{"question": "What can you infer from this?", "image": "path/to/image3.jpg", "answer": "It seems like a festival."},
]
dataset = Dataset.from_dict({
"question": [item["question"] for item in dataset_samples],
"image": [item["image"] for item in dataset_samples],
"answer": [item["answer"] for item in dataset_samples],
})
# ---- Chuẩn bị dữ liệu huấn luyện ----
def preprocess_function(examples):
conversations = [
{
"role": "<|User|>",
"content": f"<image_placeholder>\n{q}",
"images": [img],
}
for q, img in zip(examples["question"], examples["image"])
]
pil_images = load_pil_images(conversations)
inputs = vl_chat_processor(
conversations=conversations, images=pil_images, force_batchify=True
).to(vl_gpt.device)
labels = tokenizer(examples["answer"], padding="max_length", truncation=True, return_tensors="pt")["input_ids"]
return {"inputs_embeds": inputs["inputs_embeds"], "labels": labels}
tokenized_datasets = dataset.map(preprocess_function, batched=True)
# ---- Huấn luyện mô hình ----
training_args = TrainingArguments(
output_dir="./results",
evaluation_strategy="epoch",
save_strategy="epoch",
per_device_train_batch_size=1,
per_device_eval_batch_size=1,
num_train_epochs=1,
weight_decay=0.01,
save_total_limit=2,
logging_dir="./logs",
logging_steps=10,
load_best_model_at_end=True
)
trainer = Trainer(
model=vl_gpt,
args=training_args,
train_dataset=tokenized_datasets,
eval_dataset=tokenized_datasets,
)
trainer.train()
# ---- Lưu mô hình sau fine-tune ----
vl_gpt.save_pretrained("./fine_tuned_Janus_Pro_7B")
tokenizer.save_pretrained("./fine_tuned_Janus_Pro_7B")
print("Fine-tuning complete!") |
@ntanhfai Thank you for the script! I'm actually more interested in fine-tuning it for image generation. Is there a parameter I should change to output an image instead of text? Or is it possible to do both? |
@ntanhfai KeyError: 'inputs_embeds'??? |
Hi @ntanhfai , I am trying to get your basic script to work, but I'm having issues with the default HuggingFace Trainer. From the looks of it, I may have the same issue as @7125messi. I either get this error, when using your dictionary structure from the
Or when I alter the structure to use
I'm not very familiar with the Trainer, but if I'm hitting the Also, were you able to run the script you shared in the thread or was that pseudo-code? Thank you!! |
It would be amazing if there is a fine tuning script to use on this amazing model!
The text was updated successfully, but these errors were encountered: