Fine tuning #147

a-hamdi · 2025-02-06T20:59:16Z

It would be amazing if there is a fine tuning script to use on this amazing model!

ntanhfai · 2025-02-07T01:47:21Z

right, without fine-tuning, its not much valuable

ntanhfai · 2025-02-07T02:03:48Z

I recommend the following fine tune:

# [email protected]

import torch
from transformers import AutoModelForCausalLM, TrainingArguments, Trainer
from janus.models import MultiModalityCausalLM, VLChatProcessor
from janus.utils.io import load_pil_images
from datasets import Dataset

# Định nghĩa model path
model_name = "deepseek-ai/Janus-Pro-7B"

# Load processor và tokenizer
vl_chat_processor: VLChatProcessor = VLChatProcessor.from_pretrained(model_name)
tokenizer = vl_chat_processor.tokenizer

# Load model
vl_gpt: MultiModalityCausalLM = AutoModelForCausalLM.from_pretrained(
    model_name, trust_remote_code=True
)
vl_gpt = vl_gpt.to(torch.bfloat16).cuda().eval()

# ---- Tạo dataset mẫu ----
dataset_samples = [
    {"question": "What is this image about?", "image": "path/to/image1.jpg", "answer": "This image is about AI."},
    {"question": "Describe the object in the image.", "image": "path/to/image2.jpg", "answer": "It is a red car."},
    {"question": "What can you infer from this?", "image": "path/to/image3.jpg", "answer": "It seems like a festival."},
]

dataset = Dataset.from_dict({
    "question": [item["question"] for item in dataset_samples],
    "image": [item["image"] for item in dataset_samples],
    "answer": [item["answer"] for item in dataset_samples],
})

# ---- Chuẩn bị dữ liệu huấn luyện ----
def preprocess_function(examples):
    conversations = [
        {
            "role": "<|User|>",
            "content": f"<image_placeholder>\n{q}",
            "images": [img],
        }
        for q, img in zip(examples["question"], examples["image"])
    ]
    
    pil_images = load_pil_images(conversations)
    inputs = vl_chat_processor(
        conversations=conversations, images=pil_images, force_batchify=True
    ).to(vl_gpt.device)
    
    labels = tokenizer(examples["answer"], padding="max_length", truncation=True, return_tensors="pt")["input_ids"]
    
    return {"inputs_embeds": inputs["inputs_embeds"], "labels": labels}

tokenized_datasets = dataset.map(preprocess_function, batched=True)

# ---- Huấn luyện mô hình ----
training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    save_strategy="epoch",
    per_device_train_batch_size=1,
    per_device_eval_batch_size=1,
    num_train_epochs=1,
    weight_decay=0.01,
    save_total_limit=2,
    logging_dir="./logs",
    logging_steps=10,
    load_best_model_at_end=True
)

trainer = Trainer(
    model=vl_gpt,
    args=training_args,
    train_dataset=tokenized_datasets,
    eval_dataset=tokenized_datasets,
)

trainer.train()

# ---- Lưu mô hình sau fine-tune ----
vl_gpt.save_pretrained("./fine_tuned_Janus_Pro_7B")
tokenizer.save_pretrained("./fine_tuned_Janus_Pro_7B")

print("Fine-tuning complete!")

a-hamdi · 2025-02-07T14:33:04Z

@ntanhfai Thank you for the script! I'm actually more interested in fine-tuning it for image generation. Is there a parameter I should change to output an image instead of text? Or is it possible to do both?

7125messi · 2025-02-10T10:58:54Z

@ntanhfai KeyError: 'inputs_embeds'???

SouLeo · 2025-02-10T19:17:16Z

Hi @ntanhfai , I am trying to get your basic script to work, but I'm having issues with the default HuggingFace Trainer. From the looks of it, I may have the same issue as @7125messi.

I either get this error, when using your dictionary structure from the preprocess_function:

TypeError: _forward_unimplemented() got an unexpected keyword argument 'input_embeds'

Or when I alter the structure to use input_ids, I get this error:

TypeError: _forward_unimplemented() got an unexpected keyword argument 'input_ids'

I'm not very familiar with the Trainer, but if I'm hitting the _forward_unimplemented() method, does this mean that the code base doesn't support fine tuning? I'm sorry for my confusion.

Also, were you able to run the script you shared in the thread or was that pseudo-code?

Thank you!!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine tuning #147

Fine tuning #147

a-hamdi commented Feb 6, 2025

ntanhfai commented Feb 7, 2025

ntanhfai commented Feb 7, 2025

a-hamdi commented Feb 7, 2025

7125messi commented Feb 10, 2025

SouLeo commented Feb 10, 2025

Fine tuning #147

Fine tuning #147

Comments

a-hamdi commented Feb 6, 2025

ntanhfai commented Feb 7, 2025

ntanhfai commented Feb 7, 2025

a-hamdi commented Feb 7, 2025

7125messi commented Feb 10, 2025

SouLeo commented Feb 10, 2025