Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fine tuning #147

Open
a-hamdi opened this issue Feb 6, 2025 · 5 comments
Open

Fine tuning #147

a-hamdi opened this issue Feb 6, 2025 · 5 comments

Comments

@a-hamdi
Copy link

a-hamdi commented Feb 6, 2025

It would be amazing if there is a fine tuning script to use on this amazing model!

@ntanhfai
Copy link

ntanhfai commented Feb 7, 2025

right, without fine-tuning, its not much valuable

@ntanhfai
Copy link

ntanhfai commented Feb 7, 2025

I recommend the following fine tune:

# [email protected]

import torch
from transformers import AutoModelForCausalLM, TrainingArguments, Trainer
from janus.models import MultiModalityCausalLM, VLChatProcessor
from janus.utils.io import load_pil_images
from datasets import Dataset

# Định nghĩa model path
model_name = "deepseek-ai/Janus-Pro-7B"

# Load processor và tokenizer
vl_chat_processor: VLChatProcessor = VLChatProcessor.from_pretrained(model_name)
tokenizer = vl_chat_processor.tokenizer

# Load model
vl_gpt: MultiModalityCausalLM = AutoModelForCausalLM.from_pretrained(
    model_name, trust_remote_code=True
)
vl_gpt = vl_gpt.to(torch.bfloat16).cuda().eval()

# ---- Tạo dataset mẫu ----
dataset_samples = [
    {"question": "What is this image about?", "image": "path/to/image1.jpg", "answer": "This image is about AI."},
    {"question": "Describe the object in the image.", "image": "path/to/image2.jpg", "answer": "It is a red car."},
    {"question": "What can you infer from this?", "image": "path/to/image3.jpg", "answer": "It seems like a festival."},
]

dataset = Dataset.from_dict({
    "question": [item["question"] for item in dataset_samples],
    "image": [item["image"] for item in dataset_samples],
    "answer": [item["answer"] for item in dataset_samples],
})

# ---- Chuẩn bị dữ liệu huấn luyện ----
def preprocess_function(examples):
    conversations = [
        {
            "role": "<|User|>",
            "content": f"<image_placeholder>\n{q}",
            "images": [img],
        }
        for q, img in zip(examples["question"], examples["image"])
    ]
    
    pil_images = load_pil_images(conversations)
    inputs = vl_chat_processor(
        conversations=conversations, images=pil_images, force_batchify=True
    ).to(vl_gpt.device)
    
    labels = tokenizer(examples["answer"], padding="max_length", truncation=True, return_tensors="pt")["input_ids"]
    
    return {"inputs_embeds": inputs["inputs_embeds"], "labels": labels}

tokenized_datasets = dataset.map(preprocess_function, batched=True)

# ---- Huấn luyện mô hình ----
training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    save_strategy="epoch",
    per_device_train_batch_size=1,
    per_device_eval_batch_size=1,
    num_train_epochs=1,
    weight_decay=0.01,
    save_total_limit=2,
    logging_dir="./logs",
    logging_steps=10,
    load_best_model_at_end=True
)

trainer = Trainer(
    model=vl_gpt,
    args=training_args,
    train_dataset=tokenized_datasets,
    eval_dataset=tokenized_datasets,
)

trainer.train()

# ---- Lưu mô hình sau fine-tune ----
vl_gpt.save_pretrained("./fine_tuned_Janus_Pro_7B")
tokenizer.save_pretrained("./fine_tuned_Janus_Pro_7B")

print("Fine-tuning complete!")

@a-hamdi
Copy link
Author

a-hamdi commented Feb 7, 2025

@ntanhfai Thank you for the script! I'm actually more interested in fine-tuning it for image generation. Is there a parameter I should change to output an image instead of text? Or is it possible to do both?

@7125messi
Copy link

@ntanhfai KeyError: 'inputs_embeds'???

@SouLeo
Copy link

SouLeo commented Feb 10, 2025

Hi @ntanhfai , I am trying to get your basic script to work, but I'm having issues with the default HuggingFace Trainer. From the looks of it, I may have the same issue as @7125messi.

I either get this error, when using your dictionary structure from the preprocess_function:

TypeError: _forward_unimplemented() got an unexpected keyword argument 'input_embeds'

Or when I alter the structure to use input_ids, I get this error:

TypeError: _forward_unimplemented() got an unexpected keyword argument 'input_ids'

I'm not very familiar with the Trainer, but if I'm hitting the _forward_unimplemented() method, does this mean that the code base doesn't support fine tuning? I'm sorry for my confusion.

Also, were you able to run the script you shared in the thread or was that pseudo-code?

Thank you!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants