-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
maestro Florence-2 fine-tuning #33
Conversation
…add_timm fix: current Florence-2 training pipeline is missing `timm`
…readme updated project `README` to showcase new project profile
…checkpoints Feature/foundations of training checkpoints
…ns_of_cli # Conflicts: # maestro/trainer/common/utils/metrics_tracing.py # maestro/trainer/models/florence_2/entities.py # maestro/trainer/models/florence_2/training.py
Add first scratch of implementation for maestro CLI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall it looks well prepared. Since it is on-going project left some common question here, because I'm still learning the overall pipeline and code-style that roboflow team has made. (Luckly, it seems very similar to transformers pipeline)
- does
training.py
in florence_2 is missing or on-going? (while paligemma has training.py) - Definitely consider multi-gpu circumstances when thinking of real user scenario.
This is my on-going Zero-shot Object detection pipeline in HuggingFace.
huggingface/transformers#32483
num_workers=config.num_workers, | ||
test_loaders_workers=config.val_num_workers, | ||
) | ||
peft_model = prepare_peft_model( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approaching with peft is also good way to start. FYI, I have tried three different technique
- Full finetuning
- Part finetuning (freezing encoder-like part)
- peft
it turns out 2, 3 is robust for other hyperparameter option and I couldn't fine any stable configuration for 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have 1/2 yet. I'm just wondering how to solve 2. In theory, users might want to freeze larger/smaller parts of the graph. Do you think such flexibility might be useful or can we just offer a pre-defined freeze?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for now supporting only peft would be enough (which means not the highest priority), since this is the moment of just starting to make the growth. If there is retention or other users inquiry then we can support from that moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so too! For the time being, we have to remember that such an option may arise at some point.
# Postprocess prediction for mean average precision calculation | ||
prediction = processor.post_process_generation(generated_text, task="<OD>", image_size=image.size) | ||
prediction = sv.Detections.from_lmm(sv.LMM.FLORENCE_2, prediction, resolution_wh=image.size) | ||
prediction = prediction[np.isin(prediction["class_name"], classes)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also agree that this is one option to choose text-based output
to calculate traditional OD.
However, some prediction is very close to the classes
, but this np.isin
will not able to catch.
e.g. prediction : apple, groundtruth : apples.
I also considered to use calculate the distance vectorized text embedding or other heuristic method such as CIDEr
for to make more robust. It would be great to consider metrics in VLM. e.g. CIDEr
, BLUE
, etc...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also agree that this is one option to choose
text-based output
to calculate traditional OD.
However, some prediction is very close to theclasses
, but thisnp.isin
will not able to catch.
e.g. prediction : apple, groundtruth : apples.
Good catch! I've experienced that myself. I don't have the time to address it right now, but I'll add a task for it. Maybe one of the external contributors would like to implement this feature.chciałby zaimplementować ten feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also considered to use calculate the distance vectorized text embedding or other heuristic method such as
CIDEr
for to make more robust. It would be great to consider metrics in VLM. e.g.CIDEr
,BLUE
, etc...
Do you have any resources (papers) where I could read about alternative metrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with torch.amp.autocast(device.type, torch.float16): | ||
lora_layers = filter(lambda p: p.requires_grad, peft_model.parameters()) | ||
optimizer = optim.SGD(lora_layers, lr=learning_rate) | ||
scheduler = optim.lr_scheduler.CosineAnnealingLR( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason using CosineAnnealingLR
?
Hi @SangbumChoi 👋🏻 First of all, thank you so much for taking the time to look at the code.
|
Since this PR is merged let me run this repo and discuss in slack! |
README.md
updatemaestro
CLI withtrain
andevaluate
commandsMeanAveragePrecisionMetric