Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loss staying high while trying to tune network #166

Open
picosankaricpp opened this issue Feb 15, 2025 · 3 comments
Open

Loss staying high while trying to tune network #166

picosankaricpp opened this issue Feb 15, 2025 · 3 comments

Comments

@picosankaricpp
Copy link

I'm trying to tune the obj365+coco for my custom dataset. I have 1 class. This is the class information from the coco annotations: "categories": [{"supercategory": "none", "id": 0, "name": "0"}]}. I have set num_classes: 2 in custom_detection.yml. I have also made sure that remap_mscoco_category: False in custom_detection.yml. I'm training from powershell with this command: $env:CUDA_VISIBLE_DEVICES="0"; python train.py -c configs/dfine/custom/objects365/dfine_hgnetv2_s_obj2custom.yml --use-amp --seed=0 -t dfine_s_obj2coco.pth. My loss is staying high (20-30) during training and all of my evals come back as 0. I know the dataset is trainable, I've successfully trained a YOLO network on it. What am I doing wrong here?

@mirza298
Copy link

Hi there, what is the size of your dataset? How many epochs did you train for? Have you experimented with the hyperparameters? What is the goal of your dataset—what are you trying to detect?

You might want to try DEIM. The repository is similar, and according to their paper, it should converge way faster than RT-DETR and D-FINE.

Have you tried converting the D-FINE (or RT-DETR) model to ONNX and TensorRT? If so, what has your experience been with the model's inference speed? Did you encounter any errors during the conversion?

One more question: Have you encountered this issue on Windows or Linux with your custom dataset:

[rank0]: NotImplementedError: Caught NotImplementedError in DataLoader worker process 0.

I have successfully trained D-FINE and RT-DETR a month ago, but for some reason, I can’t get past this error now. I’m really pissed off now.

@picosankaricpp
Copy link
Author

picosankaricpp commented Feb 15, 2025

My dataset is small (180 images), I am trying to detect visible blobs in my images. My images are grey, but I kept them in RGB format (3 channels). I can look into DEIM, but I don't get why this isn't working. I trained for 64 epochs and did not mess with any of the hyperparameters. Is this simply too few epochs? I did convert the model to ONNX, it quickly makes very inaccurate inferences, which aligns with the high loss. There were no errors with the conversion. I am training on Windows and did initially run into a NotImplementedError. I think I solved it by going to an older torch version (2.3.1). What steps did you use to train on your custom dataset? I feel like I'm missing something obvious.

@mirza298
Copy link

If you had a perfect architecture for your problem you'd only need a few datapoints, I would add more images to the dataset: https://en.m.wikipedia.org/wiki/Neural_scaling_law

I haven't tried these models on smaller datasets, but transformers tend to perform worse than convolutional networks in low-data scenarios. Transformers allow each part of the image to influence the representation of every other part, while convolutional networks focus on local regions, with downsampling enabling deeper layers to capture broader context. The YOLO Darknet repository has shown that fine-tuning requires only a few images (around 20 is sufficient), even for relatively complex scenes. However, transformers are far more complex, and adjusting their weights to an optimal point requires a larger dataset, especially for understanding complex scenes.

This is an interesting paper for this question: https://openreview.net/forum?id=SCN8UaetXx

Ultimately the simplest route to a good transformer model is a ton of data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants