Loss staying high while trying to tune network #166

picosankaricpp · 2025-02-15T21:03:45Z

I'm trying to tune the obj365+coco for my custom dataset. I have 1 class. This is the class information from the coco annotations: "categories": [{"supercategory": "none", "id": 0, "name": "0"}]}. I have set num_classes: 2 in custom_detection.yml. I have also made sure that remap_mscoco_category: False in custom_detection.yml. I'm training from powershell with this command: $env:CUDA_VISIBLE_DEVICES="0"; python train.py -c configs/dfine/custom/objects365/dfine_hgnetv2_s_obj2custom.yml --use-amp --seed=0 -t dfine_s_obj2coco.pth. My loss is staying high (20-30) during training and all of my evals come back as 0. I know the dataset is trainable, I've successfully trained a YOLO network on it. What am I doing wrong here?

The text was updated successfully, but these errors were encountered:

mirza298 · 2025-02-15T22:35:30Z

Hi there, what is the size of your dataset? How many epochs did you train for? Have you experimented with the hyperparameters? What is the goal of your dataset—what are you trying to detect?

You might want to try DEIM. The repository is similar, and according to their paper, it should converge way faster than RT-DETR and D-FINE.

Have you tried converting the D-FINE (or RT-DETR) model to ONNX and TensorRT? If so, what has your experience been with the model's inference speed? Did you encounter any errors during the conversion?

One more question: Have you encountered this issue on Windows or Linux with your custom dataset:

[rank0]: NotImplementedError: Caught NotImplementedError in DataLoader worker process 0.

I have successfully trained D-FINE and RT-DETR a month ago, but for some reason, I can’t get past this error now. I’m really pissed off now.

picosankaricpp · 2025-02-15T23:25:13Z

My dataset is small (180 images), I am trying to detect visible blobs in my images. My images are grey, but I kept them in RGB format (3 channels). I can look into DEIM, but I don't get why this isn't working. I trained for 64 epochs and did not mess with any of the hyperparameters. Is this simply too few epochs? I did convert the model to ONNX, it quickly makes very inaccurate inferences, which aligns with the high loss. There were no errors with the conversion. I am training on Windows and did initially run into a NotImplementedError. I think I solved it by going to an older torch version (2.3.1). What steps did you use to train on your custom dataset? I feel like I'm missing something obvious.

mirza298 · 2025-02-16T08:28:56Z

If you had a perfect architecture for your problem you'd only need a few datapoints, I would add more images to the dataset: https://en.m.wikipedia.org/wiki/Neural_scaling_law

I haven't tried these models on smaller datasets, but transformers tend to perform worse than convolutional networks in low-data scenarios. Transformers allow each part of the image to influence the representation of every other part, while convolutional networks focus on local regions, with downsampling enabling deeper layers to capture broader context. The YOLO Darknet repository has shown that fine-tuning requires only a few images (around 20 is sufficient), even for relatively complex scenes. However, transformers are far more complex, and adjusting their weights to an optimal point requires a larger dataset, especially for understanding complex scenes.

This is an interesting paper for this question: https://openreview.net/forum?id=SCN8UaetXx

Ultimately the simplest route to a good transformer model is a ton of data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loss staying high while trying to tune network #166

Loss staying high while trying to tune network #166

picosankaricpp commented Feb 15, 2025

mirza298 commented Feb 15, 2025

picosankaricpp commented Feb 15, 2025 •

edited

Loading

mirza298 commented Feb 16, 2025

Loss staying high while trying to tune network #166

Loss staying high while trying to tune network #166

Comments

picosankaricpp commented Feb 15, 2025

mirza298 commented Feb 15, 2025

picosankaricpp commented Feb 15, 2025 • edited Loading

mirza298 commented Feb 16, 2025

picosankaricpp commented Feb 15, 2025 •

edited

Loading