APT results #11

mkabra · 2024-01-23T12:58:53Z

Hi,

We have benchmarked three multi-animal networks from APT on Deeplabcut Datasets. We tested our submission by running "python -m benchmark". We get the following output which we believe suggests that the submission is working:

benchmark method                    version           RMSE  mAP
trimouse  DLCRNet_ms4 (30k)         2.3.8     NaN  NaN
          EfficientNet B7_s4 (30k)  2.3.8     NaN  NaN
          ResNet50 (30k)            2.3.8     NaN  NaN
parenting DLCRNet_ms4 (30k)         2.3.8     NaN  NaN
          EfficientNet B7 (30k)     2.3.8     NaN  NaN
          EfficientNet B7_s4 (30k)  2.3.8     NaN  NaN
marmosets DLCRNet (200k)            2.3.8     NaN  NaN
          DLCRNet_ms4 (200k)        2.3.8     NaN  NaN
          EfficientNet B7 (200k)    2.3.8     NaN  NaN
          EfficientNet B7_s4 (200k) 2.3.8     NaN  NaN
fish      DLCRNet_ms4 (30k)         2.3.8     NaN  NaN
          EfficientNet B7_s4 (30k)  2.3.8     NaN  NaN
          ResNet50_s4 (30k)         2.3.8     NaN  NaN
trimouse  GRONe                     2.3.8     NaN  NaN
          MMPose-CiD                2.3.8     NaN  NaN
          DeTR+GRONe                2.3.8     NaN  NaN
marmosets GRONe                     2.3.8     NaN  NaN
          MMPose-CiD                2.3.8     NaN  NaN
          DeTR+GRONe                2.3.8     NaN  NaN
parenting GRONe                     2.3.8     NaN  NaN
          MMPose-CiD                2.3.8     NaN  NaN
          DeTR+GRONe                2.3.8     NaN  NaN
fish      GRONe                     2.3.8     NaN  NaN
          MMPose-CiD                2.3.8     NaN  NaN
          DeTR+GRONe                2.3.8     NaN  NaN

The documentation for creating the submission though is slightly out of date. We had to make the following changes to get the test to work

Add __init__.py to benchmark/submissions
Run python -m benchmark from DEEPLABCUT conda environment. This is probably obvious but it could help other users if it were mentioned explicitly.
Change the imports in the .py file to

from deeplabcut import benchmark
from deeplabcut.benchmark.benchmarks import TriMouseBenchmark, MarmosetBenchmark, ParentingMouseBenchmark, FishBenchmark

instead of

import benchmark
from benchmark.benchmarks import TriMouseBenchmark

In the class definition of the submissions, we do not need to inherit DLCBenchMixin

Hope everything else is as expected.
Best,
Mayank

stes · 2024-01-23T22:12:29Z

Hi @mkabra, thanks for flagging!

I will look into the proposed updates for the docs, and will confirm here again if your submission is working.

stes · 2024-02-15T02:31:27Z

Hi @mkabra , thanks for adding the most recent comments. Looking into this in the next days and will try to get back by early next week.

Thanks again for the contribution!

MMathisLab · 2024-03-01T12:43:30Z

bump @stes

n-poulsen · 2024-09-13T15:41:04Z

@mkabra sorry for the really slow response time - there are still a few issues in your code that I've fixed on my end, and I'll push those changes.

Meanwhile, do you have scores available for your predictions? I've noticed our docs were incorrect and mentioned that the results should be given in the format

      return {
         "path/to/image.png" : (
            # animal 1
            {
               "snout" : (0, 1),
               "leftear" : (2, 3),
               ...
            },
            # animal 2
            {
               "snout" : (0, 1),
               "leftear" : (2, 3),
               ...
            },
         ),
         ...
      }

when they should be given in the format

      return {
         "path/to/image.png" : (
            # animal 1
            {
               "pose": {
                 "snout" : (12, 17),
                 "leftear" : (15, 13),
                 ...
               },
               "score": 0.9172,
            },
            ...
         ),
         ...
      }

To compute evaluation metrics, the model confidence is very important. I've been able to evaluate your model with random scores, but to get the true performance I would also need the score for each individual.

n-poulsen · 2024-09-16T10:07:53Z

@mkabra I just pushed the updated code (with 413b8e5) which changes the json format to have "pose" and "score" for each individual, and the path to the data files (which need to be from the root of the repository).

The scores for each prediction is generated with the first prediction for each being given the highest score, the 2nd the 2nd highest, etc. (updating the json files to individual scores will mean this is no longer needed - and produce the correct evaluation results).

mkabra added 3 commits January 23, 2024 07:30

Adding APT results for Deeplabcut benchmark datasets

205b875

Editing the Readme

1edeea9

Minor edits

440c522

mkabra added 3 commits February 9, 2024 06:46

Adding results for DeTR+Hrformer

f797de6

Bug fix

abe5f29

Adding cid for parenting

1caafc5

MMathisLab assigned stes and MMathisLab Mar 1, 2024

MMathisLab requested a review from n-poulsen March 1, 2024 12:43

Bug fix for CiD

9f1eb51

MMathisLab requested a review from AlexEMG August 12, 2024 11:50

n-poulsen mentioned this pull request Sep 13, 2024

Update display to have 1 table per benchmark #12

Merged

fix APT benchmark results - add random score

413b8e5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

APT results #11

APT results #11

mkabra commented Jan 23, 2024

stes commented Jan 23, 2024

stes commented Feb 15, 2024

MMathisLab commented Mar 1, 2024

n-poulsen commented Sep 13, 2024

n-poulsen commented Sep 16, 2024

APT results #11

Are you sure you want to change the base?

APT results #11

Conversation

mkabra commented Jan 23, 2024

stes commented Jan 23, 2024

stes commented Feb 15, 2024

MMathisLab commented Mar 1, 2024

n-poulsen commented Sep 13, 2024

n-poulsen commented Sep 16, 2024