Simulation stage - results #52

fpetric · 2023-05-01T15:37:32Z

fpetric
May 1, 2023
Maintainer

And the results for all three benchmarks, and the overall results, are in.

Results for FBM1

Rank	Team	RMSE [m]	Points
1	RAS-lab	1.047130	30
2	CVAR	3.083009	28
3	AIRo Lab	3.610801	27.5
4	UAS-DTU	4.012093	27
5	Carrot	5.493141	25.5
6	SDCNlab	6.859361	24
7	UMIC	7.723799	23.5
8	DMA	7.942326	23
9	Avant-UFMG	10.088815	21
10	Purple Bee	47.980188	10
N/A	AeroMIT	-	-
N/A	Airbenders	-	-
N/A	SpatialScouts	-	-
N/A	USC	-	-
N/A	Vishwakarma	-	-

Results for FBM2

Rank	Team	CSI	Points
1	Carrot	0.615385	30
2	UMIC	0.6	29
3	RAS-lab	0.571429	28
4	CVAR	0.47619	23
5	SDCNlab	0.347826	17
6	DMA	0.333333	16
6	UAS-DTU	0.333333	16
8	AIRo Lab	0.133333	6.5
9	Avant-UFMG	0.0285714	1.5
10	Purple Bee	0	0
N/A	AeroMIT	-	-
N/A	Airbenders	-	-
N/A	SpatialScouts	-	-
N/A	USC	-	-
N/A	Vishwakarma	-	-

Results for simulation

Rank	Team	Points
1	AIRo Lab	40
2	Avant-UFMG	37.5
3	SDCNlab	27.5
4	UMIC	22.5
5	Carrot	15
5	CVAR	15
7	Purple Bee	5
8	AeroMIT	1.5
N/A	Airbenders	-
N/A	DMA	-
N/A	RAS-lab	-
N/A	SpatialScouts	-
N/A	UAS-DTU	-
N/A	USC	-
N/A	Vishwakarma	-

Overall results

Rank	Team	FBM1	FBM2	Sim	Total
1	UMIC	23.5	29	22.5	75
2	AIRo Lab	27.5	6.5	40	74
3	Carrot	25.5	30	15	70.5
4	SDCNlab	24	17	27.5	68.5
5	CVAR	28	23	15	66

6	Avant-UFMG	21	1.5	37.5	60
7	RAS-lab	30	28	-	58
8	UAS-DTU	27	16	-	43
9	DMA	23	16	-	39
10	Purple Bee	10	0	5	15
11	AeroMIT	-	-	1.5	1.5
12	Airbenders	-	-	-	-
12	SpatialScouts	-	-	-	-
12	USC	-	-	-	-
12	Vishwakarma	-	-	-	-

Scoring scheme

As you can see from the tables, each of the benchmarks has been assigned maximum points, that sum up to a hundred:

FBM1 - 30 points
FBM2 - 30 points
Simulation - 40 points

FBM1 scoring

The score for FBM1 was calculated in the following way: score = 30 - (RMSE_team-min(RMSE for all teams)) if the RMSE was within 10 meters of the best estimate. Otherwise, points were scored based on the total error. The score for the team was the score achieved with the final submission.

FBM2 scoring

The score for FBM2 was calculated in the following way: score = 30*CSI_team/max(CSI for all teams). The score for the team was the score achieved with the final submission.

Simulation scoring

The calculation of the score for the simulation benchmark was a bit more complicated. We have evaluated submissions in two different arenas (the one we provided and the same arena only rotated/mirrored). Total score was calculated as score = 1/3*score(known_arena)+2/3*score(uknown_arena). The score within one run was calculated based on the number of points of interest visited, number of true positive detections and number of false positive reports. If the UAV reported more than half of cracks detected, the time taken to complete the mission (if lower than the time limit) was also considered. This was aimed towards penalizing speed without actually trying to visit all points of interest and searching for cracks. Time limit was set to 10 minutes.

Given the fact that no team reported more than 50% of cracks within the time limit, the scoring for a run was simplified to score = 25*(percentage of PoIs visited) + 15*(percentage of correct detections) - 0.5*(number of false positives). For the known arena, there were 10 PoI with 5 cracks to be detected, for the unknown there were 10 PoI with 7 cracks. Percentage of correct detections was calculated as number of correctly identified cracked tiles divided by the total number of cracked tiles in the arena. Finally, the total score was scaled to 40 points so that the best team has the maximum amount of points for the simulation benchmark.

Penalties

Since crashes prevented the team from advancing towards more points of interest, we considered that part of the score to have already penalized the UAV crashing and no additional penalties were added. We have not observed the UAVs leaving the flying area without crashing, so there was also no special penalty for such case.

Complaints

We tried to the best of our abilities to get your solutions to work, some of the steps we performed, depending on the issue observed:

fixing dockerfiles, copying missing code to the container, fixing session.yml file
checking the output of build commands and fixing missing dependencies, fixing CMakeLists.txt if necessary and similar build stuff
checking the output of terminals you launched to catch python file is not executable, cannot launch node of type and similar errors and manually tried to fix them
installing additional Python/ROS packages
debugging code that has neural network path to weights hardcoded
rebuilding code inside nvidia docker images to enable neural network and/or GUI

We firmly believe we have given each team a fair chance for their code to work, especially with the time slots for submission before the final deadline. However, if you think we have made a critical error in evaluating your code, let us know.

Detailed feedback

Detailed feedback will be sent to team leaders via email.

Edit notes:

As per request of the team, PAIR lab was renamed to AIRo Lab

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simulation stage - results #52

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Simulation stage - results #52

fpetric May 1, 2023 Maintainer

Results for FBM1

Results for FBM2

Results for simulation

Overall results

Scoring scheme

FBM1 scoring

FBM2 scoring

Simulation scoring

Penalties

Complaints

Detailed feedback

Edit notes:

Replies: 0 comments

fpetric
May 1, 2023
Maintainer