Simulation stage - results #52
Pinned
fpetric
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
And the results for all three benchmarks, and the overall results, are in.
Results for FBM1
Results for FBM2
Results for simulation
Overall results
Scoring scheme
As you can see from the tables, each of the benchmarks has been assigned maximum points, that sum up to a hundred:
FBM1 scoring
The score for FBM1 was calculated in the following way:
score = 30 - (RMSE_team-min(RMSE for all teams))
if the RMSE was within 10 meters of the best estimate. Otherwise, points were scored based on the total error. The score for the team was the score achieved with the final submission.FBM2 scoring
The score for FBM2 was calculated in the following way:
score = 30*CSI_team/max(CSI for all teams)
. The score for the team was the score achieved with the final submission.Simulation scoring
The calculation of the score for the simulation benchmark was a bit more complicated. We have evaluated submissions in two different arenas (the one we provided and the same arena only rotated/mirrored). Total score was calculated as
score = 1/3*score(known_arena)+2/3*score(uknown_arena)
. The score within one run was calculated based on the number of points of interest visited, number of true positive detections and number of false positive reports. If the UAV reported more than half of cracks detected, the time taken to complete the mission (if lower than the time limit) was also considered. This was aimed towards penalizing speed without actually trying to visit all points of interest and searching for cracks. Time limit was set to 10 minutes.Given the fact that no team reported more than 50% of cracks within the time limit, the scoring for a run was simplified to
score = 25*(percentage of PoIs visited) + 15*(percentage of correct detections) - 0.5*(number of false positives)
. For the known arena, there were 10 PoI with 5 cracks to be detected, for the unknown there were 10 PoI with 7 cracks. Percentage of correct detections was calculated as number of correctly identified cracked tiles divided by the total number of cracked tiles in the arena. Finally, the total score was scaled to 40 points so that the best team has the maximum amount of points for the simulation benchmark.Penalties
Since crashes prevented the team from advancing towards more points of interest, we considered that part of the score to have already penalized the UAV crashing and no additional penalties were added. We have not observed the UAVs leaving the flying area without crashing, so there was also no special penalty for such case.
Complaints
We tried to the best of our abilities to get your solutions to work, some of the steps we performed, depending on the issue observed:
We firmly believe we have given each team a fair chance for their code to work, especially with the time slots for submission before the final deadline. However, if you think we have made a critical error in evaluating your code, let us know.
Detailed feedback
Detailed feedback will be sent to team leaders via email.
Edit notes:
As per request of the team, PAIR lab was renamed to AIRo Lab
Beta Was this translation helpful? Give feedback.
All reactions