-
Notifications
You must be signed in to change notification settings - Fork 900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement quality stats #469
Comments
Some extra information in the connection.h of the statistics . We should try to replicate these same stats Currently I'm assuming that a connection statistic is per 1 crazyradio <-> 1 crazyflie pair
|
In the cflib there are some link quality functionality already, which looks at the number of retries within a certain sliding window. So it returns a percentage of the retries done? crazyflie-lib-python/cflib/crtp/radiodriver.py Lines 610 to 618 in be04130
Git blame and it has been implemented 8 years ago, but a typical thing that kind of dissappeard in memory.... but it has been used in the cfclient though! See number 3 in the main gui: https://www.bitcraze.io/documentation/repository/crazyflie-clients-python/master/userguides/userguide_client/#main-ui I guess the question is, since we've got to leave this to prevent things from breaking, would we want to add some things on the side of this since cpp link has more values? Also broadcasting hasn't been implemented yet either so we can't add all (see #472 ) |
Alright, so the thing here is, that cpp link looks at the full amount of packets send over the entire connection which is perhaps a bit much, but this looks at the retry percentage the last 100 packets which is way too little... But I kind of like the sliding window approach. How about we collect the stats of the last 3 seconds (perhaps configurable)? And what would we like to monitor in that sense? Technically retries is an way to get a sense of the link quality as it does that if there is no ack anyway, the difference is that perhaps the crazyflie link cpp is doing the retries in code and the cflib sets the nordic chip settings for retries? Which means that there is a clear difference and perhaps we need to be talking about some standards than regular counts. Update: cpp link does also setArc to 3 as well (see this permalink). See here the settings for cflib: crazyflie-lib-python/cflib/drivers/crazyradio.py Lines 153 to 158 in be04130
|
Note that at the link-level it's "total" count, but then the client side can easily implement a sliding-window approach. in crazyflie-server-(cpp) it's a 1second window (which basically just looks at the difference of the "total" counts of now vs. one second ago). I believe the window-size was configurable there, too. |
Thanks for the explanation @whoenig. We had a discussion today with @ataffanel @gemenerik @ToveRumar and we determined that the link quality as it has been implemented on the Cflib side is not a good indication of the actual link quality as it only looks at number of retries of the nrf (which is either a 100 good or bad). We could also go more towards how crazyswarm2 but we had doubts on if that would be the best values to look at, so we let that a bit lose and just made a list (from highest to lowest priority):
Latency It looks to the actual time that it takes for one packet to be sent and for the ack of that packet to come back. That is collected and the minimum latency is returned as result. That by itself is strange though... as I would say that the max or at least the average would be more suitable. At one point we could differentiate between the latency of the usb radio, nrf-air-nrf, nrf-stm, but let's first get the 'big picture' latency in first. Bandwidth @ataffanel mentioned that a better way is to look at actual null packets send for both up and down link versus all packets sends, as that will give a better idea of how congested the bandwidth actually is. RSSI Having said that, RSSI is perhaps a indication of how strong the signal strenght is, it does not say how congested the communication bandwidth is or how many other desturbances are in the area, so by itself it is not a good link quality indicator but it will be a good 'side kick' for the others. Moreover it will be quite easy to implement as this is an value that comes from the NRF chip directly and doesn't require a measuring window (unless the average is necessary) uplink/downlink packet loss This is a good metric but we have put it on the lowest priority as we think that the other values are more useful in determining actual link quality, since packetloss is something that is intergral to the NRF-NRF communication and should work the best out of all of these. Next steps |
Great summary. Just a few comments. Latency: Using the minimum is often done in the networking community. In Crazyswarm2, we "inject" the measurement once per second (user-configurable) and then report the resulting latency unfiltered. This is also done to not take away too much bandwidth just by measuring the latency. Bandwidth: I actually like and use a simple packet count (as counted on the Crazyflie firmware itself) of received unicast and broadcast packets. This tells you a lot, especially when a radio is shared across multiple CFs. In Crazyswarm2, we show (side-by-side) the sent packets, so one can pretty easily see if there is any significant packet loss (when looking at broadcasts, at least). I agree that RSSI doesn't really provide very meaningful values. At least good RSSI values do not always indicate a stable link. Poor RSSI values often do correlate with a poor link. I never tried to get values for up/downlink, so not sure how useful it would be in practice - neat idea, though. |
So the packet loss is something that we can measure separately from the bandwidth congestion based on ratio of null packets send to and received from the Crazyflie. Today I was able to do a quick proof of concept to check null packets versus total packets for both the up and downlink and it was pretty accurate. (see this commit) As soon as I used a teleop from the client the uplink congestion went up and as soon enabled a lot of logging lot, the downlink congestion went up. Once we have both latency and bandwidth congestion implemented we can see what happens if we communicate with more Crazyflies or do stress tests with the communication. But currently still ongoing work just to see if we got it at the right end. For the latency, we choose to look at the 95 percentile instead since that is actually more common these days but I'll let @gemenerik comment on that since he's been working on that now. |
Also, I found out that in the Cflib this piece of code with the ifstatement of the relaxation of the connection doesn't really work anymore... crazyflie-lib-python/cflib/crtp/radiodriver.py Lines 634 to 646 in 418a681
Even the null packets have a data length of the channel 0xf3 with a bit countdown, so best is to check that first data point with a bitmask for the data in like:
We should probably make an issue of this? Or do we want to even have to fix this since it perhaps never even worked in the first place maybe... we can perhaps do something better now we are getting better stats. |
Cpp link already has certain quality stats implemented that looks at amount of packets sent, how many acks have been returned and the delay of the messages being received (like here: bitcraze/crazyflie-link-cpp#28). We should mirror this in cflib such that we can look at the communication in a more qualitative way.
Currently broadcasting is not implemented in the cflib so we can't mirror that, but anything unicast we can ofcourse.
The text was updated successfully, but these errors were encountered: