-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RSDK-9618: Bake in knowledge for rate based stats to FTDC parsing. #4658
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Some initial comments.
// ratioMetricToFields is a global variable identifying the metric names that are to be graphed as | ||
// some ratio. The two members (`Numerator` and `Denominator`) refer to the suffix* of a metric | ||
// name. For example, `UserCPUSecs` will appear under `proc.viam-server.UserCPUSecs` as well as | ||
// `proc.modules.<foo>.UserCPUSecs`. If the `Denominator` is the empty string, the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Smart to use suffixes here to allow one mapping in ratioMetricToFields
to handle both viam-server
and module CPU usage calculations...
// | ||
// When computing rates for metrics across two "readings", we simply subtract the numerators and | ||
// denominator and divide the differences. We use the `windowSizeSecs` to pick which "readings" | ||
// should be compared. This creates a sliding window. We (currently) bias this window to better |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bias this window
What does this mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you understand the idea of windowing -- but because the usage of bias
is heavily influenced, I'll deconstruct the general idea first:
So when we create a datapoint for a graph that is stating "what percent of the CPU this is process using", it's not helpful to talk about at that specific instance in time. It's either running right now (i.e: "100%") or it's not (i.e: "0%"). So instead we talk about CPU usage over some "window" of time. When one does ps aux
, the CPU percentage there is over the lifetime of the process.
So if a viam-server runs for 9 minutes doing nothing, that would say a number close to 0%. And if at that point a user then opened a video stream that stayed on the CPU for an entire minute (modulo some system scheduling overhead), ps aux
wouldn't report a usage of 100%, but rather slowly go up from 0% -> 10% over the course of that one minute.
So regarding the window, we have 2 extremes. What's happening this very instant and what's happened over the lifetime of the process. I don't like either of those for this case, so I chose something in between. I felt like a process that flips from completely idle to max CPU should see that reflected in a graph over the coarse of 5 seconds.
I feel that choice skews more towards the "what's happening this instant" side of the scale. So "bias this window" refers to two properties:
- This is a human choice. There are other valid options.
- The choice is not a binary A or B. But rather a relatively continuous range of options to select from.
The context where I hear "bias" used in this way is (not coincidentally) from my college OS class talking about the (in theory) linux CPU scheduler. Where it wants to prioritize running programs that used its entire time slice to do work. And it similarly has to age out older program behavior. Which gets modeled by some variable r
[ecency bias]:
new_priority = (1-r)old_priority + (r)percent_of_timeslice_used_in_last_run
So choosing r=1 gives a priority influenced completely by the last run. And choosing smaller values believe that a program's future behavior is better predicting by its more historical behavior.
Sorry -- this was huge. Probably should have clarified offline. Definitely recommend a change (maybe just avoid mentioning what our window size preference is?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Big thanks for the thoughtful explanation here. The concept generally makes sense to me and I don't think I need much more in the way of documentation. I think if you're finding that 5s is a time window that's providing reasonable output wrt CPU usage, then it seems like a perfectly valid choice to me.
ftdc/cmd/parser.go
Outdated
} | ||
|
||
func (rr ratioReading) toValue() (float32, error) { | ||
if math.Abs(rr.Denominator) < 1e-9 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] Do you have a pre-existing epsilon constant lying around in this package somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right in that I have one (that I completely forgot about!)
But it's (right now) in a different package
|
||
value, err := diff.toValue() | ||
if err != nil { | ||
// The denominator did not change -- divide by zero error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] Might as well report the created error here, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea
forCompare := idx - windowSizeSecs | ||
if forCompare < 0 { | ||
// If we haven't | ||
forCompare = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this create a divide by zero issue? Comparing with self?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition (and the above one) isn't written in the most readable way: https://github.com/viamrobotics/rdk/pull/4658/files#diff-a6de85d19062a282c0d7f2c427b26f6598fcf073ef34752a97ec02b053a62f18R295-R296
edit the above was perhaps cryptic -- my point is idx == forCompare
can not happen. Because the above if-statement skips the whole idx == 0
case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I sat and thought about how to change this to make it more obvious. I failed.
I did realize that the abstraction of ratioMetrics
(as I describe them) is incorrect. All of this windowing code only makes sense for time-based things. And in that case we can be (fairly) confident the denominator is always increasing.
There can be other "metrics that are computed as a ratio of things" that are not time based. And consequently don't need this windowing logic. And therefore need not worry about a denominator being 0 after some subtraction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha maybe you can confirm my understanding: is this lie setting the value we should compare to to the earliest possible recorded value in the case that 5 seconds have not actually elapsed?
As for the concern about ratioMetric
vs something like rateMetric
, I don't think it matters too much for now. If we start adding more non-time-based ratio metrics, then we could probably revisit this (unexported) naming...
Co-authored-by: Benjamin Rewis <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will push some changes tomorrow
ftdc/cmd/parser.go
Outdated
} | ||
|
||
func (rr ratioReading) toValue() (float32, error) { | ||
if math.Abs(rr.Denominator) < 1e-9 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right in that I have one (that I completely forgot about!)
But it's (right now) in a different package
forCompare := idx - windowSizeSecs | ||
if forCompare < 0 { | ||
// If we haven't | ||
forCompare = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition (and the above one) isn't written in the most readable way: https://github.com/viamrobotics/rdk/pull/4658/files#diff-a6de85d19062a282c0d7f2c427b26f6598fcf073ef34752a97ec02b053a62f18R295-R296
edit the above was perhaps cryptic -- my point is idx == forCompare
can not happen. Because the above if-statement skips the whole idx == 0
case.
|
||
value, err := diff.toValue() | ||
if err != nil { | ||
// The denominator did not change -- divide by zero error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM % my one question to check my understanding.
// | ||
// When computing rates for metrics across two "readings", we simply subtract the numerators and | ||
// denominator and divide the differences. We use the `windowSizeSecs` to pick which "readings" | ||
// should be compared. This creates a sliding window. We (currently) bias this window to better |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Big thanks for the thoughtful explanation here. The concept generally makes sense to me and I don't think I need much more in the way of documentation. I think if you're finding that 5s is a time window that's providing reasonable output wrt CPU usage, then it seems like a perfectly valid choice to me.
forCompare := idx - windowSizeSecs | ||
if forCompare < 0 { | ||
// If we haven't | ||
forCompare = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha maybe you can confirm my understanding: is this lie setting the value we should compare to to the earliest possible recorded value in the case that 5 seconds have not actually elapsed?
As for the concern about ratioMetric
vs something like rateMetric
, I don't think it matters too much for now. If we start adding more non-time-based ratio metrics, then we could probably revisit this (unexported) naming...
No description provided.