RSDK-9618: Bake in knowledge for rate based stats to FTDC parsing. #4658

dgottlieb · 2024-12-30T21:00:46Z

No description provided.

dgottlieb · 2025-01-02T14:12:44Z

Sample output

benjirewis

Nice! Some initial comments.

benjirewis · 2025-01-02T16:43:07Z

ftdc/cmd/parser.go

+// ratioMetricToFields is a global variable identifying the metric names that are to be graphed as
+// some ratio. The two members (`Numerator` and `Denominator`) refer to the suffix* of a metric
+// name. For example, `UserCPUSecs` will appear under `proc.viam-server.UserCPUSecs` as well as
+// `proc.modules.<foo>.UserCPUSecs`. If the `Denominator` is the empty string, the


Smart to use suffixes here to allow one mapping in ratioMetricToFields to handle both viam-server and module CPU usage calculations...

benjirewis · 2025-01-02T16:44:04Z

ftdc/cmd/parser.go

+//
+// When computing rates for metrics across two "readings", we simply subtract the numerators and
+// denominator and divide the differences. We use the `windowSizeSecs` to pick which "readings"
+// should be compared. This creates a sliding window. We (currently) bias this window to better


bias this window

What does this mean?

I think you understand the idea of windowing -- but because the usage of bias is heavily influenced, I'll deconstruct the general idea first:

So when we create a datapoint for a graph that is stating "what percent of the CPU this is process using", it's not helpful to talk about at that specific instance in time. It's either running right now (i.e: "100%") or it's not (i.e: "0%"). So instead we talk about CPU usage over some "window" of time. When one does ps aux, the CPU percentage there is over the lifetime of the process.

So if a viam-server runs for 9 minutes doing nothing, that would say a number close to 0%. And if at that point a user then opened a video stream that stayed on the CPU for an entire minute (modulo some system scheduling overhead), ps aux wouldn't report a usage of 100%, but rather slowly go up from 0% -> 10% over the course of that one minute.

So regarding the window, we have 2 extremes. What's happening this very instant and what's happened over the lifetime of the process. I don't like either of those for this case, so I chose something in between. I felt like a process that flips from completely idle to max CPU should see that reflected in a graph over the coarse of 5 seconds.

I feel that choice skews more towards the "what's happening this instant" side of the scale. So "bias this window" refers to two properties:

This is a human choice. There are other valid options.

The choice is not a binary A or B. But rather a relatively continuous range of options to select from.

The context where I hear "bias" used in this way is (not coincidentally) from my college OS class talking about the (in theory) linux CPU scheduler. Where it wants to prioritize running programs that used its entire time slice to do work. And it similarly has to age out older program behavior. Which gets modeled by some variable r[ecency bias]:

new_priority = (1-r)old_priority + (r)percent_of_timeslice_used_in_last_run

So choosing r=1 gives a priority influenced completely by the last run. And choosing smaller values believe that a program's future behavior is better predicting by its more historical behavior.

Sorry -- this was huge. Probably should have clarified offline. Definitely recommend a change (maybe just avoid mentioning what our window size preference is?)

Big thanks for the thoughtful explanation here. The concept generally makes sense to me and I don't think I need much more in the way of documentation. I think if you're finding that 5s is a time window that's providing reasonable output wrt CPU usage, then it seems like a perfectly valid choice to me.

benjirewis · 2025-01-02T16:45:43Z

ftdc/cmd/parser.go

+}
+
+func (rr ratioReading) toValue() (float32, error) {
+	if math.Abs(rr.Denominator) < 1e-9 {


[nit] Do you have a pre-existing epsilon constant lying around in this package somewhere?

You're right in that I have one (that I completely forgot about!)

But it's (right now) in a different package

ftdc/cmd/parser.go

benjirewis · 2025-01-02T17:02:09Z

ftdc/cmd/parser.go

+
+			value, err := diff.toValue()
+			if err != nil {
+				// The denominator did not change -- divide by zero error.


[nit] Might as well report the created error here, no?

benjirewis · 2025-01-02T17:03:53Z

ftdc/cmd/parser.go

+		forCompare := idx - windowSizeSecs
+		if forCompare < 0 {
+			// If we haven't
+			forCompare = 0


Won't this create a divide by zero issue? Comparing with self?

This condition (and the above one) isn't written in the most readable way: https://github.com/viamrobotics/rdk/pull/4658/files#diff-a6de85d19062a282c0d7f2c427b26f6598fcf073ef34752a97ec02b053a62f18R295-R296

edit the above was perhaps cryptic -- my point is idx == forCompare can not happen. Because the above if-statement skips the whole idx == 0 case.

I sat and thought about how to change this to make it more obvious. I failed.

I did realize that the abstraction of ratioMetrics (as I describe them) is incorrect. All of this windowing code only makes sense for time-based things. And in that case we can be (fairly) confident the denominator is always increasing.

There can be other "metrics that are computed as a ratio of things" that are not time based. And consequently don't need this windowing logic. And therefore need not worry about a denominator being 0 after some subtraction.

Gotcha maybe you can confirm my understanding: is this lie setting the value we should compare to to the earliest possible recorded value in the case that 5 seconds have not actually elapsed?

As for the concern about ratioMetric vs something like rateMetric, I don't think it matters too much for now. If we start adding more non-time-based ratio metrics, then we could probably revisit this (unexported) naming...

Co-authored-by: Benjamin Rewis <[email protected]>

dgottlieb

Will push some changes tomorrow

dgottlieb · 2025-01-02T21:36:55Z

ftdc/cmd/parser.go

+}
+
+func (rr ratioReading) toValue() (float32, error) {
+	if math.Abs(rr.Denominator) < 1e-9 {


You're right in that I have one (that I completely forgot about!)

But it's (right now) in a different package

dgottlieb · 2025-01-02T21:39:07Z

ftdc/cmd/parser.go

+		forCompare := idx - windowSizeSecs
+		if forCompare < 0 {
+			// If we haven't
+			forCompare = 0


This condition (and the above one) isn't written in the most readable way: https://github.com/viamrobotics/rdk/pull/4658/files#diff-a6de85d19062a282c0d7f2c427b26f6598fcf073ef34752a97ec02b053a62f18R295-R296

edit the above was perhaps cryptic -- my point is idx == forCompare can not happen. Because the above if-statement skips the whole idx == 0 case.

dgottlieb · 2025-01-02T21:39:49Z

ftdc/cmd/parser.go

+
+			value, err := diff.toValue()
+			if err != nil {
+				// The denominator did not change -- divide by zero error.


benjirewis

LGTM % my one question to check my understanding.

benjirewis · 2025-01-03T20:28:19Z

ftdc/cmd/parser.go

+//
+// When computing rates for metrics across two "readings", we simply subtract the numerators and
+// denominator and divide the differences. We use the `windowSizeSecs` to pick which "readings"
+// should be compared. This creates a sliding window. We (currently) bias this window to better


Big thanks for the thoughtful explanation here. The concept generally makes sense to me and I don't think I need much more in the way of documentation. I think if you're finding that 5s is a time window that's providing reasonable output wrt CPU usage, then it seems like a perfectly valid choice to me.

benjirewis · 2025-01-03T20:31:45Z

ftdc/cmd/parser.go

+		forCompare := idx - windowSizeSecs
+		if forCompare < 0 {
+			// If we haven't
+			forCompare = 0


Gotcha maybe you can confirm my understanding: is this lie setting the value we should compare to to the earliest possible recorded value in the case that 5 seconds have not actually elapsed?

As for the concern about ratioMetric vs something like rateMetric, I don't think it matters too much for now. If we start adding more non-time-based ratio metrics, then we could probably revisit this (unexported) naming...

dgottlieb added 2 commits December 30, 2024 15:59

RSDK-9618: Bake in knowledge for rate based stats to FTDC parsing.

1cf65bd

FTDC

5ee4e7a

viambot added the safe to test This pull request is marked safe to test from a trusted zone label Dec 30, 2024

document

3882e54

viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Dec 30, 2024

lint

bb70f7e

viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Dec 30, 2024

more words

61da580

viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Dec 31, 2024

ratioGraphs -> defferredReadings

56a4664

viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Dec 31, 2024

better name the out parameter

41d71a3

viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Dec 31, 2024

dgottlieb requested a review from benjirewis December 31, 2024 19:35

lint

f8a1c9c

viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Jan 2, 2025

benjirewis reviewed Jan 2, 2025

View reviewed changes

Update ftdc/cmd/parser.go

17f1461

Co-authored-by: Benjamin Rewis <[email protected]>

viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Jan 2, 2025

dgottlieb commented Jan 2, 2025

View reviewed changes

cleanup

df0896b

viambot removed the safe to test This pull request is marked safe to test from a trusted zone label Jan 3, 2025

dgottlieb requested a review from benjirewis January 3, 2025 20:15

viambot added the safe to test This pull request is marked safe to test from a trusted zone label Jan 3, 2025

benjirewis approved these changes Jan 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RSDK-9618: Bake in knowledge for rate based stats to FTDC parsing. #4658

RSDK-9618: Bake in knowledge for rate based stats to FTDC parsing. #4658

dgottlieb commented Dec 30, 2024

dgottlieb commented Jan 2, 2025

benjirewis left a comment

benjirewis Jan 2, 2025

benjirewis Jan 2, 2025

dgottlieb Jan 2, 2025

benjirewis Jan 3, 2025

benjirewis Jan 2, 2025

dgottlieb Jan 2, 2025

benjirewis Jan 2, 2025

dgottlieb Jan 2, 2025

benjirewis Jan 2, 2025

dgottlieb Jan 2, 2025 •

edited

Loading

dgottlieb Jan 3, 2025

benjirewis Jan 3, 2025

dgottlieb left a comment

dgottlieb Jan 2, 2025

dgottlieb Jan 2, 2025 •

edited

Loading

dgottlieb Jan 2, 2025

benjirewis left a comment

benjirewis Jan 3, 2025

benjirewis Jan 3, 2025

RSDK-9618: Bake in knowledge for rate based stats to FTDC parsing. #4658

Are you sure you want to change the base?

RSDK-9618: Bake in knowledge for rate based stats to FTDC parsing. #4658

Conversation

dgottlieb commented Dec 30, 2024

dgottlieb commented Jan 2, 2025

benjirewis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dgottlieb Jan 2, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dgottlieb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dgottlieb Jan 2, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benjirewis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dgottlieb Jan 2, 2025 •

edited

Loading

dgottlieb Jan 2, 2025 •

edited

Loading