Adapt Periodic Puma Metrics #3521

philippthun · 2023-11-21T16:22:10Z

The current thread_info is tightly coupled to Thin/EventMachine; don't emit those metrics when running the Puma webserver; maybe we will come up with other thread-related metrics for Puma at a later point in time.
Return the summed up memory bytes and cpu for the Puma process and its worker subprocesses.
When using Puma, the periodic_updater shall run in the main process only, thus direct invocations of update! from a controller (executed within a worker process) must be removed.
When running Puma, the DirectFileStore data store is used for prometheus metrics.
Aggregation methods for (gauge) metrics are specified and applied when using the DirectFileStore.
- For cc_requests_outstanding_total the values per worker are summed up.
- For metrics that are collected by the periodic_updater running in the main process, the aggregation method MOST_RECENT has been specified to eliminate the unnecessary pid label.
Replace VCAP::CloudController::Metrics::StatsdUpdater.new with CloudController::DependencyLocator.instance.statsd_updater.
Replace Statsd.new with CloudController::DependencyLocator.instance.statsd_client.
Add statsd_host + statsd_port to ClockSchema; don't rely on default values (see Add statsd_host + statsd_port to cloud_controller_clock job capi-release#359)

I have reviewed the contributing guide
I have viewed, signed, and submitted the Contributor License Agreement
I have made this pull request to the main branch
I have run all the unit tests using bundle exec rake
I have run CF Acceptance Tests

The current 'thread_info' is tightly coupled to Thin/EventMachine; don't emit those metrics when running the Puma webserver; maybe we will come up with other thread-related metrics for Puma at a later point in time.

Return the summed up memory bytes and cpu for the Puma process and its worker subprocesses.

- Replace VCAP::CloudController::Metrics::StatsdUpdater.new with CloudController::DependencyLocator.instance.statsd_updater - Replace Statsd.new with CloudController::DependencyLocator.instance. statsd_client - Add statsd_host + statsd_port to ClockSchema; don't rely on default values

When using Puma, the periodic_updater shall run in the main process only, thus direct invocations of 'update!' from a controller (executed within a worker process) must be removed.

- When running Puma, the DirectFileStore data store is used for prometheus metrics. - Aggregation methods for (gauge) metrics are specified and applied when using the DirectFileStore. - For 'cc_requests_outstanding_total' the values per worker are summed up. - For metrics that are collected by the periodic_updater running in the main process, the aggregation method MOST_RECENT has been specified to eliminate the unnecessary 'pid' label.

svkrieger

Looks good! Just a few conceptual questions.

svkrieger · 2023-11-30T11:23:41Z

lib/cloud_controller/runner.rb

+        File.unlink(file_path)
+      end
+
+      Prometheus::Client.config.data_store = Prometheus::Client::DataStores::DirectFileStore.new(dir: prometheus_dir)


What's the reason for using a file datastore instead of in memory, if we want to reset the metrics on startup anyways?

The in-memory data store does not work for metrics collected by different processes (i.e. the outstanding requests), thus we need some kind of "central" storage; the easiest is the direct file store.

svkrieger · 2023-11-30T12:15:35Z

lib/cloud_controller/metrics/prometheus_updater.rb

      { type: :counter, name: :cc_staging_requests_total, docstring: 'Number of staging requests' },
      { type: :histogram, name: :cc_staging_succeeded_duration_seconds, docstring: 'Durations of successful staging events', buckets: DURATION_BUCKETS },
      { type: :histogram, name: :cc_staging_failed_duration_seconds, docstring: 'Durations of failed staging events', buckets: DURATION_BUCKETS },
-      { type: :gauge, name: :cc_requests_outstanding_total, docstring: 'Requests outstanding' },
+      { type: :gauge, name: :cc_requests_outstanding_total, docstring: 'Requests outstanding', aggregation: :sum },


Would be interesting to see how the metrics look like if we don't aggregate them. Would the different values emitted by each worker be stored in the same time series, using a label "worker", or would they simply overwrite each other and lead to wrong metrics?

The metrics would be labelled with the pid of the worker processes. I'm in favor of aggregating them to be compatible with the Thin metrics, i.e. have one metric per vm. In the end we are not even interested in the outstanding requests per vm, but for all apis, so I don't see a use case for having metrics per process.

philippthun changed the title ~~thread_info~~ Adapt Periodic Puma Metrics Nov 21, 2023

Don't emit thread_info for Puma

b467d90

The current 'thread_info' is tightly coupled to Thin/EventMachine; don't emit those metrics when running the Puma webserver; maybe we will come up with other thread-related metrics for Puma at a later point in time.

philippthun force-pushed the adapt-periodic-puma-metrics branch from 0d83e8d to a105b64 Compare November 22, 2023 14:33

philippthun marked this pull request as ready for review November 22, 2023 14:38

philippthun force-pushed the adapt-periodic-puma-metrics branch 2 times, most recently from bfa6629 to 120b311 Compare November 27, 2023 15:07

philippthun added 4 commits November 29, 2023 13:44

Adapt vitals (mem and cpu) for Puma

5ea1f64

Return the summed up memory bytes and cpu for the Puma process and its worker subprocesses.

Don't run periodic_updater on Puma workers

3b1e543

When using Puma, the periodic_updater shall run in the main process only, thus direct invocations of 'update!' from a controller (executed within a worker process) must be removed.

philippthun force-pushed the adapt-periodic-puma-metrics branch from 120b311 to 64bb224 Compare November 29, 2023 12:49

philippthun mentioned this pull request Nov 29, 2023

Expose puma stats as prometheus metrics #3528

Merged

5 tasks

svkrieger reviewed Nov 30, 2023

View reviewed changes

svkrieger self-requested a review December 5, 2023 10:56

svkrieger approved these changes Dec 5, 2023

View reviewed changes

philippthun merged commit 64115aa into cloudfoundry:main Dec 5, 2023
5 checks passed

kathap mentioned this pull request Jan 5, 2024

Introduce config option to disable statsd metrics #3585

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adapt Periodic Puma Metrics #3521

Adapt Periodic Puma Metrics #3521

philippthun commented Nov 21, 2023 •

edited

Loading

svkrieger left a comment

svkrieger Nov 30, 2023

philippthun Dec 4, 2023

svkrieger Nov 30, 2023

philippthun Dec 4, 2023

Adapt Periodic Puma Metrics #3521

Adapt Periodic Puma Metrics #3521

Conversation

philippthun commented Nov 21, 2023 • edited Loading

svkrieger left a comment

Choose a reason for hiding this comment

svkrieger Nov 30, 2023

Choose a reason for hiding this comment

philippthun Dec 4, 2023

Choose a reason for hiding this comment

svkrieger Nov 30, 2023

Choose a reason for hiding this comment

philippthun Dec 4, 2023

Choose a reason for hiding this comment

philippthun commented Nov 21, 2023 •

edited

Loading