Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapt Periodic Puma Metrics #3521

Merged

Conversation

philippthun
Copy link
Member

@philippthun philippthun commented Nov 21, 2023

  • The current thread_info is tightly coupled to Thin/EventMachine; don't emit those metrics when running the Puma webserver; maybe we will come up with other thread-related metrics for Puma at a later point in time.
  • Return the summed up memory bytes and cpu for the Puma process and its worker subprocesses.
  • When using Puma, the periodic_updater shall run in the main process only, thus direct invocations of update! from a controller (executed within a worker process) must be removed.
  • When running Puma, the DirectFileStore data store is used for prometheus metrics.
  • Aggregation methods for (gauge) metrics are specified and applied when using the DirectFileStore.
    • For cc_requests_outstanding_total the values per worker are summed up.
    • For metrics that are collected by the periodic_updater running in the main process, the aggregation method MOST_RECENT has been specified to eliminate the unnecessary pid label.
  • Replace VCAP::CloudController::Metrics::StatsdUpdater.new with CloudController::DependencyLocator.instance.statsd_updater.
  • Replace Statsd.new with CloudController::DependencyLocator.instance.statsd_client.
  • Add statsd_host + statsd_port to ClockSchema; don't rely on default values (see Add statsd_host + statsd_port to cloud_controller_clock job capi-release#359)
  • I have reviewed the contributing guide

  • I have viewed, signed, and submitted the Contributor License Agreement

  • I have made this pull request to the main branch

  • I have run all the unit tests using bundle exec rake

  • I have run CF Acceptance Tests

@philippthun philippthun changed the title thread_info Adapt Periodic Puma Metrics Nov 21, 2023
The current 'thread_info' is tightly coupled to Thin/EventMachine; don't
emit those metrics when running the Puma webserver; maybe we will come
up with other thread-related metrics for Puma at a later point in time.
@philippthun philippthun force-pushed the adapt-periodic-puma-metrics branch from 0d83e8d to a105b64 Compare November 22, 2023 14:33
@philippthun philippthun marked this pull request as ready for review November 22, 2023 14:38
@philippthun philippthun force-pushed the adapt-periodic-puma-metrics branch 2 times, most recently from bfa6629 to 120b311 Compare November 27, 2023 15:07
Return the summed up memory bytes and cpu for the Puma process and its
worker subprocesses.
- Replace VCAP::CloudController::Metrics::StatsdUpdater.new with
  CloudController::DependencyLocator.instance.statsd_updater
- Replace Statsd.new with CloudController::DependencyLocator.instance.
  statsd_client
- Add statsd_host + statsd_port to ClockSchema; don't rely on default
  values
When using Puma, the periodic_updater shall run in the main process
only, thus direct invocations of 'update!' from a controller (executed
within a worker process) must be removed.
- When running Puma, the DirectFileStore data store is used for
  prometheus metrics.
- Aggregation methods for (gauge) metrics are specified and applied when
  using the DirectFileStore.
- For 'cc_requests_outstanding_total' the values per worker are summed
  up.
- For metrics that are collected by the periodic_updater running in the
  main process, the aggregation method MOST_RECENT has been specified to
  eliminate the unnecessary 'pid' label.
@philippthun philippthun force-pushed the adapt-periodic-puma-metrics branch from 120b311 to 64bb224 Compare November 29, 2023 12:49
Copy link
Contributor

@svkrieger svkrieger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just a few conceptual questions.

File.unlink(file_path)
end

Prometheus::Client.config.data_store = Prometheus::Client::DataStores::DirectFileStore.new(dir: prometheus_dir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for using a file datastore instead of in memory, if we want to reset the metrics on startup anyways?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The in-memory data store does not work for metrics collected by different processes (i.e. the outstanding requests), thus we need some kind of "central" storage; the easiest is the direct file store.

{ type: :counter, name: :cc_staging_requests_total, docstring: 'Number of staging requests' },
{ type: :histogram, name: :cc_staging_succeeded_duration_seconds, docstring: 'Durations of successful staging events', buckets: DURATION_BUCKETS },
{ type: :histogram, name: :cc_staging_failed_duration_seconds, docstring: 'Durations of failed staging events', buckets: DURATION_BUCKETS },
{ type: :gauge, name: :cc_requests_outstanding_total, docstring: 'Requests outstanding' },
{ type: :gauge, name: :cc_requests_outstanding_total, docstring: 'Requests outstanding', aggregation: :sum },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be interesting to see how the metrics look like if we don't aggregate them. Would the different values emitted by each worker be stored in the same time series, using a label "worker", or would they simply overwrite each other and lead to wrong metrics?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metrics would be labelled with the pid of the worker processes. I'm in favor of aggregating them to be compatible with the Thin metrics, i.e. have one metric per vm. In the end we are not even interested in the outstanding requests per vm, but for all apis, so I don't see a use case for having metrics per process.

@svkrieger svkrieger self-requested a review December 5, 2023 10:56
@philippthun philippthun merged commit 64115aa into cloudfoundry:main Dec 5, 2023
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants