Skip to content

Metrics Required for the performances tests

ShyamB97 edited this page Oct 3, 2024 · 1 revision

Metrics from readout performance test template

Copy of version: https://docs.google.com/document/d/1LoKOntjo5ESYEkcmfScwtCp7qwCInjv2G0lgnaopAXI/edit

Failure criteria and potential metrics to identify them:

  • Data reception failures
    • missed and dropped packets
  • Trigger primitive generation failure
    • post processing error messages
    • TP rate per channel
  • SNB recording failure
    • error messages
    • data written for one APA must be ~876.25 GB
    • data written for one CRP must be ~1024 GB
  • Excessive system utilisation:
    • CPU utilisation for all cores must be less than 80% (only average available)
    • memory bandwidth across each NUMA nodes must be less than 80% (unclear which numa node is presented)

Technical specifications of the machine tested must be logged in each report:

  • Motherboard and chipset
  • CPU details (incl. caches)
  • DRAM (incl. speed, capacity, and available channels)
  • NICs (incl. for readout and backend with available ports, bandwidth and features.)
  • SNB store (incl. drive specs. and their configuration -RAID-)
  • OS and software (OS, kernel version, external dependencies used)

Concurrancy:

  • Active user sessions on the RU during the load test
  • Active processes and services running on the RU during the load test (top ten dominant processes, usage of each daq application on the RU)

Post test Analysis:

  • Overall RU utilization in terms of:

    • CPU instructions per second and percentile per core
    • Memory BW utilization per channel (what are channels in this context?)
    • CPU cache utilization in terms of misses and hits
  • Every application and their threads’ utilization in terms of:

    • Peak and average CPU percentile
    • Peak and average memory BW utilization
    • Peak and average CPU cache utilization in terms of misses and hits

Additional information in reports:

  • Kernel’s and other services’ resource utilization
  • Available headroom on the RU, in terms of:
  • Unused cores
  • Unused memory bandwidth
  • Downstream network bandwidth available
  • Configuration information for readout
  • Configuration information for trigger primitive generation
  • CPU pinning information

What do instructions per cycle and any retired tell us? Why is metrics split in Socket number rather than numa node?