-
Notifications
You must be signed in to change notification settings - Fork 58
LDMS connector
The Lightweight Data Monitoring System (LDMS) is a health monitoring system to monitor performance of a set of independent applications running on a particular supercomputer or supercomputing platform. LDMS is developed at Sandia National Laboratories and is used across U.S. DoE laboratories. LDMS Kokkos Tools connector collects data of Kokkos kernels and converts them into a format understandable by LDMS.
Due to very large amounts of data gathered, often 10s of TB per day, the LDMS Kokkos Tools connector should be used with the sampler utility of Kokkos tools to extract profiling data samples from a Kokkos application program.
- Collected LDMS data is already on node, and queryable, with little to no overhead.
- LDMS can be cloned and installed here: https://github.com/ovis-hpc/ovis
- Information and Quickstart guide can be found here: https://ovis-hpcreadthedocs.readthedocs.io/en/latest/
- The following environment variables can be adjusted by users in their run scripts:
export KOKKOS_TOOLS_SAMPLER_RATE=101 export KOKKOS_LDMS_VERBOSE=0
- The tool's environment variable
KOKKOS_SAMPLER_RATE
sets the sampling rate of kernel function calls. The default is set to 1%. - The tool's environment variable
KOKKOS_LDMS_VERBOSE
prints all Kokkos messages that are sent to LDMS to an output file when set to a non-zero integer.
All collected data by LDMS are stored in the built storage system (DSOS) provided in the LDMS setup tutorials above.
Data can be visualized using Grafana. Information about setting up and using Grafana can be found here: https://ovis-hpcreadthedocs.readthedocs.io/en/latest/grafanapanel.html
SAND2017-3786