-
Notifications
You must be signed in to change notification settings - Fork 0
Sync meeting on EESSI test suite (2023 05 31)
Kenneth Hoste edited this page May 31, 2023
·
1 revision
- https://github.com/EESSI/meetings/wiki/Sync-meeting-on-EESSI-test-suite-(2023-05-17)
- https://github.com/EESSI/meetings/wiki/Sync-meeting-on-EESSI-test-suite-(2023-04-20)
- https://github.com/EESSI/meetings/wiki/Sync-meeting-on-EESSI-test-suite-(2023-03-30)
- https://github.com/EESSI/meetings/wiki/Sync-meeting-on-EESSI-test-suite-(2023-03-10) (incl. 2023-02-23)
- https://github.com/EESSI/meetings/wiki/Sync-meeting-on-EESSI-test-suite-(2023-02-09)
- merged PRs
-
PR #28 by Sam: extend scales + add constants
- list of supported scales should be documented in README?
- extra 1_gpu, 2_gpu scales?
- core count can be picked based on available cores/gpus per node
- 1_4_node didn't work for Kenneth on Hortense to get 1 GPU runs
- was using
hortense
branch not properly updated with latestmain
:man-facepalming:
- was using
- hardcoded limit of 30min in GROMACS tests (see open issue)
- PR #33 by Xin: update README
- PR #42 by Sam: add more comments, docs to the gromacs test
-
PR #28 by Sam: extend scales + add constants
- open PRs
-
PR #24 by Sam+Kenneth add configuration file for VSC Tier-1 Hortense
- was updated to have all partitions on Hortense (GPU + CPU)
- on GPU nodes, both CPU-only and GPU test variants are submitted
- Do we want this?
- We can not include
cpu
in features for the GPU partitions
- we should also specify
num_sockets
along withnum_cpus
for all partitions
- PR #36 by Caspar: Expand developer instructions
-
PR #24 by Sam+Kenneth add configuration file for VSC Tier-1 Hortense
- WIP PRs
-
PR #38 by Caspar: WIP TensorFlow
- how to specify binding behaviour in a portable way?
- options depend on launcher being used (mpirun, srun, etc.)
- something to ask ReFrame developers?
- can we set environment variables that control bindings for all known launchers?
-
$OMPI_*
,$IMPI_*
,$SRUN_*
- https://hpc-wiki.info/hpc/Binding/Pinning
-
- PR #44 by Kenneth: add GitHub Actions workflow to run EESSI test suite
-
PR #45 by Kenneth: refactor namespace to eessi.testsuite.* (WIP)
- remove
__init__.py
files? (maybe be needed for 'pip install' => add CI for that) -
eessi/testsuite/utils/constants.py
->eessi/testsuite/constants.py
- example configuration files in
config
dir directly intest-suite
repo -
config/vsc_hortense.py
,config/surf_snellius.py
,config/aws_eessi_citc.py
,config/github_actions.py
,config/local_cpu.py
,config/local_cpu_gpu.py
,config/local_gpu.py
,config/eessi.py
- need to also update
README
!
- remove
-
PR #38 by Caspar: WIP TensorFlow
- Satish is stuck with problem with configuration file (issue #43)
- maybe due to
hostnames
?
- maybe due to
- consider using remote detection support of ReFrame in example configuration file
- Caspar: running tests with EESSI
- requires "source /cvmfs/pilot.eessi-hpc.org/latest/init/bash" prepare_cmds in partition
- but also still requires to run the "source" command in the shell session where 'reframe' is run
- [Kenneth] from README to docs (https://eessi.github.io/docs/software_testing/ or https://eessi.github.io/docs/test-suite)
- configuration
- "default" environment is required
- features: cpu and/or gpu
- logging (cfr. https://github.com/EESSI/test-suite/issues/40)
- etc.
- test filtering
- tags: --tag CI --tag '1_node|2_nodes'
- configuration
- should we be relying on the
hpctestlib
provided by ReFrame- is being maintained, according to Victor & Vasileios, but won't be extended with new tests soon
- as long as implementing tests on top works, that's OK for now (cfr. our current GROMACS test)
- same approach for OSU Microbenchmarks, although test in library is compile + run, while we want run only
- Satish: seems like we can only derive from the run test (
osu_run
, cfr. https://github.com/reframe-hpc/reframe/blob/develop/hpctestlib/microbenchmarks/mpi/osu.py)
- Satish: seems like we can only derive from the run test (
- not using tensorflow test from test library, because it uses Horovod (https://github.com/reframe-hpc/reframe/blob/develop/hpctestlib/ml/tensorflow/horovod.py)
- next steps
- Kenneth
- CI PR #44
- rework namespace - PR #45
- proper docs (README to EESSI docs)
- Caspar
- TensorFlow test
- Satish
- OSU Microbenchmarks test
- maybe OpenFOAM
- ReFrame in EESSI
- running test in AWS cluster (https://github.com/EESSI/hackathons/tree/main/2022-01/citc)
- Sam
- logging of logic to set up resources (issue #35)
- CUDA vs NVIDIA feature
- Lara
- run test suite on VSC Tier-2 Hortense + HPC-UGent Tier-2
- HPL/STREAM/WRF from internal HPC-UGent test suite (https://github.ugent.be/hpcugent/vsc-testing private repo)
- additional tests (for Xin)
- Bioconductor demo: https://github.com/EESSI/eessi-demo/blob/main/Bioconductor/run.sh
- maybe ESPResSo for MultiXscale, see https://github.com/multixscale/planning/issues/53
- OpenFOAM is pretty complex, can be done later (or by someone more experienced with ReFrame)
- Kenneth