-
Notifications
You must be signed in to change notification settings - Fork 0
Azure meeting Mar 18 2022
Kenneth Hoste edited this page Mar 22, 2022
·
1 revision
- Overview of spent credits
- EESSI hackathon January 2022
- EUM'22
- Monitoring
- Repository for hosting datasets (e.g. for WRF)
- EESSI paper published!
- Azure support for CitC
- European project proposal (funding for EESSI)
- Bob Dröge (RUG)
- Kenneth Hoste (HPC-UGent)
- Laura Redfern (MS Azure)
- Alan O'Cais (CECAM)
- Hugo Meiland (MS Azure)
- Julian Kreuk (MS Azure)
- Ahmad Hesam (SURF)
- January 2022: ~€1195
- temporary Magic Castle cluster EESSI hackathon (Jan'22): ~€610
- AMD Milan build node: ~€350
- Stratum-1 in us-east: ~€200
- February 2022: ~€185
- March 2022 (partial): ~€103
- in last 1.5 months: basically only Stratum-1 in us-east
- https://github.com/EESSI/meetings/wiki/EESSI-hackathon-Jan'22
- show & tell meeting on last day to wrap up the hackathon
- good progress on:
- installing software on top of EESSI (experiments with LAMMPS)
- good progress on support for running software on NVIDIA GPUs
- see https://github.com/EESSI/hackathons/tree/05_gpu/2022-01/05_gpu
- working solution (proof-of-concept) for Rocky Linux (should work on RHEL8-based Linux distros)
- some form of approval from NVIDIA to ship CUDA runtime in EESSI (but nothing more than that)
- still leaves the CUDA compilers
- even if we include CUDA runtime, we'll still need a script to ensure things work properly
- can also include sanity checks
- Hugo: what would be needed in Azure VM images for this?
- Alan: basically only reasonbly recent GPU drivers
- script can install compat libs in userspace to fill the gaps where needed
- scripts we have should get integrated in EESSI to facilitate letting play with it
- Hugo also has some contacts in NVIDIA that could be helpful
- Hugo is interested on playing with proof-of-concept GPU support with Devito (see https://github.com/easybuilders/easybuild-easyconfigs/pull/14984)
- archiving EESSI software installations into a container image
- https://easybuild.io/eum22
- incl. talk by Hugo & Davide on running WRF on Azure via EESSI
- see https://easybuild.io/eum22/#azure-eessi-wrf
- some additional progress has been made there
- question in this context that came up was whether Intel oneAPI compilers and tools (incl. MKL) could be included in EESSI
- EESSI community has not contacted Intel yet at all for this
- two talks on EESSI:
- Getting Started with EESSI
- Semi-automated workflow for adding software to EESSI
- good progess, work done by Terje @ Univ. of Oslo
- https://monitoring.eessi-infra.org
- see Hugo's proposal at https://github.com/EESSI/filesystem-layer/pull/112
- WRF input data is handful of 1GB files
- should be doable with a "normal" data repository?
- Bob will look into this
- expectation is that additional data would be pulled in soon for seismic data
- some concerns there w.r.t. client CernVM-FS cache
- larger cache than 10GB would be needed for this
- EESSI: A cross-platform ready-to-use optimised scientific software stack
- https://doi.org/10.1002/spe.3075
- Open access
- "Software: Practice and Experience" special issue on "New Trends in HPC"
- work done by Hugo
- resulted in PRs to CitC:
- Matt Williams (main developer of CitC) has been granted access to EESSI Azure credits for testing CitC on Azure
- maybe we should try and get Matt to join next EESSI hackathon
- Magic Castle already has Azure support, but no auto-power-down of unused workernodes there
- Hugo could help here too
- Alan: doing things securely is a big concern here
- seeking letter of support from Azure
- if possible, also some confirmation that sponsored Azure credits could be made available to project (what we basically already have)
- any plans?
- half-day EasyBuild tutorial where EESSI will be shortly featured
- Laura will check if there could be a session setu p
- integration is done by Hugo, using EESSI is opt-in
- was well received
- Hugo is making it very clear that EESSI is still pre-production
- Hugo: best option to spend time on stuff that's helpful for EESSI?
- Kenneth: blockers for stable EESSI are:
- dedicated manpower (would be resolved if European project proposal gets accepted)
- proper support for NVIDIA GPUs
- Alan could post instructions on how to play with proof-of-concept that was puzzled together during last hackathon
- GitHub App to automate workflow for community contributions
- starting point by Kenneth: https://github.com/boegel/pyghee
- set up central Stratum-0 server securely (physical box, yubikeys)
- any helps on these aspects is welcome :)