All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog.
- Support for NVIDIA MIG (PR #276)
- Support for AWS EFA (PR #331)
- Bumped Slurm version to 23.02 (PR #276)
- Removed support for Slurm 20.11 (PR #276)
- Added class
software_stack
(PR #304) - Enabled Prometheus alertmanager for mgmt (PR #318)
- Added support for ssh authorized keys options (PR #344)
- Moved software stack content out of cvmfs class into its own class (PR #304)
- Fixed cvmfs configuration order in Puppet (PR #310)
- Improved error handling in mkhome and mkproject daemon by implementing pipeline with shell fifo (PR #328)
- Improved mkproject daemon handling of error and of group locking (PR #346)
- Remove profile::mfa::provider from common.yaml
- Bumped puppet-jupyterhub to 5.0.4
- Moved Default=YES from PartitionName=DEFAULT to PartitionName=cpubase_by_core1 in slurm.conf
- Update EESSI for software.eessi.io (#294)
- Moved default partition parameters to DEFAULT
- Added a parameter to configure slurm.conf addendum (#257)
- Defined a value for ReturnToService in slurm.conf (#288)
- Defined a value for ResumeFailProgram in slurm.conf (#291)
- Defined PrivateData=cloud in slurm.conf for Slurm < 23.02 (#293)
- Removed scontrol_update_state resource from profile::slurm::node
No changes to Puppet code.
Refer to magic_castle changelog
- Added support for FreeIPA automembership rule (#287)
- Bumped Slurm autoscale TFE version to 0.5.1 (#296)
- Changed default MOTD to an empty string
- Moved mkhome and mkproject logic to bash function (#287)
- Added ability to define login node MOTD using puppetlabs-motd
- Moved mkhome and mkscratch logic in bash functions.
- Removed infinite while loop in mhome (#281)
- File
/etc/hosts
is now generated from a template instead of appending hosts (#282) - prepare4image.sh now removes cluster specific content from /etc/hosts (#277)
- Added selinux module for Caddy read access to somaxconn
- Added resource collectors to define requirements for service
slurmd
. The resource collector requirements only apply if their corresponding class is included in the site definition. - Added `include mysql::server`` in classes that needs a MySQL server.
- Added missing sssd service include to jupyterhub::hub
- Added include epel in fail2ban and ceph
- Added include profile::gpu in profile::slurm::node
- Added include epel to slurm::base
- Added ensurance /etc/ssh/ssh_known_hosts exists
- Added require Class['consul::reload_service'] to wait_for slurmctld host
- Added missing include consul_template in cvmfs.pp
- Added ability to add extra environment variables to CVMFS site.sh
- Added ability for user to shuffle module include in site.yaml (
magic_castle::site::enable_chaos
) - Added support for Rocky / Alma Linux 9
- Refactored
profile::reverse_proxy
to allow arbitrary subdomain and proxy definition - Replaced
require profile::accounts
by resource collection inldap_users
- Replaced bootstrap.sh puppet command by application of all resources tagged as
mc_bootstrap
- Replaced require by resource collector in profile::accounts
- Moved
mysql::server
out ofslurm::accounting
and in hieradata - Redirected output of ipa-client-uninstall_bad-hostname onlyif curl command to /dev/null
- Udpated fail2ban module to 4.2.0
- Moved swap file defintion from profile::base to common.yaml
- Moved xauth from profile::base to profile::slurm::base
- Moved puppet cache mode change from profile::base to profile::consul::puppet_watch
- Moved ssh config and ssh_known_hosts from profile::base to their own classes
- Moved consul_template from profile::base to profile::consul
- Merged profile::consul::server and profile::consul::client classes
- Moved CentOS powertools repo enabling to its own class
- Moved /etc/hosts definition to its own class
- Moved sssd service to its own class
- Replaced site.pp by site.yaml
- Fixed sed_fqdn onlyif
- Replaced sed_host_puppet by sed_host_wo_fqdn
- Fixed key used slurm_compute_weights to retrieve instances' memory ('ram' instead of 'realmemory')
- Bumped slurm-autoscale-tfe version to v0.4.0
- Improved prepare4image.sh script (#256)
- Removed service clean-nfs-rbind
- Removed class profile::mfa
- Removed
require profile::base
from profile::nfs::server - Disabled unused epel repos
- Removed profile::nfs exec
exportfs -ua; cat...; exportfs -a
- Removed puppet alias from etc/hosts
- Fixed VGPU identification facts when dealing with NVIDIA A100 and more than one gpus. (#268)
- Fixed Compute Canada CVMFS rpm package name and source.
- Fix regression introduced in #263
- Added a
fail()
call ifcomputecanada
is being initialized on an instance with a nonx86-64
CPU.
- Moved cvmfs.pp code related to
RSNT_ARCH
under ifcomputecanada
branch.
- Bumped cmdntrf-consul_template to v2.3.5 to support aarch64
- Fixed issue with CVMFS configuration when there are no
/scratch
NFS export (#262)
- Updated puppetlabs-mysql to 13.3.0 (#261)
- Fix
slurm_compute_weights
sort on ram instead of realmemory
- Added GPU monitoring with Prometheus and improve global compute node monitoring configuration (#237)
- Added
2
as a possible return code when creating HBAC rules - Add definition
seluser
to alien cache folder.
- Updated consul to 1.15 (#245)
- Enabled multi-servers consul configuration (#245)
- Moved from puppet facts to Terraform data to identify the ethernet interface connected to the local network. (#247)
- Bumped puppet-jupyterhub to v4.6.4
- Added support Add support for CVMFS alien cache (#204)
- Removed hardcoding of python 3.6 for slurm autoscale virtualenv
- Added automembership rule for users who self sign-up with Mokey
- Added HBAC rules to allow self signup user to connect
- Defined missing variable
$cidr
inprofile::nfs::server::export_volume
. - Added a before
Package['cvmfs']
clause forcvmfs
user andcvmfs-reserved
group. - Changed the default FreeIPA user shell from
/bin/sh
to/bin/bash
. - Bumped puppet-jupyterhub to v4.6.1
- Fixed mkhome daemon to retry initial rsync of a LDAP user's home (#218, #219)
- Fixed LDAP TLS certificate to add ipa subdomain (#215)
- Moved
consul_template::watch
ofslurm-consul.conf
inslurm::base
(#221, #222) - Consolidated generation of
/etc/hosts
in a single classprofile::base
(#221, #225)
- Activated SSH hostbased authentication on compute nodes, from login and compute nodes. (#5, #217)
- Added automatic generation of HBAC rules for LDAP users based on instance tags (#221, #225)
- Added a mount bind of NFS exports on the NFS server if the LDAP users can connect to it (#221, #224)
- Add mising profile:slurm::submitter class to profile::jupyterhub::hub (#227, #230)
- Added creation of Slurm partitions based on the compute node hostname prefixes(#38, #226)
- Removed Singularity class.
apptainer
is now provided by CVMFS. (#216)
- Bump puppet-jupyterhub to 4.5.0
- Added eyaml lookup in hiera.yaml
- Added generation of ipa admin password to bootstrap.sh
- Added resource allowing reset of ipa admin password
- Added generation of consul token, freeipa admin password, mysql password, munge token to bootstrap script
- Defined a specific password for directory server
- Added a specific password for slurmdbd
- Added management of LDAP user password in Puppet - guest password can now be resetted by changing the hieradata
- Added documentation
- Replaced mokey password lookup by class variable
- [account] Added rsync package installation (package was missing from some base image)
- [account] Added unique filter on username when creating accounts
- [base] Added magic-castle-release file in
/etc
(PR #208) - [base] Added generation of
/etc/hosts
fromterraform_data.yaml
information for compute nodes (PR #208) - [base] Added definition of /etc/ssh/ssh_known_hosts for compute node (PR #208)
- [base] Added a script
prepare4image.sh
that prepare an instance to be snapshot. (PR #208) - [cvmfs] Added cvmfs local user
- [singularity] Added singularity to the list of EPEL exclusion
- [slurm] Added definition of
node.conf
usingterraform_data.yaml
information (PR #208) - [slurm] Added ResumeProgram and SuspendProgram option allowing Slurm to autoscale with Terraform Cloud (PR #208)
- [slurm] Added virtual environment and installation of Slurm TFE autoscale Python package (PR #208)
- [slurm] Added new file
/etc/slurm/env.secrets
containing environment variable to interact with TFE (PR #208) - [slurm] Added version to slurm and slurmdbd package name installation
- [consul] Added timestamp payload support in
puppet_event_handler
and improved logic - [cvmfs] Updated source of cvmfs-repo
- [freeipa] Moved
kinit_wrapper
creation tofreeipa::server
(PR #208) - [gpu] Improved GPU driver symlink creation to avoid creating invalid symlinks on first Puppet run (PR #209)
- [gpu] Fixed nvidia-persistenced /var/run folder
- [nfs] Fixed volume pool to keep only unique volumes
- [puppetfile] Bumped puppet-jupyterhub version to v4.3.6
- [slurm] Moved node weight computation from a consul plugin to a Puppet function (PR #208)
- [slurm] Simplified COPR slurm yumrepo definition (PR #208)
- [slurm] Moved definition of
gres.conf
from node to base (PR #208) - [slurm] Fixed slurmdbd regex (PR #208)
- [slurm] Fixed source of spank-cc-tmpfs_mount for Slurm 22.05 (PR #208)
- [slurm] Configured default state of compute node to CLOUD
- [base] Removed magic castle plugin rpm install
- [base] Removed owner and group definition in
/var/puppet/cache
- [base] Removed ssh-rsa from HostKeyAlgorithms and PubkeyAcceptedKeyTypes
- [freeipa] Removed PTR record creation from
freeipa::client
- [gpu] Removed nvidia_driver_version fact (PR #209)
- [slurm] Removed consul-template generation of node.conf (PR #208)
- [slurm] Removed support for Slurm 19.08 (PR #208)
- [slurm] Dropped NVML usage in gres.conf (incompatible with cloud state node) (PR #208)
- [slurm] Removed NVML enabled Slurm yum repo