Skip to content

Commit

Permalink
Apply performance tuning when mounting FSx for Lustre
Browse files Browse the repository at this point in the history
We have to create a custom mount helper to better accommodate parameters persistence across instance reboot and head node instance type update.
lctl set_param commands have to be executed after the file systems are mounted and do not persist over instance reboot. The FSx document advises to use a boot cron job to set the parameter after reboot. However, a boot cron job is not compatible with our use case because FSx Lustre file systems are mounted upon first access instead of instance reboot (see code (https://github.com/aws/aws-parallelcluster-cookbook/blob/develop/cookbooks/aws-parallelcluster-config/resources/manage_fsx.rb#L60)). Therefore, we have to create a custom mount helper (see mount man page (https://linux.die.net/man/8/mount)):

Q: Are these operations affect the client FSx configuration or the server configuration?
A: Client only.

Q: How it will work if a customer mounts FSx manually?
A: If they use lustre as the mount type, the performance tuning will not be applied.

Q: How do customers know they will have to use the mount helper?
A: They will have to read ParallelCluster official doc.

Signed-off-by: Hanwen <[email protected]>
  • Loading branch information
hanwen-cluster committed Jan 24, 2023
1 parent 8b1a85a commit 41f0cde
Show file tree
Hide file tree
Showing 2 changed files with 44 additions and 2 deletions.
8 changes: 6 additions & 2 deletions cookbooks/aws-parallelcluster-config/resources/manage_fsx.rb
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,10 @@
default_action :mount

action :mount do
template "/sbin/mount.lustre_with_performance_tuning" do
source 'shared_storages/mount.lustre_with_performance_tuning.erb'
mode '0755'
end
fsx_fs_id_array = new_resource.fsx_fs_id_array.dup
fsx_fs_type_array = new_resource.fsx_fs_type_array.dup
fsx_shared_dir_array = new_resource.fsx_shared_dir_array.dup
Expand Down Expand Up @@ -63,7 +67,7 @@

mount fsx_shared_dir do
device "#{dns_name}@tcp:/#{mount_name}"
fstype 'lustre'
fstype 'lustre_with_performance_tuning'
dump 0
pass 0
options mount_options
Expand All @@ -75,7 +79,7 @@

mount fsx_shared_dir do
device "#{dns_name}@tcp:/#{mount_name}"
fstype 'lustre'
fstype 'lustre_with_performance_tuning'
dump 0
pass 0
options mount_options
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
#!/bin/bash

# Copyright 2013-2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance with the
# License. A copy of the License is located at
#
# http://aws.amazon.com/apache2.0/
#
# or in the "LICENSE.txt" file accompanying this file. This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES
# OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions and
# limitations under the License.

set -ex

if [ $(ohai cpu/total) -gt 64 ] && (! (/sbin/lsmod | grep -q ^lustre) || [ $(/sbin/lsmod | grep ^lustre | awk '{print $3}') -eq 0 ]); then
modprobe_conf_path="/etc/modprobe.d/modprobe.conf"
ptlrpcd_per_cpt_max="options ptlrpc ptlrpcd_per_cpt_max"
ksocklnd_credits="options ksocklnd credits"
if ! grep -q "$ptlrpcd_per_cpt_max" "$modprobe_conf_path" && ! grep -q "$ksocklnd_credits" "$modprobe_conf_path"; then
sudo sh -c "echo $ptlrpcd_per_cpt_max=32 >> /etc/modprobe.d/modprobe.conf"
sudo sh -c "echo $ksocklnd_credits=2560 >> /etc/modprobe.d/modprobe.conf"
# Reload Lustre kernel module to apply the above two settings
sudo lustre_rmmod && sudo modprobe lustre
fi
fi

sudo mount -t lustre "$@"

if [ $(ohai cpu/total) -gt 64 ]; then
sudo lctl set_param osc.*OST*.max_rpcs_in_flight=32
sudo lctl set_param mdc.*.max_rpcs_in_flight=64
sudo lctl set_param mdc.*.max_mod_rpcs_in_flight=50
fi
total_memory_kb=$(cat /proc/meminfo | grep MemTotal | awk '{print $2}')
if [ "$total_memory_kb" -gt 274877907 ]; then
sudo lctl set_param llite.*.max_cached_mb=$((total_memory_kb/10000)) # set this value to be 10% of customer client instance physical memory in mb
fi

0 comments on commit 41f0cde

Please sign in to comment.