Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add payu sync cmd for syncing archive to a remote directory #360

Merged
merged 3 commits into from
Nov 2, 2023

Conversation

jo-basevi
Copy link
Collaborator

@jo-basevi jo-basevi commented Aug 24, 2023

Add a new payu command to sync output to a remote directory. I've extended the expt.postprocess(), which either runs after collation or end of archive if not collating, to also call 'payu sync' if syncing is enabled.

An example config currently is

sync:
    enable: true
    # PBS specific 
    # queue: copyq # Default
    # mem: 2GB # Default
    # ncpus: 1 # Default

    # rsync specific
    path: /g/data/{Project}/{user}/{model}/{experiment name}/archive  # Absolute path to sync data to
    # rsync_flags: -vrltoD --safe-links # Default
    exclude: 
        - 'iceh.????-??-??.nc' 
        - '*-DEPRECATED' 
        - '*-DELETE' 
        - '*-IN-PROGRESS'

   # exclude_uncollated: false # Default is true if collation is enabled.
      
    # Extra paths to sync to remote-archive
    extra_paths:
       - metadata.yaml
       - run_summary*.csv
     
    # Sync permanently saved restarts
    # restarts: false # Default

    # remove_local_files: false # Default
    # remove_local_dirs: false # Default

    # Add a bare repo for the runlog 
    # runlog: true # Default 

userscripts:
    # User defined scripts/commands to be called before remote syncing model archive - useful for any post-processing
    sync: postprocess_archive.sh

The cosima's sync_data.sh (https://github.com/COSIMA/1deg_jra55_iaf/blob/master/sync_data.sh) does some extra processing, e.g: runs a run_summary python script and concatenates ice daily files.

An example of postprocess_archive.sh could look like for 1deg_jra55_iaf
  #!/bin/bash
  
  # Load/use required enviroment modules
  if [ -z "$MODULESHOME" ]; then
      echo "warning: No Environment Modules found; skipping $command call."
      exit 1
  fi
  
  modulecmd="$MODULESHOME/bin/modulecmd bash"
  
  # Modules for ice concatenation 
  eval "$($modulecmd load nco)"
  
  # Modules for run summary 
  eval "$($modulecmd use /g/data/hh5/public/modules)"
  eval "$($modulecmd load conda/analysis3)"
  eval "$($modulecmd load python3-as-python)"
  
  # Concatenate ice daily files
  for d in archive/output*/ice/OUTPUT; do
      for f in $d/iceh.????-??-01.nc; do
          if [[ ! -f ${f/-01.nc/-IN-PROGRESS} ]] && [[ ! -f ${f/-01.nc/-daily.nc} ]];
          then
              touch ${f/-01.nc/-IN-PROGRESS}
              echo "doing ncrcat -O -L 5 -7 ${f/-01.nc/-??.nc} ${f/-01.nc/-daily.nc}"
              ncrcat -O -L 5 -7 ${f/-01.nc/-??.nc} ${f/-01.nc/-daily.nc} && chmod g+r ${f/-01.nc/-daily.nc} && rm ${f/-01.nc/-IN-PROGRESS}
              if [[ ! -f ${f/-01.nc/-IN-PROGRESS} ]] && [[ -f ${f/-01.nc/-daily.nc} ]];
              then
                  for daily in ${f/-01.nc/-??.nc}
                  do
                      # mv $daily $daily-DELETE  # rename individual daily files - user to delete
                      rm $daily
                  done
              else
                  rm ${f/-01.nc/-IN-PROGRESS}
              fi
          fi
      done
  done
  
  sourcepath="$PWD"
  cd archive || exit 1
  
  # Delete any cice log files that only have a 105-character header and nothing else
  find output* -size 105c -iname "ice.log.task_*" -delete
  
  # Update and commit run summary - do this last in case it doesn't work
  cd $sourcepath
  ./run_summary.py --no_header
  git add run_summary*.csv
  git commit -m "update run summary"
  
  echo "$0 completed successfully"

Edited: 2/12/23

@coveralls
Copy link

coveralls commented Aug 24, 2023

Coverage Status

coverage: 47.069% (+1.5%) from 45.539% when pulling 9ac075a on jo-basevi:200-restoring-remote-archive into a336968 on payu-org:master.

@aidanheerdegen
Copy link
Collaborator

Closes #200

@jo-basevi jo-basevi force-pushed the 200-restoring-remote-archive branch from 4c5f149 to a05168e Compare October 15, 2023 23:24
@pep8speaks
Copy link

pep8speaks commented Oct 15, 2023

Hello @jo-basevi! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2023-11-02 00:36:33 UTC

@jo-basevi
Copy link
Collaborator Author

jo-basevi commented Oct 23, 2023

Extra Notes/Questions

Syncing and postscript post-processing

Currently if post-script is configured AND automatic syncing is enabled, the latest output (and lastest restarts, if syncing restarts is enabled) will not be synced. To manually sync the latest outputs after the post-script job has completed, run:
payu sync

Instead of post-script, to automatically ensure post-processing happens before syncing, there's a sync userscript hook. This will run at the start of the sync pbs job and before running any rsync commands. There are options to modify pbs resources/queue under the sync namespace in config.yaml.

If post-processing needs to run on a different queue to sync job (if need network access for syncing to a remote machine and post-processing >1 NCPUs), or it requires significant resources, an option could be to call payu sync at the end of a post-script and disable sync being called automatically by payu.

If the above won't work, an option could be to add to payu a post-processing command to submit a post-processing job which runs any shell/python post-processing userscripts and then either runs payu.sync() after completion or submits a separate copyq job to run payu.sync().

Local delete options

There is a config option to remove local files once synced. As this leaves behind empty directories and files excluded from rsync, there is another config option to remove local restart/output directories. This also happens after the directory has been successfully synced.

The latest output (and last "permanently-saved" restart and subsequent restarts) are protected from both local delete options. Any additional configured paths to sync in config.yaml are also protected. The last "permanently-saved" restart is protected because it is used as a checkpoint for date-based restart pruning.

A procedure to remove all local files and directories - including last outputs/restarts could be:

  1. Run payu sync to sync all outputs, or payu sync --sync-restarts to sync all restarts, as well.
  2. Check files have been successfully synced by checking logs for errors and inspect remote archive.
  3. Then run payu sweep --hard to delete everything in the local archive.

#### Default path for the remove archive?.

If not syncing to a remote machine, I've automated the destination path for remote archive to be
/g/data/${PROJECT}/${USER}/${MODEL}/${EXPERIMENT_NAME}/archive.

I'm unsure if this is the best idea as it could risk outputs being overwritten if experiment name is not unique.. I am thinking it might be best to remove this, until experiments have a unique ID. Edit: This has been removed

Syncing to remote machine

I've kept most of the logic in pre-exisiting remote_archive() for rsyncing to a remote machine, which seems to be in-line with nci's guide for setting up restricted rsync calls with ssh-keys.

I've attempted to go through the above guide above and rsync small files from gadi to my local desktop but haven't had any success. So I've left the remote sync options (e.g rsync_protocol, user and url) out of the documentation for now, as I've haven't been able to test it.

Uncollated files

Currently, --exclude *.nc.* is always added to the rsync commands to ignore uncollated files.
Would syncing uncollated files be an option? If so, I could add a ignore_uncollated flag to config.yaml, with True being the default value.

docs/source/config.rst Outdated Show resolved Hide resolved
@jo-basevi jo-basevi marked this pull request as ready for review October 24, 2023 22:01
@aekiss
Copy link
Contributor

aekiss commented Oct 24, 2023

Just checking - the current sync script rsyncs all outputs (not just the latest) every time it's called, to handle asynchronous collation. Does payu sync have the same behaviour?

@jo-basevi
Copy link
Collaborator Author

Just checking - the current sync script rsyncs all outputs (not just the latest) every time it's called, to handle asynchronous collation. Does payu sync have the same behaviour?

Yes, it rsyncs all the outputs directories every time it's called. I had looked into rsyncing only the current output or current/ prior restart- if collation is enabled, however realised there wasn't much point as rsync would be able to quickly pass over already synced directories in archive anyway..

aidanheerdegen
aidanheerdegen previously approved these changes Oct 31, 2023
Copy link
Collaborator

@aidanheerdegen aidanheerdegen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite a complex thing to achieve in a legacy codebase, so good work, thanks.

I do appreciate the thorough testing!

There are few minor changes, which I think should be addressed, but otherwise looks great.

docs/source/config.rst Outdated Show resolved Hide resolved
docs/source/config.rst Outdated Show resolved Hide resolved
docs/source/config.rst Outdated Show resolved Hide resolved
docs/source/config.rst Outdated Show resolved Hide resolved
payu/experiment.py Show resolved Hide resolved
payu/sync.py Outdated Show resolved Hide resolved
payu/sync.py Outdated Show resolved Hide resolved
payu/sync.py Outdated Show resolved Hide resolved
payu/sync.py Show resolved Hide resolved
test/test_sync.py Outdated Show resolved Hide resolved
@jo-basevi
Copy link
Collaborator Author

Thanks @aidanheerdegen for the review! I've just added in the requested changes. I've tested it locally but will have to wait until gadi is back up and running to run through a couple of actual model output tests.

aidanheerdegen
aidanheerdegen previously approved these changes Nov 1, 2023
Copy link
Collaborator

@aidanheerdegen aidanheerdegen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seemed to want another review, so .... LGTM!

Jo Basevi added 3 commits November 2, 2023 11:18
- Extended postprocess() to run `payu sync`, if syncing is enabled
- Automatically check sync path for storage paths
- If syncing restarts is enabled, only sync restarts that will be permanently archived
- Add config options for syncing to remote machine,
- Add payu sync cmd options to sync all restarts
- Add tests
- Change logic to automatically sync all output (rather than just the lastest output)
- Update storage path check to look for sync path in config.yaml
- Add options for local delete of files/dirs after syncing
   - Add protected paths in get_archive_paths_to_sync. This is protect the last output, and last saved restart (needed for date-based restart pruning) from delete local options
   - remove_local_files config flag for removing local files once synced
   - remove_local_dirs config flag for removing local restart/output dirs onced synced. This will remove any empty dirs after rsync operation and any files that were excluded from rsync.
- Add excludes options
- Add single or list options to extra paths to sync and exclude
- Add documention for sync configuration options and usage
- Add runlog option to sync which defaults to True
- Remove hyperthreading in sync command, and explicitly add a default walltime
- Raise error when sync path is not defined
- Remove sync ssh keys
- Add flag for syncing uncollated files which defaults to True when collation is enabled.
@jo-basevi
Copy link
Collaborator Author

I've tested it all works on gadi, I just added on line in the tests to disable attempting to sync runlog. I also tidied up the commit history

Copy link
Collaborator

@aidanheerdegen aidanheerdegen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still LGTM!

@jo-basevi jo-basevi merged commit bb84f66 into payu-org:master Nov 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants