-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concatenate CICE daily output #31
Conversation
…still containing daily data). This relies on adding nco to the payu environment per ACCESS-NRI/payu-condaenv#24
❌ Automated testing cannot be run on this branch ❌ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to get an idea of how long it takes to concatenate some of the high res data and post the results in the PR or Issue.
If it's time-consuming the script could have PBS directives added and run as a postscript
which is then submitted to the queue
tools/concat_ice_daily.sh
Outdated
#concatenate sea-ice daily output | ||
#script inspired from https://github.com/COSIMA/1deg_jra55_ryf/blob/master/sync_data.sh#L87-L108 | ||
|
||
for d in archive/output*/ice/OUTPUT; do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This loops over all the output directories. If we're running this each time we shouldn't have to do that.
I can see two options:
- Determine the most recent output directory and just run there
- invoke this with the
run
userscript hook and do the concatenation inn thework/ice/OUPUT
directory before it is archived.
The issue with option 2 is that there is already a run
userscript. I honestly have no idea what would happen if you tried to run two scripts in a single line, say with &&
or separated with a ;
.
Another point: apparently there has been a requirement in the past to concatenate 6 hourly data in the past
https://github.com/COSIMA/01deg_jra55_iaf/blob/01deg_jra55v140_iaf_cycle4/concat_ice_6hourlies.sh#L6
Is there a way we could accomodate that use case as well in a general way I wonder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we assume that the use might configure the output to be saved in any number of hours, this gets hard ...
Would assume there will always be data saved at 12 hours (i.e. every combination of 1/2/3/4/6/12 hourly would save a 12 hours, then we can find the days by something like $output_dir/iceh*.????-??-01-43200.nc
). Messy but probably ok.
The other complexity is that CICE timestamps are at the end of the time period. e.g. with hourly data, there is a file with a name at the midnight at the end of the month. So for January, there is a archive/output001/ice/OUTPUT/iceh_03h.1901-02-01-00000.nc
file made, but this contains January data.
So I don't know how, in Bash, to make a list of all files for a month working with both those conditions ? We would probable need to use a calendar tool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the thoughtful engagement. It sounds like we should probably shoot for the common use case to begin with and make an issue to update to a more general form at a later date.
tools/concat_ice_daily.sh
Outdated
for f in $d/iceh.????-??-01.nc; do | ||
if [[ ! -f ${f/-01.nc/-IN-PROGRESS} ]] && [[ ! -f ${f/-01.nc/-daily.nc} ]]; | ||
then | ||
touch ${f/-01.nc/-IN-PROGRESS} | ||
echo "doing ncrcat -O -L 5 -4 ${f/-01.nc/-??.nc} ${f/-01.nc/-daily.nc}" | ||
${PAYU_PATH}/ncrcat -O -L 5 -4 ${f/-01.nc/-??.nc} ${f/-01.nc/-daily.nc} && chmod g+r ${f/-01.nc/-daily.nc} && rm ${f/-01.nc/-IN-PROGRESS} | ||
if [[ ! -f ${f/-01.nc/-IN-PROGRESS} ]] && [[ -f ${f/-01.nc/-daily.nc} ]]; | ||
then | ||
for daily in ${f/-01.nc/-??.nc} | ||
do | ||
# mv $daily $daily-DELETE # rename individual daily files - user to delete | ||
rm $daily | ||
done | ||
else | ||
rm ${f/-01.nc/-IN-PROGRESS} | ||
fi | ||
fi | ||
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for f in $d/iceh.????-??-01.nc; do | |
if [[ ! -f ${f/-01.nc/-IN-PROGRESS} ]] && [[ ! -f ${f/-01.nc/-daily.nc} ]]; | |
then | |
touch ${f/-01.nc/-IN-PROGRESS} | |
echo "doing ncrcat -O -L 5 -4 ${f/-01.nc/-??.nc} ${f/-01.nc/-daily.nc}" | |
${PAYU_PATH}/ncrcat -O -L 5 -4 ${f/-01.nc/-??.nc} ${f/-01.nc/-daily.nc} && chmod g+r ${f/-01.nc/-daily.nc} && rm ${f/-01.nc/-IN-PROGRESS} | |
if [[ ! -f ${f/-01.nc/-IN-PROGRESS} ]] && [[ -f ${f/-01.nc/-daily.nc} ]]; | |
then | |
for daily in ${f/-01.nc/-??.nc} | |
do | |
# mv $daily $daily-DELETE # rename individual daily files - user to delete | |
rm $daily | |
done | |
else | |
rm ${f/-01.nc/-IN-PROGRESS} | |
fi | |
fi | |
done | |
# Don't error if there are no matching patterns | |
shopt -s nullglob | |
# Assuming `$d` contains the directory where the data resides | |
for first_file in $d/iceh.????-??-01.nc | |
do | |
# Make a list of all files we wish to concatenate | |
icefiles=(${first_file/-01.nc/-??.nc}) | |
if [ ${#icefiles[@]} -gt 0 ] | |
then | |
iceout="${first_file/-*.nc/-daily.nc}" | |
ncrcat -O -L 5 -4 "${icefiles[@]}" ${iceout} && rm "${icefiles[@]}" | |
fi | |
done |
Personally I prefer to just delete the files if the return status of the ncrcat
command is ok. Making temporary files ends up introducing extra logic to deal with them.
Note the above is untested, just a suggestion for how to reduce the complexity of the logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making temporary files ends up introducing extra logic to deal with them.
This is copied from the COSIMA scripts. I assume the temporary files were needed for some edge case? @aekiss - Do you know why the temporary files were used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to badger you @aekiss but I'm also curious if there were cases of data loss that prompted the design you implemented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Andrew said it was just for sanity / checking debug in case of failure. I am happy to remove it.
How would one do this? Are there ways to log the PBS "CPU Time" between user scripts? |
With one month of 0.1 degree data, this takes ~2.5 minutes to run in the login node. Compared to approx 1.6 hours of Walltime for the model to run. (Amazingly it turns 3.6GB into 1.5GB too!) This doesn't parallelise, and thats ~2-3% of the walltime, so I guess it is worth worrying about? (Reducing compression to level 1, reduces the time to ~1m50sec but file size goes up ~6%) |
Instead of using an |
IKR. There is a reason this is worth doing.
That is a good idea thanks @jo-basevi. I think it does run into the issue that if someone turns I did wonder if we couldn't define a |
I think moving to a payu postscript is the best plan, as it runs as a seperate PBS job, this reduces the resources held waiting for a single PE job to complete?
I think we might get rid of the need for this step in OM3, or at least remove the grid from the CICE output. Also - we've added 'nco' as a dependency in some cases. Do we need to document this somewhere (for users who don't use 'vk83' ) ? |
It looks like setting this as a postscript would stop the sync from running? https://payu.readthedocs.io/en/latest/config.html#postprocessing |
❌ Automated testing cannot be run on this branch ❌ |
@aidan - I have updated based on the review comments. Back to you. I switched to using the system nco module, rahter than adding to payu-env ? I cleaned up the script to remove the uneeded operations and only check the last archive folder. |
Yeah if postscript is used, and Also, if syncing is enabled, a |
#concatenate sea-ice daily output | ||
#script inspired from https://github.com/COSIMA/1deg_jra55_ryf/blob/master/sync_data.sh#L87-L108 | ||
|
||
out_dir=$(ls -td archive/output??? | head -1)/ice/OUTPUT #latest output dir only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out_dir=$(ls -td archive/output??? | head -1)/ice/OUTPUT #latest output dir only | |
out_dir=$(ls -dr archive/output??? | head -1)/ice/OUTPUT #latest output dir only |
for f in $out_dir/iceh.????-??-01.nc; do | ||
#concat daily files for this month | ||
echo "doing ncrcat -O -L 5 -4 ${f/-01.nc/-??.nc} ${f/-01.nc/-daily.nc}" | ||
ncrcat -O -L 5 -4 ${f/-01.nc/-??.nc} ${f/-01.nc/-daily.nc} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ncrcat -O -L 5 -4 ${f/-01.nc/-??.nc} ${f/-01.nc/-daily.nc} | |
{PAYU_PATH}/ncrcat -O -L 5 -4 ${f/-01.nc/-??.nc} ${f/-01.nc/-daily.nc} |
|
||
modules: | ||
load: | ||
- nco/5.0.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- nco/5.0.5 |
Looking down the barrel of adding this to every config, and not being confident we wouldn't have to update it in the future (see conversation about 6 hourly concatenation) I've made a new repo and moved the code to a PR there When we've got that merged I'll manually pop it in Sorry for mucking you about @anton-seaice |
Ok - no worries. Ill put my changes there. Do we still need to update the config.yaml here? |
Maybe we'll leave this open and just update the |
This was superseded by a script in a separate repo in this PR |
Add script to concatenate daily cice output into one file per month (still containing daily data) and delete the individual daily files. This relies on adding nco to the payu environment per ACCESS-NRI/payu-condaenv#24.
Following Aidan's suggestion, the script is taken from https://github.com/COSIMA/1deg_jra55_ryf/blob/master/sync_data.sh#L87-L108, and on that basis I haven't tested beyond checking that it concatenates data.
The only change I made was change the netcdf output type to -4 (netcdf4) instead of -7 (netcdf4_classic).
Looking at ncdump of output looks correct, i.e. it shows time and time bound dimensions of length 31 days for January.
Maybe @aekiss would like to review too?