-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated Field API variants of the clouds dwarf (CPU and GPU) #96
Changes from 18 commits
3dc7920
4f0ab23
50ba3b5
8e0591c
32828de
99f4da3
45a31c0
6796466
91f01f2
97eb718
0b52301
611b385
77d19d2
a8cc3a0
b5a13c8
a3bc0eb
d37f273
125741e
e455384
128382c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -111,12 +111,20 @@ endif() | |
ecbuild_add_option( FEATURE FIELD_API | ||
DESCRIPTION "Use field_api to manage GPU data offload and copyback" | ||
REQUIRED_PACKAGES "field_api" | ||
CONDITION HAVE_CUDA | ||
DEFAULT ON ) | ||
|
||
ecbuild_find_package( NAME loki ) | ||
ecbuild_find_package( NAME atlas ) | ||
|
||
ecbuild_add_option( FEATURE FIELD_API_DISABLE_MAPPED_MEMORY | ||
DESCRIPTION "Use ACC mapped memory by default in Field API objects" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That description should probably read something like "Disable the use of ACC mapped memory in Field API objects" There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, thank you for spotting that, I missed to update the description when changing this option. I have changed it now. |
||
CONDITION HAVE_FIELD_API AND field_api_HAVE_ACC AND field_api_HAVE_CUDA | ||
DEFAULT OFF ) | ||
if( HAVE_FIELD_API_DISABLE_MAPPED_MEMORY ) | ||
list(APPEND CLOUDSC_DEFINITIONS FIELD_API_DISABLE_MAPPED_MEMORY) | ||
endif() | ||
|
||
|
||
# Add option for single-precision builds | ||
ecbuild_add_option( FEATURE SINGLE_PRECISION | ||
DESCRIPTION "Build CLOUDSC in single precision" DEFAULT OFF | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -28,6 +28,9 @@ Balthasar Reuter ([email protected]) | |
prototype that validates runs against platform and language-agnostic | ||
off-line reference data via HDF5 or the Serialbox package. The kernel code | ||
also is slightly cleaner than the original version. | ||
- **dwarf-cloudsc-fortran-field**: A fortran version of CLOUDSC that uses Field API | ||
for the data structures. The intent of this version is to show how | ||
Field API is used in newer versions of the IFS. | ||
- **dwarf-cloudsc-c**: Standalone C version of the kernel that has | ||
been generated by ECMWF tools. This relies exclusively on the Serialbox | ||
validation mechanism. | ||
|
@@ -81,13 +84,18 @@ Balthasar Reuter ([email protected]) | |
- **dwarf-cloudsc-gpu-scc-field**: GPU-enabled and optimized version of | ||
CLOUDSC that uses the SCC loop layout, and uses [FIELD API](https://github.com/ecmwf-ifs/field_api) (a Fortran library purpose-built for IFS data-structures that facilitates the | ||
creation and management of field objects in scientific code) to perform device offload | ||
and copyback. The intent is to demonstrate the explicit use of pinned host memory to speed-up | ||
data transfers, as provided by the shipped prototype implmentation, and | ||
investigate the effect of different data storage allocation layouts. | ||
and copyback. | ||
The field api variant supports modern features of the FIELD API such as *field gangs* that group | ||
multiple fields and allocates them in one larger field, in order to reduce allocations and | ||
data transfers. Field gang support can be enabled at runtime by setting the environment | ||
variable `CLOUDSC_PACKED_STORAGE=ON`. If CUDA is available, then the field api variant also supports | ||
the use of allocating fields in pinned memory. This is enabled by setting the | ||
environemnt variable `CLOUDSC_FIELD_API_PINNED=ON` and will speed up data transfers between host and device. | ||
To enable this variant, a suitable CUDA installation is required and the | ||
`--with-cuda` flag needs to be passed at the build stage. This variant lets the CUDA runtime | ||
manage temporary arrays and needs a large `NV_ACC_CUDA_HEAPSIZE` | ||
(eg. `NV_ACC_CUDA_HEAPSIZE=8GB` for 160K columns.) | ||
manage temporary arrays and needs a large `NV_ACC_CUDA_HEAPSIZE` (eg. `NV_ACC_CUDA_HEAPSIZE=8GB` for 160K columns.). | ||
It is possible to disable Field API registering fields in the OpenACC data map, by passing the | ||
`--without-mapped-fields` flag at build stage. | ||
- **cloudsc-pyiface.py**: a combination of the cloudsc/cloudsc-driver routines | ||
of cloudsc-fortran with the uppermost `dwarf` program replaced with a | ||
corresponding Python script capable of HDF5 data load and | ||
|
@@ -320,8 +328,9 @@ transfer overheads will dominate timings, and that most supported GPU | |
variants aim to optimise compute kernel timings only. However, a | ||
dedicated variant `dwarf-cloudsc-gpu-scc-field` has been added to | ||
explore host-side memory pinning, which improves data transfer times | ||
and alternative data layout strategies. By default, this will allocate | ||
each array variable individually in pinned memory. A runtime flag | ||
and alternative data layout strategies. By default, pinned memory is turned off | ||
but can be turned on by setting the environment variable `CLOUDSC_FIELD_API_PINNED=ON`. | ||
This will allocate each array variable individually in pinned memory. A runtime flag | ||
`CLOUDSC_PACKED_STORAGE=ON` can be used to enable "packed" storage, | ||
where multiple arrays are stored in a single base allocation, eg. | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[No action required!] I think we can skip the 21.9 builds. That compiler version is fairly outdated now and the 23.5 tests are much more meaningful. But that can be dealt with in a subsequent CI-cleanup that is on the horizon.