See also the results overview.
Name | Main Characteristics / Demonstrated Features |
---|---|
3D Diffusion | Memory Bandwidth bounded stencil code, full time integration on device. Uses Pointers for device memory swap between timesteps. |
Particle Push | Computationally bounded, full time integration on device. Uses Pointers for device memory swap between timesteps. Demonstrates high speedup for trigonometric functions on GPU. |
Poisson on FEM Solver with Jacobi Approximation | Memory bandwidth bounded Jacobi stencil code in a complete solver setup with multiple kernels. Reduction using GPU compatible BLAS calls. Uses Pointers for device memory swap between iterations. |
MIDACO Ant Colony Solver with MINLP Example | Heavily computationally bounded problem function, parallelized on two levels for optimal distribution on both CPU and GPU. Automatic privatization of 1D code to 3D version for GPU parallelization. Data is copied between host and device for every iteration (solver currently only running on CPU). |
Simple Stencil Example | Stencil code. |
Stencil With Local Array Example | Stencil code with local array. Tests Hybrid Fortran's array reshaping in conjunction with stencil codes. |
Stencil With Passed In Scalar From Array Example | Stencil code with a scalar input that's being passed in as a single value from an array in the wrapper. |
Parallel Vector and Reduction Example | Separate parallelizations for CPU/GPU with unified codebase, parallel vector calculations without communication. Automatic privatization of 1D code to 3D version for GPU parallelization. Shows a reduction as well. |
Simple OpenACC Example | Based on Parallel Vector Example, shows off the OpenACC backend and using multiple parallel regions in one subroutine. |
OpenACC Branching Example | Based on the OpenACC example, texts branches around parallel regions implemented using OpenACC. |
OpenACC Module Data Example | Tests different ways of using module data with an OpenACC implementation. |
OpenACC with Hybrid Code (Device + Host code callable) Example | Hybrid Fortran kernel subroutines are be callable from host-only-code when using the OpenACC implementation. This feature is demonstrated by this example. |
Mixed Implementations Example | Tests the @scheme directive which can be used to have different implementations for different parts of your code. |
Strides Example | Like parallel vector example, uses blocking of data domain (in case GPU memory is too small). |
Tracing Example | Tests different real- and integer data type kernels with the tracing implementation, automatically tracking down errors. |
Early Returns Example | Tests different return statements within your kernels. |
Array Accessor Functions Example | Tests more complicated array access patterns like 'a(min(n_max,i),j)' with the Hybrid Fortran parser. |
5D Parallel Vector Example | Tests parallel (in two dimensions) computation of up to 5D data in different configurations. This is used to emulate the data setup of many physical processes packages. |
Simple Weather | A unscientifically simple weather model, accelerated with Hybrid Fortran, used as an academic example to explain the framework. |
Name | Source | Root Mean Square Error Bounds | Reference C Implementation (OpenACC + OpemMP) | Reference CUDA C Implementation | Reference Fortran Implementation (OpenACC) |
---|---|---|---|---|---|
3D Diffusion | Link | 1E-8 [3] | Yes | Yes | Yes |
Particle Push | Link | 1E-11 | Yes | Yes | Yes |
Poisson on FEM Solver with Jacobi Approximation | Link | 1E-07 [1] | No | No | No |
MIDACO Ant Colony Solver with MINLP Example | Link | 1E-3 [3] | No | No | No |
Simple Stencil Example | Link | 1E-8 | No | No | No |
Stencil With Local Array Example | Link | 1E-8 | No | No | No |
Stencil With Passed In Scalar From Array Example | Link | 1E-8 | No | No | No |
Parallel Vector and Reduction Example | Link [2] | 1E-8 | No | No | No |
Simple OpenACC Example | Link | 1E-8 | No | No | No |
OpenACC Branching Example | Link | 1E-8 | No | No | No |
OpenACC Module Data Example | Link | 1E-8 | No | No | No |
OpenACC with Hybrid Code (Device + Host code callable) Example | Link | 1E-8 | No | No | No |
Mixed Implementations Example | Link | 1E-8 | No | No | No |
Strides Example | Link | 1E-8 | No | No | No |
Tracing Example | Link | 1E-8 | No | No | No |
Early Returns Example | Link | 1E-8 | No | No | No |
Array Accessor Function Example | Link | 1E-8 | No | No | No |
5D Parallel Vector Example | Link | 1E-8 | No | No | No |
Simple Weather | Link | n/a | No | No | No |
[1]: Number of iterations to achieve this error level depends on problem domain sizes. The provided value is an upper bound for the error value after an unspecified long runtime - it 'eventually' converges. Note then that this solver's algorithm is not good enough for production use, it is only included for demonstration purposes here.
[2]: Example obtained when typing 'make example' in the Hybrid Fortran directory.
[3]: Compared to analytic solution