Skip to content

Latest commit

 

History

History
258 lines (252 loc) · 10 KB

Overview.md

File metadata and controls

258 lines (252 loc) · 10 KB

Samples Overview

See also the results overview.

Characteristics

Name Main Characteristics / Demonstrated Features
3D Diffusion Memory Bandwidth bounded stencil code, full time integration on device. Uses Pointers for device memory swap between timesteps.
Particle Push Computationally bounded, full time integration on device. Uses Pointers for device memory swap between timesteps. Demonstrates high speedup for trigonometric functions on GPU.
Poisson on FEM Solver with Jacobi Approximation Memory bandwidth bounded Jacobi stencil code in a complete solver setup with multiple kernels. Reduction using GPU compatible BLAS calls. Uses Pointers for device memory swap between iterations.
MIDACO Ant Colony Solver with MINLP Example Heavily computationally bounded problem function, parallelized on two levels for optimal distribution on both CPU and GPU. Automatic privatization of 1D code to 3D version for GPU parallelization. Data is copied between host and device for every iteration (solver currently only running on CPU).
Simple Stencil Example Stencil code.
Stencil With Local Array Example Stencil code with local array. Tests Hybrid Fortran's array reshaping in conjunction with stencil codes.
Stencil With Passed In Scalar From Array Example Stencil code with a scalar input that's being passed in as a single value from an array in the wrapper.
Parallel Vector and Reduction Example Separate parallelizations for CPU/GPU with unified codebase, parallel vector calculations without communication. Automatic privatization of 1D code to 3D version for GPU parallelization. Shows a reduction as well.
Simple OpenACC Example Based on Parallel Vector Example, shows off the OpenACC backend and using multiple parallel regions in one subroutine.
OpenACC Branching Example Based on the OpenACC example, texts branches around parallel regions implemented using OpenACC.
OpenACC Module Data Example Tests different ways of using module data with an OpenACC implementation.
OpenACC with Hybrid Code (Device + Host code callable) Example Hybrid Fortran kernel subroutines are be callable from host-only-code when using the OpenACC implementation. This feature is demonstrated by this example.
Mixed Implementations Example Tests the @scheme directive which can be used to have different implementations for different parts of your code.
Strides Example Like parallel vector example, uses blocking of data domain (in case GPU memory is too small).
Tracing Example Tests different real- and integer data type kernels with the tracing implementation, automatically tracking down errors.
Early Returns Example Tests different return statements within your kernels.
Array Accessor Functions Example Tests more complicated array access patterns like 'a(min(n_max,i),j)' with the Hybrid Fortran parser.
5D Parallel Vector Example Tests parallel (in two dimensions) computation of up to 5D data in different configurations. This is used to emulate the data setup of many physical processes packages.
Simple Weather A unscientifically simple weather model, accelerated with Hybrid Fortran, used as an academic example to explain the framework.

Link to Sources, Available Versions and Implementation Accuracy

Name Source Root Mean Square Error Bounds Reference C Implementation (OpenACC + OpemMP) Reference CUDA C Implementation Reference Fortran Implementation (OpenACC)
3D Diffusion Link 1E-8 [3] Yes Yes Yes
Particle Push Link 1E-11 Yes Yes Yes
Poisson on FEM Solver with Jacobi Approximation Link 1E-07 [1] No No No
MIDACO Ant Colony Solver with MINLP Example Link 1E-3 [3] No No No
Simple Stencil Example Link 1E-8 No No No
Stencil With Local Array Example Link 1E-8 No No No
Stencil With Passed In Scalar From Array Example Link 1E-8 No No No
Parallel Vector and Reduction Example Link [2] 1E-8 No No No
Simple OpenACC Example Link 1E-8 No No No
OpenACC Branching Example Link 1E-8 No No No
OpenACC Module Data Example Link 1E-8 No No No
OpenACC with Hybrid Code (Device + Host code callable) Example Link 1E-8 No No No
Mixed Implementations Example Link 1E-8 No No No
Strides Example Link 1E-8 No No No
Tracing Example Link 1E-8 No No No
Early Returns Example Link 1E-8 No No No
Array Accessor Function Example Link 1E-8 No No No
5D Parallel Vector Example Link 1E-8 No No No
Simple Weather Link n/a No No No

[1]: Number of iterations to achieve this error level depends on problem domain sizes. The provided value is an upper bound for the error value after an unspecified long runtime - it 'eventually' converges. Note then that this solver's algorithm is not good enough for production use, it is only included for demonstration purposes here.

[2]: Example obtained when typing 'make example' in the Hybrid Fortran directory.

[3]: Compared to analytic solution