Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenACC Model #34

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
*~
*.o
*/XSBench
7 changes: 5 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,12 +44,15 @@ This version of XSBench is written in CUDA for use with NVIDIA GPU architectures
4. **XSBench/opencl**
This version of XSBench is written in OpenCL, and can be used for CPU, GPU, FPGA, or other architectures that support OpenCL. It was written with GPUs in mind, so if running on other architectures you may need to heavily re-optimize the code. You will also likely need to edit the makefile to supply the path to your OpenCL compiler.

4. **XSBench/sycl**
5. **XSBench/sycl**
This version of XSBench is written in SYCL, and can be used for CPU, GPU, FPGA, or other architectures that support OpenCL and SYCL. It was written with GPUs in mind, so if running on other architectures you may need to heavily re-optimize the code. You will also likely need to edit the makefile to supply the path to your SYCL compiler.

5. **XSBench/hip**
6. **XSBench/hip**
This version of XSBench is written in HIP for use with GPU architectures. This version is derived from CUDA using an automatic conversion tool with only a few small manual changes.

7. XSBench/openacc
This verison of XSBench is written in OpenACC for use with GPU architectures. This version is derived from the OpenMP 4.5 implementation. You will likely need to modify the Makefile to supply the appropriate compiler and compiler flags.

## Compilation

To compile XSBench with default settings, navigate to your selected source directory and use the following command:
Expand Down
177 changes: 177 additions & 0 deletions openacc/GridInit.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,177 @@
#include "XSbench_header.h"

SimulationData grid_init_do_not_profile( Inputs in, int mype )
{
// Structure to hold all allocated simuluation data arrays
SimulationData SD;

// Keep track of how much data we're allocating
size_t nbytes = 0;

// Set the initial seed value
uint64_t seed = 42;

////////////////////////////////////////////////////////////////////
// Initialize Nuclide Grids
////////////////////////////////////////////////////////////////////

if(mype == 0) printf("Intializing nuclide grids...\n");

// First, we need to initialize our nuclide grid. This comes in the form
// of a flattened 2D array that hold all the information we need to define
// the cross sections for all isotopes in the simulation.
// The grid is composed of "NuclideGridPoint" structures, which hold the
// energy level of the grid point and all associated XS data at that level.
// An array of structures (AOS) is used instead of
// a structure of arrays, as the grid points themselves are accessed in
// a random order, but all cross section interaction channels and the
// energy level are read whenever the gridpoint is accessed, meaning the
// AOS is more cache efficient.

// Initialize Nuclide Grid
SD.length_nuclide_grid = in.n_isotopes * in.n_gridpoints;
SD.nuclide_grid = (NuclideGridPoint *) malloc( SD.length_nuclide_grid * sizeof(NuclideGridPoint));
assert(SD.nuclide_grid != NULL);
nbytes += SD.length_nuclide_grid * sizeof(NuclideGridPoint);
for( int i = 0; i < SD.length_nuclide_grid; i++ )
{
SD.nuclide_grid[i].energy = LCG_random_double(&seed);
SD.nuclide_grid[i].total_xs = LCG_random_double(&seed);
SD.nuclide_grid[i].elastic_xs = LCG_random_double(&seed);
SD.nuclide_grid[i].absorbtion_xs = LCG_random_double(&seed);
SD.nuclide_grid[i].fission_xs = LCG_random_double(&seed);
SD.nuclide_grid[i].nu_fission_xs = LCG_random_double(&seed);
}

// Sort so that each nuclide has data stored in ascending energy order.
for( int i = 0; i < in.n_isotopes; i++ )
qsort( &SD.nuclide_grid[i*in.n_gridpoints], in.n_gridpoints, sizeof(NuclideGridPoint), NGP_compare);

// error debug check
/*
for( int i = 0; i < in.n_isotopes; i++ )
{
printf("NUCLIDE %d ==============================\n", i);
for( int j = 0; j < in.n_gridpoints; j++ )
printf("E%d = %lf\n", j, SD.nuclide_grid[i * in.n_gridpoints + j].energy);
}
*/


////////////////////////////////////////////////////////////////////
// Initialize Acceleration Structure
////////////////////////////////////////////////////////////////////

if( in.grid_type == NUCLIDE )
{
SD.length_unionized_energy_array = 0;
SD.length_index_grid = 0;
}

if( in.grid_type == UNIONIZED )
{
if(mype == 0) printf("Intializing unionized grid...\n");

// Allocate space to hold the union of all nuclide energy data
SD.length_unionized_energy_array = in.n_isotopes * in.n_gridpoints;
SD.unionized_energy_array = (double *) malloc( SD.length_unionized_energy_array * sizeof(double));
assert(SD.unionized_energy_array != NULL );
nbytes += SD.length_unionized_energy_array * sizeof(double);

// Copy energy data over from the nuclide energy grid
for( int i = 0; i < SD.length_unionized_energy_array; i++ )
SD.unionized_energy_array[i] = SD.nuclide_grid[i].energy;

// Sort unionized energy array
qsort( SD.unionized_energy_array, SD.length_unionized_energy_array, sizeof(double), double_compare);

// Allocate space to hold the acceleration grid indices
SD.length_index_grid = SD.length_unionized_energy_array * in.n_isotopes;
SD.index_grid = (int *) malloc( SD.length_index_grid * sizeof(int));
assert(SD.index_grid != NULL);
nbytes += SD.length_index_grid * sizeof(int);

// Generates the double indexing grid
int * idx_low = (int *) calloc( in.n_isotopes, sizeof(int));
assert(idx_low != NULL );
double * energy_high = (double *) malloc( in.n_isotopes * sizeof(double));
assert(energy_high != NULL );

for( int i = 0; i < in.n_isotopes; i++ )
energy_high[i] = SD.nuclide_grid[i * in.n_gridpoints + 1].energy;

for( long e = 0; e < SD.length_unionized_energy_array; e++ )
{
double unionized_energy = SD.unionized_energy_array[e];
for( long i = 0; i < in.n_isotopes; i++ )
{
if( unionized_energy < energy_high[i] )
SD.index_grid[e * in.n_isotopes + i] = idx_low[i];
else if( idx_low[i] == in.n_gridpoints - 2 )
SD.index_grid[e * in.n_isotopes + i] = idx_low[i];
else
{
idx_low[i]++;
SD.index_grid[e * in.n_isotopes + i] = idx_low[i];
energy_high[i] = SD.nuclide_grid[i * in.n_gridpoints + idx_low[i] + 1].energy;
}
}
}

free(idx_low);
free(energy_high);
}

if( in.grid_type == HASH )
{
if(mype == 0) printf("Intializing hash grid...\n");
SD.length_unionized_energy_array = 0;
SD.length_index_grid = in.hash_bins * in.n_isotopes;
SD.index_grid = (int *) malloc( SD.length_index_grid * sizeof(int));
assert(SD.index_grid != NULL);
nbytes += SD.length_index_grid * sizeof(int);

double du = 1.0 / in.hash_bins;

// For each energy level in the hash table
#pragma omp parallel for
for( long e = 0; e < in.hash_bins; e++ )
{
double energy = e * du;

// We need to determine the bounding energy levels for all isotopes
for( long i = 0; i < in.n_isotopes; i++ )
{
SD.index_grid[e * in.n_isotopes + i] = grid_search_nuclide( in.n_gridpoints, energy, SD.nuclide_grid + i * in.n_gridpoints, 0, in.n_gridpoints-1);
}
}
}

////////////////////////////////////////////////////////////////////
// Initialize Materials and Concentrations
////////////////////////////////////////////////////////////////////
if(mype == 0) printf("Intializing material data...\n");

// Set the number of nuclides in each material
SD.num_nucs = load_num_nucs(in.n_isotopes);
SD.length_num_nucs = 12; // There are always 12 materials in XSBench

// Intialize the flattened 2D grid of material data. The grid holds
// a list of nuclide indices for each of the 12 material types. The
// grid is allocated as a full square grid, even though not all
// materials have the same number of nuclides.
SD.mats = load_mats(SD.num_nucs, in.n_isotopes, &SD.max_num_nucs);
SD.length_mats = SD.length_num_nucs * SD.max_num_nucs;

// Intialize the flattened 2D grid of nuclide concentration data. The grid holds
// a list of nuclide concentrations for each of the 12 material types. The
// grid is allocated as a full square grid, even though not all
// materials have the same number of nuclides.
SD.concs = load_concs(SD.num_nucs, SD.max_num_nucs);
SD.length_concs = SD.length_mats;

if(mype == 0) printf("Intialization complete. Allocated %.0lf MB of data.\n", nbytes/1024.0/1024.0 );

return SD;

}
115 changes: 115 additions & 0 deletions openacc/Main.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
#include "XSbench_header.h"

#ifdef MPI
#include<mpi.h>
#endif

int main( int argc, char* argv[] )
{
// =====================================================================
// Initialization & Command Line Read-In
// =====================================================================
int version = 20;
int mype = 0;
double omp_start, omp_end;
int nprocs = 1;
unsigned long long verification;

#ifdef MPI
MPI_Status stat;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &mype);
#endif

// Process CLI Fields -- store in "Inputs" structure
Inputs in = read_CLI( argc, argv );

// Set number of OpenMP Threads
//omp_set_num_threads(in.nthreads);

// Print-out of Input Summary
if( mype == 0 )
print_inputs( in, nprocs, version );

// =====================================================================
// Prepare Nuclide Energy Grids, Unionized Energy Grid, & Material Data
// This is not reflective of a real Monte Carlo simulation workload,
// therefore, do not profile this region!
// =====================================================================

SimulationData SD;

// If read from file mode is selected, skip initialization and load
// all simulation data structures from file instead
if( in.binary_mode == READ )
SD = binary_read(in);
else
SD = grid_init_do_not_profile( in, mype );

// If writing from file mode is selected, write all simulation data
// structures to file
if( in.binary_mode == WRITE && mype == 0 )
binary_write(in, SD);


// =====================================================================
// Cross Section (XS) Parallel Lookup Simulation
// This is the section that should be profiled, as it reflects a
// realistic continuous energy Monte Carlo macroscopic cross section
// lookup kernel.
// =====================================================================

if( mype == 0 )
{
printf("\n");
border_print();
center_print("SIMULATION", 79);
border_print();
}

// Start Simulation Timer
omp_start = omp_get_wtime();

// Run simulation
if( in.simulation_method == EVENT_BASED )
{
if( in.kernel_id == 0 )
verification = run_event_based_simulation(in, SD, mype);
else
{
printf("Error: No kernel ID %d found!\n", in.kernel_id);
exit(1);
}
}
else
{
printf("History-based simulation not implemented in OpenMP offload code. Instead,\nuse the event-based method with \"-m event\" argument.\n");
exit(1);
}

if( mype == 0)
{
printf("\n" );
printf("Simulation complete.\n" );
}

// End Simulation Timer
omp_end = omp_get_wtime();

// =====================================================================
// Output Results & Finalize
// =====================================================================

// Final Hash Step
verification = verification % 999983;

// Print / Save Results and Exit
int is_invalid_result = print_results( in, mype, omp_end-omp_start, nprocs, verification );

#ifdef MPI
MPI_Finalize();
#endif

return is_invalid_result;
}
Loading