A systematic framework for automatically deciding the right execution model for OpenCL applications on FPGAs (FPGA'20)
This repository contains:
- PART ONE: 11 OpenCL applications on FPGAs
- Source code of all the optimization combinations under four execution models
- Detailed resource consumption and absolute performance numbers of our experiments
- PART TWO: an LLVM-based automatic tool to determine whether to use two emerging OpenCL features
This work explores the optimization combinations on FPGAs for four high-level execution models: NDRange kernel (NDR), Single work-item kernel (SWI), NDRange kernel+OpenCL channel (NDR+C), Single work-item kernel+OpenCL channel (SWI+C).
And our goal is to assist OpenCL programmers to determine the most suitable execution model based on the presence or absence of three OpenCL patterns: Atomic Operation (AO), Multi-Pass Scheme (MPS) and Kernel-to-Kernel Communication (KKC).
The 11 OpenCL applications include:
Application | Description | Source |
---|---|---|
BFS | Breadth-First Search | Chai benchmarks |
RSCD | RANSAC | Chai benchmarks |
TQH | Task Queue System | Chai benchmarks |
HSTI | Histogram | Chai benchmarks |
SC | Stream Compaction | Chai benchmarks |
PAD | Padding | Chai benchmarks |
CEDD | Canny Edge Detection | Chai benchmarks |
KM | K-Means | Rodinia benchmarks |
MM | Matrix Multiplication | Intel OpenCL demos |
MS | Mandelbrot Set | Intel OpenCL demos |
PS | Prefix Sum | CUDA demos |
For more information, you can refer to Chai benchmarks, Rodinia benchmarks, Intel OpenCL demos and CUDA demos.
The evaluation of the work requires Intel Quartus Prime software (including OpenCL SDK for FPGA), its license, and FPGA hardware. The FPGA synthesis software and FPGA hardware used in this work are listed below:
- Quartus Prime Design Software 16.1
- Intel FPGA SDK for OpenCL 16.1
- Intel FPGA Runtime Environment for OpenCL: 16.1
- FPGA board support package (BSP) provided by Terasic
- FPGA device: Terasic DE5a-Net board
- Operating system: Windows 7
- Host compiler: Microsoft Visual Studio 2010
(Take CEDD application as an example:)
Path | Description |
---|---|
CEDD\ |
OpenCL application |
CEDD_test\ |
a test sample using baseline.cl(in NDRange\ ) |
cedd.sln |
Microsoft Visual Studio project for host program |
bin\ |
host program, AOCX files |
device\ |
top-level OpenCL kernel files |
host\src\ |
host source files |
input\ |
input files |
common\ |
common configuration implementations |
NDRange\ |
source code of all the optimization combinations under NDRange execution model |
SWI\ |
source code of all the optimization combinations under single work-item(SWI) execution model |
NDRange+Channel\ |
source code of all the optimization combinations under NDRange+Channel execution model |
SWI+Channel\ |
source code of all the optimization combinations under SWI+Channel execution model |
CEDD_data.xlsx |
Detailed resource consumption and absolute performance numbers |
(Take CEDD application as an example:)
We'd recommend you to compile and run the project in CEDD_test\
, which implements baseline.cl
in NDRange\
, through the following steps. After running successfully, you can replace the related code with other code in NDRange\
, or SWI\
, or NDRange+Channel\
or SWI+Channel\
to try whatever implementations you interested.
NOTE: please make sure to build the OpenCL+FPGA environment before.
- See Intel FPGA SDK for OpenCL Pro Edition: Getting Started Guide for details of installing Intel OpenCL SDK.
- Refer to vendor's manual for detailed steps of installing FPGA board and driver. The step of installation can be different for different FPGA board and operating system. For Terasic DE5a-Net board, please refer to DE5a-Net OpenCL Manual
To compile the OpenCL kernel, run:
aoc device\baseline.cl -o bin\baseline.aocx --board <i>\<board></i>
where <board> matches the board you want to target. If you are unsure of the boards available, use the following command to list available boards:
aoc --list-boards
To compile the host program, build the project in Visual Studio 2010 (or later). The compiled host program will be located at bin\host
.
Before running the host program, you should have compiled the OpenCL kernel and the host program. To launch the host program, use Ctrl + F5 or the following command:
bin\host
In our work, Clang 9.0.0 and LLVM9.0.0 are first installed in Ubuntu 14.04. Please see Getting Started with the LLVM System - Requirements to find detailed software and hardware requirements.
- The
apps
folder contains the OpenCL kernel code and the host C/C++ code of the 11 OpenCL applications. - The
Transforms
folder mainly contains the LLVM passes for IR analyzation. - The
run.sh
file is the shell script.
Please refer to LLVM's documentations for details of configuring and compiling LLVM. An LLVM getting started guideline can be found here. Or you can start to use our LLVM tool (in a Linux environment) quickly following the steps below:
If you have already finished configuring the LLVM environment, please start with Step 3.
git clone https://github.com/llvm/llvm-project.git
cd llvm-project
mkdir build
cmake -DLLVM_ENABLE_PROJECTS=clang -G "Unix Makefiles" ../llvm
make
make install
Then you can try it out:
clang --version
You may get:
clang version 9.0.0 (https://github.com/llvm/llvm-project.git 75afc0105c089171f9d85d59038617fb222c38cd)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/local/bin
- Move the
apps
folder and therun.sh
file to the same directory as thellvm-project
folder. - Add all the folders in the
analysis_passes
folder to thellvm-project/llvm/lib/Transforms
folder. - Copy the following lines into
llvm-project/llvm/lib/Transforms/CMakeLists.txt
:add_subdirectory(HasAO) add_subdirectory(NumOfKernels) add_subdirectory(IsSameBuff) add_subdirectory(IsRdWr) add_subdirectory(IsSameMAP) add_subdirectory(IsSequential) add_subdirectory(VarBuffInHost) add_subdirectory(VarBuffInKernel) add_subdirectory(Model)
- Change directory to llvm-project/.. and run:
cd llvm-project/.. /run.sh