Skip to content

Commit

Permalink
Merge pull request #16 from CNugteren/development
Browse files Browse the repository at this point in the history
Various improvements
  • Loading branch information
CNugteren committed May 18, 2015
2 parents 645bfd2 + 2416b3b commit fe52b45
Show file tree
Hide file tree
Showing 16 changed files with 353 additions and 223 deletions.
8 changes: 8 additions & 0 deletions CHANGELOG
Original file line number Diff line number Diff line change
@@ -1,4 +1,12 @@

Version 1.5.1
- Improved the GEMM example to support the Intel MIC (Xeon Phi) accelerators
- Updated compiler check and compiler flags
- Adds support for multiple OpenCL kernel files at once (e.g. when wanting to include a header file)
- Adds support for the std::complex data-types
- Fixed some compilation warnings regarding size_t conversions
- Updated the FindOpenCL.cmake file

Version 1.5.0
- OpenCL local work size and memory size constraints are now automatically handled
- Greatly improved the new 2D convolution example:
Expand Down
30 changes: 26 additions & 4 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ cmake_minimum_required(VERSION 2.8)
project("cltune" CXX)
set(cltune_VERSION_MAJOR 1)
set(cltune_VERSION_MINOR 5)
set(cltune_VERSION_PATCH 0)
set(cltune_VERSION_PATCH 1)

# Options
option(SAMPLES "Enable compilation of sample programs" ON)
Expand All @@ -46,12 +46,34 @@ set(CMAKE_INSTALL_RPATH_USE_LINK_PATH false) # Don't add the automatically deter
# Compiler-version check
if("${CMAKE_CXX_COMPILER_ID}" STREQUAL "GNU")
if (CMAKE_CXX_COMPILER_VERSION VERSION_LESS 4.9)
message(FATAL_ERROR "GCC version must be at least 4.9 (for full C++11 compatibility)")
message(FATAL_ERROR "GCC version must be at least 4.9")
endif()
elseif ("${CMAKE_CXX_COMPILER_ID}" STREQUAL "Clang")
if (CMAKE_CXX_COMPILER_VERSION VERSION_LESS 3.3)
message(FATAL_ERROR "Clang version must be at least 3.3")
endif()
elseif ("${CMAKE_CXX_COMPILER_ID}" STREQUAL "AppleClang")
if (CMAKE_CXX_COMPILER_VERSION VERSION_LESS 5.0)
message(FATAL_ERROR "Clang version must be at least 5.0")
endif()
elseif ("${CMAKE_CXX_COMPILER_ID}" STREQUAL "Intel")
if (CMAKE_CXX_COMPILER_VERSION VERSION_LESS 14.0)
message(FATAL_ERROR "ICC version must be at least 14.0")
endif()
elseif ("${CMAKE_CXX_COMPILER_ID}" STREQUAL "MSVC")
if (CMAKE_CXX_COMPILER_VERSION VERSION_LESS 18.0)
message(FATAL_ERROR "Visual Studio version must be at least 18.0")
endif()
endif()

# C++11 compiler settings
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -O3 -std=c++11 -Wall -Wno-comment")
# C++ compiler settings
set(FLAGS "-O3 -std=c++11")
if("${CMAKE_CXX_COMPILER_ID}" STREQUAL "GNU")
set(FLAGS "${FLAGS} -Wall -Wno-comment")
elseif ("${CMAKE_CXX_COMPILER_ID}" MATCHES "Clang")
#set(FLAGS "${FLAGS} -Weverything -Wno-c++98-compat -Wno-c++98-compat-pedantic -Wno-padded")
endif()
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${FLAGS}")

# ==================================================================================================

Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,9 +47,9 @@ Before we start using the tuner, we'll have to create one. The constructor takes

cltune::Tuner my_tuner(0, 1); // Tuner on device 1 of OpenCL platform 0

Now that we have a tuner, we can add a tuning kernel. This is done by providing the path to an OpenCL kernel (first argument), the name of the kernel (second argument), a list of global thread dimensions (third argument), and a list of local thread or workgroup dimensions (fourth argument). Here is an example:
Now that we have a tuner, we can add a tuning kernel. This is done by providing a list of paths to OpenCL kernel files (first argument), the name of the kernel (second argument), a list of global thread dimensions (third argument), and a list of local thread or workgroup dimensions (fourth argument). Here is an example:

int id = my_tuner.AddKernel("path/to/kernel.opencl", "my_kernel", {1024,512}, {16,8});
size_t id = my_tuner.AddKernel({"path/to/kernel.opencl"}, "my_kernel", {1024,512}, {16,8});

Notice that the AddKernel function returns an integer: it is the ID of the added kernel. We'll need this ID when we want to add tuning parameters to this kernel. Let's say that our kernel has two pre-processor parameters named `PARAM_1` and `PARAM_2`:

Expand All @@ -58,7 +58,7 @@ Notice that the AddKernel function returns an integer: it is the ID of the added

Now that we've added a kernel and its parameters, we can add another one if we wish. When we're done, there are a couple of things left to be done. Let's start with adding an reference kernel. This reference kernel can provide the tuner with the ground-truth and is optional - only when it is provided will the tuner perform verification checks to ensure correctness.

my_tuner.SetReference("path/to/reference.opencl", "my_reference", {8192}, {128});
my_tuner.SetReference({"path/to/reference.opencl"}, "my_reference", {8192}, {128});

The tuner also needs to know which arguments the kernels take. Scalar arguments can be provided as-is and are passed-by-value, whereas arrays have to be provided as C++ `std::vector`s. That's right, we won't have to create OpenCL buffers, CLTune will handle that for us! Here is an example:

Expand Down
156 changes: 69 additions & 87 deletions cmake/Modules/FindOpenCL.cmake
Original file line number Diff line number Diff line change
@@ -1,104 +1,86 @@
# ########################################################################
# Copyright 2013 Advanced Micro Devices, Inc.
# ==================================================================================================
# This file is part of the CLTune project, which loosely follows the Google C++ styleguide and uses
# a tab-size of two spaces and a max-width of 100 characters per line.
#
# Author: [email protected] (Cedric Nugteren)
#
# Defines the following variables:
# OPENCL_FOUND Boolean holding whether or not the OpenCL library was found
# OPENCL_INCLUDE_DIRS The OpenCL include directory
# OPENCL_LIBRARIES The OpenCL library
#
# In case OpenCL is not installed in the default directory, set the OPENCL_ROOT variable to point to
# the root of OpenCL, such that 'OpenCL/cl.h' or 'CL/cl.h' can be found in $OPENCL_ROOT/include.
# This can either be done using an environmental variable (e.g. export OPENCL_ROOT=/path/to/opencl)
# or using a CMake variable (e.g. cmake -DOPENCL_ROOT=/path/to/opencl ..).
#
# --------------------------------------------------------------------------------------------------
#
# Copyright 2014 SURFsara
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ########################################################################


# Locate an OpenCL implementation.
# Currently supports AMD APP SDK (http://developer.amd.com/sdks/AMDAPPSDK/Pages/default.aspx/)
#
# Defines the following variables:
#
# OPENCL_FOUND - Found the OPENCL framework
# OPENCL_INCLUDE_DIRS - Include directories
#
# Also defines the library variables below as normal
# variables. These contain debug/optimized keywords when
# a debugging library is found.
#
# OPENCL_LIBRARIES - libopencl
#
# Accepts the following variables as input:
#
# OPENCL_ROOT - (as a CMake or environment variable)
# The root directory of the OpenCL implementation found
#
# FIND_LIBRARY_USE_LIB64_PATHS - Global property that controls whether findOpenCL should search for
# 64bit or 32bit libs
#-----------------------
# Example Usage:
#
# find_package(OPENCL REQUIRED)
# include_directories(${OPENCL_INCLUDE_DIRS})
#
# add_executable(foo foo.cc)
# target_link_libraries(foo ${OPENCL_LIBRARIES})
#
#-----------------------
# ==================================================================================================

find_path(OPENCL_INCLUDE_DIRS
NAMES OpenCL/cl.h CL/cl.h
HINTS
${OPENCL_ROOT}/include
$ENV{AMDAPPSDKROOT}/include
$ENV{CUDA_PATH}/include
PATHS
/usr/include
/usr/local/include
/usr/local/cuda/include
/opt/cuda/include
DOC "OpenCL header file path"
# Sets the possible install locations
set(OPENCL_HINTS
${OPENCL_ROOT}
$ENV{OPENCL_ROOT}
$ENV{AMDAPPSDKROOT}
$ENV{CUDA_PATH}
$ENV{INTELOCLSDKROOT}
$ENV{NVSDKCOMPUTE_ROOT}
$ENV{ATISTREAMSDKROOT}
)
set(OPENCL_PATHS
/usr/local/cuda
/opt/cuda
/usr
/usr/local
)
mark_as_advanced( OPENCL_INCLUDE_DIRS )

# Search for 64bit libs if FIND_LIBRARY_USE_LIB64_PATHS is set to true in the global environment, 32bit libs else
get_property( LIB64 GLOBAL PROPERTY FIND_LIBRARY_USE_LIB64_PATHS )
# Finds the include directories
find_path(OPENCL_INCLUDE_DIRS
NAMES OpenCL/cl.h CL/cl.h
HINTS ${OPENCL_HINTS}
PATH_SUFFIXES include OpenCL/common/inc inc include/x86_64 include/x64
PATHS ${OPENCL_PATHS}
DOC "OpenCL include header OpenCL/cl.h or CL/cl.h"
)
mark_as_advanced(OPENCL_INCLUDE_DIRS)

if( LIB64 )
find_library( OPENCL_LIBRARIES
NAMES OpenCL
HINTS
${OPENCL_ROOT}/lib
$ENV{AMDAPPSDKROOT}/lib
$ENV{CUDA_PATH}/lib
DOC "OpenCL dynamic library path"
PATH_SUFFIXES x86_64 x64
PATHS
/usr/lib
/usr/local/cuda/lib
/opt/cuda/lib
)
else( )
find_library( OPENCL_LIBRARIES
NAMES OpenCL
HINTS
${OPENCL_ROOT}/lib
$ENV{AMDAPPSDKROOT}/lib
$ENV{CUDA_PATH}/lib
DOC "OpenCL dynamic library path"
PATH_SUFFIXES x86 Win32
PATHS
/usr/lib
/usr/local/cuda/lib
/opt/cuda/lib
)
endif( )
mark_as_advanced( OPENCL_LIBRARIES )
# Finds the library
find_library(OPENCL_LIBRARIES
NAMES OpenCL
HINTS ${OPENCL_HINTS}
PATH_SUFFIXES lib lib64 lib/x86_64 lib/x64 lib/x86 lib/Win32 OpenCL/common/lib/x64
PATHS ${OPENCL_PATHS}
DOC "OpenCL library"
)
mark_as_advanced(OPENCL_LIBRARIES)

include( FindPackageHandleStandardArgs )
FIND_PACKAGE_HANDLE_STANDARD_ARGS( OPENCL DEFAULT_MSG OPENCL_LIBRARIES OPENCL_INCLUDE_DIRS )
# ==================================================================================================

if( NOT OPENCL_FOUND )
message( STATUS "FindOpenCL looked for libraries named: OpenCL" )
# Notification messages
if(NOT OPENCL_INCLUDE_DIRS)
message(STATUS "Could NOT find 'OpenCL/cl.h' or 'CL/cl.h', install OpenCL or set OPENCL_ROOT")
endif()
if(NOT OPENCL_LIBRARIES)
message(STATUS "Could NOT find OpenCL library, install it or set OPENCL_ROOT")
endif()

# Determines whether or not OpenCL was found
include(FindPackageHandleStandardArgs)
find_package_handle_standard_args(OpenCL DEFAULT_MSG OPENCL_INCLUDE_DIRS OPENCL_LIBRARIES)

# ==================================================================================================
28 changes: 17 additions & 11 deletions include/cltune.h
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ class Tuner {

// Helper structure to store an OpenCL memory argument for a kernel
struct MemArgument {
int index; // The OpenCL kernel-argument index
size_t index; // The OpenCL kernel-argument index
size_t size; // The number of elements (not bytes)
MemType type; // The data-type (e.g. float)
cl::Buffer buffer; // The host memory and OpenCL buffer on the device
Expand All @@ -91,25 +91,25 @@ class Tuner {

// Initialize either with platform 0 and device 0 or with a custom platform/device
explicit Tuner();
explicit Tuner(int platform_id, int device_id);
explicit Tuner(size_t platform_id, size_t device_id);
~Tuner();

// Adds a new kernel to the list of tuning-kernels and returns a unique ID (to be used when
// adding tuning parameters)
int AddKernel(const std::string &filename, const std::string &kernel_name,
const cl::NDRange &global, const cl::NDRange &local);
size_t AddKernel(const std::vector<std::string> &filenames, const std::string &kernel_name,
const cl::NDRange &global, const cl::NDRange &local);

// Sets the reference kernel. Same as the AddKernel function, but in this case there is only one
// reference kernel. Calling this function again will overwrite the previous reference kernel.
void SetReference(const std::string &filename, const std::string &kernel_name,
void SetReference(const std::vector<std::string> &filenames, const std::string &kernel_name,
const cl::NDRange &global, const cl::NDRange &local);

// Adds a new tuning parameter for a kernel with a specific ID. The parameter has a name, the
// number of values, and a list of values.
// TODO: Remove all following functions (those that take "const size_t id" as first argument) and
// make the KernelInfo class publicly accessible instead.
void AddParameter(const size_t id, const std::string parameter_name,
const std::initializer_list<int> values);
void AddParameter(const size_t id, const std::string &parameter_name,
const std::initializer_list<size_t> &values);

// Modifies the global or local thread-size (in NDRange form) by one of the parameters (in
// StringRange form). The modifier can be multiplication or division.
Expand Down Expand Up @@ -137,8 +137,8 @@ class Tuner {

// Configures a specific search method. The default search method is "FullSearch"
void UseFullSearch();
void UseRandomSearch(const float fraction);
void UseAnnealing(const float fraction, const double max_temperature);
void UseRandomSearch(const double fraction);
void UseAnnealing(const double fraction, const double max_temperature);
void UsePSO(const double fraction, const size_t swarm_size, const double influence_global,
const double influence_local, const double influence_random);

Expand Down Expand Up @@ -173,6 +173,7 @@ class Tuner {
// Downloads the output of a tuning run and compares it against the reference run
bool VerifyOutput();
template <typename T> bool DownloadAndCompare(const MemArgument &device_buffer, const size_t i);
template <typename T> double AbsoluteDifference(const T reference, const T result);

// Prints results of a particular kernel run
void PrintResult(FILE* fp, const TunerResult &result, const std::string &message) const;
Expand All @@ -197,11 +198,16 @@ class Tuner {
std::vector<double> search_args_;

// Storage of kernel sources, arguments, and parameters
int argument_counter_;
size_t argument_counter_;
std::vector<KernelInfo> kernels_;
std::vector<MemArgument> arguments_input_;
std::vector<MemArgument> arguments_output_;
std::vector<std::pair<int,int>> arguments_scalar_;
std::vector<std::pair<size_t,int>> arguments_int_;
std::vector<std::pair<size_t,size_t>> arguments_size_t_;
std::vector<std::pair<size_t,float>> arguments_float_;
std::vector<std::pair<size_t,double>> arguments_double_;
std::vector<std::pair<size_t,float2>> arguments_float2_;
std::vector<std::pair<size_t,double2>> arguments_double2_;

// Storage for the reference kernel and output
std::unique_ptr<KernelInfo> reference_kernel_;
Expand Down
10 changes: 5 additions & 5 deletions include/cltune/kernel_info.h
Original file line number Diff line number Diff line change
Expand Up @@ -55,14 +55,14 @@ class KernelInfo {
// Helper structure holding a parameter name and a list of all values
struct Parameter {
std::string name;
std::vector<int> values;
std::vector<size_t> values;
};

// Helper structure holding a setting: a name and a value. Multiple settings combined make a
// single configuration.
struct Setting {
std::string name;
int value;
size_t value;
std::string GetDefine() const { return "#define "+name+" "+GetValueString()+"\n"; }
std::string GetConfig() const { return name+" "+GetValueString(); }
std::string GetDatabase() const { return "{\""+name+"\","+GetValueString()+"}"; }
Expand All @@ -78,14 +78,14 @@ class KernelInfo {

// Helper structure holding a constraint on parameters. This constraint consists of a constraint
// function object and a vector of paramater names represented as strings.
using ConstraintFunction = std::function<bool(std::vector<int>)>;
using ConstraintFunction = std::function<bool(std::vector<size_t>)>;
struct Constraint {
ConstraintFunction valid_if;
std::vector<std::string> parameters;
};

// As above, but for local memory size.
using LocalMemoryFunction = std::function<size_t(std::vector<int>)>;
using LocalMemoryFunction = std::function<size_t(std::vector<size_t>)>;
struct LocalMemory {
LocalMemoryFunction amount;
std::vector<std::string> parameters;
Expand Down Expand Up @@ -116,7 +116,7 @@ class KernelInfo {
void set_local_base(cl::NDRange local) { local_base_ = local; local_ = local; }

// Adds a new parameter with a name and a vector of possible values
void AddParameter(const std::string name, const std::vector<int> values);
void AddParameter(const std::string &name, const std::vector<size_t> &values);

// Checks wheter a parameter exists, returns "true" if it does exist
bool ParameterExists(const std::string parameter_name);
Expand Down
9 changes: 8 additions & 1 deletion include/cltune/memory.h
Original file line number Diff line number Diff line change
Expand Up @@ -34,14 +34,21 @@
#include <vector>
#include <stdexcept>
#include <memory>
#include <complex>

#include "cltune/opencl.h"

namespace cltune {
// =================================================================================================

// Shorthands for complex data-types
using float2 = std::complex<float>; // cl_float2;
using double2 = std::complex<double>; // cl_double2;

// =================================================================================================

// Enumeration of currently supported data-types by this class
enum class MemType { kInt, kFloat, kDouble };
enum class MemType { kInt, kFloat, kDouble, kFloat2, kDouble2 };

// See comment at top of file for a description of the class
template <typename T>
Expand Down
Loading

0 comments on commit fe52b45

Please sign in to comment.