diff --git a/CMakeLists.txt b/CMakeLists.txt index 62c0e59..96b8d63 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -92,10 +92,12 @@ list(SORT sources) source_group(Headers FILES ${headers}) source_group(Sources FILES ${sources}) -#add_subdirectory(stream_compaction) # TODO: uncomment if using your stream compaction +add_subdirectory(stream_compaction) +add_subdirectory(oidn) # TODO: uncomment if using your stream compaction cuda_add_executable(${CMAKE_PROJECT_NAME} ${sources} ${headers}) target_link_libraries(${CMAKE_PROJECT_NAME} ${LIBRARIES} - #stream_compaction # TODO: uncomment if using your stream compaction + stream_compaction # TODO: uncomment if using your stream compaction + OpenImageDenoise ) diff --git a/README.md b/README.md index 110697c..c339ec0 100644 --- a/README.md +++ b/README.md @@ -3,11 +3,165 @@ CUDA Path Tracer **University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 3** -* (TODO) YOUR NAME HERE -* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab) +* Dewang Sultania + * [LinkedIn](https://www.linkedin.com/in/dewang-sultania/) +* Tested on: Windows 10, Intel Xeon E-2176M @ 2.70GHz 16GB, Quadro P2000 4GB (Personal Computer) -### (TODO: Your README) +![](img/main.png) -*DO NOT* leave the README to the last minute! It is a crucial part of the -project, and we will not be able to grade you without a good README. +### Table of Contents +1. [Overview](#overview) +2. [Graphics Features](#graphics) + 1. [Diffusion](#diffusion) + 2. [Reflection](#reflection) + 3. [Refraction with Fresnel effects using Schlick's approximation](#refraction) + 4. [Anti Aliasing](#anti-alias) + 5. [Motion Blur](#motion-blur) + 6. [Open Image AI Denoiser](#denoiser) +3. [Optimization Features](#optimization) + 1. [Stream Compaction](#stream) + 2. [Material Sorting](#material-sort) + 3. [Cache First Bounce](#cache) +4. [References](#references) + + + +## Overview + +This repository contains code for GPU implementation of a Monte-Carlo Path Tracer. It is a rendering technique for generating an image by tracing the path of light as pixel in an image plane and simulating the effects of its encounters with virtual objects. The technique is capable of producing a very high degree of visual realism, usually higher than that of typical scanline rendering methods, but at a greater computational cost. This makes ray tracing best suited for applications where taking a relatively long time to render a frame can be tolerated, such as in still images and film and television visual effects, and more poorly suited for real-time applications such as video games where speed is critical. Ray tracing is capable of simulating a wide variety of optical effects, such as reflection and refraction, scattering, and dispersion phenomena (such as chromatic aberration). + +![](img/path_tracer.png) + + + +## Graphics Features + +This section contains description and results of the graphics features that were implemented. + + + +#### Diffusion + +Diffuse Shading is obtained using a cosine-weighted sampling function. It basically means that the incident light is uniformly scattered in all directions. + + + +#### Reflection + +Reflection is implemented using glm::reflect. + +![](img/reflection.jpg) + + + +#### Refraction with Fresnel effects using Schlick's approximation + + +![](img/refraction.png) + +Refraction was implemented using glm::refract and there is also a toggle for if we want to use Schlick's approximation. Special Case of total internal reflection was also handled. + +Without Schlick's approximation | With Schlick's approximation +:-------------------------:|:-------------------------: +![](img/refraction_no_fresnel.png) | ![](img/fresnel.png) + + + +#### Anti Aliasing + +Anti aliasing is achieved by jittering the origin of a ray sent out from each pixel using unifrom sampling. + +With Anti Aliasing | Without Anti Aliasing +:-------------------------:|:-------------------------: +![](img/alias.JPG) | ![](img/no-alias.JPG) + + + +#### Motion Blur +Motion blur is the averaging of multiple shots in a motion. + ![](img/motion_blur.png) + + + +#### Open Image AI Denoiser +I was able to get the denoiser kind of working and all credits to Intel's out of the world documentation (Really only aliens can understand it). Here is my blooper reel from that. + +Blooper 1 | Blooper 2 | Blooper 3 | Blooper 4 +:-------------------------:|:-------------------------:|:-------------------------:|:-------------------------: +![](img/denoise_blooper.png) | ![](img/denoise_blooper2.png)| ![](img/denoise_blooper3.png) | ![](img/denoise_blooper4.png) + +Finally after fixing my issues I was able to get it working: + + + +The library expects the pixel values to be in little endian format according to the documentation, so I had written a ReverseFloat function to convert big-endian to little endian, but doing so resulted in the blooper reel, when I did not use that function, I got this result for output after 5 iterations. + + + +Original | Denoised +:-------------------------:|:-------------------------: +![](img/denoise_orig.png) | ![](img/denoise_decent.png) + + +Then I also passed albedos and normals buffers to the library, the results after doing that were: + +Original | Denoised +:-------------------------:|:-------------------------: +![](img/original_albedo_nromal.png) | ![](img/denoise_albedo_nromal.png) + +It was a hard task to set it up and build it. I have listed down the steps I had to take here and I think this can serve as an easy documentation of how to set it up and get it running because the existing one is simply great!! + +* Install tbb from here: https://github.com/intel/tbb/releases +* Then run this command : ```git clone --recursive https://github.com/OpenImageDenoise/oidn.git``` +* Then copy the oidn folder into your Path Tracer folder +* Now in your CMakeLists.txt add the lines ```add_subdirectory(oidn)``` and add ```OpenImageDenoise``` to target_link_libraries +* Then run ```cmake-gui ..``` and add the following four entries before clicking configure: + + * ```TBB_ROOT``` which is equal to something like ```C:/Users/dewan/Desktop/tbb2019_20190605oss_win/tbb2019_20190605oss``` +* ```TBB_INCLUDE_DIR``` which is something like ```C:\Users\dewan\Desktop\tbb2019_20190605oss_win\tbb2019_20190605oss\include``` + * ```TBB_LIBRARY``` which is something like ```C:\Users\dewan\Desktop\tbb2019_20190605oss_win\tbb2019_20190605oss\lib\intel64\vc14\tbb_debug.lib``` + * ```TBB_LIBRARY_MALLOC``` which is something like ```C:\Users\dewan\Desktop\tbb2019_20190605oss_win\tbb2019_20190605oss\lib\intel64\vc14\tbbmalloc_debug.lib``` +* Now install oidn from here ```https://github.com/OpenImageDenoise/oidn/releases``` and copy ```OpenImageDenoise.dll, tbb.dll, tbbmalloc.dll``` from the bin folder to your System32 windows folder. + +The code should build now, atleast it did for me, but I make no guarantees as these steps were results of solving all the error messages that were thrown at me when trying to run this. + + + + +## Optimization Features + + + +#### Stream Compaction + +After each bounce, some rays would hit the light source and terminate. We can stop the threads that are assigned to these rays or equivalently run less threads in the next one. Using thrust::partition function all the active rays are kept together after every iteration and then only those need to be started. + + + +#### Material Sort + +This idea is based on the fact that if neighboring threads are executing same material type, they will run the same instructions which will result in less warp divergence. + + + +#### Cache First Bounce +The rays always start at the pixel they belong to and shoot out at the same location. So we can cache the first bounce in the first iteration and we won't need to recalculate their intersections again. + + + +A performance comparison of these optimizations can be seen below: + +![](img/perf.JPG) + + + + + +## References + +1. https://en.wikipedia.org/wiki/Ray_tracing_(graphics) +2. https://en.wikipedia.org/wiki/Schlick%27s_approximation +3. http://viclw17.github.io/2018/07/17/raytracing-camera-and-msaa/ +4. https://www.andrew.cmu.edu/user/hgifford/projects/msaa.pdf +5. https://github.com/RayTracing/InOneWeekend diff --git a/img/alias.JPG b/img/alias.JPG new file mode 100644 index 0000000..609b54b Binary files /dev/null and b/img/alias.JPG differ diff --git a/img/anti-alias.JPG b/img/anti-alias.JPG new file mode 100644 index 0000000..580334e Binary files /dev/null and b/img/anti-alias.JPG differ diff --git a/img/denoise_albedo_nromal.png b/img/denoise_albedo_nromal.png new file mode 100644 index 0000000..d34ba8a Binary files /dev/null and b/img/denoise_albedo_nromal.png differ diff --git a/img/denoise_blooper.png b/img/denoise_blooper.png new file mode 100644 index 0000000..0dd97a9 Binary files /dev/null and b/img/denoise_blooper.png differ diff --git a/img/denoise_blooper2.png b/img/denoise_blooper2.png new file mode 100644 index 0000000..4ee66a6 Binary files /dev/null and b/img/denoise_blooper2.png differ diff --git a/img/denoise_blooper3.png b/img/denoise_blooper3.png new file mode 100644 index 0000000..f6c86c6 Binary files /dev/null and b/img/denoise_blooper3.png differ diff --git a/img/denoise_blooper4.png b/img/denoise_blooper4.png new file mode 100644 index 0000000..3e33f7f Binary files /dev/null and b/img/denoise_blooper4.png differ diff --git a/img/denoise_decent.png b/img/denoise_decent.png new file mode 100644 index 0000000..6aa9881 Binary files /dev/null and b/img/denoise_decent.png differ diff --git a/img/denoise_orig.png b/img/denoise_orig.png new file mode 100644 index 0000000..73b84ff Binary files /dev/null and b/img/denoise_orig.png differ diff --git a/img/fresnel.png b/img/fresnel.png new file mode 100644 index 0000000..2881603 Binary files /dev/null and b/img/fresnel.png differ diff --git a/img/main.png b/img/main.png new file mode 100644 index 0000000..d08aaa1 Binary files /dev/null and b/img/main.png differ diff --git a/img/motion_blur.png b/img/motion_blur.png new file mode 100644 index 0000000..ba36bdf Binary files /dev/null and b/img/motion_blur.png differ diff --git a/img/no-alias.JPG b/img/no-alias.JPG new file mode 100644 index 0000000..c546eb0 Binary files /dev/null and b/img/no-alias.JPG differ diff --git a/img/original_albedo_nromal.png b/img/original_albedo_nromal.png new file mode 100644 index 0000000..ac3dc38 Binary files /dev/null and b/img/original_albedo_nromal.png differ diff --git a/img/path_tracer.png b/img/path_tracer.png new file mode 100644 index 0000000..54fef69 Binary files /dev/null and b/img/path_tracer.png differ diff --git a/img/perf.JPG b/img/perf.JPG new file mode 100644 index 0000000..0129444 Binary files /dev/null and b/img/perf.JPG differ diff --git a/img/reflection.jpg b/img/reflection.jpg new file mode 100644 index 0000000..fdfebe7 Binary files /dev/null and b/img/reflection.jpg differ diff --git a/img/refraction.png b/img/refraction.png new file mode 100644 index 0000000..d831d98 Binary files /dev/null and b/img/refraction.png differ diff --git a/img/refraction_no_fresnel.png b/img/refraction_no_fresnel.png new file mode 100644 index 0000000..e2b2ad5 Binary files /dev/null and b/img/refraction_no_fresnel.png differ diff --git a/oidn/.gitignore b/oidn/.gitignore new file mode 100644 index 0000000..10a8f91 --- /dev/null +++ b/oidn/.gitignore @@ -0,0 +1,87 @@ +# This file is used to ignore files which are generated +# ---------------------------------------------------------------------------- + +*~ +*.autosave +*.a +*.core +*.moc +*.o +*.obj +*.orig +*.rej +*.so +*.so.* +*_pch.h.cpp +*_resource.rc +*.qm +.#* +*.*# +core +!core/ +tags +.DS_Store +.directory +*.debug +*.prl +*.app +moc_*.cpp +ui_*.h +qrc_*.cpp +Thumbs.db +*.res +*.rc +/.qmake.cache +/.qmake.stash + +# Qt Creator generated files +*.txt.user* +*.pro.user* + +# xemacs temporary files +*.flc + +# Vim temporary files +.*.swp + +# Visual Studio generated files +*.ib_pdb_index +*.idb +*.ilk +*.pdb +*.sln +*.suo +*.vcproj +*vcproj.*.*.user +*.ncb +*.sdf +*.opensdf +*.vcxproj +*vcxproj.* +*.log + +# Visual Studio Code generated files +.vscode + +# MinGW generated files +*.Debug +*.Release + +# Python byte code +*.pyc + +# Binaries +*.dll +*.exe + +# Build directories +build* + +# Dependencies +deps + +# Data directories +images + +# Generated files +include/OpenImageDenoise/version.h diff --git a/oidn/.gitmodules b/oidn/.gitmodules new file mode 100644 index 0000000..d15912e --- /dev/null +++ b/oidn/.gitmodules @@ -0,0 +1,6 @@ +[submodule "mkl-dnn"] + path = mkl-dnn + url = ../mkl-dnn.git +[submodule "weights"] + path = weights + url = ../oidn-weights.git diff --git a/oidn/CHANGELOG.md b/oidn/CHANGELOG.md new file mode 100644 index 0000000..57107ef --- /dev/null +++ b/oidn/CHANGELOG.md @@ -0,0 +1,51 @@ +Version History +--------------- + +### Changes in v1.0.0: + +- Improved denoising quality + - More details preserved + - Less artifacts (e.g. noisy spots, color bleeding with albedo/normal) +- Added `maxMemoryMB` filter parameter for limiting the maximum memory + consumption regardless of the image resolution, potentially at the cost + of lower denoising speed. This is internally implemented by denoising the + image in tiles +- Significantly reduced memory consumption (but slightly lower performance) + for high resolutions (> 2K) by default: limited to about 6 GB +- Added `alignment` and `overlap` filter parameters that can be queried for + manual tiled denoising +- Added `verbose` device parameter for setting the verbosity of the console + output, and disabled all console output by default +- Fixed crash for zero-sized images + +### Changes in v0.9.0: + +- Reduced memory consumption by about 38% +- Added support for progress monitor callback functions +- Enabled fully concurrent execution when using multiple devices +- Clamp LDR input and output colors to 1 +- Fixed issue where some memory allocation errors were not reported + +### Changes in v0.8.2: + +- Fixed wrong HDR output when the input contains infinities/NaNs +- Fixed wrong output when multiple filters were executed concurrently on + separate devices with AVX-512 support. Currently the filter executions are + serialized as a temporary workaround, and a full fix will be included in a + future release. +- Added OIDN_STATIC_LIB CMake option for building as a static library + (requires CMake 3.13.0 or later) +- Fixed CMake error when adding the library with add_subdirectory() to a project + +### Changes in v0.8.1: + +- Fixed wrong path to TBB in the generated CMake configs +- Fixed wrong rpath in the binaries +- Fixed compile error on some macOS systems +- Fixed minor compile issues with Visual Studio +- Lowered the CPU requirement to SSE4.1 +- Minor example update + +### Changes in v0.8.0: + +- Initial beta release diff --git a/oidn/CMakeLists.txt b/oidn/CMakeLists.txt new file mode 100644 index 0000000..e5e2472 --- /dev/null +++ b/oidn/CMakeLists.txt @@ -0,0 +1,191 @@ +## ======================================================================== ## +## Copyright 2009-2019 Intel Corporation ## +## ## +## Licensed under the Apache License, Version 2.0 (the "License"); ## +## you may not use this file except in compliance with the License. ## +## You may obtain a copy of the License at ## +## ## +## http://www.apache.org/licenses/LICENSE-2.0 ## +## ## +## Unless required by applicable law or agreed to in writing, software ## +## distributed under the License is distributed on an "AS IS" BASIS, ## +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ## +## See the License for the specific language governing permissions and ## +## limitations under the License. ## +## ======================================================================== ## + +cmake_minimum_required(VERSION 3.1) + +set(OIDN_VERSION_MAJOR 1) +set(OIDN_VERSION_MINOR 0) +set(OIDN_VERSION_PATCH 0) +set(OIDN_VERSION_NOTE "") + +set(OIDN_VERSION ${OIDN_VERSION_MAJOR}.${OIDN_VERSION_MINOR}.${OIDN_VERSION_PATCH}) +math(EXPR OIDN_VERSION_NUMBER "10000*${OIDN_VERSION_MAJOR} + 100*${OIDN_VERSION_MINOR} + ${OIDN_VERSION_PATCH}") + +project(OpenImageDenoise + VERSION ${OIDN_VERSION} + LANGUAGES CXX +) + +set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${PROJECT_SOURCE_DIR}/cmake") + +# Build as shared or static library +if(${CMAKE_VERSION} VERSION_GREATER_EQUAL "3.13.0") + option(OIDN_STATIC_LIB "Build Open Image Denoise as a static library.") + mark_as_advanced(CLEAR OIDN_STATIC_LIB) +else() + set(OIDN_STATIC_LIB OFF CACHE BOOL "Build Open Image Denoise as a static library." FORCE) + mark_as_advanced(OIDN_STATIC_LIB) +endif() +if(OIDN_STATIC_LIB) + set(OIDN_LIB_TYPE STATIC) +else() + set(OIDN_LIB_TYPE SHARED) +endif() + +# Configuration types +set(CONFIGURATION_TYPES "Debug;Release;RelWithDebInfo") +if(win32) + if(NOT OIDN_DEFAULT_CMAKE_CONFIGURATION_TYPES_SET) + set(CMAKE_CONFIGURATION_TYPES "${CONFIGURATION_TYPES}" + CACHE STRING "List of generated configurations." FORCE) + set(OOIDN_DEFAULT_CMAKE_CONFIGURATION_TYPES_SET ON + CACHE INTERNAL "Default CMake configuration types set.") + endif() +else() + if(NOT CMAKE_BUILD_TYPE) + set(CMAKE_BUILD_TYPE "Release" CACHE STRING "Choose the build type." FORCE) + endif() + set_property(CACHE CMAKE_BUILD_TYPE PROPERTY STRINGS ${CONFIGURATION_TYPES}) +endif() + +# Output paths +set(CMAKE_RUNTIME_OUTPUT_DIRECTORY "${CMAKE_CURRENT_BINARY_DIR}") +set(CMAKE_ARCHIVE_OUTPUT_DIRECTORY "${CMAKE_CURRENT_BINARY_DIR}") +set(CMAKE_LIBRARY_OUTPUT_DIRECTORY "${CMAKE_CURRENT_BINARY_DIR}") + +# Configure packaging +include(package) + +## ---------------------------------------------------------------------------- +## MKL-DNN +## ---------------------------------------------------------------------------- + +# Configure MKL-DNN +set(MKLDNN_LIBRARY_TYPE "STATIC" CACHE INTERNAL "") +set(MKLDNN_THREADING "TBB" CACHE INTERNAL "") +set(MKLDNN_USE_MKL "NONE" CACHE INTERNAL "") +set(MKLDNN_ENABLE_CONCURRENT_EXEC ON CACHE INTERNAL "") +set(MKLDNN_USE_CLANG_SANITIZER "" CACHE INTERNAL "") +set(MKLDNN_VERBOSE OFF CACHE BOOL "") +set(MKLDNN_BUILD_EXAMPLES OFF CACHE INTERNAL "") +set(MKLDNN_BUILD_TESTS OFF CACHE INTERNAL "") +set(BENCHDNN_USE_RDPMC OFF CACHE INTERNAL "") + +# Add modified version of MKL-DNN +add_subdirectory(mkl-dnn EXCLUDE_FROM_ALL) + +# Include some modules from MKL-DNN +include(mkl-dnn/cmake/Threading.cmake) +include(mkl-dnn/cmake/TBB.cmake) +include(mkl-dnn/cmake/platform.cmake) +include(mkl-dnn/cmake/SDL.cmake) + +# Propagate no warning flags +append(CMAKE_C_FLAGS "${CMAKE_CCXX_NOWARN_FLAGS}") +append(CMAKE_CXX_FLAGS "${CMAKE_CCXX_NOWARN_FLAGS}") + +## ---------------------------------------------------------------------------- +## Open Image Denoise library +## ---------------------------------------------------------------------------- + +# Generate version.h +configure_file( + "${PROJECT_SOURCE_DIR}/include/OpenImageDenoise/version.h.in" + "${PROJECT_SOURCE_DIR}/include/OpenImageDenoise/version.h" +) + +add_subdirectory(common EXCLUDE_FROM_ALL) + +set(CORE_SOURCES + include/OpenImageDenoise/oidn.h + include/OpenImageDenoise/oidn.hpp + include/OpenImageDenoise/version.h + core/api.cpp + core/common.h + core/math.h + core/device.h + core/device.cpp + core/buffer.h + core/image.h + core/filter.h + core/filter.cpp + core/node.h + core/input_reorder.h + core/output_reorder.h + core/weights_reorder.h + core/transfer_function.h + core/upsample.h + core/network.h + core/autoencoder.h +) + +set(CORE_SOURCES_SSE41 + core/network.cpp + core/autoencoder.cpp + core/transfer_function.cpp +) + +include(resource) +generate_cpp_resources(WEIGHTS_SOURCES "oidn::weights" + weights/rt_ldr.tza + weights/rt_ldr_alb.tza + weights/rt_ldr_alb_nrm.tza + weights/rt_hdr.tza + weights/rt_hdr_alb.tza + weights/rt_hdr_alb_nrm.tza +) + +set_source_files_properties(${CORE_SOURCES_SSE41} PROPERTIES COMPILE_FLAGS "${ISA_FLAGS_SSE41}") + +add_library(${PROJECT_NAME} ${OIDN_LIB_TYPE} ${CORE_SOURCES} ${CORE_SOURCES_SSE41} ${WEIGHTS_SOURCES}) + +if(OIDN_STATIC_LIB) + target_compile_definitions(${PROJECT_NAME} INTERFACE -DOIDN_STATIC_LIB) +endif() + +target_include_directories(${PROJECT_NAME} + PUBLIC + $ + $ + PRIVATE + ${PROJECT_SOURCE_DIR}/mkl-dnn/include + ${PROJECT_SOURCE_DIR}/mkl-dnn/src + ${PROJECT_SOURCE_DIR}/mkl-dnn/src/common + ${PROJECT_SOURCE_DIR}/mkl-dnn/src/cpu/xbyak +) + +target_link_libraries(${PROJECT_NAME} + PRIVATE + common mkldnn +) + +set_property(TARGET ${PROJECT_NAME} PROPERTY VERSION ${PROJECT_VERSION}) +set_property(TARGET ${PROJECT_NAME} PROPERTY SOVERSION "0") + +## ---------------------------------------------------------------------------- +## Open Image Denoise examples +## ---------------------------------------------------------------------------- + +add_subdirectory(examples) + +## ---------------------------------------------------------------------------- +## Open Image Denoise install and packaging +## ---------------------------------------------------------------------------- + +include(install) + +# Has to be last +include(CPack) diff --git a/oidn/LICENSE.txt b/oidn/LICENSE.txt new file mode 100644 index 0000000..d645695 --- /dev/null +++ b/oidn/LICENSE.txt @@ -0,0 +1,202 @@ + + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "[]" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright [yyyy] [name of copyright owner] + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. diff --git a/oidn/README.md b/oidn/README.md new file mode 100644 index 0000000..cee01ea --- /dev/null +++ b/oidn/README.md @@ -0,0 +1,848 @@ +# Intel® Open Image Denoise + +This is release v1.0.0 of Open Image Denoise. For changes and new +features see the [changelog](CHANGELOG.md). Visit +http://www.openimagedenoise.org for more information. + +# Open Image Denoise Overview + +Intel® Open Image Denoise is an open source library of high-performance, +high-quality denoising filters for images rendered with ray tracing. +Open Image Denoise is part of the [Intel Rendering +Framework](https://software.intel.com/en-us/rendering-framework) and is +released under the permissive [Apache 2.0 +license](http://www.apache.org/licenses/LICENSE-2.0). + +The purpose of Open Image Denoise is to provide an open, high-quality, +efficient, and easy-to-use denoising library that allows one to +significantly reduce rendering times in ray tracing based rendering +applications. It filters out the Monte Carlo noise inherent to +stochastic ray tracing methods like path tracing, reducing the amount of +necessary samples per pixel by even multiple orders of magnitude +(depending on the desired closeness to the ground truth). A simple but +flexible C/C++ API ensures that the library can be easily integrated +into most existing or new rendering solutions. + +At the heart of the Open Image Denoise library is an efficient deep +learning based denoising filter, which was trained to handle a wide +range of samples per pixel (spp), from 1 spp to almost fully converged. +Thus it is suitable for both preview and final-frame rendering. The +filters can denoise images either using only the noisy color (*beauty*) +buffer, or, to preserve as much detail as possible, can optionally +utilize auxiliary feature buffers as well (e.g. albedo, normal). Such +buffers are supported by most renderers as arbitrary output variables +(AOVs) or can be usually implemented with little effort. + +Open Image Denoise supports Intel® 64 architecture based CPUs and +compatible architectures, and runs on anything from laptops, to +workstations, to compute nodes in HPC systems. It is efficient enough to +be suitable not only for offline rendering, but, depending on the +hardware used, also for interactive ray tracing. + +Open Image Denoise internally builds on top of [Intel® Math Kernel +Library for Deep Neural Networks +(MKL-DNN)](https://github.com/intel/mkl-dnn), and automatically exploits +modern instruction sets like Intel SSE4, AVX2, and AVX-512 to achieve +high denoising performance. A CPU with support for at least SSE4.1 is +required to run Open Image Denoise. + +## Support and Contact + +Open Image Denoise is under active development, and though we do our +best to guarantee stable release versions a certain number of bugs, +as-yet-missing features, inconsistencies, or any other issues are still +possible. Should you find any such issues please report them immediately +via the [Open Image Denoise GitHub Issue +Tracker](https://github.com/OpenImageDenoise/oidn/issues) (or, if you +should happen to have a fix for it, you can also send us a pull +request); for missing features please contact us via email at +. + +For recent news, updates, and announcements, please see our complete +[news/updates](https://openimagedenoise.github.io/news.html) page. + +Join our [mailing +list](https://groups.google.com/d/forum/openimagedenoise/) to receive +release announcements and major news regarding Open Image Denoise. + +# Building Open Image Denoise from Source + +The latest Open Image Denoise sources are always available at the [Open +Image Denoise GitHub +repository](http://github.com/OpenImageDenoise/oidn). The default +`master` branch should always point to the latest tested bugfix release. + +## Prerequisites + +Open Image Denoise currently supports 64-bit Linux, Windows, and macOS +operating systems. In addition, before you can build Open Image Denoise +you need the following prerequisites: + + - You can clone the latest Open Image Denoise sources + via: + + git clone --recursive https://github.com/OpenImageDenoise/oidn.git + + - To build Open Image Denoise you need [CMake](http://www.cmake.org) + 3.1 or later, a C++11 compiler (we recommend using Clang, but also + support GCC, Microsoft Visual Studio 2015 or later, and [Intel® C++ + Compiler](https://software.intel.com/en-us/c-compilers) 17.0 or + later), and Python 2.7 or later. + + - Additionally you require a copy of [Intel® Threading Building + Blocks](https://www.threadingbuildingblocks.org/) (TBB) 2017 or + later. + +Depending on your Linux distribution you can install these dependencies +using `yum` or `apt-get`. Some of these packages might already be +installed or might have slightly different names. + +Type the following to install the dependencies using `yum`: + + sudo yum install cmake + sudo yum install tbb-devel + +Type the following to install the dependencies using `apt-get`: + + sudo apt-get install cmake-curses-gui + sudo apt-get install libtbb-dev + +Under macOS these dependencies can be installed using +[MacPorts](http://www.macports.org/): + + sudo port install cmake tbb + +Under Windows please directly use the appropriate installers or packages +for [CMake](https://cmake.org/download/), +[Python](https://www.python.org/downloads/), and +[TBB](https://github.com/01org/tbb/releases). + +## Compiling Open Image Denoise on Linux/macOS + +Assuming the above prerequisites are all fulfilled, building Open Image +Denoise through CMake is easy: + + - Create a build directory, and go into it + + mkdir oidn/build + cd oidn/build + + (We do recommend having separate build directories for different + configurations such as release, debug, etc.). + + - The compiler CMake will use by default will be whatever the `CC` and + `CXX` environment variables point to. Should you want to specify a + different compiler, run cmake manually while specifying the desired + compiler. The default compiler on most Linux machines is `gcc`, but + it can be pointed to `clang` instead by executing the following: + + cmake -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_C_COMPILER=clang .. + + CMake will now use Clang instead of GCC. If you are OK with using + the default compiler on your system, then simply skip this step. + Note that the compiler variables cannot be changed after the first + `cmake` or `ccmake` run. + + - Open the CMake configuration dialog + + ccmake .. + + - Make sure to properly set the build mode and enable the components + you need, etc.; then type ’c’onfigure and ’g’enerate. When back on + the command prompt, build it using + + make + + - You should now have `libOpenImageDenoise.so` as well as a set of + example applications. + +## Compiling Open Image Denoise on Windows + +On Windows using the CMake GUI (`cmake-gui.exe`) is the most convenient +way to configure Open Image Denoise and to create the Visual Studio +solution files: + + - Browse to the Open Image Denoise sources and specify a build + directory (if it does not exist yet CMake will create it). + + - Click “Configure” and select as generator the Visual Studio version + you have (Open Image Denoise needs Visual Studio 14 2015 or newer), + for Win64 (32-bit builds are not supported), e.g., “Visual Studio 15 + 2017 Win64”. + + - If the configuration fails because some dependencies could not be + found then follow the instructions given in the error message, e.g., + set the variable `TBB_ROOT` to the folder where TBB was installed. + + - Optionally change the default build options, and then click + “Generate” to create the solution and project files in the build + directory. + + - Open the generated `OpenImageDenoise.sln` in Visual Studio, select + the build configuration and compile the project. + +Alternatively, Open Image Denoise can also be built without any GUI, +entirely on the console. In the Visual Studio command prompt type: + + cd path\to\oidn + mkdir build + cd build + cmake -G "Visual Studio 15 2017 Win64" [-D VARIABLE=value] .. + cmake --build . --config Release + +Use `-D` to set variables for CMake, e.g., the path to TBB with “`-D +TBB_ROOT=\path\to\tbb`”. + +## CMake Configuration + +The default CMake configuration in the configuration dialog should be +appropriate for most usages. The following list describes the options +that can be configured in CMake: + + - `CMAKE_BUILD_TYPE`: Can be used to switch between Debug mode + (Debug), Release mode (Release) (default), and Release mode with + enabled assertions and debug symbols (RelWithDebInfo). + + - `OIDN_STATIC_LIB`: Builds Open Image Denoise as a static library + (OFF by default). CMake 3.13.0 or later is required to enable this + option. When using the statically compiled Open Image Denoise + library, you either have to use the generated CMake configuration + files (recommended), or you have to manually define + `OIDN_STATIC_LIB` before including the library headers in your + application. + + - `TBB_ROOT`: The path to the TBB installation (autodetected by + default). + +# Documentation + +The following [API +documentation](https://github.com/OpenImageDenoise/oidn/blob/master/readme.pdf "Open Image Denoise Documentation") +of Open Image Denoise can also be found as a [pdf +document](https://github.com/OpenImageDenoise/oidn/blob/master/readme.pdf "Open Image Denoise Documentation"). + +# Open Image Denoise API + +Open Image Denoise provides a C99 API (also compatible with C++) and a +C++11 wrapper API as well. For simplicity, this document mostly refers +to the C99 version of the API. + +The API is designed in an object-oriented manner, e.g. it contains +device objects (`OIDNDevice` type), buffer objects (`OIDNBuffer` type), +and filter objects (`OIDNFilter` type). All objects are +reference-counted, and handles can be released by calling the +appropriate release function (e.g. `oidnReleaseDevice`) or retained by +incrementing the reference count (e.g. `oidnRetainDevice`). + +An important aspect of objects is that setting their parameters do not +have an immediate effect (with a few exceptions). Instead, objects with +updated parameters are in an unusable state until the parameters get +explicitly committed to a given object. The commit semantic allows for +batching up multiple small changes, and specifies exactly when changes +to objects will occur. + +All API calls are thread-safe, but operations that use the same device +will be serialized, so the amount of API calls from different threads +should be minimized. + +To have a quick overview of the C99 and C++11 APIs, see the following +simple example code snippets. + +### C99 API Example + +``` cpp +#include +... +// Create an Open Image Denoise device +OIDNDevice device = oidnNewDevice(OIDN_DEVICE_TYPE_DEFAULT); +oidnCommitDevice(device); + +// Create a denoising filter +OIDNFilter filter = oidnNewFilter(device, "RT"); // generic ray tracing filter +oidnSetSharedFilterImage(filter, "color", colorPtr, + OIDN_FORMAT_FLOAT3, width, height, 0, 0, 0); +oidnSetSharedFilterImage(filter, "albedo", albedoPtr, + OIDN_FORMAT_FLOAT3, width, height, 0, 0, 0); // optional +oidnSetSharedFilterImage(filter, "normal", normalPtr, + OIDN_FORMAT_FLOAT3, width, height, 0, 0, 0); // optional +oidnSetSharedFilterImage(filter, "output", outputPtr, + OIDN_FORMAT_FLOAT3, width, height, 0, 0, 0); +oidnSetFilter1b(filter, "hdr", true); // image is HDR +oidnCommitFilter(filter); + +// Filter the image +oidnExecuteFilter(filter); + +// Check for errors +const char* errorMessage; +if (oidnGetDeviceError(device, &errorMessage) != OIDN_ERROR_NONE) + printf("Error: %s\n", errorMessage); + +// Cleanup +oidnReleaseFilter(filter); +oidnReleaseDevice(device); +``` + +### C++11 API Example + +``` cpp +#include +... +// Create an Open Image Denoise device +oidn::DeviceRef device = oidn::newDevice(); +device.commit(); + +// Create a denoising filter +oidn::FilterRef filter = device.newFilter("RT"); // generic ray tracing filter +filter.setImage("color", colorPtr, oidn::Format::Float3, width, height); +filter.setImage("albedo", albedoPtr, oidn::Format::Float3, width, height); // optional +filter.setImage("normal", normalPtr, oidn::Format::Float3, width, height); // optional +filter.setImage("output", outputPtr, oidn::Format::Float3, width, height); +filter.set("hdr", true); // image is HDR +filter.commit(); + +// Filter the image +filter.execute(); + +// Check for errors +const char* errorMessage; +if (device.getError(errorMessage) != oidn::Error::None) + std::cout << "Error: " << errorMessage << std::endl; +``` + +## Device + +Open Image Denoise supports a device concept, which allows different +components of the application to use the Open Image Denoise API without +interfering with each other. An application first needs to create a +device with + +``` cpp +OIDNDevice oidnNewDevice(OIDNDeviceType type); +``` + +where the `type` enumeration maps to a specific device implementation, +which can be one of the +following: + +| Name | Description | +| :-------------------------- | :-------------------------------------- | +| OIDN\_DEVICE\_TYPE\_DEFAULT | select the approximately fastest device | +| OIDN\_DEVICE\_TYPE\_CPU | CPU device (requires SSE4.1 support) | + +Supported device types, i.e., valid constants of type `OIDNDeviceType`. + +Once a device is created, you can call + +``` cpp +void oidnSetDevice1b(OIDNDevice device, const char* name, bool value); +void oidnSetDevice1i(OIDNDevice device, const char* name, int value); +bool oidnGetDevice1b(OIDNDevice device, const char* name); +int oidnGetDevice1i(OIDNDevice device, const char* name); +``` + +to set and get parameter values on the device. Note that some parameters +are constants, thus trying to set them is an error. See the tables below +for the parameters supported by +devices. + +| Type | Name | Default | Description | +| :-------- | :----------- | ------: | :---------------------------------------------------------------------------------------------------------------------------------------- | +| const int | version | | combined version number (major.minor.patch) with two decimal digits per component | +| const int | versionMajor | | major version number | +| const int | versionMinor | | minor version number | +| const int | versionPatch | | patch version number | +| int | verbose | 0 | verbosity level of the console output between 0–3; when set to 0, no output is printed, when set to a higher level more output is printed | + +Parameters supported by all +devices. + +| Type | Name | Default | Description | +| :--- | :---------- | ------: | :--------------------------------------------------------------------------------------------------------------------- | +| int | numThreads | 0 | maximum number of threads which Open Image Denoise should use; 0 will set it automatically to get the best performance | +| bool | setAffinity | true | bind software threads to hardware threads if set to true (improves performance); false disables binding | + +Additional parameters supported only by CPU devices. + +Note that the CPU device heavily relies on setting the thread affinities +to achieve optimal performance, so it is highly recommended to leave +this option enabled. However, this may interfere with the application if +that also sets the thread affinities, potentially causing performance +degradation. In such cases, the recommended solution is to either +disable setting the affinities in the application or in Open Image +Denoise, or to always set/reset the affinities before/after each +parallel region in the application (e.g., if using TBB, with +`tbb::task_arena` and `tbb::task_scheduler_observer`). + +Once parameters are set on the created device, the device must be +committed with + +``` cpp +void oidnCommitDevice(OIDNDevice device); +``` + +This device can then be used to construct further objects, such as +buffers and filters. Note that a device can be committed only once +during its lifetime. Before the application exits, it should release all +devices by invoking + +``` cpp +void oidnReleaseDevice(OIDNDevice device); +``` + +Note that Open Image Denoise uses reference counting for all object +types, so this function decreases the reference count of the device, and +if the count reaches 0 the device will automatically get deleted. It is +also possible to increase the reference count by calling + +``` cpp +void oidnRetainDevice(OIDNDevice device); +``` + +An application typically creates only a single device. If required +differently, it should only use a small number of devices at any given +time. + +### Error Handling + +Each user thread has its own error code per device. If an error occurs +when calling an API function, this error code is set to the occurred +error if it stores no previous error. The currently stored error can be +queried by the application +via + +``` cpp +OIDNError oidnGetDeviceError(OIDNDevice device, const char** outMessage); +``` + +where `outMessage` can be a pointer to a C string which will be set to a +more descriptive error message, or it can be `NULL`. This function also +clears the error code, which assures that the returned error code is +always the first error occurred since the last invocation of +`oidnGetDeviceError` on the current thread. Note that the optionally +returned error message string is valid only until the next invocation of +the function. + +Alternatively, the application can also register a callback function of +type + +``` cpp +typedef void (*OIDNErrorFunction)(void* userPtr, OIDNError code, const char* message); +``` + +via + +``` cpp +void oidnSetDeviceErrorFunction(OIDNDevice device, OIDNErrorFunction func, void* userPtr); +``` + +to get notified when errors occur. Only a single callback function can +be registered per device, and further invocations overwrite the +previously set callback function, which do *not* require also calling +the `oidnCommitDevice` function. Passing `NULL` as function pointer +disables the registered callback function. When the registered callback +function is invoked, it gets passed the user-defined payload (`userPtr` +argument as specified at registration time), the error code (`code` +argument) of the occurred error, as well as a string (`message` +argument) that further describes the error. The error code is always set +even if an error callback function is registered. It is recommended to +always set a error callback function, to detect all errors. + +When the device construction fails, `oidnNewDevice` returns `NULL` as +device. To detect the error code of a such failed device construction, +pass `NULL` as device to the `oidnGetDeviceError` function. For all +other invocations of `oidnGetDeviceError`, a proper device handle must +be specified. + +The following errors are currently used by Open Image +Denoise: + +| Name | Description | +| :--------------------------------- | :----------------------------------------- | +| OIDN\_ERROR\_NONE | no error occurred | +| OIDN\_ERROR\_UNKNOWN | an unknown error occurred | +| OIDN\_ERROR\_INVALID\_ARGUMENT | an invalid argument was specified | +| OIDN\_ERROR\_INVALID\_OPERATION | the operation is not allowed | +| OIDN\_ERROR\_OUT\_OF\_MEMORY | not enough memory to execute the operation | +| OIDN\_ERROR\_UNSUPPORTED\_HARDWARE | the hardware (e.g., CPU) is not supported | +| OIDN\_ERROR\_CANCELLED | the operation was cancelled by the user | + +Possible error codes, i.e., valid constants of type `OIDNError`. + +## Buffer + +Large data like images can be passed to Open Image Denoise either via +pointers to memory allocated and managed by the user (this is the +recommended, often easier and more efficient approach, if supported by +the device) or by creating buffer objects (supported by all devices). To +create a new data buffer with memory allocated and owned by the device, +holding `byteSize` number of bytes, use + +``` cpp +OIDNBuffer oidnNewBuffer(OIDNDevice device, size_t byteSize); +``` + +The created buffer is bound to the specified device (`device` argument). +The specified number of bytes are allocated at buffer construction time +and deallocated when the buffer is destroyed. + +It is also possible to create a “shared” data buffer with memory +allocated and managed by the user +with + +``` cpp +OIDNBuffer oidnNewSharedBuffer(OIDNDevice device, void* ptr, size_t byteSize); +``` + +where `ptr` points to the user-managed memory and `byteSize` is its size +in bytes. At buffer construction time no buffer data is allocated, but +the buffer data provided by the user is used. The buffer data must +remain valid for as long as the buffer may be used, and the user is +responsible to free the buffer data when no longer required. + +Similar to device objects, buffer objects are also reference-counted and +can be retained and released by calling the following functions: + +``` cpp +void oidnRetainBuffer(OIDNBuffer buffer); +void oidnReleaseBuffer(OIDNBuffer buffer); +``` + +Accessing the data stored in a buffer object is possible by mapping it +into the address space of the application +using + +``` cpp +void* oidnMapBuffer(OIDNBuffer buffer, OIDNAccess access, size_t byteOffset, size_t byteSize) +``` + +where `access` is the desired access mode of the mapped memory, +`byteOffset` is the offset to the beginning of the mapped memory region +in bytes, and `byteSize` is the number of bytes to map. The function +returns a pointer to the mapped buffer data. If the specified `byteSize` +is 0, the maximum available amount of memory will be mapped. The +`access` argument must be one of the access modes in the following +table: + +| Name | Description | +| :--------------------------- | :------------------------------------------------------------ | +| OIDN\_ACCESS\_READ | read-only access | +| OIDN\_ACCESS\_WRITE | write-only access | +| OIDN\_ACCESS\_READ\_WRITE | read and write access | +| OIDN\_ACCESS\_WRITE\_DISCARD | write-only access but the previous contents will be discarded | + +Access modes for memory regions mapped with `oidnMapBuffer`, i.e., valid +constants of type `OIDNAccess`. + +After accessing the mapped data in the buffer, the memory region must be +unmapped with + +``` cpp +void oidnUnmapBuffer(OIDNBuffer buffer, void* mappedPtr); +``` + +where `mappedPtr` must be a pointer returned by a call to +`oidnMapBuffer` for the specified buffer. Any change to the mapped data +is guaranteed to take effect only after unmapping the memory region. + +### Data Format + +Buffers store opaque data and thus have no information about the type +and format of the data. Other objects, e.g. filters, typically require +specifying the format of the data stored in buffers or shared via +pointers. This can be done using the `OIDNFormat` enumeration +type: + +| Name | Description | +| :------------------------- | :-------------------------------------------- | +| OIDN\_FORMAT\_UNDEFINED | undefined format | +| OIDN\_FORMAT\_FLOAT | 32-bit single-precision floating point scalar | +| OIDN\_FORMAT\_FLOAT\[234\] | … and \[234\]-element vector | + +Supported data formats, i.e., valid constants of type `OIDNFormat`. + +## Filter + +Filters are the main objects in Open Image Denoise that are responsible +for the actual denoising. The library ships with a collection of filters +which are optimized for different types of images and use cases. To +create a filter object, call + +``` cpp +OIDNFilter oidnNewFilter(OIDNDevice device, const char* type); +``` + +where `type` is the name of the filter type to create. The supported +filter types are documented later in this section. Once created, filter +objects can be retained and released with + +``` cpp +void oidnRetainFilter(OIDNFilter filter); +void oidnReleaseFilter(OIDNFilter filter); +``` + +After creating a filter, it needs to be set up by specifying the input +and output image buffers, and potentially setting other parameter values +as well. + +To bind image buffers to the filter, you can use one of the following +functions: + +``` cpp +void oidnSetFilterImage(OIDNFilter filter, const char* name, + OIDNBuffer buffer, OIDNFormat format, + size_t width, size_t height, + size_t byteOffset, + size_t bytePixelStride, size_t byteRowStride); + +void oidnSetSharedFilterImage(OIDNFilter filter, const char* name, + void* ptr, OIDNFormat format, + size_t width, size_t height, + size_t byteOffset, + size_t bytePixelStride, size_t byteRowStride); +``` + +It is possible to specify either a data buffer object (`buffer` +argument) with the `oidnSetFilterImage` function, or directly a pointer +to shared user-managed data (`ptr` argument) with the +`oidnSetSharedFilterImage` function. + +In both cases, you must also specify the name of the image parameter to +set (`name` argument, e.g. `"color"`, `"output"`), the pixel format +(`format` argument), the width and height of the image in number of +pixels (`width` and `height` arguments), the starting offset of the +image data (`byteOffset` argument), the pixel stride (`bytePixelStride` +argument) and the row stride (`byteRowStride` argument), in number of +bytes. Note that the row stride must be an integer multiple of the pixel +stride. + +If the pixels and/or rows are stored contiguously (tightly packed +without any gaps), you can set `bytePixelStride` and/or `byteRowStride` +to 0 to let the library compute the actual strides automatically, as a +convenience. + +Filters may have parameters other than buffers as well, which you can +set and get using the following functions: + +``` cpp +void oidnSetFilter1b(OIDNFilter filter, const char* name, bool value); +void oidnSetFilter1i(OIDNFilter filter, const char* name, int value); +bool oidnGetFilter1b(OIDNFilter filter, const char* name); +int oidnGetFilter1i(OIDNFilter filter, const char* name); +``` + +Filters support a progress monitor callback mechanism that can be used +to report progress of filter operations and to cancel them as well. +Calling `oidnSetFilterProgressMonitorFunction` registers a progress +monitor callback function (`func` argument) with payload (`userPtr` +argument) for the specified filter (`filter` argument): + +``` cpp +typedef bool (*OIDNProgressMonitorFunction)(void* userPtr, double n); + +void oidnSetFilterProgressMonitorFunction(OIDNFilter filter, + OIDNProgressMonitorFunction func, + void* userPtr); +``` + +Only a single callback function can be registered per filter, and +further invocations overwrite the previously set callback function. +Passing `NULL` as function pointer disables the registered callback +function. Once registered, Open Image Denoise will invoke the callback +function multiple times during filter operations, by passing the payload +as set at registration time (`userPtr` argument), and a `double` in the +range \[0, 1\] which estimates the progress of the operation (`n` +argument). When returning `true` from the callback function, Open Image +Denoise will continue the filter operation normally. When returning +`false`, the library will cancel the filter operation with the +`OIDN_ERROR_CANCELLED` error code. + +After setting all necessary parameters for the filter, the changes must +be commmitted by calling + +``` cpp +void oidnCommitFilter(OIDNFilter filter); +``` + +The parameters can be updated after committing the filter, but it must +be re-committed for the changes to take effect. + +Finally, an image can be filtered by executing the filter with + +``` cpp +void oidnExecuteFilter(OIDNFilter filter); +``` + +which will read the input image data from the specified buffers and +produce the denoised output image. + +In the following we describe the different filters that are currently +implemented in Open Image Denoise. + +### RT + +The `RT` (**r**ay **t**racing) filter is a generic ray tracing denoising +filter which is suitable for denoising images rendered with Monte Carlo +ray tracing methods like unidirectional and bidirectional path tracing. +It supports depth of field and motion blur as well, but it is *not* +temporally stable. The filter is based on a deep learning based +denoising algorithm, and it aims to provide a good balance between +denoising performance and quality for a wide range of samples per pixel. + +It accepts either a low dynamic range (LDR) or high dynamic range (HDR) +color image as input. Optionally, it also accepts auxiliary *feature* +images, e.g. albedo and normal, which improve the denoising quality, +preserving more details in the image. + +The `RT` filter has certain limitations regarding the supported input +images. Most notably, it cannot denoise images that were not rendered +with ray tracing. Another important limitation is related to +anti-aliasing filters. Most renderers use a high-quality pixel +reconstruction filter instead of a trivial box filter to minimize +aliasing artifacts (e.g. Gaussian, Blackman-Harris). The `RT` filter +does support such pixel filters but only if implemented with importance +sampling. Weighted pixel sampling (sometimes called *splatting*) +introduces correlation between neighboring pixels, which causes the +denoising to fail (the noise will not be filtered), thus it is not +supported. + +The filter can be created by passing `"RT"` to the `oidnNewFilter` +function as the filter type. The filter supports the following +parameters: + +| Type | Format | Name | Default | Description | +| :-------- | :----- | :---------- | ------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| Image | float3 | color | | input color image (LDR values in \[0, 1\] or HDR values in \[0, +∞)) | +| Image | float3 | albedo | | input feature image containing the albedo (values in \[0, 1\]) of the first hit per pixel; *optional* | +| Image | float3 | normal | | input feature image containing the shading normal (world-space or view-space, arbitrary length, values in (−∞, +∞)) of the first hit per pixel; *optional*, requires setting the albedo image too | +| Image | float3 | output | | output image; can be one of the input images | +| bool | | hdr | false | whether the color is HDR | +| bool | | srgb | false | whether the color is encoded with the sRGB (or 2.2 gamma) curve (LDR only) or is linear; the output will be encoded with the same curve | +| int | | maxMemoryMB | 6000 | approximate maximum amount of memory to use in megabytes (actual memory usage may be higher); limiting memory usage may cause slower denoising due to internally splitting the image into overlapping tiles, but cannot cause the denoising to fail | +| const int | | alignment | | when manually denoising the image in tiles, the tile size and offsets should be multiples of this amount of pixels to avoid artifacts; note that manual tiled denoising is supported *only* for LDR images | +| const int | | overlap | | when manually denoising the image in tiles, the tiles should overlap by this amount of pixels | + +Parameters supported by the `RT` filter. + +All specified images must have the same dimensions. + +![](https://openimagedenoise.github.io/images/mazda_512spp_color.jpg) +Example noisy color image rendered using unidirectional path tracing +(512 spp). *Scene by +Evermotion.* + +![](https://openimagedenoise.github.io/images/mazda_512spp_oidn.jpg) +Example output image denoised using color and auxiliary feature images +(albedo and +normal). + +Using auxiliary feature images like albedo and normal helps preserving +fine details and textures in the image thus can significantly improve +denoising quality. These images should typically contain feature values +for the first hit (i.e. the surface which is directly visible) per +pixel. This works well for most surfaces but does not provide any +benefits for reflections and objects visible through transparent +surfaces (compared to just using the color as input). However, in +certain cases this issue can be fixed by storing feature values for a +subsequent hit (i.e. the reflection and/or refraction) instead of the +first hit. For example, it usually works well to follow perfect specular +(*delta*) paths and store features for the first diffuse or glossy +surface hit instead (e.g. for perfect specular dielectrics and mirrors). +This can greatly improve the quality of reflections and transmission. We +will describe this approach in more detail in the following subsections. + +The auxiliary feature images should be as noise-free as possible. It is +not a strict requirement but too much noise in the feature images may +cause residual noise in the output. Also, all feature images should use +the same pixel reconstruction filter as the color image. Using a +properly anti-aliased color image but aliased albedo or normal images +will likely introduce artifacts around edges. + +#### Albedo + +The albedo image is the feature image that usually provides the biggest +quality improvement. It should contain the approximate color of the +surfaces independent of illumination and viewing angle. + +For simple matte surfaces this means using the diffuse color/texture as +the albedo. For other, more complex surfaces it is not always obvious +what is the best way to compute the albedo, but the denoising filter is +flexibile to a certain extent and works well with differently computed +albedos. Thus it is not necessary to compute the strict, exact albedo +values but must be always between 0 and 1. + +![](https://openimagedenoise.github.io/images/mazda_512spp_albedo_firsthit.jpg) +Example albedo image obtained using the first hit. Note that the +albedos of all transparent surfaces are +1. + +![](https://openimagedenoise.github.io/images/mazda_512spp_albedo_nondeltahit.jpg) +Example albedo image obtained using the first diffuse or glossy +(non-delta) hit. Note that the albedos of perfect specular (delta) +transparent surfaces are computed as the Fresnel blend of the reflected +and transmitted +albedos. + +For metallic surfaces the albedo should be either the reflectivity at +normal incidence (e.g. from the artist friendly metallic Fresnel model) +or the average reflectivity; or if these are constant (not textured) or +unknown, the albedo can be simply 1 as well. + +The albedo for dielectric surfaces (e.g. glass) should be either 1 or, +if the surface is perfect specular (i.e. has a delta BSDF), the Fresnel +blend of the reflected and transmitted albedos (as previously +discussed). The latter usually works better but *only* if it does not +introduce too much additional noise due to random sampling. Thus we +recommend to split the path into a reflected and a transmitted path at +the first hit, and perhaps fall back to an albedo of 1 for subsequent +dielectric hits, to avoid noise. The reflected albedo in itself can be +used for mirror-like surfaces as well. + +The albedo for layered surfaces can be computed as the weighted sum of +the albedos of the individual layers. Non-absorbing clear coat layers +can be simply ignored (or the albedo of the perfect specular reflection +can be used as well) but absorption should be taken into account. + +#### Normal + +The normal image should contain the shading normals of the surfaces +either in world-space or view-space. It is recommended to include normal +maps to preserve as much detail as possible. + +Just like any other input image, the normal image should be anti-aliased +(i.e. by accumulating the normalized normals per pixel). The final +accumulated normals do not have to be normalized but must be in a range +symmetric about 0 (i.e. normals mapped to \[0, 1\] are *not* acceptable +and must be remapped to e.g. \[−1, 1\]). + +Similar to the albedo, the normal can be stored for either the first or +a subsequent hit (if the first hit has a perfect specular/delta BSDF). + +![](https://openimagedenoise.github.io/images/mazda_512spp_normal_firsthit.jpg) +Example normal image obtained using the first hit (the values are +actually in \[−1, 1\] but were mapped to \[0, 1\] for illustration +purposes). + +![](https://openimagedenoise.github.io/images/mazda_512spp_normal_nondeltahit.jpg) +Example normal image obtained using the first diffuse or glossy +(non-delta) hit. Note that the normals of perfect specular (delta) +transparent surfaces are computed as the Fresnel blend of the reflected +and transmitted +normals. + +# Examples + +## Denoise + +A minimal working example demonstrating how to use Open Image Denoise +can be found at `examples/denoise.cpp`, which uses the C++11 convenience +wrappers of the C99 API. + +This example is a simple command-line application that denoises the +provided image, which can optionally have auxiliary feature images as +well (e.g. albedo and normal). The images must be stored in the +[Portable FloatMap](http://www.pauldebevec.com/Research/HDR/PFM/) (PFM) +format, and the color values must be encoded in little-endian format. + +Running `./denoise` without any arguments will bring up a list of +command line options. diff --git a/oidn/cmake/install.cmake b/oidn/cmake/install.cmake new file mode 100644 index 0000000..fb018da --- /dev/null +++ b/oidn/cmake/install.cmake @@ -0,0 +1,94 @@ +## ======================================================================== ## +## Copyright 2009-2019 Intel Corporation ## +## ## +## Licensed under the Apache License, Version 2.0 (the "License"); ## +## you may not use this file except in compliance with the License. ## +## You may obtain a copy of the License at ## +## ## +## http://www.apache.org/licenses/LICENSE-2.0 ## +## ## +## Unless required by applicable law or agreed to in writing, software ## +## distributed under the License is distributed on an "AS IS" BASIS, ## +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ## +## See the License for the specific language governing permissions and ## +## limitations under the License. ## +## ======================================================================== ## + +## ---------------------------------------------------------------------------- +## Install library +## ---------------------------------------------------------------------------- + +install(TARGETS ${PROJECT_NAME} + EXPORT + ${PROJECT_NAME}_Export + ARCHIVE + DESTINATION ${CMAKE_INSTALL_LIBDIR} COMPONENT devel + LIBRARY + DESTINATION ${CMAKE_INSTALL_LIBDIR} COMPONENT devel + # On Windows put the dlls into bin + RUNTIME + DESTINATION ${CMAKE_INSTALL_BINDIR} COMPONENT lib +) + +if(OIDN_STATIC_LIB) + install(TARGETS common mkldnn + EXPORT + ${PROJECT_NAME}_Export + ARCHIVE + DESTINATION ${CMAKE_INSTALL_LIBDIR} COMPONENT devel + ) +endif() + +## ---------------------------------------------------------------------------- +## Install headers +## ---------------------------------------------------------------------------- + +install(DIRECTORY include/OpenImageDenoise + DESTINATION ${CMAKE_INSTALL_INCLUDEDIR} + COMPONENT devel + PATTERN "*.in" EXCLUDE +) + +## ---------------------------------------------------------------------------- +## Install documentation +## ---------------------------------------------------------------------------- + +install(FILES ${PROJECT_SOURCE_DIR}/LICENSE.txt DESTINATION ${CMAKE_INSTALL_DOCDIR} COMPONENT lib) +install(FILES ${PROJECT_SOURCE_DIR}/CHANGELOG.md DESTINATION ${CMAKE_INSTALL_DOCDIR} COMPONENT lib) +install(FILES ${PROJECT_SOURCE_DIR}/README.md DESTINATION ${CMAKE_INSTALL_DOCDIR} COMPONENT lib) +install(FILES ${PROJECT_SOURCE_DIR}/readme.pdf DESTINATION ${CMAKE_INSTALL_DOCDIR} COMPONENT lib) + +## ---------------------------------------------------------------------------- +## Install dependencies +## ---------------------------------------------------------------------------- + +# Install TBB +if(OIDN_ZIP_MODE) + if(WIN32) + install(PROGRAMS ${TBB_BIN_DIR}/tbb.dll ${TBB_BIN_DIR}/tbbmalloc.dll DESTINATION ${CMAKE_INSTALL_BINDIR} COMPONENT lib) + install(PROGRAMS ${TBB_LIBRARY} ${TBB_LIBRARY_MALLOC} DESTINATION ${CMAKE_INSTALL_LIBDIR} COMPONENT lib) + elseif(APPLE) + install(PROGRAMS ${TBB_ROOT}/lib/libtbb.dylib ${TBB_ROOT}/lib/libtbbmalloc.dylib DESTINATION ${CMAKE_INSTALL_LIBDIR} COMPONENT lib) + else() + install(PROGRAMS ${TBB_ROOT}/lib/intel64/gcc4.4/libtbb.so.2 ${TBB_ROOT}/lib/intel64/gcc4.4/libtbbmalloc.so.2 DESTINATION ${CMAKE_INSTALL_LIBDIR} COMPONENT lib) + endif() +endif() + +## ---------------------------------------------------------------------------- +## Install CMake configuration files +## ---------------------------------------------------------------------------- + +install(EXPORT ${PROJECT_NAME}_Export + DESTINATION ${CMAKE_INSTALL_LIBDIR}/cmake/${PROJECT_NAME} + #NAMESPACE ${PROJECT_NAME}:: + FILE ${PROJECT_NAME}Config.cmake + COMPONENT devel +) + +include(CMakePackageConfigHelpers) +write_basic_package_version_file(${PROJECT_NAME}ConfigVersion.cmake + COMPATIBILITY SameMajorVersion) +install(FILES ${CMAKE_BINARY_DIR}/${PROJECT_NAME}ConfigVersion.cmake + DESTINATION ${CMAKE_INSTALL_LIBDIR}/cmake/${PROJECT_NAME} + COMPONENT devel +) diff --git a/oidn/cmake/package.cmake b/oidn/cmake/package.cmake new file mode 100644 index 0000000..ded24ba --- /dev/null +++ b/oidn/cmake/package.cmake @@ -0,0 +1,106 @@ +## ======================================================================== ## +## Copyright 2009-2019 Intel Corporation ## +## ## +## Licensed under the Apache License, Version 2.0 (the "License"); ## +## you may not use this file except in compliance with the License. ## +## You may obtain a copy of the License at ## +## ## +## http://www.apache.org/licenses/LICENSE-2.0 ## +## ## +## Unless required by applicable law or agreed to in writing, software ## +## distributed under the License is distributed on an "AS IS" BASIS, ## +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ## +## See the License for the specific language governing permissions and ## +## limitations under the License. ## +## ======================================================================== ## + +option(OIDN_ZIP_MODE off) +mark_as_advanced(OIDN_ZIP_MODE) + +## ---------------------------------------------------------------------------- +## Set install directories +## ---------------------------------------------------------------------------- + +include(GNUInstallDirs) + +if(OIDN_ZIP_MODE) + set(CMAKE_INSTALL_BINDIR bin) + set(CMAKE_INSTALL_LIBDIR lib) + set(CMAKE_INSTALL_DOCDIR doc) +endif() + +## ---------------------------------------------------------------------------- +## Set rpath +## ---------------------------------------------------------------------------- + +if(OIDN_ZIP_MODE) + # In tgz / zip let's have relative rpath + set(CMAKE_SKIP_INSTALL_RPATH OFF) + if(APPLE) + set(CMAKE_MACOSX_RPATH ON) + set(CMAKE_INSTALL_RPATH "@executable_path/" "@executable_path/../lib") + else() + set(CMAKE_INSTALL_RPATH "\$ORIGIN:\$ORIGIN/../lib") + endif() +else() + set(CMAKE_INSTALL_NAME_DIR ${CMAKE_INSTALL_FULL_LIBDIR}) + if(APPLE) + # Use rpath on macOS + set(CMAKE_SKIP_INSTALL_RPATH OFF) + else() + # We do not want any rpath for installed binaries + set(CMAKE_SKIP_INSTALL_RPATH ON) + endif() +endif() + +## ---------------------------------------------------------------------------- +## Configure CPack +## ---------------------------------------------------------------------------- + +set(CPACK_PACKAGE_NAME ${PROJECT_NAME}) +set(CPACK_PACKAGE_DESCRIPTION_SUMMARY "Open Image Denoise library") +set(CPACK_PACKAGE_VENDOR "Intel Corporation") +set(CPACK_PACKAGE_INSTALL_DIRECTORY ${CPACK_PACKAGE_NAME}) +set(CPACK_PACKAGE_VERSION ${PROJECT_VERSION}) +set(CPACK_PACKAGE_VERSION_MAJOR ${PROJECT_VERSION_MAJOR}) +set(CPACK_PACKAGE_VERSION_MINOR ${PROJECT_VERSION_MINOR}) +set(CPACK_PACKAGE_VERSION_PATCH ${PROJECT_VERSION_PATCH}) +set(CPACK_PACKAGE_FILE_NAME oidn-${CPACK_PACKAGE_VERSION}) +set(CPACK_VERBATIM_VARIABLES YES) + +if(WIN32) + + # Windows specific settings + if(CMAKE_SIZEOF_VOID_P EQUAL 8) + set(ARCH x64) + set(CPACK_PACKAGE_NAME "${CPACK_PACKAGE_NAME} x64") + else() + set(ARCH win32) + set(CPACK_PACKAGE_NAME "${CPACK_PACKAGE_NAME} Win32") + endif() + + if(MSVC12) + set(VCVER vc12) + elseif(MSVC14) # also for VC15, which is toolset v141 + set(VCVER vc14) + endif() + + set(CPACK_GENERATOR ZIP) + set(CPACK_PACKAGE_FILE_NAME "${CPACK_PACKAGE_FILE_NAME}.${ARCH}.${VCVER}.windows") + set(CPACK_MONOLITHIC_INSTALL 1) + +elseif(APPLE) + + # macOS specific settings + set(CPACK_GENERATOR TGZ) + set(CPACK_PACKAGE_FILE_NAME "${CPACK_PACKAGE_FILE_NAME}.x86_64.macos") + set(CPACK_MONOLITHIC_INSTALL 1) + +else() + + # Linux specific settings + set(CPACK_GENERATOR TGZ) + set(CPACK_PACKAGE_FILE_NAME "${CPACK_PACKAGE_FILE_NAME}.x86_64.linux") + set(CPACK_MONOLITHIC_INSTALL 1) + +endif() \ No newline at end of file diff --git a/oidn/cmake/resource.cmake b/oidn/cmake/resource.cmake new file mode 100644 index 0000000..6a4ac46 --- /dev/null +++ b/oidn/cmake/resource.cmake @@ -0,0 +1,39 @@ +## ======================================================================== ## +## Copyright 2009-2019 Intel Corporation ## +## ## +## Licensed under the Apache License, Version 2.0 (the "License"); ## +## you may not use this file except in compliance with the License. ## +## You may obtain a copy of the License at ## +## ## +## http://www.apache.org/licenses/LICENSE-2.0 ## +## ## +## Unless required by applicable law or agreed to in writing, software ## +## distributed under the License is distributed on an "AS IS" BASIS, ## +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ## +## See the License for the specific language governing permissions and ## +## limitations under the License. ## +## ======================================================================== ## + +# Generates C++ files from the specified binary resource files +find_package(PythonInterp REQUIRED) +function(generate_cpp_resources out_sources namespace) + set(${out_sources}) + foreach(in_file ${ARGN}) + get_filename_component(in_file_we ${in_file} NAME_WE) + get_filename_component(in_dir ${in_file} PATH) + get_filename_component(in_path ${in_file} ABSOLUTE) + set(out_dir ${CMAKE_CURRENT_BINARY_DIR}/${in_dir}) + set(out_path ${out_dir}/${in_file_we}.cpp) + list(APPEND ${out_sources} ${out_path}) + add_custom_command( + OUTPUT ${out_path} + COMMAND ${CMAKE_COMMAND} -E make_directory ${out_dir} + COMMAND ${PYTHON_EXECUTABLE} + ARGS ${PROJECT_SOURCE_DIR}/scripts/resource_to_cpp.py ${in_path} -o ${out_path} -n ${namespace} + DEPENDS ${in_path} + COMMENT "Generating CXX resource object ${out_path}" + VERBATIM) + endforeach() + set_source_files_properties(${${out_sources}} PROPERTIES GENERATED TRUE) + set(${out_sources} ${${out_sources}} PARENT_SCOPE) +endfunction() diff --git a/oidn/common/CMakeLists.txt b/oidn/common/CMakeLists.txt new file mode 100644 index 0000000..955f30c --- /dev/null +++ b/oidn/common/CMakeLists.txt @@ -0,0 +1,47 @@ +## ======================================================================== ## +## Copyright 2009-2019 Intel Corporation ## +## ## +## Licensed under the Apache License, Version 2.0 (the "License"); ## +## you may not use this file except in compliance with the License. ## +## You may obtain a copy of the License at ## +## ## +## http://www.apache.org/licenses/LICENSE-2.0 ## +## ## +## Unless required by applicable law or agreed to in writing, software ## +## distributed under the License is distributed on an "AS IS" BASIS, ## +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ## +## See the License for the specific language governing permissions and ## +## limitations under the License. ## +## ======================================================================== ## + +add_library(common STATIC + platform.h + platform.cpp + exception.h + ref.h + barrier.h + timer.h + thread.h + thread.cpp + tasking.h + tasking.cpp + tensor.h + tensor.cpp +) + +if(OIDN_STATIC_LIB) + target_compile_definitions(common PUBLIC -DOIDN_STATIC_LIB) +endif() + +target_include_directories(common + PUBLIC + $ +) + +target_link_libraries(common PUBLIC ${TBB_LIBRARIES}) +if(UNIX) + target_link_libraries(common PUBLIC pthread) + if(NOT(APPLE)) + target_link_libraries(common PUBLIC rt) + endif() +endif() diff --git a/oidn/common/barrier.h b/oidn/common/barrier.h new file mode 100644 index 0000000..b20f670 --- /dev/null +++ b/oidn/common/barrier.h @@ -0,0 +1,52 @@ +// ======================================================================== // +// Copyright 2009-2019 Intel Corporation // +// // +// Licensed under the Apache License, Version 2.0 (the "License"); // +// you may not use this file except in compliance with the License. // +// You may obtain a copy of the License at // +// // +// http://www.apache.org/licenses/LICENSE-2.0 // +// // +// Unless required by applicable law or agreed to in writing, software // +// distributed under the License is distributed on an "AS IS" BASIS, // +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // +// See the License for the specific language governing permissions and // +// limitations under the License. // +// ======================================================================== // + +#pragma once + +#include "platform.h" +#include +#include + +namespace oidn { + + class Barrier + { + private: + std::mutex m; + std::condition_variable cv; + volatile int count; + + public: + Barrier(int count) : count(count) {} + + void wait() + { + std::unique_lock lk(m); + count--; + + if (count == 0) + { + lk.unlock(); + cv.notify_all(); + } + else + { + cv.wait(lk, [&]{ return count == 0; }); + } + } + }; + +} // namespace oidn diff --git a/oidn/common/exception.h b/oidn/common/exception.h new file mode 100644 index 0000000..18069c6 --- /dev/null +++ b/oidn/common/exception.h @@ -0,0 +1,45 @@ +// ======================================================================== // +// Copyright 2009-2019 Intel Corporation // +// // +// Licensed under the Apache License, Version 2.0 (the "License"); // +// you may not use this file except in compliance with the License. // +// You may obtain a copy of the License at // +// // +// http://www.apache.org/licenses/LICENSE-2.0 // +// // +// Unless required by applicable law or agreed to in writing, software // +// distributed under the License is distributed on an "AS IS" BASIS, // +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // +// See the License for the specific language governing permissions and // +// limitations under the License. // +// ======================================================================== // + +#pragma once + +#include +#include "platform.h" + +namespace oidn { + + class Exception : public std::exception + { + private: + Error error; + const char* message; + + public: + Exception(Error error, const char* message) + : error(error), message(message) {} + + Error code() const noexcept + { + return error; + } + + const char* what() const noexcept override + { + return message; + } + }; + +} // namespace oidn diff --git a/oidn/common/platform.cpp b/oidn/common/platform.cpp new file mode 100644 index 0000000..59a14ff --- /dev/null +++ b/oidn/common/platform.cpp @@ -0,0 +1,114 @@ +// ======================================================================== // +// Copyright 2009-2019 Intel Corporation // +// // +// Licensed under the Apache License, Version 2.0 (the "License"); // +// you may not use this file except in compliance with the License. // +// You may obtain a copy of the License at // +// // +// http://www.apache.org/licenses/LICENSE-2.0 // +// // +// Unless required by applicable law or agreed to in writing, software // +// distributed under the License is distributed on an "AS IS" BASIS, // +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // +// See the License for the specific language governing permissions and // +// limitations under the License. // +// ======================================================================== // + +#include "platform.h" + +namespace oidn { + + // ---------------------------------------------------------------------------- + // Common functions + // ---------------------------------------------------------------------------- + + void* alignedMalloc(size_t size, size_t alignment) + { + if (size == 0) + return nullptr; + + assert((alignment & (alignment-1)) == 0); + void* ptr = _mm_malloc(size, alignment); + + if (ptr == nullptr) + throw std::bad_alloc(); + + return ptr; + } + + void alignedFree(void* ptr) + { + if (ptr) + _mm_free(ptr); + } + + // ---------------------------------------------------------------------------- + // System information + // ---------------------------------------------------------------------------- + + std::string getPlatformName() + { + std::string name; + + #if defined(__linux__) + name = "Linux"; + #elif defined(__FreeBSD__) + name = "FreeBSD"; + #elif defined(__CYGWIN__) + name = "Cygwin"; + #elif defined(_WIN32) + name = "Windows"; + #elif defined(__APPLE__) + name = "macOS"; + #elif defined(__unix__) + name = "Unix"; + #else + return "Unknown"; + #endif + + #if defined(__x86_64__) || defined(_M_X64) || defined(__ia64__) || defined(__aarch64__) + name += " (64-bit)"; + #else + name += " (32-bit)"; + #endif + + return name; + } + + std::string getCompilerName() + { + #if defined(__INTEL_COMPILER) + int mayor = __INTEL_COMPILER / 100 % 100; + int minor = __INTEL_COMPILER % 100; + std::string version = "Intel Compiler "; + version += toString(mayor); + version += "." + toString(minor); + #if defined(__INTEL_COMPILER_UPDATE) + version += "." + toString(__INTEL_COMPILER_UPDATE); + #endif + return version; + #elif defined(__clang__) + return "Clang " __clang_version__; + #elif defined(__GNUC__) + return "GCC " __VERSION__; + #elif defined(_MSC_VER) + std::string version = toString(_MSC_FULL_VER); + version.insert(4, "."); + version.insert(9, "."); + version.insert(2, "."); + return "Visual C++ Compiler " + version; + #else + return "Unknown"; + #endif + } + + std::string getBuildName() + { + #if defined(NDEBUG) + return "Release"; + #else + return "Debug"; + #endif + } + +} // namespace oidn diff --git a/oidn/common/platform.h b/oidn/common/platform.h new file mode 100644 index 0000000..a9dd871 --- /dev/null +++ b/oidn/common/platform.h @@ -0,0 +1,131 @@ +// ======================================================================== // +// Copyright 2009-2019 Intel Corporation // +// // +// Licensed under the Apache License, Version 2.0 (the "License"); // +// you may not use this file except in compliance with the License. // +// You may obtain a copy of the License at // +// // +// http://www.apache.org/licenses/LICENSE-2.0 // +// // +// Unless required by applicable law or agreed to in writing, software // +// distributed under the License is distributed on an "AS IS" BASIS, // +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // +// See the License for the specific language governing permissions and // +// limitations under the License. // +// ======================================================================== // + +#pragma once + +#if defined(_WIN32) + #define WIN32_LEAN_AND_MEAN + #define NOMINMAX + #include +#elif defined(__APPLE__) + #include +#endif + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "include/OpenImageDenoise/oidn.hpp" + +namespace oidn { + + // ---------------------------------------------------------------------------- + // Macros + // ---------------------------------------------------------------------------- + + #if defined(_WIN32) + // Windows + #if !defined(__noinline) + #define __noinline __declspec(noinline) + #endif + #else + // Unix + #if !defined(__forceinline) + #define __forceinline inline __attribute__((always_inline)) + #endif + #if !defined(__noinline) + #define __noinline __attribute__((noinline)) + #endif + #endif + + #ifndef UNUSED + #define UNUSED(x) ((void)x) + #endif + #ifndef MAYBE_UNUSED + #define MAYBE_UNUSED(x) UNUSED(x) + #endif + + // ---------------------------------------------------------------------------- + // Error handling and debugging + // ---------------------------------------------------------------------------- + + struct Verbose + { + int verbose; + + Verbose(int v = 0) : verbose(v) {} + __forceinline bool isVerbose(int v = 1) const { return v <= verbose; } + }; + + #define OIDN_WARNING(message) { if (isVerbose()) std::cerr << "Warning: " << message << std::endl; } + #define OIDN_FATAL(message) throw std::runtime_error(message); + + // ---------------------------------------------------------------------------- + // Common functions + // ---------------------------------------------------------------------------- + + using std::min; + using std::max; + + template + __forceinline T clamp(const T& value, const T& minValue, const T& maxValue) + { + return min(max(value, minValue), maxValue); + } + + void* alignedMalloc(size_t size, size_t alignment); + void alignedFree(void* ptr); + + template + inline std::string toString(const T& a) + { + std::stringstream sm; + sm << a; + return sm.str(); + } + +#if defined(__APPLE__) + template + bool getSysctl(const char* name, T& value) + { + int64_t result = 0; + size_t size = sizeof(result); + + if (sysctlbyname(name, &result, &size, nullptr, 0) != 0) + return false; + + value = T(result); + return true; + } +#endif + + // ---------------------------------------------------------------------------- + // System information + // ---------------------------------------------------------------------------- + + std::string getPlatformName(); + std::string getCompilerName(); + std::string getBuildName(); + +} // namespace oidn + diff --git a/oidn/common/ref.h b/oidn/common/ref.h new file mode 100644 index 0000000..8b2b2de --- /dev/null +++ b/oidn/common/ref.h @@ -0,0 +1,163 @@ +// ======================================================================== // +// Copyright 2009-2019 Intel Corporation // +// // +// Licensed under the Apache License, Version 2.0 (the "License"); // +// you may not use this file except in compliance with the License. // +// You may obtain a copy of the License at // +// // +// http://www.apache.org/licenses/LICENSE-2.0 // +// // +// Unless required by applicable law or agreed to in writing, software // +// distributed under the License is distributed on an "AS IS" BASIS, // +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // +// See the License for the specific language governing permissions and // +// limitations under the License. // +// ======================================================================== // + +#pragma once + +#include "platform.h" + +namespace oidn { + + class RefCount + { + private: + std::atomic count; + + public: + __forceinline RefCount(int count = 0) noexcept : count(count) {} + + __forceinline size_t incRef() noexcept + { + return count.fetch_add(1) + 1; + } + + __forceinline size_t decRef() + { + const size_t newCount = decRefKeep(); + if (newCount == 0) + destroy(); + return newCount; + } + + __forceinline size_t decRefKeep() noexcept + { + return count.fetch_add(-1) - 1; + } + + __forceinline void destroy() + { + delete this; + } + + protected: + // Disable copying + RefCount(const RefCount&) = delete; + RefCount& operator =(const RefCount&) = delete; + + virtual ~RefCount() noexcept = default; + }; + + template + class Ref + { + private: + T* ptr; + + public: + __forceinline Ref() noexcept : ptr(nullptr) {} + __forceinline Ref(std::nullptr_t) noexcept : ptr(nullptr) {} + __forceinline Ref(const Ref& other) noexcept : ptr(other.ptr) { if (ptr) ptr->incRef(); } + __forceinline Ref(Ref&& other) noexcept : ptr(other.ptr) { other.ptr = nullptr; } + __forceinline Ref(T* ptr) noexcept : ptr(ptr) { if (ptr) ptr->incRef(); } + + template + __forceinline Ref(const Ref& other) noexcept : ptr(other.get()) { if (ptr) ptr->incRef(); } + + template + __forceinline explicit Ref(Y* ptr) noexcept : ptr(ptr) { if (ptr) ptr->incRef(); } + + __forceinline ~Ref() { if (ptr) ptr->decRef(); } + + __forceinline Ref& operator =(const Ref& other) + { + if (other.ptr) + other.ptr->incRef(); + if (ptr) + ptr->decRef(); + ptr = other.ptr; + return *this; + } + + __forceinline Ref& operator =(Ref&& other) + { + if (ptr) + ptr->decRef(); + ptr = other.ptr; + other.ptr = nullptr; + return *this; + } + + __forceinline Ref& operator =(T* other) + { + if (other) + other->incRef(); + if (ptr) + ptr->decRef(); + ptr = other; + return *this; + } + + __forceinline Ref& operator =(std::nullptr_t) + { + if (ptr) + ptr->decRef(); + ptr = nullptr; + return *this; + } + + __forceinline operator bool() const noexcept { return ptr != nullptr; } + + __forceinline T& operator *() const noexcept { return *ptr; } + __forceinline T* operator ->() const noexcept { return ptr; } + + __forceinline T* get() const noexcept { return ptr; } + + __forceinline T* detach() noexcept + { + T* res = ptr; + ptr = nullptr; + return res; + } + }; + + template __forceinline bool operator < (const Ref& a, const Ref& b) noexcept { return a.ptr < b.ptr; } + + template __forceinline bool operator ==(const Ref& a, std::nullptr_t) noexcept { return a.ptr == nullptr; } + template __forceinline bool operator ==(std::nullptr_t, const Ref& b) noexcept { return nullptr == b.ptr; } + template __forceinline bool operator ==(const Ref& a, const Ref& b) noexcept { return a.ptr == b.ptr; } + + template __forceinline bool operator !=(const Ref& a, std::nullptr_t) noexcept { return a.ptr != nullptr; } + template __forceinline bool operator !=(std::nullptr_t, const Ref& b) noexcept { return nullptr != b.ptr; } + template __forceinline bool operator !=(const Ref& a, const Ref& b) noexcept { return a.ptr != b.ptr; } + + template + __forceinline Ref makeRef(Args&&... args) + { + return Ref(new T(std::forward(args)...)); + } + + template + __forceinline Ref staticRefCast(const Ref& a) + { + return Ref(static_cast(a.get())); + } + + template + __forceinline Ref dynamicRefCast(const Ref& a) + { + return Ref(dynamic_cast(a.get())); + } + +} // namespace oidn diff --git a/oidn/common/tasking.cpp b/oidn/common/tasking.cpp new file mode 100644 index 0000000..e30c179 --- /dev/null +++ b/oidn/common/tasking.cpp @@ -0,0 +1,55 @@ +// ======================================================================== // +// Copyright 2009-2019 Intel Corporation // +// // +// Licensed under the Apache License, Version 2.0 (the "License"); // +// you may not use this file except in compliance with the License. // +// You may obtain a copy of the License at // +// // +// http://www.apache.org/licenses/LICENSE-2.0 // +// // +// Unless required by applicable law or agreed to in writing, software // +// distributed under the License is distributed on an "AS IS" BASIS, // +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // +// See the License for the specific language governing permissions and // +// limitations under the License. // +// ======================================================================== // + +#include "tasking.h" + +namespace oidn { + + // -------------------------------------------------------------------------- + // PinningObserver + // -------------------------------------------------------------------------- + + PinningObserver::PinningObserver(const std::shared_ptr& affinity) + : affinity(affinity) + { + observe(true); + } + + PinningObserver::PinningObserver(const std::shared_ptr& affinity, tbb::task_arena& arena) + : tbb::task_scheduler_observer(arena), + affinity(affinity) + { + observe(true); + } + + PinningObserver::~PinningObserver() + { + observe(false); + } + + void PinningObserver::on_scheduler_entry(bool isWorker) + { + const int threadIndex = tbb::this_task_arena::current_thread_index(); + affinity->set(threadIndex); + } + + void PinningObserver::on_scheduler_exit(bool isWorker) + { + const int threadIndex = tbb::this_task_arena::current_thread_index(); + affinity->restore(threadIndex); + } + +} // namespace oidn diff --git a/oidn/common/tasking.h b/oidn/common/tasking.h new file mode 100644 index 0000000..3856bee --- /dev/null +++ b/oidn/common/tasking.h @@ -0,0 +1,50 @@ +// ======================================================================== // +// Copyright 2009-2019 Intel Corporation // +// // +// Licensed under the Apache License, Version 2.0 (the "License"); // +// you may not use this file except in compliance with the License. // +// You may obtain a copy of the License at // +// // +// http://www.apache.org/licenses/LICENSE-2.0 // +// // +// Unless required by applicable law or agreed to in writing, software // +// distributed under the License is distributed on an "AS IS" BASIS, // +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // +// See the License for the specific language governing permissions and // +// limitations under the License. // +// ======================================================================== // + +#pragma once + +#include "thread.h" + +#define TBB_PREVIEW_LOCAL_OBSERVER 1 +#include "tbb/task_scheduler_init.h" +#include "tbb/task_scheduler_observer.h" +#include "tbb/task_arena.h" +#include "tbb/parallel_for.h" +#include "tbb/parallel_reduce.h" +#include "tbb/blocked_range.h" +#include "tbb/blocked_range2d.h" + +namespace oidn { + + // -------------------------------------------------------------------------- + // PinningObserver + // -------------------------------------------------------------------------- + + class PinningObserver : public tbb::task_scheduler_observer + { + private: + std::shared_ptr affinity; + + public: + explicit PinningObserver(const std::shared_ptr& affinity); + PinningObserver(const std::shared_ptr& affinity, tbb::task_arena& arena); + ~PinningObserver(); + + void on_scheduler_entry(bool isWorker) override; + void on_scheduler_exit(bool isWorker) override; + }; + +} // namespace oidn diff --git a/oidn/common/tensor.cpp b/oidn/common/tensor.cpp new file mode 100644 index 0000000..0249f2e --- /dev/null +++ b/oidn/common/tensor.cpp @@ -0,0 +1,83 @@ +// ======================================================================== // +// Copyright 2009-2019 Intel Corporation // +// // +// Licensed under the Apache License, Version 2.0 (the "License"); // +// you may not use this file except in compliance with the License. // +// You may obtain a copy of the License at // +// // +// http://www.apache.org/licenses/LICENSE-2.0 // +// // +// Unless required by applicable law or agreed to in writing, software // +// distributed under the License is distributed on an "AS IS" BASIS, // +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // +// See the License for the specific language governing permissions and // +// limitations under the License. // +// ======================================================================== // + +#include "exception.h" +#include "tensor.h" + +namespace oidn { + + std::map parseTensors(void* buffer) + { + char* input = (char*)buffer; + + // Parse the magic value + const int magic = *(unsigned short*)input; + if (magic != 0x41D7) + throw Exception(Error::InvalidOperation, "invalid tensor archive"); + input += sizeof(unsigned short); + + // Parse the version + const int majorVersion = *(unsigned char*)input++; + const int minorVersion = *(unsigned char*)input++; + UNUSED(minorVersion); + if (majorVersion > 1) + throw Exception(Error::InvalidOperation, "unsupported tensor archive version"); + + // Parse the number of tensors + const int numTensors = *(int*)input; + input += sizeof(int); + + // Parse the tensors + std::map tensorMap; + for (int i = 0; i < numTensors; ++i) + { + Tensor tensor; + + // Parse the name + const int nameLen = *(unsigned char*)input++; + std::string name(input, nameLen); + input += nameLen; + + // Parse the number of dimensions + const int ndims = *(unsigned char*)input++; + + // Parse the shape of the tensor + tensor.dims.resize(ndims); + for (int i = 0; i < ndims; ++i) + tensor.dims[i] = ((int*)input)[i]; + input += ndims * sizeof(int); + + // Parse the format of the tensor + tensor.format = std::string(input, input + ndims); + input += ndims; + + // Parse the data type of the tensor + const char type = *(unsigned char*)input++; + if (type != 'f') // only float32 is supported + throw Exception(Error::InvalidOperation, "unsupported tensor data type"); + + // Skip the data + tensor.data = (float*)input; + input += tensor.size() * sizeof(float); + + // Add the tensor to the map + tensorMap.emplace(name, std::move(tensor)); + } + + return tensorMap; + } + +} // namespace oidn diff --git a/oidn/common/tensor.h b/oidn/common/tensor.h new file mode 100644 index 0000000..48e7d11 --- /dev/null +++ b/oidn/common/tensor.h @@ -0,0 +1,66 @@ +// ======================================================================== // +// Copyright 2009-2019 Intel Corporation // +// // +// Licensed under the Apache License, Version 2.0 (the "License"); // +// you may not use this file except in compliance with the License. // +// You may obtain a copy of the License at // +// // +// http://www.apache.org/licenses/LICENSE-2.0 // +// // +// Unless required by applicable law or agreed to in writing, software // +// distributed under the License is distributed on an "AS IS" BASIS, // +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // +// See the License for the specific language governing permissions and // +// limitations under the License. // +// ======================================================================== // + +#pragma once + +#include "platform.h" +#include +#include + +namespace oidn { + + template + using shared_vector = std::shared_ptr>; + + // Generic tensor + struct Tensor + { + float* data; + std::vector dims; + std::string format; + shared_vector buffer; // optional, only for reference counting + + __forceinline Tensor() : data(nullptr) {} + + __forceinline Tensor(const std::vector& dims, const std::string& format) + : dims(dims), + format(format) + { + buffer = std::make_shared>(size() * sizeof(float)); + data = (float*)buffer->data(); + } + + __forceinline operator bool() const { return data != nullptr; } + + __forceinline int ndims() const { return (int)dims.size(); } + + // Returns the number of values + __forceinline size_t size() const + { + size_t size = 1; + for (int i = 0; i < ndims(); ++i) + size *= dims[i]; + return size; + } + + __forceinline float& operator [](size_t i) { return data[i]; } + __forceinline const float& operator [](size_t i) const { return data[i]; } + }; + + // Parses tensors from a buffer + std::map parseTensors(void* buffer); + +} // namespace oidn diff --git a/oidn/common/thread.cpp b/oidn/common/thread.cpp new file mode 100644 index 0000000..48c489c --- /dev/null +++ b/oidn/common/thread.cpp @@ -0,0 +1,297 @@ +// ======================================================================== // +// Copyright 2009-2019 Intel Corporation // +// // +// Licensed under the Apache License, Version 2.0 (the "License"); // +// you may not use this file except in compliance with the License. // +// You may obtain a copy of the License at // +// // +// http://www.apache.org/licenses/LICENSE-2.0 // +// // +// Unless required by applicable law or agreed to in writing, software // +// distributed under the License is distributed on an "AS IS" BASIS, // +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // +// See the License for the specific language governing permissions and // +// limitations under the License. // +// ======================================================================== // + +#if defined(_MSC_VER) + #pragma warning (disable : 4146) // unary minus operator applied to unsigned type, result still unsigned +#endif + +#if defined(__APPLE__) + #include + #include +#endif + +#include "thread.h" +#include + +namespace oidn { + +#if defined(_WIN32) + + // -------------------------------------------------------------------------- + // ThreadAffinity - Windows + // -------------------------------------------------------------------------- + + ThreadAffinity::ThreadAffinity(int numThreadsPerCore, int verbose) + : Verbose(verbose) + { + HMODULE hLib = GetModuleHandle(TEXT("kernel32")); + pGetLogicalProcessorInformationEx = (GetLogicalProcessorInformationExFunc)GetProcAddress(hLib, "GetLogicalProcessorInformationEx"); + pSetThreadGroupAffinity = (SetThreadGroupAffinityFunc)GetProcAddress(hLib, "SetThreadGroupAffinity"); + + if (pGetLogicalProcessorInformationEx && pSetThreadGroupAffinity) + { + // Get logical processor information + PSYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX buffer = nullptr; + DWORD bufferSize = 0; + + // First call the function with an empty buffer to get the required buffer size + BOOL result = pGetLogicalProcessorInformationEx(RelationProcessorCore, buffer, &bufferSize); + if (result || GetLastError() != ERROR_INSUFFICIENT_BUFFER) + { + OIDN_WARNING("GetLogicalProcessorInformationEx failed"); + return; + } + + // Allocate the buffer + buffer = (PSYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX)malloc(bufferSize); + if (!buffer) + { + OIDN_WARNING("SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX allocation failed"); + return; + } + + // Call again the function but now with the properly sized buffer + result = pGetLogicalProcessorInformationEx(RelationProcessorCore, buffer, &bufferSize); + if (!result) + { + OIDN_WARNING("GetLogicalProcessorInformationEx failed"); + free(buffer); + return; + } + + // Iterate over the logical processor information structures + // There should be one structure for each physical core + char* ptr = (char*)buffer; + while (ptr < (char*)buffer + bufferSize) + { + PSYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX item = (PSYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX)ptr; + if (item->Relationship == RelationProcessorCore && item->Processor.GroupCount > 0) + { + // Iterate over the groups + int numThreads = 0; + for (int group = 0; (group < item->Processor.GroupCount) && (numThreads < numThreadsPerCore); ++group) + { + GROUP_AFFINITY coreAffinity = item->Processor.GroupMask[group]; + while ((coreAffinity.Mask != 0) && (numThreads < numThreadsPerCore)) + { + // Extract the next set bit/thread from the mask + GROUP_AFFINITY threadAffinity = coreAffinity; + threadAffinity.Mask = threadAffinity.Mask & -threadAffinity.Mask; + + // Push the affinity for this thread + affinities.push_back(threadAffinity); + oldAffinities.push_back(threadAffinity); + numThreads++; + + // Remove this bit/thread from the mask + coreAffinity.Mask ^= threadAffinity.Mask; + } + } + } + + // Next structure + ptr += item->Size; + } + + // Free the buffer + free(buffer); + } + } + + void ThreadAffinity::set(int threadIndex) + { + if (threadIndex >= (int)affinities.size()) + return; + + // Save the current affinity and set the new one + const HANDLE thread = GetCurrentThread(); + if (!pSetThreadGroupAffinity(thread, &affinities[threadIndex], &oldAffinities[threadIndex])) + OIDN_WARNING("SetThreadGroupAffinity failed"); + } + + void ThreadAffinity::restore(int threadIndex) + { + if (threadIndex >= (int)affinities.size()) + return; + + // Restore the original affinity + const HANDLE thread = GetCurrentThread(); + if (!pSetThreadGroupAffinity(thread, &oldAffinities[threadIndex], nullptr)) + OIDN_WARNING("SetThreadGroupAffinity failed"); + } + +#elif defined(__linux__) + + // -------------------------------------------------------------------------- + // ThreadAffinity - Linux + // -------------------------------------------------------------------------- + + ThreadAffinity::ThreadAffinity(int numThreadsPerCore, int verbose) + : Verbose(verbose) + { + std::vector threadIds; + + // Parse the thread/CPU topology + for (int cpuId = 0; ; cpuId++) + { + std::fstream fs; + std::string cpu = std::string("/sys/devices/system/cpu/cpu") + std::to_string(cpuId) + std::string("/topology/thread_siblings_list"); + fs.open(cpu.c_str(), std::fstream::in); + if (fs.fail()) break; + + int i; + int j = 0; + while ((j < numThreadsPerCore) && (fs >> i)) + { + if (std::none_of(threadIds.begin(), threadIds.end(), [&](int id) { return id == i; })) + threadIds.push_back(i); + + if (fs.peek() == ',') + fs.ignore(); + j++; + } + + fs.close(); + } + + #if 0 + for (size_t i = 0; i < thread_ids.size(); ++i) + std::cout << "thread " << i << " -> " << thread_ids[i] << std::endl; + #endif + + // Create the affinity structures + affinities.resize(threadIds.size()); + oldAffinities.resize(threadIds.size()); + + for (size_t i = 0; i < threadIds.size(); ++i) + { + cpu_set_t affinity; + CPU_ZERO(&affinity); + CPU_SET(threadIds[i], &affinity); + + affinities[i] = affinity; + oldAffinities[i] = affinity; + } + } + + void ThreadAffinity::set(int threadIndex) + { + if (threadIndex >= (int)affinities.size()) + return; + + const pthread_t thread = pthread_self(); + + // Save the current affinity + if (pthread_getaffinity_np(thread, sizeof(cpu_set_t), &oldAffinities[threadIndex]) != 0) + { + OIDN_WARNING("pthread_getaffinity_np failed"); + oldAffinities[threadIndex] = affinities[threadIndex]; + return; + } + + // Set the new affinity + if (pthread_setaffinity_np(thread, sizeof(cpu_set_t), &affinities[threadIndex]) != 0) + OIDN_WARNING("pthread_setaffinity_np failed"); + } + + void ThreadAffinity::restore(int threadIndex) + { + if (threadIndex >= (int)affinities.size()) + return; + + const pthread_t thread = pthread_self(); + + // Restore the original affinity + if (pthread_setaffinity_np(thread, sizeof(cpu_set_t), &oldAffinities[threadIndex]) != 0) + OIDN_WARNING("pthread_setaffinity_np failed"); + } + +#elif defined(__APPLE__) + + // -------------------------------------------------------------------------- + // ThreadAffinity - macOS + // -------------------------------------------------------------------------- + + ThreadAffinity::ThreadAffinity(int numThreadsPerCore, int verbose) + : Verbose(verbose) + { + // Query the thread/CPU topology + int numPhysicalCpus; + int numLogicalCpus; + + if (!getSysctl("hw.physicalcpu", numPhysicalCpus) || !getSysctl("hw.logicalcpu", numLogicalCpus)) + { + OIDN_WARNING("sysctlbyname failed"); + return; + } + + if ((numLogicalCpus % numPhysicalCpus != 0) && (numThreadsPerCore > 1)) + return; // this shouldn't happen + const int maxThreadsPerCore = numLogicalCpus / numPhysicalCpus; + + // Create the affinity structures + // macOS doesn't support binding a thread to a specific core, but we can at least group threads which + // should be on the same core together + for (int core = 1; core <= numPhysicalCpus; ++core) // tags start from 1! + { + thread_affinity_policy affinity; + affinity.affinity_tag = core; + + for (int thread = 0; thread < min(numThreadsPerCore, maxThreadsPerCore); ++thread) + { + affinities.push_back(affinity); + oldAffinities.push_back(affinity); + } + } + } + + void ThreadAffinity::set(int threadIndex) + { + if (threadIndex >= (int)affinities.size()) + return; + + const auto thread = mach_thread_self(); + + // Save the current affinity + mach_msg_type_number_t policyCount = THREAD_AFFINITY_POLICY_COUNT; + boolean_t getDefault = FALSE; + if (thread_policy_get(thread, THREAD_AFFINITY_POLICY, (thread_policy_t)&oldAffinities[threadIndex], &policyCount, &getDefault) != KERN_SUCCESS) + { + OIDN_WARNING("thread_policy_get failed"); + oldAffinities[threadIndex] = affinities[threadIndex]; + return; + } + + // Set the new affinity + if (thread_policy_set(thread, THREAD_AFFINITY_POLICY, (thread_policy_t)&affinities[threadIndex], THREAD_AFFINITY_POLICY_COUNT) != KERN_SUCCESS) + OIDN_WARNING("thread_policy_set failed"); + } + + void ThreadAffinity::restore(int threadIndex) + { + if (threadIndex >= (int)affinities.size()) + return; + + const auto thread = mach_thread_self(); + + // Restore the original affinity + if (thread_policy_set(thread, THREAD_AFFINITY_POLICY, (thread_policy_t)&oldAffinities[threadIndex], THREAD_AFFINITY_POLICY_COUNT) != KERN_SUCCESS) + OIDN_WARNING("thread_policy_set failed"); + } + +#endif + +} // namespace oidn diff --git a/oidn/common/thread.h b/oidn/common/thread.h new file mode 100644 index 0000000..2c73136 --- /dev/null +++ b/oidn/common/thread.h @@ -0,0 +1,202 @@ +// ======================================================================== // +// Copyright 2009-2019 Intel Corporation // +// // +// Licensed under the Apache License, Version 2.0 (the "License"); // +// you may not use this file except in compliance with the License. // +// You may obtain a copy of the License at // +// // +// http://www.apache.org/licenses/LICENSE-2.0 // +// // +// Unless required by applicable law or agreed to in writing, software // +// distributed under the License is distributed on an "AS IS" BASIS, // +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // +// See the License for the specific language governing permissions and // +// limitations under the License. // +// ======================================================================== // + +#pragma once + +#include "platform.h" + +#if !defined(_WIN32) + #include + #include + #if defined(__APPLE__) + #include + #endif +#endif + +#include +#include + +namespace oidn { + + // -------------------------------------------------------------------------- + // ThreadLocal + // -------------------------------------------------------------------------- + + // Wrapper which makes any variable thread-local + template + class ThreadLocal : public Verbose + { + private: + #if defined(_WIN32) + DWORD key; + #else + pthread_key_t key; + #endif + + std::vector instances; + std::mutex mutex; + + public: + ThreadLocal(int verbose = 0) + : Verbose(verbose) + { + #if defined(_WIN32) + key = TlsAlloc(); + if (key == TLS_OUT_OF_INDEXES) + OIDN_FATAL("TlsAlloc failed"); + #else + if (pthread_key_create(&key, nullptr) != 0) + OIDN_FATAL("pthread_key_create failed"); + #endif + } + + ~ThreadLocal() + { + std::lock_guard lock(mutex); + for (T* ptr : instances) + delete ptr; + + #if defined(_WIN32) + if (!TlsFree(key)) + OIDN_WARNING("TlsFree failed"); + #else + if (pthread_key_delete(key) != 0) + OIDN_WARNING("pthread_key_delete failed"); + #endif + } + + T& get() + { + #if defined(_WIN32) + T* ptr = (T*)TlsGetValue(key); + #else + T* ptr = (T*)pthread_getspecific(key); + #endif + + if (ptr) + return *ptr; + + ptr = new T; + std::lock_guard lock(mutex); + instances.push_back(ptr); + + #if defined(_WIN32) + if (!TlsSetValue(key, ptr)) + OIDN_FATAL("TlsSetValue failed"); + #else + if (pthread_setspecific(key, ptr) != 0) + OIDN_FATAL("pthread_setspecific failed"); + #endif + + return *ptr; + } + }; + +#if defined(_WIN32) + + // -------------------------------------------------------------------------- + // ThreadAffinity - Windows + // -------------------------------------------------------------------------- + + class ThreadAffinity : public Verbose + { + private: + typedef BOOL (WINAPI *GetLogicalProcessorInformationExFunc)(LOGICAL_PROCESSOR_RELATIONSHIP, + PSYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX, + PDWORD); + + typedef BOOL (WINAPI *SetThreadGroupAffinityFunc)(HANDLE, + CONST GROUP_AFFINITY*, + PGROUP_AFFINITY); + + GetLogicalProcessorInformationExFunc pGetLogicalProcessorInformationEx = nullptr; + SetThreadGroupAffinityFunc pSetThreadGroupAffinity = nullptr; + + std::vector affinities; // thread affinities + std::vector oldAffinities; // original thread affinities + + public: + ThreadAffinity(int numThreadsPerCore = INT_MAX, int verbose = 0); + + int getNumThreads() const + { + return (int)affinities.size(); + } + + // Sets the affinity (0..numThreads-1) of the thread after saving the current affinity + void set(int threadIndex); + + // Restores the affinity of the thread + void restore(int threadIndex); + }; + +#elif defined(__linux__) + + // -------------------------------------------------------------------------- + // ThreadAffinity - Linux + // -------------------------------------------------------------------------- + + class ThreadAffinity : public Verbose + { + private: + std::vector affinities; // thread affinities + std::vector oldAffinities; // original thread affinities + + public: + ThreadAffinity(int numThreadsPerCore = INT_MAX, int verbose = 0); + + int getNumThreads() const + { + return (int)affinities.size(); + } + + // Sets the affinity (0..numThreads-1) of the thread after saving the current affinity + void set(int threadIndex); + + // Restores the affinity of the thread + void restore(int threadIndex); + }; + +#elif defined(__APPLE__) + + // -------------------------------------------------------------------------- + // ThreadAffinity - macOS + // -------------------------------------------------------------------------- + + class ThreadAffinity : public Verbose + { + private: + std::vector affinities; // thread affinities + std::vector oldAffinities; // original thread affinities + + public: + ThreadAffinity(int numThreadsPerCore = INT_MAX, int verbose = 0); + + int getNumThreads() const + { + return (int)affinities.size(); + } + + // Sets the affinity (0..numThreads-1) of the thread after saving the current affinity + void set(int threadIndex); + + // Restores the affinity of the thread + void restore(int threadIndex); + }; + +#endif + +} // namespace oidn diff --git a/oidn/common/timer.h b/oidn/common/timer.h new file mode 100644 index 0000000..62aaaa1 --- /dev/null +++ b/oidn/common/timer.h @@ -0,0 +1,49 @@ +// ======================================================================== // +// Copyright 2009-2019 Intel Corporation // +// // +// Licensed under the Apache License, Version 2.0 (the "License"); // +// you may not use this file except in compliance with the License. // +// You may obtain a copy of the License at // +// // +// http://www.apache.org/licenses/LICENSE-2.0 // +// // +// Unless required by applicable law or agreed to in writing, software // +// distributed under the License is distributed on an "AS IS" BASIS, // +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // +// See the License for the specific language governing permissions and // +// limitations under the License. // +// ======================================================================== // + +#pragma once + +#include "platform.h" +#include + +namespace oidn { + + class Timer + { + private: + using clock = std::chrono::high_resolution_clock; + + std::chrono::time_point start; + + public: + Timer() + { + reset(); + } + + void reset() + { + start = clock::now(); + } + + double query() const + { + auto end = clock::now(); + return std::chrono::duration_cast>(end - start).count(); + } + }; + +} // namespace oidn diff --git a/oidn/core/api.cpp b/oidn/core/api.cpp new file mode 100644 index 0000000..48b3b8e --- /dev/null +++ b/oidn/core/api.cpp @@ -0,0 +1,387 @@ +// ======================================================================== // +// Copyright 2009-2019 Intel Corporation // +// // +// Licensed under the Apache License, Version 2.0 (the "License"); // +// you may not use this file except in compliance with the License. // +// You may obtain a copy of the License at // +// // +// http://www.apache.org/licenses/LICENSE-2.0 // +// // +// Unless required by applicable law or agreed to in writing, software // +// distributed under the License is distributed on an "AS IS" BASIS, // +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // +// See the License for the specific language governing permissions and // +// limitations under the License. // +// ======================================================================== // + +#ifdef _WIN32 +# define OIDN_API extern "C" __declspec(dllexport) +#else +# define OIDN_API extern "C" __attribute__ ((visibility ("default"))) +#endif + +// Locks the device that owns the specified object +// Use *only* inside OIDN_TRY/CATCH! +#define OIDN_LOCK(obj) \ + std::lock_guard lock(obj->getDevice()->getMutex()); + +// Try/catch for converting exceptions to errors +#define OIDN_TRY \ + try { + +#define OIDN_CATCH(obj) \ + } catch (Exception& e) { \ + Device::setError(obj ? obj->getDevice() : nullptr, e.code(), e.what()); \ + } catch (std::bad_alloc&) { \ + Device::setError(obj ? obj->getDevice() : nullptr, Error::OutOfMemory, "out of memory"); \ + } catch (mkldnn::error& e) { \ + if (e.status == mkldnn_out_of_memory) \ + Device::setError(obj ? obj->getDevice() : nullptr, Error::OutOfMemory, "out of memory"); \ + else \ + Device::setError(obj ? obj->getDevice() : nullptr, Error::Unknown, e.message); \ + } catch (std::exception& e) { \ + Device::setError(obj ? obj->getDevice() : nullptr, Error::Unknown, e.what()); \ + } catch (...) { \ + Device::setError(obj ? obj->getDevice() : nullptr, Error::Unknown, "unknown exception caught"); \ + } + +#include "device.h" +#include "filter.h" +#include + +namespace oidn { + + namespace + { + __forceinline void checkHandle(void* handle) + { + if (handle == nullptr) + throw Exception(Error::InvalidArgument, "invalid handle"); + } + + template + __forceinline void retainObject(T* obj) + { + if (obj) + { + obj->incRef(); + } + else + { + OIDN_TRY + checkHandle(obj); + OIDN_CATCH(obj) + } + } + + template + __forceinline void releaseObject(T* obj) + { + if (obj == nullptr || obj->decRefKeep() == 0) + { + OIDN_TRY + checkHandle(obj); + OIDN_LOCK(obj); + obj->destroy(); + OIDN_CATCH(obj) + } + } + + template<> + __forceinline void releaseObject(Device* obj) + { + if (obj == nullptr || obj->decRefKeep() == 0) + { + OIDN_TRY + checkHandle(obj); + // Do NOT lock the device because it owns the mutex + obj->destroy(); + OIDN_CATCH(obj) + } + } + } + + OIDN_API OIDNDevice oidnNewDevice(OIDNDeviceType type) + { + Ref device = nullptr; + OIDN_TRY + if (type == OIDN_DEVICE_TYPE_CPU || type == OIDN_DEVICE_TYPE_DEFAULT) + device = makeRef(); + else + throw Exception(Error::InvalidArgument, "invalid device type"); + OIDN_CATCH(device) + return (OIDNDevice)device.detach(); + } + + OIDN_API void oidnRetainDevice(OIDNDevice hDevice) + { + Device* device = (Device*)hDevice; + retainObject(device); + } + + OIDN_API void oidnReleaseDevice(OIDNDevice hDevice) + { + Device* device = (Device*)hDevice; + releaseObject(device); + } + + OIDN_API void oidnSetDevice1b(OIDNDevice hDevice, const char* name, bool value) + { + Device* device = (Device*)hDevice; + OIDN_TRY + checkHandle(hDevice); + OIDN_LOCK(device); + device->set1i(name, value); + OIDN_CATCH(device) + } + + OIDN_API void oidnSetDevice1i(OIDNDevice hDevice, const char* name, int value) + { + Device* device = (Device*)hDevice; + OIDN_TRY + checkHandle(hDevice); + OIDN_LOCK(device); + device->set1i(name, value); + OIDN_CATCH(device) + } + + OIDN_API bool oidnGetDevice1b(OIDNDevice hDevice, const char* name) + { + Device* device = (Device*)hDevice; + OIDN_TRY + checkHandle(hDevice); + OIDN_LOCK(device); + return device->get1i(name); + OIDN_CATCH(device) + return false; + } + + OIDN_API int oidnGetDevice1i(OIDNDevice hDevice, const char* name) + { + Device* device = (Device*)hDevice; + OIDN_TRY + checkHandle(hDevice); + OIDN_LOCK(device); + return device->get1i(name); + OIDN_CATCH(device) + return 0; + } + + OIDN_API void oidnSetDeviceErrorFunction(OIDNDevice hDevice, OIDNErrorFunction func, void* userPtr) + { + Device* device = (Device*)hDevice; + OIDN_TRY + checkHandle(hDevice); + OIDN_LOCK(device); + device->setErrorFunction((ErrorFunction)func, userPtr); + OIDN_CATCH(device) + } + + OIDN_API OIDNError oidnGetDeviceError(OIDNDevice hDevice, const char** outMessage) + { + Device* device = (Device*)hDevice; + OIDN_TRY + return (OIDNError)Device::getError(device, outMessage); + OIDN_CATCH(device) + if (outMessage) *outMessage = ""; + return OIDN_ERROR_UNKNOWN; + } + + OIDN_API void oidnCommitDevice(OIDNDevice hDevice) + { + Device* device = (Device*)hDevice; + OIDN_TRY + checkHandle(hDevice); + OIDN_LOCK(device); + device->commit(); + OIDN_CATCH(device) + } + + OIDN_API OIDNBuffer oidnNewBuffer(OIDNDevice hDevice, size_t byteSize) + { + Device* device = (Device*)hDevice; + OIDN_TRY + checkHandle(hDevice); + OIDN_LOCK(device); + Ref buffer = device->newBuffer(byteSize); + return (OIDNBuffer)buffer.detach(); + OIDN_CATCH(device) + return nullptr; + } + + OIDN_API OIDNBuffer oidnNewSharedBuffer(OIDNDevice hDevice, void* ptr, size_t byteSize) + { + Device* device = (Device*)hDevice; + OIDN_TRY + checkHandle(hDevice); + OIDN_LOCK(device); + Ref buffer = device->newBuffer(ptr, byteSize); + return (OIDNBuffer)buffer.detach(); + OIDN_CATCH(device) + return nullptr; + } + + OIDN_API void oidnRetainBuffer(OIDNBuffer hBuffer) + { + Buffer* buffer = (Buffer*)hBuffer; + retainObject(buffer); + } + + OIDN_API void oidnReleaseBuffer(OIDNBuffer hBuffer) + { + Buffer* buffer = (Buffer*)hBuffer; + releaseObject(buffer); + } + + OIDN_API void* oidnMapBuffer(OIDNBuffer hBuffer, OIDNAccess access, size_t byteOffset, size_t byteSize) + { + Buffer* buffer = (Buffer*)hBuffer; + OIDN_TRY + checkHandle(hBuffer); + OIDN_LOCK(buffer); + return buffer->map(byteOffset, byteSize); + OIDN_CATCH(buffer) + return nullptr; + } + + OIDN_API void oidnUnmapBuffer(OIDNBuffer hBuffer, void* mappedPtr) + { + Buffer* buffer = (Buffer*)hBuffer; + OIDN_TRY + checkHandle(hBuffer); + OIDN_LOCK(buffer); + return buffer->unmap(mappedPtr); + OIDN_CATCH(buffer) + } + + OIDN_API OIDNFilter oidnNewFilter(OIDNDevice hDevice, const char* type) + { + Device* device = (Device*)hDevice; + OIDN_TRY + checkHandle(hDevice); + OIDN_LOCK(device); + Ref filter = device->newFilter(type); + return (OIDNFilter)filter.detach(); + OIDN_CATCH(device) + return nullptr; + } + + OIDN_API void oidnRetainFilter(OIDNFilter hFilter) + { + Filter* filter = (Filter*)hFilter; + retainObject(filter); + } + + OIDN_API void oidnReleaseFilter(OIDNFilter hFilter) + { + Filter* filter = (Filter*)hFilter; + releaseObject(filter); + } + + OIDN_API void oidnSetFilterImage(OIDNFilter hFilter, const char* name, + OIDNBuffer hBuffer, OIDNFormat format, + size_t width, size_t height, + size_t byteOffset, + size_t bytePixelStride, size_t byteRowStride) + { + Filter* filter = (Filter*)hFilter; + OIDN_TRY + checkHandle(hFilter); + checkHandle(hBuffer); + OIDN_LOCK(filter); + Ref buffer = (Buffer*)hBuffer; + if (buffer->getDevice() != filter->getDevice()) + throw Exception(Error::InvalidArgument, "the specified objects are bound to different devices"); + Image data(buffer, (Format)format, (int)width, (int)height, byteOffset, bytePixelStride, byteRowStride); + filter->setImage(name, data); + OIDN_CATCH(filter) + } + + OIDN_API void oidnSetSharedFilterImage(OIDNFilter hFilter, const char* name, + void* ptr, OIDNFormat format, + size_t width, size_t height, + size_t byteOffset, + size_t bytePixelStride, size_t byteRowStride) + { + Filter* filter = (Filter*)hFilter; + OIDN_TRY + checkHandle(hFilter); + OIDN_LOCK(filter); + Image data(ptr, (Format)format, (int)width, (int)height, byteOffset, bytePixelStride, byteRowStride); + filter->setImage(name, data); + OIDN_CATCH(filter) + } + + OIDN_API void oidnSetFilter1b(OIDNFilter hFilter, const char* name, bool value) + { + Filter* filter = (Filter*)hFilter; + OIDN_TRY + checkHandle(hFilter); + OIDN_LOCK(filter); + filter->set1i(name, int(value)); + OIDN_CATCH(filter) + } + + OIDN_API void oidnSetFilter1i(OIDNFilter hFilter, const char* name, int value) + { + Filter* filter = (Filter*)hFilter; + OIDN_TRY + checkHandle(hFilter); + OIDN_LOCK(filter); + filter->set1i(name, value); + OIDN_CATCH(filter) + } + + OIDN_API bool oidnGetFilter1b(OIDNFilter hFilter, const char* name) + { + Filter* filter = (Filter*)hFilter; + OIDN_TRY + checkHandle(hFilter); + OIDN_LOCK(filter); + return filter->get1i(name); + OIDN_CATCH(filter) + return false; + } + + OIDN_API int oidnGetFilter1i(OIDNFilter hFilter, const char* name) + { + Filter* filter = (Filter*)hFilter; + OIDN_TRY + checkHandle(hFilter); + OIDN_LOCK(filter); + return filter->get1i(name); + OIDN_CATCH(filter) + return 0; + } + + OIDN_API void oidnSetFilterProgressMonitorFunction(OIDNFilter hFilter, OIDNProgressMonitorFunction func, void* userPtr) + { + Filter* filter = (Filter*)hFilter; + OIDN_TRY + checkHandle(hFilter); + OIDN_LOCK(filter); + filter->setProgressMonitorFunction(func, userPtr); + OIDN_CATCH(filter) + } + + OIDN_API void oidnCommitFilter(OIDNFilter hFilter) + { + Filter* filter = (Filter*)hFilter; + OIDN_TRY + checkHandle(hFilter); + OIDN_LOCK(filter); + filter->commit(); + OIDN_CATCH(filter) + } + + OIDN_API void oidnExecuteFilter(OIDNFilter hFilter) + { + Filter* filter = (Filter*)hFilter; + OIDN_TRY + checkHandle(hFilter); + OIDN_LOCK(filter); + filter->execute(); + OIDN_CATCH(filter) + } + +} // namespace oidn diff --git a/oidn/core/autoencoder.cpp b/oidn/core/autoencoder.cpp new file mode 100644 index 0000000..a53d009 --- /dev/null +++ b/oidn/core/autoencoder.cpp @@ -0,0 +1,484 @@ +// ======================================================================== // +// Copyright 2009-2019 Intel Corporation // +// // +// Licensed under the Apache License, Version 2.0 (the "License"); // +// you may not use this file except in compliance with the License. // +// You may obtain a copy of the License at // +// // +// http://www.apache.org/licenses/LICENSE-2.0 // +// // +// Unless required by applicable law or agreed to in writing, software // +// distributed under the License is distributed on an "AS IS" BASIS, // +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // +// See the License for the specific language governing permissions and // +// limitations under the License. // +// ======================================================================== // + +#include "autoencoder.h" + +namespace oidn { + + // -------------------------------------------------------------------------- + // AutoencoderFilter + // -------------------------------------------------------------------------- + + AutoencoderFilter::AutoencoderFilter(const Ref& device) + : Filter(device) + { + } + + void AutoencoderFilter::setImage(const std::string& name, const Image& data) + { + if (name == "color") + color = data; + else if (name == "albedo") + albedo = data; + else if (name == "normal") + normal = data; + else if (name == "output") + output = data; + + dirty = true; + } + + void AutoencoderFilter::set1i(const std::string& name, int value) + { + if (name == "hdr") + hdr = value; + else if (name == "srgb") + srgb = value; + else if (name == "maxMemoryMB") + maxMemoryMB = value; + + dirty = true; + } + + int AutoencoderFilter::get1i(const std::string& name) + { + if (name == "hdr") + return hdr; + else if (name == "srgb") + return srgb; + else if (name == "maxMemoryMB") + return maxMemoryMB; + else if (name == "alignment") + return alignment; + else if (name == "overlap") + return overlap; + else + throw Exception(Error::InvalidArgument, "invalid parameter"); + } + + void AutoencoderFilter::commit() + { + if (!dirty) + return; + + device->executeTask([&]() + { + if (mayiuse(avx512_common)) + net = buildNet<16>(); + else + net = buildNet<8>(); + }); + + dirty = false; + } + + void AutoencoderFilter::execute() + { + if (dirty) + throw Exception(Error::InvalidOperation, "changes to the filter are not committed"); + + if (!net) + return; + + device->executeTask([&]() + { + Progress progress; + progress.func = progressFunc; + progress.userPtr = progressUserPtr; + progress.taskCount = tileCountH * tileCountW; + + // Iterate over the tiles + int tileIndex = 0; + + for (int i = 0; i < tileCountH; ++i) + { + const int h = i * (tileH - 2*overlap); // input tile position (including overlap) + const int overlapBeginH = i > 0 ? overlap : 0; // overlap on the top + const int overlapEndH = i < tileCountH-1 ? overlap : 0; // overlap on the bottom + const int tileH1 = min(H - h, tileH); // input tile size (including overlap) + const int tileH2 = tileH1 - overlapBeginH - overlapEndH; // output tile size + const int alignOffsetH = tileH - roundUp(tileH1, alignment); // align to the bottom in the tile buffer + + for (int j = 0; j < tileCountW; ++j) + { + const int w = j * (tileW - 2*overlap); // input tile position (including overlap) + const int overlapBeginW = j > 0 ? overlap : 0; // overlap on the left + const int overlapEndW = j < tileCountW-1 ? overlap : 0; // overlap on the right + const int tileW1 = min(W - w, tileW); // input tile size (including overlap) + const int tileW2 = tileW1 - overlapBeginW - overlapEndW; // output tile size + const int alignOffsetW = tileW - roundUp(tileW1, alignment); // align to the right in the tile buffer + + // Set the input tile + inputReorder->setTile(h, w, + alignOffsetH, alignOffsetW, + tileH1, tileW1); + + // Set the output tile + outputReorder->setTile(alignOffsetH + overlapBeginH, alignOffsetW + overlapBeginW, + h + overlapBeginH, w + overlapBeginW, + tileH2, tileW2); + + //printf("Tile: %d %d -> %d %d\n", w+overlapBeginW, h+overlapBeginH, w+overlapBeginW+tileW2, h+overlapBeginH+tileH2); + + // Denoise the tile + net->execute(progress, tileIndex); + + // Next tile + tileIndex++; + } + } + }); + } + + void AutoencoderFilter::computeTileSize() + { + const int minTileSize = 3*overlap; + const int estimatedBytesPerPixel = mayiuse(avx512_common) ? estimatedBytesPerPixel16 : estimatedBytesPerPixel8; + const int64_t maxTilePixels = (int64_t(maxMemoryMB)*1024*1024 - estimatedBytesBase) / estimatedBytesPerPixel; + + tileCountH = 1; + tileCountW = 1; + tileH = roundUp(H, alignment); + tileW = roundUp(W, alignment); + + // Divide the image into tiles until the tile size gets below the threshold + while (int64_t(tileH) * tileW > maxTilePixels) + { + if (tileH > minTileSize && tileH > tileW) + { + tileCountH++; + tileH = max(roundUp(ceilDiv(H - 2*overlap, tileCountH), alignment) + 2*overlap, minTileSize); + } + else if (tileW > minTileSize) + { + tileCountW++; + tileW = max(roundUp(ceilDiv(W - 2*overlap, tileCountW), alignment) + 2*overlap, minTileSize); + } + else + break; + } + + // Compute the final number of tiles + tileCountH = (H > tileH) ? ceilDiv(H - 2*overlap, tileH - 2*overlap) : 1; + tileCountW = (W > tileW) ? ceilDiv(W - 2*overlap, tileW - 2*overlap) : 1; + + if (device->isVerbose(2)) + { + std::cout << "Tile size : " << tileW << "x" << tileH << std::endl; + std::cout << "Tile count: " << tileCountW << "x" << tileCountH << std::endl; + } + } + + template + std::shared_ptr AutoencoderFilter::buildNet() + { + H = color.height; + W = color.width; + + // Configure the network + int inputC; + void* weightPtr; + + if (srgb && hdr) + throw Exception(Error::InvalidOperation, "srgb and hdr modes cannot be enabled at the same time"); + + if (color && !albedo && !normal && weightData.hdr) + { + inputC = 3; + weightPtr = hdr ? weightData.hdr : weightData.ldr; + } + else if (color && albedo && !normal && weightData.hdr_alb) + { + inputC = 6; + weightPtr = hdr ? weightData.hdr_alb : weightData.ldr_alb; + } + else if (color && albedo && normal && weightData.hdr_alb_nrm) + { + inputC = 9; + weightPtr = hdr ? weightData.hdr_alb_nrm : weightData.ldr_alb_nrm; + } + else + { + throw Exception(Error::InvalidOperation, "unsupported combination of input features"); + } + + if (!output) + throw Exception(Error::InvalidOperation, "output image not specified"); + + if ((color.format != Format::Float3) + || (albedo && albedo.format != Format::Float3) + || (normal && normal.format != Format::Float3) + || (output.format != Format::Float3)) + throw Exception(Error::InvalidOperation, "unsupported image format"); + + if ((albedo && (albedo.width != W || albedo.height != H)) + || (normal && (normal.width != W || normal.height != H)) + || (output.width != W || output.height != H)) + throw Exception(Error::InvalidOperation, "image size mismatch"); + + // Compute the tile size + computeTileSize(); + + // If the image size is zero, there is nothing else to do + if (H <= 0 || W <= 0) + return nullptr; + + // Parse the weights + const auto weightMap = parseTensors(weightPtr); + + // Create the network + std::shared_ptr> net = std::make_shared>(device, weightMap); + + // Compute the tensor sizes + const auto inputDims = memory::dims({1, inputC, tileH, tileW}); + const auto inputReorderDims = net->getInputReorderDims(inputDims, alignment); //-> concat0 + + const auto conv1Dims = net->getConvDims("conv1", inputReorderDims); //-> temp0 + const auto conv1bDims = net->getConvDims("conv1b", conv1Dims); //-> temp1 + const auto pool1Dims = net->getPoolDims(conv1bDims); //-> concat1 + const auto conv2Dims = net->getConvDims("conv2", pool1Dims); //-> temp0 + const auto pool2Dims = net->getPoolDims(conv2Dims); //-> concat2 + const auto conv3Dims = net->getConvDims("conv3", pool2Dims); //-> temp0 + const auto pool3Dims = net->getPoolDims(conv3Dims); //-> concat3 + const auto conv4Dims = net->getConvDims("conv4", pool3Dims); //-> temp0 + const auto pool4Dims = net->getPoolDims(conv4Dims); //-> concat4 + const auto conv5Dims = net->getConvDims("conv5", pool4Dims); //-> temp0 + const auto pool5Dims = net->getPoolDims(conv5Dims); //-> temp1 + const auto upsample4Dims = net->getUpsampleDims(pool5Dims); //-> concat4 + const auto concat4Dims = net->getConcatDims(upsample4Dims, pool4Dims); + const auto conv6Dims = net->getConvDims("conv6", concat4Dims); //-> temp0 + const auto conv6bDims = net->getConvDims("conv6b", conv6Dims); //-> temp1 + const auto upsample3Dims = net->getUpsampleDims(conv6bDims); //-> concat3 + const auto concat3Dims = net->getConcatDims(upsample3Dims, pool3Dims); + const auto conv7Dims = net->getConvDims("conv7", concat3Dims); //-> temp0 + const auto conv7bDims = net->getConvDims("conv7b", conv7Dims); //-> temp1 + const auto upsample2Dims = net->getUpsampleDims(conv7bDims); //-> concat2 + const auto concat2Dims = net->getConcatDims(upsample2Dims, pool2Dims); + const auto conv8Dims = net->getConvDims("conv8", concat2Dims); //-> temp0 + const auto conv8bDims = net->getConvDims("conv8b", conv8Dims); //-> temp1 + const auto upsample1Dims = net->getUpsampleDims(conv8bDims); //-> concat1 + const auto concat1Dims = net->getConcatDims(upsample1Dims, pool1Dims); + const auto conv9Dims = net->getConvDims("conv9", concat1Dims); //-> temp0 + const auto conv9bDims = net->getConvDims("conv9b", conv9Dims); //-> temp1 + const auto upsample0Dims = net->getUpsampleDims(conv9bDims); //-> concat0 + const auto concat0Dims = net->getConcatDims(upsample0Dims, inputReorderDims); + const auto conv10Dims = net->getConvDims("conv10", concat0Dims); //-> temp0 + const auto conv10bDims = net->getConvDims("conv10b", conv10Dims); //-> temp1 + const auto conv11Dims = net->getConvDims("conv11", conv10bDims); //-> temp0 + + const auto outputDims = memory::dims({1, 3, tileH, tileW}); + + // Allocate two temporary ping-pong buffers to decrease memory usage + const auto temp0Dims = getMaxTensorDims({ + conv1Dims, + conv2Dims, + conv3Dims, + conv4Dims, + conv5Dims, + conv6Dims, + conv7Dims, + conv8Dims, + conv9Dims, + conv10Dims, + conv11Dims + }); + + const auto temp1Dims = getMaxTensorDims({ + conv1bDims, + pool5Dims, + conv6bDims, + conv7bDims, + conv8bDims, + conv9bDims, + conv10bDims, + }); + + auto temp0 = net->allocTensor(temp0Dims); + auto temp1 = net->allocTensor(temp1Dims); + + // Allocate enough memory to hold the concat outputs. Then use the first + // half to hold the previous conv output and the second half to hold the + // pool/orig image output. This works because everything is C dimension + // outermost, padded to K floats, and all the concats are on the C dimension. + auto concat0Dst = net->allocTensor(concat0Dims); + auto concat1Dst = net->allocTensor(concat1Dims); + auto concat2Dst = net->allocTensor(concat2Dims); + auto concat3Dst = net->allocTensor(concat3Dims); + auto concat4Dst = net->allocTensor(concat4Dims); + + // Input reorder + auto inputReorderDst = net->castTensor(inputReorderDims, concat0Dst, upsample0Dims); + if (srgb) + { + transferFunc = std::make_shared(); + inputReorder = net->addInputReorder(color, albedo, normal, + std::static_pointer_cast(transferFunc), + alignment, inputReorderDst); + } + else if (hdr) + { + transferFunc = std::make_shared(); + + net->addAutoexposure(color, + std::static_pointer_cast(transferFunc)); + + inputReorder = net->addInputReorder(color, albedo, normal, + std::static_pointer_cast(transferFunc), + alignment, inputReorderDst); + } + else + { + transferFunc = std::make_shared(); + inputReorder = net->addInputReorder(color, albedo, normal, + std::static_pointer_cast(transferFunc), + alignment, inputReorderDst); + } + + // conv1 + auto conv1 = net->addConv("conv1", inputReorder->getDst(), temp0); + + // conv1b + auto conv1b = net->addConv("conv1b", conv1->getDst(), temp1); + + // pool1 + // Adjust pointer for pool1 to eliminate concat1 + auto pool1Dst = net->castTensor(pool1Dims, concat1Dst, upsample1Dims); + auto pool1 = net->addPool(conv1b->getDst(), pool1Dst); + + // conv2 + auto conv2 = net->addConv("conv2", pool1->getDst(), temp0); + + // pool2 + // Adjust pointer for pool2 to eliminate concat2 + auto pool2Dst = net->castTensor(pool2Dims, concat2Dst, upsample2Dims); + auto pool2 = net->addPool(conv2->getDst(), pool2Dst); + + // conv3 + auto conv3 = net->addConv("conv3", pool2->getDst(), temp0); + + // pool3 + // Adjust pointer for pool3 to eliminate concat3 + auto pool3Dst = net->castTensor(pool3Dims, concat3Dst, upsample3Dims); + auto pool3 = net->addPool(conv3->getDst(), pool3Dst); + + // conv4 + auto conv4 = net->addConv("conv4", pool3->getDst(), temp0); + + // pool4 + // Adjust pointer for pool4 to eliminate concat4 + auto pool4Dst = net->castTensor(pool4Dims, concat4Dst, upsample4Dims); + auto pool4 = net->addPool(conv4->getDst(), pool4Dst); + + // conv5 + auto conv5 = net->addConv("conv5", pool4->getDst(), temp0); + + // pool5 + auto pool5 = net->addPool(conv5->getDst(), temp1); + + // upsample4 + auto upsample4Dst = net->castTensor(upsample4Dims, concat4Dst); + auto upsample4 = net->addUpsample(pool5->getDst(), upsample4Dst); + + // conv6 + auto conv6 = net->addConv("conv6", concat4Dst, temp0); + + // conv6b + auto conv6b = net->addConv("conv6b", conv6->getDst(), temp1); + + // upsample3 + auto upsample3Dst = net->castTensor(upsample3Dims, concat3Dst); + auto upsample3 = net->addUpsample(conv6b->getDst(), upsample3Dst); + + // conv7 + auto conv7 = net->addConv("conv7", concat3Dst, temp0); + + // conv7b + auto conv7b = net->addConv("conv7b", conv7->getDst(), temp1); + + // upsample2 + auto upsample2Dst = net->castTensor(upsample2Dims, concat2Dst); + auto upsample2 = net->addUpsample(conv7b->getDst(), upsample2Dst); + + // conv8 + auto conv8 = net->addConv("conv8", concat2Dst, temp0); + + // conv8b + auto conv8b = net->addConv("conv8b", conv8->getDst(), temp1); + + // upsample1 + auto upsample1Dst = net->castTensor(upsample1Dims, concat1Dst); + auto upsample1 = net->addUpsample(conv8b->getDst(), upsample1Dst); + + // conv9 + auto conv9 = net->addConv("conv9", concat1Dst, temp0); + + // conv9b + auto conv9b = net->addConv("conv9b", conv9->getDst(), temp1); + + // upsample0 + auto upsample0Dst = net->castTensor(upsample0Dims, concat0Dst); + auto upsample0 = net->addUpsample(conv9b->getDst(), upsample0Dst); + + // conv10 + auto conv10 = net->addConv("conv10", concat0Dst, temp0); + + // conv10b + auto conv10b = net->addConv("conv10b", conv10->getDst(), temp1); + + // conv11 + auto conv11 = net->addConv("conv11", conv10b->getDst(), temp0, false /* no relu */); + + // Output reorder + if (srgb) + outputReorder = net->addOutputReorder(conv11->getDst(), std::static_pointer_cast(transferFunc), output); + else if (hdr) + outputReorder = net->addOutputReorder(conv11->getDst(), std::static_pointer_cast(transferFunc), output); + else + outputReorder = net->addOutputReorder(conv11->getDst(), std::static_pointer_cast(transferFunc), output); + + net->finalize(); + return net; + } + + // -------------------------------------------------------------------------- + // RTFilter + // -------------------------------------------------------------------------- + + namespace weights + { + // LDR + extern unsigned char rt_ldr[]; // color + extern unsigned char rt_ldr_alb[]; // color, albedo + extern unsigned char rt_ldr_alb_nrm[]; // color, albedo, normal + + // HDR + extern unsigned char rt_hdr[]; // color + extern unsigned char rt_hdr_alb[]; // color, albedo + extern unsigned char rt_hdr_alb_nrm[]; // color, albedo, normal + } + + RTFilter::RTFilter(const Ref& device) + : AutoencoderFilter(device) + { + weightData.ldr = weights::rt_ldr; + weightData.ldr_alb = weights::rt_ldr_alb; + weightData.ldr_alb_nrm = weights::rt_ldr_alb_nrm; + weightData.hdr = weights::rt_hdr; + weightData.hdr_alb = weights::rt_hdr_alb; + weightData.hdr_alb_nrm = weights::rt_hdr_alb_nrm; + } + +} // namespace oidn diff --git a/oidn/core/autoencoder.h b/oidn/core/autoencoder.h new file mode 100644 index 0000000..ae5f585 --- /dev/null +++ b/oidn/core/autoencoder.h @@ -0,0 +1,100 @@ +// ======================================================================== // +// Copyright 2009-2019 Intel Corporation // +// // +// Licensed under the Apache License, Version 2.0 (the "License"); // +// you may not use this file except in compliance with the License. // +// You may obtain a copy of the License at // +// // +// http://www.apache.org/licenses/LICENSE-2.0 // +// // +// Unless required by applicable law or agreed to in writing, software // +// distributed under the License is distributed on an "AS IS" BASIS, // +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // +// See the License for the specific language governing permissions and // +// limitations under the License. // +// ======================================================================== // + +#pragma once + +#include "filter.h" +#include "network.h" +#include "transfer_function.h" + +namespace oidn { + + // -------------------------------------------------------------------------- + // AutoencoderFilter - Direct-predicting autoencoder + // -------------------------------------------------------------------------- + + class AutoencoderFilter : public Filter + { + private: + static constexpr int alignment = 32; // required spatial alignment in pixels (padding may be necessary) + static constexpr int receptiveField = 222; // receptive field in pixels + static constexpr int overlap = roundUp(receptiveField / 2, alignment); // required spatial overlap between tiles in pixels + + static constexpr int estimatedBytesBase = 16*1024*1024; // estimated base memory usage + static constexpr int estimatedBytesPerPixel8 = 889; // estimated memory usage per pixel for K=8 + static constexpr int estimatedBytesPerPixel16 = 2185; // estimated memory usage per pixel for K=16 + + Image color; + Image albedo; + Image normal; + Image output; + bool hdr = false; + bool srgb = false; + int maxMemoryMB = 6000; // approximate maximum memory usage in MBs + + int H = 0; // image height + int W = 0; // image width + int tileH = 0; // tile height + int tileW = 0; // tile width + int tileCountH = 1; // number of tiles in H dimension + int tileCountW = 1; // number of tiles in W dimension + + std::shared_ptr net; + std::shared_ptr inputReorder; + std::shared_ptr outputReorder; + std::shared_ptr transferFunc; + + protected: + struct + { + void* ldr = nullptr; + void* ldr_alb = nullptr; + void* ldr_alb_nrm = nullptr; + void* hdr = nullptr; + void* hdr_alb = nullptr; + void* hdr_alb_nrm = nullptr; + } weightData; + + explicit AutoencoderFilter(const Ref& device); + + public: + void setImage(const std::string& name, const Image& data) override; + void set1i(const std::string& name, int value) override; + int get1i(const std::string& name) override; + + void commit() override; + void execute() override; + + private: + void computeTileSize(); + + template + std::shared_ptr buildNet(); + + bool isCommitted() const { return bool(net); } + }; + + // -------------------------------------------------------------------------- + // RTFilter - Generic ray tracing denoiser + // -------------------------------------------------------------------------- + + class RTFilter : public AutoencoderFilter + { + public: + explicit RTFilter(const Ref& device); + }; + +} // namespace oidn diff --git a/oidn/core/buffer.h b/oidn/core/buffer.h new file mode 100644 index 0000000..b951091 --- /dev/null +++ b/oidn/core/buffer.h @@ -0,0 +1,75 @@ +// ======================================================================== // +// Copyright 2009-2019 Intel Corporation // +// // +// Licensed under the Apache License, Version 2.0 (the "License"); // +// you may not use this file except in compliance with the License. // +// You may obtain a copy of the License at // +// // +// http://www.apache.org/licenses/LICENSE-2.0 // +// // +// Unless required by applicable law or agreed to in writing, software // +// distributed under the License is distributed on an "AS IS" BASIS, // +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // +// See the License for the specific language governing permissions and // +// limitations under the License. // +// ======================================================================== // + +#pragma once + +#include "common.h" +#include "device.h" + +namespace oidn { + + class Device; + + // Buffer which may or may not own its data + class Buffer : public RefCount + { + private: + char* ptr; + size_t byteSize; + bool shared; + Ref device; + + public: + __forceinline Buffer(const Ref& device, size_t size) + : ptr((char*)alignedMalloc(size, 64)), + byteSize(size), + shared(false), + device(device) {} + + __forceinline Buffer(const Ref& device, void* data, size_t size) + : ptr((char*)data), + byteSize(size), + shared(true), + device(device) + { + if (data == nullptr) + throw Exception(Error::InvalidArgument, "buffer pointer null"); + } + + __forceinline ~Buffer() + { + if (!shared) + alignedFree(ptr); + } + + __forceinline char* data() { return ptr; } + __forceinline const char* data() const { return ptr; } + __forceinline size_t size() const { return byteSize; } + + void* map(size_t offset, size_t size) + { + if (offset + size > byteSize) + throw Exception(Error::InvalidArgument, "buffer region out of range"); + + return ptr + offset; + } + + void unmap(void* mappedPtr) {} + + Device* getDevice() { return device.get(); } + }; + +} // namespace oidn diff --git a/oidn/core/common.h b/oidn/core/common.h new file mode 100644 index 0000000..a3a7e8a --- /dev/null +++ b/oidn/core/common.h @@ -0,0 +1,134 @@ +// ======================================================================== // +// Copyright 2009-2019 Intel Corporation // +// // +// Licensed under the Apache License, Version 2.0 (the "License"); // +// you may not use this file except in compliance with the License. // +// You may obtain a copy of the License at // +// // +// http://www.apache.org/licenses/LICENSE-2.0 // +// // +// Unless required by applicable law or agreed to in writing, software // +// distributed under the License is distributed on an "AS IS" BASIS, // +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // +// See the License for the specific language governing permissions and // +// limitations under the License. // +// ======================================================================== // + +#pragma once + +#include "common/platform.h" + +#include "mkl-dnn/include/mkldnn.hpp" +#include "mkl-dnn/include/mkldnn_debug.h" +#include "mkl-dnn/src/common/mkldnn_thread.hpp" +#include "mkl-dnn/src/common/type_helpers.hpp" +#include "mkl-dnn/src/cpu/jit_generator.hpp" + +#include "common/ref.h" +#include "common/exception.h" +#include "common/thread.h" +#include "common/tasking.h" +#include "math.h" + +namespace oidn { + + using namespace mkldnn; + using namespace mkldnn::impl::cpu; + using mkldnn::impl::parallel_nd; + using mkldnn::impl::memory_desc_matches_tag; + + + inline size_t getFormatBytes(Format format) + { + switch (format) + { + case Format::Undefined: return 1; + case Format::Float: return sizeof(float); + case Format::Float2: return sizeof(float)*2; + case Format::Float3: return sizeof(float)*3; + case Format::Float4: return sizeof(float)*4; + } + assert(0); + return 0; + } + + + inline memory::dims getTensorDims(const std::shared_ptr& mem) + { + const mkldnn_memory_desc_t& desc = mem->get_desc().data; + return memory::dims(&desc.dims[0], &desc.dims[desc.ndims]); + } + + inline memory::data_type getTensorType(const std::shared_ptr& mem) + { + const mkldnn_memory_desc_t& desc = mem->get_desc().data; + return memory::data_type(desc.data_type); + } + + // Returns the number of values in a tensor + inline size_t getTensorSize(const memory::dims& dims) + { + size_t res = 1; + for (int i = 0; i < (int)dims.size(); ++i) + res *= dims[i]; + return res; + } + + inline memory::dims getMaxTensorDims(const std::vector& dims) + { + memory::dims result; + size_t maxSize = 0; + + for (const auto& d : dims) + { + const size_t size = getTensorSize(d); + if (size > maxSize) + { + result = d; + maxSize = size; + } + } + + return result; + } + + inline size_t getTensorSize(const std::shared_ptr& mem) + { + return getTensorSize(getTensorDims(mem)); + } + + + template + inline int getPadded(int dim) + { + return (dim + (K-1)) & ~(K-1); + } + + template + inline memory::dims getPadded_nchw(const memory::dims& dims) + { + assert(dims.size() == 4); + memory::dims padDims = dims; + padDims[1] = getPadded(dims[1]); // pad C + return padDims; + } + + + template + struct BlockedFormat; + + template<> + struct BlockedFormat<8> + { + static constexpr memory::format_tag nChwKc = memory::format_tag::nChw8c; + static constexpr memory::format_tag OIhwKiKo = memory::format_tag::OIhw8i8o; + }; + + template<> + struct BlockedFormat<16> + { + static constexpr memory::format_tag nChwKc = memory::format_tag::nChw16c; + static constexpr memory::format_tag OIhwKiKo = memory::format_tag::OIhw16i16o; + }; + +} // namespace oidn diff --git a/oidn/core/device.cpp b/oidn/core/device.cpp new file mode 100644 index 0000000..5e5a849 --- /dev/null +++ b/oidn/core/device.cpp @@ -0,0 +1,216 @@ +// ======================================================================== // +// Copyright 2009-2019 Intel Corporation // +// // +// Licensed under the Apache License, Version 2.0 (the "License"); // +// you may not use this file except in compliance with the License. // +// You may obtain a copy of the License at // +// // +// http://www.apache.org/licenses/LICENSE-2.0 // +// // +// Unless required by applicable law or agreed to in writing, software // +// distributed under the License is distributed on an "AS IS" BASIS, // +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // +// See the License for the specific language governing permissions and // +// limitations under the License. // +// ======================================================================== // + +#include "device.h" +#include "autoencoder.h" + +namespace oidn { + + thread_local Device::ErrorState Device::globalError; + + Device::Device() + { + if (!mayiuse(sse41)) + throw Exception(Error::UnsupportedHardware, "SSE4.1 support is required at minimum"); + } + + Device::~Device() + { + observer.reset(); + } + + void Device::setError(Device* device, Error code, const std::string& message) + { + // Update the stored error only if the previous error was queried + if (device) + { + ErrorState& curError = device->error.get(); + + if (curError.code == Error::None) + { + curError.code = code; + curError.message = message; + } + + // Print the error message in verbose mode + if (device->isVerbose()) + std::cerr << "Error: " << message << std::endl; + + // Call the error callback function + ErrorFunction errorFunc; + void* errorUserPtr; + + { + std::lock_guard lock(device->mutex); + errorFunc = device->errorFunc; + errorUserPtr = device->errorUserPtr; + } + + if (errorFunc) + errorFunc(errorUserPtr, code, (code == Error::None) ? nullptr : message.c_str()); + } + else + { + if (globalError.code == Error::None) + { + globalError.code = code; + globalError.message = message; + } + } + } + + Error Device::getError(Device* device, const char** outMessage) + { + // Return and clear the stored error code, but keep the error message so pointers to it will + // remain valid until the next getError call + if (device) + { + ErrorState& curError = device->error.get(); + const Error code = curError.code; + if (outMessage) + *outMessage = (code == Error::None) ? nullptr : curError.message.c_str(); + curError.code = Error::None; + return code; + } + else + { + const Error code = globalError.code; + if (outMessage) + *outMessage = (code == Error::None) ? nullptr : globalError.message.c_str(); + globalError.code = Error::None; + return code; + } + } + + void Device::setErrorFunction(ErrorFunction func, void* userPtr) + { + errorFunc = func; + errorUserPtr = userPtr; + } + + int Device::get1i(const std::string& name) + { + if (name == "numThreads") + return numThreads; + else if (name == "setAffinity") + return setAffinity; + else if (name == "verbose") + return verbose; + else if (name == "version") + return OIDN_VERSION; + else if (name == "versionMajor") + return OIDN_VERSION_MAJOR; + else if (name == "versionMinor") + return OIDN_VERSION_MINOR; + else if (name == "versionPatch") + return OIDN_VERSION_PATCH; + else + throw Exception(Error::InvalidArgument, "invalid parameter"); + } + + void Device::set1i(const std::string& name, int value) + { + if (name == "numThreads") + numThreads = value; + else if (name == "setAffinity") + setAffinity = value; + else if (name == "verbose") + { + verbose = value; + error.verbose = value; + } + + dirty = true; + } + + void Device::commit() + { + if (isCommitted()) + throw Exception(Error::InvalidOperation, "device can be committed only once"); + + // Get the optimal thread affinities + if (setAffinity) + { + affinity = std::make_shared(1, verbose); // one thread per core + if (affinity->getNumThreads() == 0) + affinity.reset(); + } + + // Create the task arena + const int maxNumThreads = affinity ? affinity->getNumThreads() : tbb::this_task_arena::max_concurrency(); + numThreads = (numThreads > 0) ? min(numThreads, maxNumThreads) : maxNumThreads; + arena = std::make_shared(numThreads); + + // Automatically set the thread affinities + if (affinity) + observer = std::make_shared(affinity, *arena); + + dirty = false; + + if (isVerbose()) + print(); + } + + void Device::checkCommitted() + { + if (dirty) + throw Exception(Error::InvalidOperation, "changes to the device are not committed"); + } + + Ref Device::newBuffer(size_t byteSize) + { + checkCommitted(); + return makeRef(Ref(this), byteSize); + } + + Ref Device::newBuffer(void* ptr, size_t byteSize) + { + checkCommitted(); + return makeRef(Ref(this), ptr, byteSize); + } + + Ref Device::newFilter(const std::string& type) + { + checkCommitted(); + + Ref filter; + + if (type == "RT") + filter = makeRef(Ref(this)); + else + throw Exception(Error::InvalidArgument, "unknown filter type"); + + return filter; + } + + void Device::print() + { + std::cout << std::endl; + + std::cout << "Open Image Denoise " << OIDN_VERSION_STRING << std::endl; + std::cout << " Compiler: " << getCompilerName() << std::endl; + std::cout << " Build : " << getBuildName() << std::endl; + std::cout << " Platform: " << getPlatformName() << std::endl; + + std::cout << " Tasking :"; + std::cout << " TBB" << TBB_VERSION_MAJOR << "." << TBB_VERSION_MINOR; + std::cout << " TBB_header_interface_" << TBB_INTERFACE_VERSION << " TBB_lib_interface_" << tbb::TBB_runtime_interface_version(); + std::cout << std::endl; + + std::cout << std::endl; + } + +} // namespace oidn diff --git a/oidn/core/device.h b/oidn/core/device.h new file mode 100644 index 0000000..c2df714 --- /dev/null +++ b/oidn/core/device.h @@ -0,0 +1,95 @@ +// ======================================================================== // +// Copyright 2009-2019 Intel Corporation // +// // +// Licensed under the Apache License, Version 2.0 (the "License"); // +// you may not use this file except in compliance with the License. // +// You may obtain a copy of the License at // +// // +// http://www.apache.org/licenses/LICENSE-2.0 // +// // +// Unless required by applicable law or agreed to in writing, software // +// distributed under the License is distributed on an "AS IS" BASIS, // +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // +// See the License for the specific language governing permissions and // +// limitations under the License. // +// ======================================================================== // + +#pragma once + +#include "common.h" + +namespace oidn { + + class Buffer; + class Filter; + + class Device : public RefCount, public Verbose + { + private: + // Thread-safety + std::mutex mutex; + + // Error handling + struct ErrorState + { + Error code = Error::None; + std::string message; + }; + + static thread_local ErrorState globalError; + ThreadLocal error; + ErrorFunction errorFunc = nullptr; + void* errorUserPtr = nullptr; + + // Tasking + std::shared_ptr arena; + std::shared_ptr observer; + std::shared_ptr affinity; + + // Parameters + int numThreads = 0; // autodetect by default + bool setAffinity = true; + + bool dirty = true; + + public: + Device(); + ~Device(); + + static void setError(Device* device, Error code, const std::string& message); + static Error getError(Device* device, const char** outMessage); + + void setErrorFunction(ErrorFunction func, void* userPtr); + + int get1i(const std::string& name); + void set1i(const std::string& name, int value); + + void commit(); + + template + void executeTask(F& f) + { + arena->execute(f); + } + + template + void executeTask(const F& f) + { + arena->execute(f); + } + + Ref newBuffer(size_t byteSize); + Ref newBuffer(void* ptr, size_t byteSize); + Ref newFilter(const std::string& type); + + __forceinline Device* getDevice() { return this; } + __forceinline std::mutex& getMutex() { return mutex; } + + private: + bool isCommitted() const { return bool(arena); } + void checkCommitted(); + + void print(); + }; + +} // namespace oidn diff --git a/oidn/core/filter.cpp b/oidn/core/filter.cpp new file mode 100644 index 0000000..ec1f10a --- /dev/null +++ b/oidn/core/filter.cpp @@ -0,0 +1,27 @@ +// ======================================================================== // +// Copyright 2009-2019 Intel Corporation // +// // +// Licensed under the Apache License, Version 2.0 (the "License"); // +// you may not use this file except in compliance with the License. // +// You may obtain a copy of the License at // +// // +// http://www.apache.org/licenses/LICENSE-2.0 // +// // +// Unless required by applicable law or agreed to in writing, software // +// distributed under the License is distributed on an "AS IS" BASIS, // +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // +// See the License for the specific language governing permissions and // +// limitations under the License. // +// ======================================================================== // + +#include "filter.h" + +namespace oidn { + + void Filter::setProgressMonitorFunction(ProgressMonitorFunction func, void* userPtr) + { + progressFunc = func; + progressUserPtr = userPtr; + } + +} // namespace oidn diff --git a/oidn/core/filter.h b/oidn/core/filter.h new file mode 100644 index 0000000..a4ea1e2 --- /dev/null +++ b/oidn/core/filter.h @@ -0,0 +1,50 @@ +// ======================================================================== // +// Copyright 2009-2019 Intel Corporation // +// // +// Licensed under the Apache License, Version 2.0 (the "License"); // +// you may not use this file except in compliance with the License. // +// You may obtain a copy of the License at // +// // +// http://www.apache.org/licenses/LICENSE-2.0 // +// // +// Unless required by applicable law or agreed to in writing, software // +// distributed under the License is distributed on an "AS IS" BASIS, // +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // +// See the License for the specific language governing permissions and // +// limitations under the License. // +// ======================================================================== // + +#pragma once + +#include "common.h" +#include "device.h" +#include "image.h" + +namespace oidn { + + class Filter : public RefCount + { + protected: + Ref device; + + ProgressMonitorFunction progressFunc = nullptr; + void* progressUserPtr = nullptr; + + bool dirty = true; + + public: + explicit Filter(const Ref& device) : device(device) {} + + virtual void setImage(const std::string& name, const Image& data) = 0; + virtual void set1i(const std::string& name, int value) = 0; + virtual int get1i(const std::string& name) = 0; + + void setProgressMonitorFunction(ProgressMonitorFunction func, void* userPtr); + + virtual void commit() = 0; + virtual void execute() = 0; + + Device* getDevice() { return device.get(); } + }; + +} // namespace oidn diff --git a/oidn/core/image.h b/oidn/core/image.h new file mode 100644 index 0000000..748f49c --- /dev/null +++ b/oidn/core/image.h @@ -0,0 +1,111 @@ +// ======================================================================== // +// Copyright 2009-2019 Intel Corporation // +// // +// Licensed under the Apache License, Version 2.0 (the "License"); // +// you may not use this file except in compliance with the License. // +// You may obtain a copy of the License at // +// // +// http://www.apache.org/licenses/LICENSE-2.0 // +// // +// Unless required by applicable law or agreed to in writing, software // +// distributed under the License is distributed on an "AS IS" BASIS, // +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // +// See the License for the specific language governing permissions and // +// limitations under the License. // +// ======================================================================== // + +#pragma once + +#include "common.h" +#include "buffer.h" + +namespace oidn { + + struct Image + { + static constexpr int maxSize = 65536; + + char* ptr; // pointer to the first pixel + int width; // width in number of pixels + int height; // height in number of pixels + size_t bytePixelStride; // pixel stride in number of *bytes* + size_t rowStride; // row stride in number of *pixel strides* + Format format; // pixel format + Ref buffer; // buffer containing the image data + + Image() : ptr(nullptr), width(0), height(0), bytePixelStride(0), rowStride(0), format(Format::Undefined) {} + + Image(void* ptr, Format format, int width, int height, size_t byteOffset, size_t inBytePixelStride, size_t inByteRowStride) + { + if (ptr == nullptr) + throw Exception(Error::InvalidArgument, "buffer pointer null"); + + init((char*)ptr + byteOffset, format, width, height, inBytePixelStride, inByteRowStride); + } + + Image(const Ref& buffer, Format format, int width, int height, size_t byteOffset, size_t inBytePixelStride, size_t inByteRowStride) + { + init(buffer->data() + byteOffset, format, width, height, inBytePixelStride, inByteRowStride); + + if (byteOffset + height * rowStride * bytePixelStride > buffer->size()) + throw Exception(Error::InvalidArgument, "buffer region out of range"); + } + + void init(char* ptr, Format format, int width, int height, size_t inBytePixelStride, size_t inByteRowStride) + { + assert(width >= 0); + assert(height >= 0); + if (width > maxSize || height > maxSize) + throw Exception(Error::InvalidArgument, "image size too large"); + + this->ptr = ptr; + this->width = width; + this->height = height; + + const size_t pixelSize = getFormatBytes(format); + if (inBytePixelStride != 0) + { + if (inBytePixelStride < pixelSize) + throw Exception(Error::InvalidArgument, "pixel stride smaller than pixel size"); + + this->bytePixelStride = inBytePixelStride; + } + else + { + this->bytePixelStride = pixelSize; + } + + if (inByteRowStride != 0) + { + if (inByteRowStride < width * this->bytePixelStride) + throw Exception(Error::InvalidArgument, "row stride smaller than width * pixel stride"); + if (inByteRowStride % this->bytePixelStride != 0) + throw Exception(Error::InvalidArgument, "row stride not integer multiple of pixel stride"); + + this->rowStride = inByteRowStride / this->bytePixelStride; + } + else + { + this->rowStride = width; + } + + this->format = format; + } + + __forceinline char* get(int y, int x) + { + return ptr + ((size_t(y) * rowStride + size_t(x)) * bytePixelStride); + } + + __forceinline const char* get(int y, int x) const + { + return ptr + ((size_t(y) * rowStride + size_t(x)) * bytePixelStride); + } + + operator bool() const + { + return ptr != nullptr; + } + }; + +} // namespace oidn diff --git a/oidn/core/input_reorder.h b/oidn/core/input_reorder.h new file mode 100644 index 0000000..966856a --- /dev/null +++ b/oidn/core/input_reorder.h @@ -0,0 +1,232 @@ +// ======================================================================== // +// Copyright 2009-2019 Intel Corporation // +// // +// Licensed under the Apache License, Version 2.0 (the "License"); // +// you may not use this file except in compliance with the License. // +// You may obtain a copy of the License at // +// // +// http://www.apache.org/licenses/LICENSE-2.0 // +// // +// Unless required by applicable law or agreed to in writing, software // +// distributed under the License is distributed on an "AS IS" BASIS, // +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // +// See the License for the specific language governing permissions and // +// limitations under the License. // +// ======================================================================== // + +#pragma once + +#include "node.h" +#include "image.h" + +namespace oidn { + + // Input reorder node + template + class InputReorderNode : public Node + { + private: + // Source + Image color; + Image albedo; + Image normal; + + // Destination + std::shared_ptr dst; + float* dstPtr; + int C2; + int H2; + int W2; + + // Tile + int h1Begin; + int w1Begin; + int h2Begin; + int w2Begin; + int H; + int W; + + std::shared_ptr transferFunc; + + public: + InputReorderNode(const Image& color, + const Image& albedo, + const Image& normal, + const std::shared_ptr& dst, + const std::shared_ptr& transferFunc) + : color(color), albedo(albedo), normal(normal), + dst(dst), + h1Begin(0), w1Begin(0), + H(color.height), W(color.width), + transferFunc(transferFunc) + { + const mkldnn_memory_desc_t& dstDesc = dst->get_desc().data; + assert(memory_desc_matches_tag(dstDesc, mkldnn_format_tag_t(BlockedFormat::nChwKc))); + assert(dstDesc.ndims == 4); + assert(dstDesc.data_type == memory::data_type::f32); + assert(dstDesc.dims[0] == 1); + //assert(dstDesc.dims[1] >= getPadded(C1)); + + dstPtr = (float*)dst->get_data_handle(); + C2 = dstDesc.dims[1]; + H2 = dstDesc.dims[2]; + W2 = dstDesc.dims[3]; + } + + void setTile(int h1, int w1, int h2, int w2, int H, int W) override + { + h1Begin = h1; + w1Begin = w1; + h2Begin = h2; + w2Begin = w2; + this->H = H; + this->W = W; + } + + void execute(stream& sm) override + { + assert(H + h1Begin <= color.height); + assert(W + w1Begin <= color.width); + assert(H + h2Begin <= H2); + assert(W + w2Begin <= W2); + + parallel_nd(H2, [&](int h2) + { + const int h = h2 - h2Begin; + + if (h >= 0 && h < H) + { + const int h1 = h + h1Begin; + + // Zero pad + for (int w2 = 0; w2 < w2Begin; ++w2) + { + int c = 0; + while (c < C2) + store(h2, w2, c, 0.f); + } + + // Reorder + for (int w = 0; w < W; ++w) + { + const int w1 = w + w1Begin; + const int w2 = w + w2Begin; + + int c = 0; + storeColor(h2, w2, c, (float*)color.get(h1, w1)); + if (albedo) + storeAlbedo(h2, w2, c, (float*)albedo.get(h1, w1)); + if (normal) + storeNormal(h2, w2, c, (float*)normal.get(h1, w1)); + while (c < C2) + store(h2, w2, c, 0.f); + } + + // Zero pad + for (int w2 = W + w2Begin; w2 < W2; ++w2) + { + int c = 0; + while (c < C2) + store(h2, w2, c, 0.f); + } + } + else + { + // Zero pad + for (int w2 = 0; w2 < W2; ++w2) + { + int c = 0; + while (c < C2) + store(h2, w2, c, 0.f); + } + } + }); + } + + std::shared_ptr getDst() const override { return dst; } + + private: + // Stores a single value + __forceinline void store(int h, int w, int& c, float value) + { + // Destination is in nChwKc format + float* dst_c = dstPtr + (H2*W2*K*(c/K)) + h*W2*K + w*K + (c%K); + *dst_c = value; + c++; + } + + // Stores a color + __forceinline void storeColor(int h, int w, int& c, const float* values) + { + #pragma unroll + for (int i = 0; i < 3; ++i) + { + // Load the value + float x = values[i]; + + // Sanitize the value + x = maxSafe(x, 0.f); + + // Apply the transfer function + x = transferFunc->forward(x); + + // Store the value + store(h, w, c, x); + } + } + + // Stores an albedo + __forceinline void storeAlbedo(int h, int w, int& c, const float* values) + { + #pragma unroll + for (int i = 0; i < 3; ++i) + { + // Load the value + float x = values[i]; + + // Sanitize the value + x = clampSafe(x, 0.f, 1.f); + + // Store the value + store(h, w, c, x); + } + } + + // Stores a normal + __forceinline void storeNormal(int h, int w, int& c, const float* values) + { + // Load the normal + float x = values[0]; + float y = values[1]; + float z = values[2]; + + // Compute the length of the normal + const float lengthSqr = sqr(x) + sqr(y) + sqr(z); + + // Normalize the normal and transform it to [0..1] + if (isfinite(lengthSqr)) + { + const float invLength = (lengthSqr > minVectorLengthSqr) ? rsqrt(lengthSqr) : 1.f; + + const float scale = invLength * 0.5f; + const float offset = 0.5f; + + x = x * scale + offset; + y = y * scale + offset; + z = z * scale + offset; + } + else + { + x = 0.f; + y = 0.f; + z = 0.f; + } + + // Store the normal + store(h, w, c, x); + store(h, w, c, y); + store(h, w, c, z); + } + }; + +} // namespace oidn diff --git a/oidn/core/math.h b/oidn/core/math.h new file mode 100644 index 0000000..abcf7af --- /dev/null +++ b/oidn/core/math.h @@ -0,0 +1,77 @@ +// ======================================================================== // +// Copyright 2009-2019 Intel Corporation // +// // +// Licensed under the Apache License, Version 2.0 (the "License"); // +// you may not use this file except in compliance with the License. // +// You may obtain a copy of the License at // +// // +// http://www.apache.org/licenses/LICENSE-2.0 // +// // +// Unless required by applicable law or agreed to in writing, software // +// distributed under the License is distributed on an "AS IS" BASIS, // +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // +// See the License for the specific language governing permissions and // +// limitations under the License. // +// ======================================================================== // + +#pragma once + +#include "common/platform.h" + +namespace oidn { + + constexpr float minVectorLength = 1e-10f; + constexpr float minVectorLengthSqr = minVectorLength * minVectorLength; + + using std::log; + using std::log2; + using std::exp; + using std::exp2; + using std::pow; + using std::isfinite; + + __forceinline float sqr(float x) + { + return x * x; + } + + __forceinline float rcp(float x) + { + __m128 r = _mm_rcp_ss(_mm_set_ss(x)); + return _mm_cvtss_f32(_mm_sub_ss(_mm_add_ss(r, r), _mm_mul_ss(_mm_mul_ss(r, r), _mm_set_ss(x)))); + } + + __forceinline float rsqrt(float x) + { + __m128 r = _mm_rsqrt_ss(_mm_set_ss(x)); + return _mm_cvtss_f32(_mm_add_ss(_mm_mul_ss(_mm_set_ss(1.5f), r), + _mm_mul_ss(_mm_mul_ss(_mm_mul_ss(_mm_set_ss(x), _mm_set_ss(-0.5f)), r), _mm_mul_ss(r, r)))); + } + + __forceinline float maxSafe(float value, float minValue) + { + return isfinite(value) ? max(value, minValue) : minValue; + } + + __forceinline float clampSafe(float value, float minValue, float maxValue) + { + return isfinite(value) ? clamp(value, minValue, maxValue) : minValue; + } + + // Returns ceil(a / b) for non-negative integers + template + __forceinline constexpr Int ceilDiv(Int a, Int b) + { + //assert(a >= 0); + //assert(b > 0); + return (a + b - 1) / b; + } + + // Returns a rounded up to multiple of b + template + __forceinline constexpr Int roundUp(Int a, Int b) + { + return ceilDiv(a, b) * b; + } + +} // namespace oidn diff --git a/oidn/core/network.cpp b/oidn/core/network.cpp new file mode 100644 index 0000000..1f6d3ab --- /dev/null +++ b/oidn/core/network.cpp @@ -0,0 +1,368 @@ +// ======================================================================== // +// Copyright 2009-2019 Intel Corporation // +// // +// Licensed under the Apache License, Version 2.0 (the "License"); // +// you may not use this file except in compliance with the License. // +// You may obtain a copy of the License at // +// // +// http://www.apache.org/licenses/LICENSE-2.0 // +// // +// Unless required by applicable law or agreed to in writing, software // +// distributed under the License is distributed on an "AS IS" BASIS, // +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // +// See the License for the specific language governing permissions and // +// limitations under the License. // +// ======================================================================== // + +#include "upsample.h" +#include "weights_reorder.h" +#include "network.h" + +namespace oidn { + + template + Network::Network(const Ref& device, const std::map& weightMap) + : device(device), + eng(engine::cpu, 0), + sm(eng), + weightMap(weightMap) + { + } + + template + void Network::execute(const Progress& progress, int taskIndex) + { + if (progress.func) + { + const double value = double(taskIndex) / double(progress.taskCount); + if (!progress.func(progress.userPtr, value)) + throw Exception(Error::Cancelled, "execution was cancelled"); + } + + for (size_t i = 0; i < nodes.size(); ++i) + { + nodes[i]->execute(sm); + + if (progress.func) + { + const double value = (double(taskIndex) + double(i+1) / double(nodes.size())) / double(progress.taskCount); + if (!progress.func(progress.userPtr, value)) + throw Exception(Error::Cancelled, "execution was cancelled"); + } + } + } + + template + std::shared_ptr Network::allocTensor(const memory::dims& dims, + memory::format_tag format, + void* data) + { + if (format == memory::format_tag::any) + { + if (dims.size() == 4) + format = BlockedFormat::nChwKc; + else if (dims.size() == 1) + format = memory::format_tag::x; + else + assert(0); + } + memory::desc desc(dims, memory::data_type::f32, format); + if (data == nullptr) + { + const size_t bytes = getTensorSize(dims) * sizeof(float); + if (format == BlockedFormat::nChwKc) + activationAllocBytes += bytes; + totalAllocBytes += bytes; + + return std::make_shared(desc, eng); + } + else + { + return std::make_shared(desc, eng, data); + } + } + + template + std::shared_ptr Network::castTensor(const memory::dims& dims, + const std::shared_ptr& src, + size_t srcOffset, + memory::format_tag format) + { + const mkldnn_memory_desc_t& srcDesc = src->get_desc().data; + MAYBE_UNUSED(srcDesc); + assert(srcDesc.data_type == memory::data_type::f32); + assert(getTensorSize(src) >= srcOffset + getTensorSize(dims)); + + if (format == memory::format_tag::any) + { + if (dims.size() == 4) + format = BlockedFormat::nChwKc; + else if (dims.size() == 1) + format = memory::format_tag::x; + else + assert(0); + } + memory::desc desc(dims, memory::data_type::f32, format); + float* srcPtr = (float*)src->get_data_handle() + srcOffset; + return std::make_shared(desc, eng, srcPtr); + } + + template + std::shared_ptr Network::castTensor(const memory::dims& dims, + const std::shared_ptr& src, + const memory::dims& srcOffset) + { + return castTensor(dims, src, getTensorSize(srcOffset)); + } + + template + void Network::zeroTensor(const std::shared_ptr& dst) + { + assert(getTensorType(dst) == memory::data_type::f32); + memset(dst->get_data_handle(), 0, getTensorSize(dst)*sizeof(float)); + } + + template + memory::dims Network::getInputReorderDims(const memory::dims& srcDims, int alignment) + { + memory::dims dstDims = srcDims; + dstDims[1] = getPadded(srcDims[1]); // round up C + dstDims[2] = roundUp(srcDims[2], memory::dim(alignment)); // round up H + dstDims[3] = roundUp(srcDims[3], memory::dim(alignment)); // round up W + return dstDims; + } + + template + memory::dims Network::getConvDims(const std::string& name, const memory::dims& srcDims) + { + auto b = weightMap[name + "/b"]; + memory::dims dstDims = srcDims; + dstDims[1] = getPadded(b.dims[0]); // dstDims[C] = getPadded(OC) + return dstDims; + } + + template + std::shared_ptr Network::addConv(const std::string& name, + const std::shared_ptr& src, + const std::shared_ptr& userDst, + bool relu) + { + const memory::dims strides = {1, 1}; + const memory::dims padding = {1, 1}; + + memory::dims srcDims = getTensorDims(src); + + // Get the weights + const auto& W = weightMap[name + "/W"]; + if (W.ndims() != 4 || W.format != "oihw") + throw Exception(Error::InvalidOperation, "invalid convolution weights"); + memory::dims weightsDims = W.dims; + auto userWeights = allocTensor(weightsDims, memory::format_tag::oihw, W.data); + + // Pad the weights + memory::dims weightsPadDims = weightsDims; + weightsPadDims[1] = getPadded(weightsDims[1]); // IC + weightsPadDims[0] = getPadded(weightsDims[0]); // OC + assert(srcDims[1] == weightsPadDims[1]); // srcDims[C] == weightsPadDims[IC] + auto weightsPad = allocTensor(weightsPadDims, memory::format_tag::oihw); + WeightsReorderNode(userWeights, weightsPad).execute(sm); + + // Get the biases + const auto& b = weightMap[name + "/b"]; + if (b.ndims() != 1) + throw Exception(Error::InvalidOperation, "invalid convolution biases"); + memory::dims biasDims = b.dims; + + // Copy/pad the biases + memory::dims biasPadDims = {getPadded(biasDims[0])}; + auto bias = allocTensor(biasPadDims); + if (biasDims[0] != biasPadDims[0]) + memset(bias->get_data_handle(), 0, biasPadDims[0]*sizeof(float)); + memcpy(bias->get_data_handle(), b.data, biasDims[0]*sizeof(float)); + + // Allocate memory for destination + memory::dims dstDims = srcDims; + dstDims[1] = weightsPadDims[0]; // dstDims[C] = weightsPadDims[OC] + + std::shared_ptr dst; + if (!userDst) + dst = allocTensor(dstDims); + else if (getTensorDims(userDst) == dstDims) + dst = userDst; + else + dst = castTensor(dstDims, userDst); + + // Create a convolution + // Let the convolution primitive choose the weights format + auto weightsDesc = memory::desc({ weightsPadDims }, memory::data_type::f32, memory::format_tag::any); + + auto convAlgo = (K == 16) ? convolution_winograd : convolution_direct; + auto convDesc = convolution_forward::desc( + prop_kind::forward_inference, convAlgo, + src->get_desc(), + weightsDesc, + bias->get_desc(), + dst->get_desc(), + strides, padding, padding, padding_kind::zero); + + // Incorporate relu + mkldnn::primitive_attr convAttr; + if (relu) + { + mkldnn::post_ops ops; + ops.append_eltwise( + 1.f, // scale factor, not used + algorithm::eltwise_relu, + 0.f, // max with + 0.f // unused + ); + convAttr.set_post_ops(ops); + } + convAttr.set_scratchpad_mode(scratchpad_mode_user); + + auto convPrimDesc = convolution_forward::primitive_desc(convDesc, convAttr, eng); + + // Reorder the weights to the final format, if necessary + auto weights = weightsPad; + if (convPrimDesc.weights_desc() != weightsPad->get_desc()) + { + weights = std::make_shared(convPrimDesc.weights_desc(), eng); + ReorderNode(weightsPad, weights).execute(sm); + } + + // Create convolution node and add it to the net + auto node = std::make_shared(convPrimDesc, src, weights, bias, dst); + nodes.push_back(node); + return node; + } + + template + memory::dims Network::getPoolDims(const memory::dims& srcDims) + { + memory::dims dstDims = srcDims; + dstDims[2] /= 2; // H/2 + dstDims[3] /= 2; // W/2 + return dstDims; + } + + template + std::shared_ptr Network::addPool(const std::shared_ptr& src, + const std::shared_ptr& userDst) + { + const memory::dims kernel = {2, 2}; + const memory::dims strides = {2, 2}; + const memory::dims padding = {0, 0}; + + memory::dims srcDims = getTensorDims(src); + memory::dims dstDims = getPoolDims(srcDims); + + std::shared_ptr dst; + if (!userDst) + dst = allocTensor(dstDims); + else if (getTensorDims(userDst) == dstDims) + dst = userDst; + else + dst = castTensor(dstDims, userDst); + + auto poolDesc = pooling_forward::desc( + prop_kind::forward_inference, pooling_max, + src->get_desc(), + dst->get_desc(), + strides, kernel, padding, padding, padding_kind::zero); + + mkldnn::primitive_attr poolAttr; + poolAttr.set_scratchpad_mode(scratchpad_mode_user); + + auto poolPrimDesc = pooling_forward::primitive_desc(poolDesc, poolAttr, eng); + + auto node = std::make_shared(poolPrimDesc, src, dst); + nodes.push_back(node); + return node; + } + + template + memory::dims Network::getUpsampleDims(const memory::dims& srcDims) + { + memory::dims dstDims = srcDims; + dstDims[2] *= 2; // H*2 + dstDims[3] *= 2; // W*2 + return dstDims; + } + + template + std::shared_ptr Network::addUpsample(const std::shared_ptr& src, + const std::shared_ptr& userDst) + { + memory::dims srcDims = getTensorDims(src); + memory::dims dstDims = getUpsampleDims(srcDims); + + std::shared_ptr dst; + if (!userDst) + dst = allocTensor(dstDims); + else if (getTensorDims(userDst) == dstDims) + dst = userDst; + else + dst = castTensor(dstDims, userDst); + + // Create upsampling node and add it to net + auto node = std::make_shared>(src, dst); + nodes.push_back(node); + return node; + } + + template + memory::dims Network::getConcatDims(const memory::dims& src1Dims, const memory::dims& src2Dims) + { + assert(src1Dims[0] == src2Dims[0]); // N + assert(src1Dims[2] == src2Dims[2]); // H + assert(src1Dims[3] == src2Dims[3]); // W + + memory::dims dstDims = src1Dims; + dstDims[1] += src2Dims[1]; // C + return dstDims; + } + + template + std::shared_ptr Network::addAutoexposure(const Image& color, + const std::shared_ptr& transferFunc) + { + auto node = std::make_shared(color, transferFunc); + nodes.push_back(node); + return node; + } + + template + void Network::finalize() + { + // Compute the size of the scratchpad + size_t scratchpadSize = 0; + for (const auto& node : nodes) + scratchpadSize = max(scratchpadSize, node->getScratchpadSize()); + + // Allocate the scratchpad + memory::dims scratchpadDims = { memory::dim(scratchpadSize) }; + memory::desc scratchpadDesc(scratchpadDims, memory::data_type::u8, memory::format_tag::x); + auto scratchpad = std::make_shared(scratchpadDesc, eng); + activationAllocBytes += scratchpadSize; + totalAllocBytes += scratchpadSize; + + // Set the scratchpad for the nodes + for (auto& node : nodes) + node->setScratchpad(scratchpad); + + // Free the weights + weightMap.clear(); + + // Print statistics + if (device->isVerbose(2)) + { + std::cout << "Activation bytes: " << activationAllocBytes << std::endl; + std::cout << "Scratchpad bytes: " << scratchpadSize << std::endl; + std::cout << "Total bytes : " << totalAllocBytes << std::endl; + } + } + + template class Network<8>; + template class Network<16>; + +} // namespace oidn diff --git a/oidn/core/network.h b/oidn/core/network.h new file mode 100644 index 0000000..81a9d1e --- /dev/null +++ b/oidn/core/network.h @@ -0,0 +1,158 @@ +// ======================================================================== // +// Copyright 2009-2019 Intel Corporation // +// // +// Licensed under the Apache License, Version 2.0 (the "License"); // +// you may not use this file except in compliance with the License. // +// You may obtain a copy of the License at // +// // +// http://www.apache.org/licenses/LICENSE-2.0 // +// // +// Unless required by applicable law or agreed to in writing, software // +// distributed under the License is distributed on an "AS IS" BASIS, // +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // +// See the License for the specific language governing permissions and // +// limitations under the License. // +// ======================================================================== // + +#include "common/tensor.h" +#include "image.h" +#include "node.h" +#include "input_reorder.h" +#include "output_reorder.h" +#include "transfer_function.h" + +#pragma once + +namespace oidn { + + // Progress state + struct Progress + { + ProgressMonitorFunction func; + void* userPtr; + int taskCount; + }; + + class Executable + { + public: + virtual ~Executable() {} + virtual void execute(const Progress& progress, int taskIndex) = 0; + }; + + template + class Network : public Executable + { + public: + Network(const Ref& device, const std::map& weightMap); + + void execute(const Progress& progress, int taskIndex) override; + + std::shared_ptr allocTensor(const memory::dims& dims, + memory::format_tag format = memory::format_tag::any, + void* data = nullptr); + + std::shared_ptr castTensor(const memory::dims& dims, + const std::shared_ptr& src, + size_t srcOffset = 0, + memory::format_tag format = memory::format_tag::any); + + std::shared_ptr castTensor(const memory::dims& dims, + const std::shared_ptr& src, + const memory::dims& srcOffset); + + void zeroTensor(const std::shared_ptr& dst); + + memory::dims getInputReorderDims(const memory::dims& srcDims, int alignment); + + template + std::shared_ptr addInputReorder(const Image& color, + const Image& albedo, + const Image& normal, + const std::shared_ptr& transferFunc, + int alignment, + const std::shared_ptr& userDst = nullptr); + + template + std::shared_ptr addOutputReorder(const std::shared_ptr& src, + const std::shared_ptr& transferFunc, + const Image& output); + + memory::dims getConvDims(const std::string& name, const memory::dims& srcDims); + std::shared_ptr addConv(const std::string& name, + const std::shared_ptr& src, + const std::shared_ptr& userDst = nullptr, + bool relu = true); + + memory::dims getPoolDims(const memory::dims& srcDims); + std::shared_ptr addPool(const std::shared_ptr& src, + const std::shared_ptr& userDst = nullptr); + + memory::dims getUpsampleDims(const memory::dims& srcDims); + std::shared_ptr addUpsample(const std::shared_ptr& src, + const std::shared_ptr& userDst = nullptr); + + memory::dims getConcatDims(const memory::dims& src1Dims, const memory::dims& src2Dims); + + std::shared_ptr addAutoexposure(const Image& color, + const std::shared_ptr& transferFunc); + + void finalize(); + + private: + Ref device; + engine eng; + stream sm; + std::vector> nodes; + std::map weightMap; + + // Memory allocation statistics + size_t activationAllocBytes = 0; // number of allocated activation bytes + size_t totalAllocBytes = 0; // total number of allocated bytes + }; + + + template + template + std::shared_ptr Network::addInputReorder(const Image& color, + const Image& albedo, + const Image& normal, + const std::shared_ptr& transferFunc, + int alignment, + const std::shared_ptr& userDst) + { + assert(color); + int inputC = 3; + if (albedo) inputC += 3; + if (normal) inputC += 3; + + memory::dims srcDims = {1, inputC, color.height, color.width}; + memory::dims dstDims = getInputReorderDims(srcDims, alignment); + + // Allocate padded memory + auto dst = userDst; + if (!dst) + dst = allocTensor(dstDims); + + // Push node + auto node = std::make_shared>(color, albedo, normal, dst, transferFunc); + nodes.push_back(node); + return node; + } + + template + template + std::shared_ptr Network::addOutputReorder(const std::shared_ptr& src, + const std::shared_ptr& transferFunc, + const Image& output) + { + memory::dims srcDims = getTensorDims(src); + assert(srcDims[1] == K); + + // Push node + auto node = std::make_shared>(src, output, transferFunc); + nodes.push_back(node); + return node; + } + +} // namespace oidn diff --git a/oidn/core/node.h b/oidn/core/node.h new file mode 100644 index 0000000..b9ffe90 --- /dev/null +++ b/oidn/core/node.h @@ -0,0 +1,142 @@ +// ======================================================================== // +// Copyright 2009-2019 Intel Corporation // +// // +// Licensed under the Apache License, Version 2.0 (the "License"); // +// you may not use this file except in compliance with the License. // +// You may obtain a copy of the License at // +// // +// http://www.apache.org/licenses/LICENSE-2.0 // +// // +// Unless required by applicable law or agreed to in writing, software // +// distributed under the License is distributed on an "AS IS" BASIS, // +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // +// See the License for the specific language governing permissions and // +// limitations under the License. // +// ======================================================================== // + +#pragma once + +#include "common.h" +#include + +namespace oidn { + + class Node + { + public: + virtual ~Node() = default; + + virtual void execute(stream& sm) = 0; + + virtual std::shared_ptr getDst() const { return nullptr; } + + virtual size_t getScratchpadSize() const { return 0; } + virtual void setScratchpad(const std::shared_ptr& mem) {} + + virtual void setTile(int h1, int w1, int h2, int w2, int H, int W) + { + assert(0); // not supported + } + }; + + // Node wrapping an MKL-DNN primitive + class MklNode : public Node + { + private: + primitive prim; + std::unordered_map args; + std::shared_ptr scratchpad; + + public: + MklNode(const primitive& prim, const std::unordered_map& args) + : prim(prim), + args(args) + {} + + size_t getScratchpadSize() const override + { + const auto primDesc = prim.get_primitive_desc(); + const mkldnn_memory_desc_t* scratchpadDesc = mkldnn_primitive_desc_query_md(primDesc, mkldnn_query_scratchpad_md, 0); + if (scratchpadDesc == nullptr) + return 0; + return mkldnn_memory_desc_get_size(scratchpadDesc); + } + + void setScratchpad(const std::shared_ptr& mem) override + { + scratchpad = mem; + args.insert(std::make_pair(MKLDNN_ARG_SCRATCHPAD, *scratchpad)); + } + + void execute(stream& sm) override + { + prim.execute(sm, args); + } + }; + + // Convolution node + class ConvNode : public MklNode + { + private: + std::shared_ptr src; + std::shared_ptr weights; + std::shared_ptr bias; + std::shared_ptr dst; + + public: + ConvNode(const convolution_forward::primitive_desc& desc, + const std::shared_ptr& src, + const std::shared_ptr& weights, + const std::shared_ptr& bias, + const std::shared_ptr& dst) + : MklNode(convolution_forward(desc), + { { MKLDNN_ARG_SRC, *src }, + { MKLDNN_ARG_WEIGHTS, *weights }, + { MKLDNN_ARG_BIAS, *bias }, + { MKLDNN_ARG_DST, *dst } }), + src(src), weights(weights), bias(bias), dst(dst) + {} + + std::shared_ptr getDst() const override { return dst; } + }; + + // Pooling node + class PoolNode : public MklNode + { + private: + std::shared_ptr src; + std::shared_ptr dst; + + public: + PoolNode(const pooling_forward::primitive_desc& desc, + const std::shared_ptr& src, + const std::shared_ptr& dst) + : MklNode(pooling_forward(desc), + { { MKLDNN_ARG_SRC, *src }, + { MKLDNN_ARG_DST, *dst } }), + src(src), dst(dst) + {} + + std::shared_ptr getDst() const override { return dst; } + }; + + // Reorder node + class ReorderNode : public MklNode + { + private: + std::shared_ptr src; + std::shared_ptr dst; + + public: + ReorderNode(const std::shared_ptr& src, + const std::shared_ptr& dst) + : MklNode(reorder(reorder::primitive_desc(*src, *dst)), + { { MKLDNN_ARG_SRC, *src }, + { MKLDNN_ARG_DST, *dst } }), + src(src), dst(dst) + {} + + std::shared_ptr getDst() const override { return dst; } + }; + +} // namespace oidn diff --git a/oidn/core/output_reorder.h b/oidn/core/output_reorder.h new file mode 100644 index 0000000..7918d48 --- /dev/null +++ b/oidn/core/output_reorder.h @@ -0,0 +1,126 @@ +// ======================================================================== // +// Copyright 2009-2019 Intel Corporation // +// // +// Licensed under the Apache License, Version 2.0 (the "License"); // +// you may not use this file except in compliance with the License. // +// You may obtain a copy of the License at // +// // +// http://www.apache.org/licenses/LICENSE-2.0 // +// // +// Unless required by applicable law or agreed to in writing, software // +// distributed under the License is distributed on an "AS IS" BASIS, // +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // +// See the License for the specific language governing permissions and // +// limitations under the License. // +// ======================================================================== // + +#pragma once + +#include "node.h" +#include "image.h" + +namespace oidn { + + // Output reorder node + template + class OutputReorderNode : public Node + { + private: + // Source + std::shared_ptr src; + const float* srcPtr; + int H1; + int W1; + + // Destination + Image output; + + // Tile + int h1Begin; + int w1Begin; + int h2Begin; + int w2Begin; + int H; + int W; + + std::shared_ptr transferFunc; + + public: + OutputReorderNode(const std::shared_ptr& src, + const Image& output, + const std::shared_ptr& transferFunc) + : src(src), + output(output), + h1Begin(0), w1Begin(0), + h2Begin(0), w2Begin(0), + H(output.height), W(output.width), + transferFunc(transferFunc) + { + const mkldnn_memory_desc_t& srcDesc = src->get_desc().data; + MAYBE_UNUSED(srcDesc); + assert(memory_desc_matches_tag(srcDesc, mkldnn_format_tag_t(BlockedFormat::nChwKc))); + assert(srcDesc.ndims == 4); + assert(srcDesc.data_type == memory::data_type::f32); + assert(srcDesc.dims[0] == 1); + // We assume output data is <= K OC + assert(srcDesc.dims[1] == K); + + srcPtr = (float*)src->get_data_handle(); + H1 = srcDesc.dims[2]; + W1 = srcDesc.dims[3]; + } + + void setTile(int h1, int w1, int h2, int w2, int H, int W) override + { + h1Begin = h1; + w1Begin = w1; + h2Begin = h2; + w2Begin = w2; + this->H = H; + this->W = W; + } + + void execute(stream& sm) override + { + assert(h1Begin + H <= H1); + assert(w1Begin + W <= W1); + assert(h2Begin + H <= output.height); + assert(w2Begin + W <= output.width); + + const int C1 = K; + + parallel_nd(H, [&](int h) + { + const int h1 = h + h1Begin; + const int h2 = h + h2Begin; + + for (int w = 0; w < W; ++w) + { + const int w1 = w + w1Begin; + const int w2 = w + w2Begin; + float* dstPtr_C = (float*)output.get(h2, w2); + + // Source is in nChwKc format. In this case C is 1 so this is really nhwc + const float* srcPtr_C = srcPtr + h1*W1*C1 + w1*C1; + + #pragma unroll + for (int i = 0; i < 3; ++i) + { + // Load the value + float x = srcPtr_C[i]; + + // The CNN output may contain negative values or even NaNs, so it must be sanitized + x = maxSafe(x, 0.f); + + // Apply the inverse transfer function + x = transferFunc->inverse(x); + + // Sanitize and store the final value + dstPtr_C[i] = max(x, 0.f); + } + } + }); + } + }; + +} // namespace oidn diff --git a/oidn/core/transfer_function.cpp b/oidn/core/transfer_function.cpp new file mode 100644 index 0000000..d13d823 --- /dev/null +++ b/oidn/core/transfer_function.cpp @@ -0,0 +1,93 @@ +// ======================================================================== // +// Copyright 2009-2019 Intel Corporation // +// // +// Licensed under the Apache License, Version 2.0 (the "License"); // +// you may not use this file except in compliance with the License. // +// You may obtain a copy of the License at // +// // +// http://www.apache.org/licenses/LICENSE-2.0 // +// // +// Unless required by applicable law or agreed to in writing, software // +// distributed under the License is distributed on an "AS IS" BASIS, // +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // +// See the License for the specific language governing permissions and // +// limitations under the License. // +// ======================================================================== // + +#include "transfer_function.h" + +namespace oidn { + + const float HDRTransferFunction::xScale = 1.f / HDRTransferFunction::pqxForward(HDRTransferFunction::yMax * HDRTransferFunction::yScale); + + float AutoexposureNode::autoexposure(const Image& color) + { + assert(color.format == Format::Float3); + + constexpr float key = 0.18f; + constexpr float eps = 1e-8f; + constexpr int K = 16; // downsampling amount + + // Downsample the image to minimize sensitivity to noise + const int H = color.height; // original height + const int W = color.width; // original width + const int HK = (H + K/2) / K; // downsampled height + const int WK = (W + K/2) / K; // downsampled width + + // Compute the average log luminance of the downsampled image + using Sum = std::pair; + + Sum sum = + tbb::parallel_reduce( + tbb::blocked_range2d(0, HK, 0, WK), + Sum(0.f, 0), + [&](const tbb::blocked_range2d& r, Sum sum) -> Sum + { + // Iterate over blocks + for (int i = r.rows().begin(); i != r.rows().end(); ++i) + { + for (int j = r.cols().begin(); j != r.cols().end(); ++j) + { + // Compute the average luminance in the current block + const int beginH = int(ptrdiff_t(i) * H / HK); + const int beginW = int(ptrdiff_t(j) * W / WK); + const int endH = int(ptrdiff_t(i+1) * H / HK); + const int endW = int(ptrdiff_t(j+1) * W / WK); + + float L = 0.f; + + for (int h = beginH; h < endH; ++h) + { + for (int w = beginW; w < endW; ++w) + { + const float* rgb = (const float*)color.get(h, w); + + const float r = maxSafe(rgb[0], 0.f); + const float g = maxSafe(rgb[1], 0.f); + const float b = maxSafe(rgb[2], 0.f); + + L += luminance(r, g, b); + } + } + + L /= (endH - beginH) * (endW - beginW); + + // Accumulate the log luminance + if (L > eps) + { + sum.first += log2(L); + sum.second++; + } + } + } + + return sum; + }, + [](Sum a, Sum b) -> Sum { return Sum(a.first+b.first, a.second+b.second); }, + tbb::static_partitioner() + ); + + return (sum.second > 0) ? (key / exp2(sum.first / float(sum.second))) : 1.f; + } + +} // namespace oidn diff --git a/oidn/core/transfer_function.h b/oidn/core/transfer_function.h new file mode 100644 index 0000000..65b931f --- /dev/null +++ b/oidn/core/transfer_function.h @@ -0,0 +1,166 @@ +// ======================================================================== // +// Copyright 2009-2019 Intel Corporation // +// // +// Licensed under the Apache License, Version 2.0 (the "License"); // +// you may not use this file except in compliance with the License. // +// You may obtain a copy of the License at // +// // +// http://www.apache.org/licenses/LICENSE-2.0 // +// // +// Unless required by applicable law or agreed to in writing, software // +// distributed under the License is distributed on an "AS IS" BASIS, // +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // +// See the License for the specific language governing permissions and // +// limitations under the License. // +// ======================================================================== // + +#pragma once + +#include "image.h" +#include "node.h" + +namespace oidn { + + __forceinline float luminance(float r, float g, float b) + { + return 0.212671f * r + 0.715160f * g + 0.072169f * b; + } + + // Color transfer function + class TransferFunction + { + public: + virtual ~TransferFunction() = default; + + virtual float forward(float y) const = 0; + virtual float inverse(float x) const = 0; + }; + + // LDR transfer function: linear + class LDRLinearTransferFunction : public TransferFunction + { + public: + __forceinline float forward(float y) const override + { + return min(y, 1.f); + } + + __forceinline float inverse(float x) const override + { + return min(x, 1.f); + } + }; + + // LDR transfer function: sRGB curve + class LDRTransferFunction : public TransferFunction + { + public: + __forceinline float forward(float y) const override + { + return min(pow(y, 1.f/2.2f), 1.f); + } + + __forceinline float inverse(float x) const override + { + return min(pow(x, 2.2f), 1.f); + } + }; + + // HDR transfer function: PQX curve + // Compresses [0..65504] to [0..1] + class HDRTransferFunction : public TransferFunction + { + private: + static constexpr float m1 = 2610.f / 4096.f / 4.f; + static constexpr float m2 = 2523.f / 4096.f * 128.f; + static constexpr float c1 = 3424.f / 4096.f; + static constexpr float c2 = 2413.f / 4096.f * 32.f; + static constexpr float c3 = 2392.f / 4096.f * 32.f; + static constexpr float a = 3711.f / 4096.f / 8.f; + + static constexpr float yMax = 65504.f; + static constexpr float yScale = 100.f / 10000.f; + static const float xScale; + + float exposure; + float rcpExposure; + + public: + HDRTransferFunction(float exposure = 1.f) + { + setExposure(exposure); + } + + void setExposure(float exposure) + { + this->exposure = exposure; + this->rcpExposure = 1.f / exposure; + } + + __forceinline float forward(float y) const override + { + y *= exposure; + return pqxForward(y * yScale) * xScale; + } + + __forceinline float inverse(float x) const override + { + return pqxInverse(x * (1.f/xScale)) * (1.f/yScale) * rcpExposure; + } + + private: + static __forceinline float pqForward(float y) + { + const float yPow = pow(y, m1); + return pow((c1 + c2 * yPow) * rcp(1.f + c3 * yPow), m2); + } + + static __forceinline float pqxForward(float y) + { + if (y <= 1.f) + return pqForward(y); + else + return a * log(y) + 1.f; + } + + static __forceinline float pqInverse(float x) + { + const float xPow = pow(x, 1.f/m2); + return pow(max((xPow - c1) * rcp(c2 - c3 * xPow), 0.f), 1.f/m1); + } + + static __forceinline float pqxInverse(float x) + { + if (x <= 1.f) + return pqInverse(x); + else + return exp((x - 1.f) * (1.f/a)); + } + }; + + // Autoexposure node + class AutoexposureNode : public Node + { + private: + Image color; + std::shared_ptr transferFunc; + + public: + AutoexposureNode(const Image& color, + const std::shared_ptr& transferFunc) + : color(color), + transferFunc(transferFunc) + {} + + void execute(stream& sm) override + { + const float exposure = autoexposure(color); + //printf("exposure = %f\n", exposure); + transferFunc->setExposure(exposure); + } + + private: + static float autoexposure(const Image& color); + }; + +} // namespace oidn diff --git a/oidn/core/upsample.h b/oidn/core/upsample.h new file mode 100644 index 0000000..f6cace4 --- /dev/null +++ b/oidn/core/upsample.h @@ -0,0 +1,92 @@ +// ======================================================================== // +// Copyright 2009-2019 Intel Corporation // +// // +// Licensed under the Apache License, Version 2.0 (the "License"); // +// you may not use this file except in compliance with the License. // +// You may obtain a copy of the License at // +// // +// http://www.apache.org/licenses/LICENSE-2.0 // +// // +// Unless required by applicable law or agreed to in writing, software // +// distributed under the License is distributed on an "AS IS" BASIS, // +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // +// See the License for the specific language governing permissions and // +// limitations under the License. // +// ======================================================================== // + +#pragma once + +#include "node.h" + +namespace oidn { + + // 2x2 nearest-neighbor upsampling node + template + class UpsampleNode : public Node + { + private: + std::shared_ptr src; + std::shared_ptr dst; + + public: + UpsampleNode(const std::shared_ptr& src, + const std::shared_ptr& dst) + : src(src), + dst(dst) + { + const mkldnn_memory_desc_t& srcDesc = src->get_desc().data; + const mkldnn_memory_desc_t& dstDesc = dst->get_desc().data; + MAYBE_UNUSED(srcDesc); + MAYBE_UNUSED(dstDesc); + assert(memory_desc_matches_tag(srcDesc, mkldnn_format_tag_t(BlockedFormat::nChwKc))); + assert(memory_desc_matches_tag(dstDesc, mkldnn_format_tag_t(BlockedFormat::nChwKc))); + assert(srcDesc.ndims == 4); + assert(dstDesc.ndims == 4); + assert(srcDesc.data_type == memory::data_type::f32); + assert(dstDesc.data_type == memory::data_type::f32); + assert(srcDesc.dims[0] == 1); + assert(dstDesc.dims[0] == 1); + // 2x2 upsampling + assert(dstDesc.dims[2] == srcDesc.dims[2] * 2); + assert(dstDesc.dims[3] == srcDesc.dims[3] * 2); + } + + void execute(stream& sm) override + { + const mkldnn_memory_desc_t& srcDesc = src->get_desc().data; + + const float* srcPtr = (float*)src->get_data_handle(); + float* dstPtr = (float*)dst->get_data_handle(); + + const int C = srcDesc.dims[1]; + const int H = srcDesc.dims[2]; + const int W = srcDesc.dims[3]; + const int CK = C / K; + + parallel_nd(CK, H, [&](int ck, int h) + { + const size_t offset = ck*H*W*K + h*W*K; + const float* srcPtr_line = srcPtr + offset; + float* dstPtr_line0 = dstPtr + offset * 4; + float* dstPtr_line1 = dstPtr_line0 + W*2*K; // next line + + for (int w = 0; w < W; ++w) + { + #pragma unroll + for (int k = 0; k < K; k += 4) + { + const __m128 m = _mm_load_ps(&srcPtr_line[w*K + k]); + + _mm_stream_ps(&dstPtr_line0[w*2*K + k], m); + _mm_stream_ps(&dstPtr_line0[w*2*K+K + k], m); + _mm_stream_ps(&dstPtr_line1[w*2*K + k], m); + _mm_stream_ps(&dstPtr_line1[w*2*K+K + k], m); + } + } + }); + } + + std::shared_ptr getDst() const override { return dst; } + }; + +} // namespace oidn diff --git a/oidn/core/weights_reorder.h b/oidn/core/weights_reorder.h new file mode 100644 index 0000000..6c5dacb --- /dev/null +++ b/oidn/core/weights_reorder.h @@ -0,0 +1,99 @@ +// ======================================================================== // +// Copyright 2009-2019 Intel Corporation // +// // +// Licensed under the Apache License, Version 2.0 (the "License"); // +// you may not use this file except in compliance with the License. // +// You may obtain a copy of the License at // +// // +// http://www.apache.org/licenses/LICENSE-2.0 // +// // +// Unless required by applicable law or agreed to in writing, software // +// distributed under the License is distributed on an "AS IS" BASIS, // +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // +// See the License for the specific language governing permissions and // +// limitations under the License. // +// ======================================================================== // + +#pragma once + +#include "node.h" + +namespace oidn { + + // Reorders weights from oihw to padded oihw format + template + class WeightsReorderNode : public Node + { + private: + std::shared_ptr src; + std::shared_ptr dst; + + public: + WeightsReorderNode(const std::shared_ptr& src, + const std::shared_ptr& dst) + : src(src), + dst(dst) + { + const mkldnn_memory_desc_t& srcDesc = src->get_desc().data; + const mkldnn_memory_desc_t& dstDesc = dst->get_desc().data; + MAYBE_UNUSED(srcDesc); + MAYBE_UNUSED(dstDesc); + assert(memory_desc_matches_tag(srcDesc, mkldnn_format_tag_t(memory::format_tag::oihw))); + assert(memory_desc_matches_tag(dstDesc, mkldnn_format_tag_t(memory::format_tag::oihw))); + assert(srcDesc.ndims == 4); + assert(dstDesc.ndims == 4); + assert(srcDesc.data_type == memory::data_type::f32); + assert(dstDesc.data_type == memory::data_type::f32); + assert(getPadded(srcDesc.dims[0]) == dstDesc.dims[0]); // OC + assert(getPadded(srcDesc.dims[1]) == dstDesc.dims[1]); // IC + assert(srcDesc.dims[2] == dstDesc.dims[2]); + assert(srcDesc.dims[3] == dstDesc.dims[3]); + } + + void execute(stream& sm) override + { + const mkldnn_memory_desc_t& srcDesc = src->get_desc().data; + const mkldnn_memory_desc_t& dstDesc = dst->get_desc().data; + + const float* srcPtr = (float*)src->get_data_handle(); + float* dstPtr = (float*)dst->get_data_handle(); + + const int OC1 = srcDesc.dims[0]; + const int OC2 = dstDesc.dims[0]; + const int IC1 = srcDesc.dims[1]; + const int IC2 = dstDesc.dims[1]; + const int H = dstDesc.dims[2]; + const int W = dstDesc.dims[3]; + + for (int oc = 0; oc < OC2; ++oc) + { + for (int ic = 0; ic < IC2; ++ic) + { + for (int h = 0; h < H; ++h) + { + for (int w = 0; w < W; ++w) + { + // Output is in oihw format + float* dstPtr_c = dstPtr + oc*IC2*H*W + ic*H*W + h*W + w; + + if (oc < OC1 && ic < IC1) + { + // Input is in oihw format + const float* srcPtr_c = srcPtr + oc*IC1*H*W + ic*H*W + h*W + w; + *dstPtr_c = *srcPtr_c; + } + else + { + // padding + *dstPtr_c = 0; + } + } + } + } + } + } + + std::shared_ptr getDst() const override { return dst; } + }; + +} // namespace oidn diff --git a/oidn/doc/.gitignore b/oidn/doc/.gitignore new file mode 100644 index 0000000..706faca --- /dev/null +++ b/oidn/doc/.gitignore @@ -0,0 +1,6 @@ +# Generated files and folders +changelog.md +www +tmp +__pycache__ +images diff --git a/oidn/doc/api.md b/oidn/doc/api.md new file mode 100644 index 0000000..2933b76 --- /dev/null +++ b/oidn/doc/api.md @@ -0,0 +1,586 @@ +Open Image Denoise API +====================== + +Open Image Denoise provides a C99 API (also compatible with C++) and a +C++11 wrapper API as well. For simplicity, this document mostly refers to the +C99 version of the API. + +The API is designed in an object-oriented manner, e.g. it contains device +objects (`OIDNDevice` type), buffer objects (`OIDNBuffer` type), and filter +objects (`OIDNFilter` type). All objects are reference-counted, and handles +can be released by calling the appropriate release function (e.g. +`oidnReleaseDevice`) or retained by incrementing the reference count (e.g. +`oidnRetainDevice`). + +An important aspect of objects is that setting their parameters do not have +an immediate effect (with a few exceptions). Instead, objects with updated +parameters are in an unusable state until the parameters get explicitly +committed to a given object. The commit semantic allows for batching up +multiple small changes, and specifies exactly when changes to objects will +occur. + +All API calls are thread-safe, but operations that use the same device will be +serialized, so the amount of API calls from different threads should be minimized. + +To have a quick overview of the C99 and C++11 APIs, see the following +simple example code snippets. + +### C99 API Example + + #include + ... + // Create an Open Image Denoise device + OIDNDevice device = oidnNewDevice(OIDN_DEVICE_TYPE_DEFAULT); + oidnCommitDevice(device); + + // Create a denoising filter + OIDNFilter filter = oidnNewFilter(device, "RT"); // generic ray tracing filter + oidnSetSharedFilterImage(filter, "color", colorPtr, + OIDN_FORMAT_FLOAT3, width, height, 0, 0, 0); + oidnSetSharedFilterImage(filter, "albedo", albedoPtr, + OIDN_FORMAT_FLOAT3, width, height, 0, 0, 0); // optional + oidnSetSharedFilterImage(filter, "normal", normalPtr, + OIDN_FORMAT_FLOAT3, width, height, 0, 0, 0); // optional + oidnSetSharedFilterImage(filter, "output", outputPtr, + OIDN_FORMAT_FLOAT3, width, height, 0, 0, 0); + oidnSetFilter1b(filter, "hdr", true); // image is HDR + oidnCommitFilter(filter); + + // Filter the image + oidnExecuteFilter(filter); + + // Check for errors + const char* errorMessage; + if (oidnGetDeviceError(device, &errorMessage) != OIDN_ERROR_NONE) + printf("Error: %s\n", errorMessage); + + // Cleanup + oidnReleaseFilter(filter); + oidnReleaseDevice(device); + +### C++11 API Example + + #include + ... + // Create an Open Image Denoise device + oidn::DeviceRef device = oidn::newDevice(); + device.commit(); + + // Create a denoising filter + oidn::FilterRef filter = device.newFilter("RT"); // generic ray tracing filter + filter.setImage("color", colorPtr, oidn::Format::Float3, width, height); + filter.setImage("albedo", albedoPtr, oidn::Format::Float3, width, height); // optional + filter.setImage("normal", normalPtr, oidn::Format::Float3, width, height); // optional + filter.setImage("output", outputPtr, oidn::Format::Float3, width, height); + filter.set("hdr", true); // image is HDR + filter.commit(); + + // Filter the image + filter.execute(); + + // Check for errors + const char* errorMessage; + if (device.getError(errorMessage) != oidn::Error::None) + std::cout << "Error: " << errorMessage << std::endl; + + +Device +------ + +Open Image Denoise supports a device concept, which allows different components +of the application to use the Open Image Denoise API without interfering with +each other. An application first needs to create a device with + + OIDNDevice oidnNewDevice(OIDNDeviceType type); + +where the `type` enumeration maps to a specific device implementation, which +can be one of the following: + +Name Description +------------------------ ------------------------------------------------------ +OIDN_DEVICE_TYPE_DEFAULT select the approximately fastest device +OIDN_DEVICE_TYPE_CPU CPU device (requires SSE4.1 support) +------------------------ ------------------------------------------------------ +: Supported device types, i.e., valid constants of type `OIDNDeviceType`. + +Once a device is created, you can call + + void oidnSetDevice1b(OIDNDevice device, const char* name, bool value); + void oidnSetDevice1i(OIDNDevice device, const char* name, int value); + bool oidnGetDevice1b(OIDNDevice device, const char* name); + int oidnGetDevice1i(OIDNDevice device, const char* name); + +to set and get parameter values on the device. Note that some parameters are +constants, thus trying to set them is an error. See the tables below for the +parameters supported by devices. + +--------- ------------ -------- ----------------------------------------------- +Type Name Default Description +--------- ------------ -------- ----------------------------------------------- +const int version combined version number (major.minor.patch) + with two decimal digits per component + +const int versionMajor major version number + +const int versionMinor minor version number + +const int versionPatch patch version number + +int verbose 0 verbosity level of the console output between + 0--3; when set to 0, no output is printed, when + set to a higher level more output is printed +--------- ------------ -------- ----------------------------------------------- +: Parameters supported by all devices. + +------ ------------ -------- -------------------------------------------------- +Type Name Default Description +------ ------------ -------- -------------------------------------------------- +int numThreads 0 maximum number of threads which Open Image Denoise + should use; 0 will set it automatically to get the + best performance + +bool setAffinity true bind software threads to hardware threads if set + to true (improves performance); false disables + binding +------ ------------ -------- -------------------------------------------------- +: Additional parameters supported only by CPU devices. + +Note that the CPU device heavily relies on setting the thread affinities to +achieve optimal performance, so it is highly recommended to leave this option +enabled. However, this may interfere with the application if that also sets +the thread affinities, potentially causing performance degradation. In such +cases, the recommended solution is to either disable setting the affinities +in the application or in Open Image Denoise, or to always set/reset +the affinities before/after each parallel region in the application (e.g., +if using TBB, with `tbb::task_arena` and `tbb::task_scheduler_observer`). + +Once parameters are set on the created device, the device must be committed with + + void oidnCommitDevice(OIDNDevice device); + +This device can then be used to construct further objects, such as buffers and +filters. Note that a device can be committed only once during its lifetime. +Before the application exits, it should release all devices by invoking + + void oidnReleaseDevice(OIDNDevice device); + +Note that Open Image Denoise uses reference counting for all object types, so +this function decreases the reference count of the device, and if the count +reaches 0 the device will automatically get deleted. It is also possible to +increase the reference count by calling + + void oidnRetainDevice(OIDNDevice device); + +An application typically creates only a single device. If required differently, +it should only use a small number of devices at any given time. + +### Error Handling + +Each user thread has its own error code per device. If an error occurs when +calling an API function, this error code is set to the occurred error if it +stores no previous error. The currently stored error can be queried by the +application via + + OIDNError oidnGetDeviceError(OIDNDevice device, const char** outMessage); + +where `outMessage` can be a pointer to a C string which will be set to a more +descriptive error message, or it can be `NULL`. This function also clears the +error code, which assures that the returned error code is always the first +error occurred since the last invocation of `oidnGetDeviceError` on the current +thread. Note that the optionally returned error message string is valid only +until the next invocation of the function. + +Alternatively, the application can also register a callback function of type + + typedef void (*OIDNErrorFunction)(void* userPtr, OIDNError code, const char* message); + +via + + void oidnSetDeviceErrorFunction(OIDNDevice device, OIDNErrorFunction func, void* userPtr); + +to get notified when errors occur. Only a single callback function can be +registered per device, and further invocations overwrite the previously set +callback function, which do *not* require also calling the `oidnCommitDevice` +function. Passing `NULL` as function pointer disables the registered callback +function. When the registered callback function is invoked, it gets passed the +user-defined payload (`userPtr` argument as specified at registration time), +the error code (`code` argument) of the occurred error, as well as a string +(`message` argument) that further describes the error. The error code is always +set even if an error callback function is registered. It is recommended to +always set a error callback function, to detect all errors. + +When the device construction fails, `oidnNewDevice` returns `NULL` as device. +To detect the error code of a such failed device construction, pass `NULL` as +device to the `oidnGetDeviceError` function. For all other invocations of +`oidnGetDeviceError`, a proper device handle must be specified. + +The following errors are currently used by Open Image Denoise: + +------------------------------- ----------------------------------------------- +Name Description +------------------------------- ----------------------------------------------- +OIDN_ERROR_NONE no error occurred + +OIDN_ERROR_UNKNOWN an unknown error occurred + +OIDN_ERROR_INVALID_ARGUMENT an invalid argument was specified + +OIDN_ERROR_INVALID_OPERATION the operation is not allowed + +OIDN_ERROR_OUT_OF_MEMORY not enough memory to execute the operation + +OIDN_ERROR_UNSUPPORTED_HARDWARE the hardware (e.g., CPU) is not supported + +OIDN_ERROR_CANCELLED the operation was cancelled by the user +------------------------------- ------------------------------------------------ +: Possible error codes, i.e., valid constants of type `OIDNError`. + + +Buffer +------ + +Large data like images can be passed to Open Image Denoise either via pointers +to memory allocated and managed by the user (this is the recommended, often +easier and more efficient approach, if supported by the device) or by creating +buffer objects (supported by all devices). To create a new data buffer with +memory allocated and owned by the device, holding `byteSize` number of bytes, +use + + OIDNBuffer oidnNewBuffer(OIDNDevice device, size_t byteSize); + +The created buffer is bound to the specified device (`device` argument). The +specified number of bytes are allocated at buffer construction time and +deallocated when the buffer is destroyed. + +It is also possible to create a "shared" data buffer with memory allocated and +managed by the user with + + OIDNBuffer oidnNewSharedBuffer(OIDNDevice device, void* ptr, size_t byteSize); + +where `ptr` points to the user-managed memory and `byteSize` is its size in +bytes. At buffer construction time no buffer data is allocated, but the buffer +data provided by the user is used. The buffer data must remain valid for as +long as the buffer may be used, and the user is responsible to free the buffer +data when no longer required. + +Similar to device objects, buffer objects are also reference-counted and can be +retained and released by calling the following functions: + + void oidnRetainBuffer(OIDNBuffer buffer); + void oidnReleaseBuffer(OIDNBuffer buffer); + +Accessing the data stored in a buffer object is possible by mapping it into the +address space of the application using + + void* oidnMapBuffer(OIDNBuffer buffer, OIDNAccess access, size_t byteOffset, size_t byteSize) + +where `access` is the desired access mode of the mapped memory, `byteOffset` is +the offset to the beginning of the mapped memory region in bytes, and +`byteSize` is the number of bytes to map. The function returns a pointer to +the mapped buffer data. If the specified `byteSize` is 0, the maximum +available amount of memory will be mapped. The `access` argument must be one of +the access modes in the following table: + +Name Description +------------------------- ----------------------------------------------------- +OIDN_ACCESS_READ read-only access +OIDN_ACCESS_WRITE write-only access +OIDN_ACCESS_READ_WRITE read and write access +OIDN_ACCESS_WRITE_DISCARD write-only access but the previous contents will be discarded +------------------------- ----------------------------------------------------- +: Access modes for memory regions mapped with `oidnMapBuffer`, i.e., valid + constants of type `OIDNAccess`. + +After accessing the mapped data in the buffer, the memory region must be +unmapped with + + void oidnUnmapBuffer(OIDNBuffer buffer, void* mappedPtr); + +where `mappedPtr` must be a pointer returned by a call to `oidnMapBuffer` for +the specified buffer. Any change to the mapped data is guaranteed to take +effect only after unmapping the memory region. + +### Data Format + +Buffers store opaque data and thus have no information about the type and +format of the data. Other objects, e.g. filters, typically require specifying +the format of the data stored in buffers or shared via pointers. This can be +done using the `OIDNFormat` enumeration type: + +Name Description +---------------------- -------------------------------------------------------- +OIDN_FORMAT_UNDEFINED undefined format +OIDN_FORMAT_FLOAT 32-bit single-precision floating point scalar +OIDN_FORMAT_FLOAT[234] ... and [234]-element vector +---------------------- -------------------------------------------------------- +: Supported data formats, i.e., valid constants of type `OIDNFormat`. + + +Filter +------ + +Filters are the main objects in Open Image Denoise that are responsible for the +actual denoising. The library ships with a collection of filters which are +optimized for different types of images and use cases. To create a filter +object, call + + OIDNFilter oidnNewFilter(OIDNDevice device, const char* type); + +where `type` is the name of the filter type to create. The supported filter +types are documented later in this section. Once created, filter objects can be +retained and released with + + void oidnRetainFilter(OIDNFilter filter); + void oidnReleaseFilter(OIDNFilter filter); + +After creating a filter, it needs to be set up by specifying the input and +output image buffers, and potentially setting other parameter values as well. + +To bind image buffers to the filter, you can use one of the following functions: + + void oidnSetFilterImage(OIDNFilter filter, const char* name, + OIDNBuffer buffer, OIDNFormat format, + size_t width, size_t height, + size_t byteOffset, + size_t bytePixelStride, size_t byteRowStride); + + void oidnSetSharedFilterImage(OIDNFilter filter, const char* name, + void* ptr, OIDNFormat format, + size_t width, size_t height, + size_t byteOffset, + size_t bytePixelStride, size_t byteRowStride); + +It is possible to specify either a data buffer object (`buffer` argument) with +the `oidnSetFilterImage` function, or directly a pointer to shared user-managed +data (`ptr` argument) with the `oidnSetSharedFilterImage` function. + +In both cases, you must also specify the name of the image parameter to set +(`name` argument, e.g. `"color"`, `"output"`), the pixel format (`format` +argument), the width and height of the image in number of pixels (`width` and +`height` arguments), the starting offset of the image data (`byteOffset` +argument), the pixel stride (`bytePixelStride` argument) and the row stride +(`byteRowStride` argument), in number of bytes. Note that the row stride must +be an integer multiple of the pixel stride. + +If the pixels and/or rows are stored contiguously (tightly packed without any +gaps), you can set `bytePixelStride` and/or `byteRowStride` to 0 to let the +library compute the actual strides automatically, as a convenience. + +Filters may have parameters other than buffers as well, which you can set and +get using the following functions: + + void oidnSetFilter1b(OIDNFilter filter, const char* name, bool value); + void oidnSetFilter1i(OIDNFilter filter, const char* name, int value); + bool oidnGetFilter1b(OIDNFilter filter, const char* name); + int oidnGetFilter1i(OIDNFilter filter, const char* name); + +Filters support a progress monitor callback mechanism that can be used to report +progress of filter operations and to cancel them as well. Calling +`oidnSetFilterProgressMonitorFunction` registers a progress monitor callback +function (`func` argument) with payload (`userPtr` argument) for the specified +filter (`filter` argument): + + typedef bool (*OIDNProgressMonitorFunction)(void* userPtr, double n); + + void oidnSetFilterProgressMonitorFunction(OIDNFilter filter, + OIDNProgressMonitorFunction func, + void* userPtr); + +Only a single callback function can be registered per filter, and further +invocations overwrite the previously set callback function. Passing `NULL` as +function pointer disables the registered callback function. Once registered, +Open Image Denoise will invoke the callback function multiple times during +filter operations, by passing the payload as set at registration time +(`userPtr` argument), and a `double` in the range [0, 1] which estimates the +progress of the operation (`n` argument). When returning `true` from the +callback function, Open Image Denoise will continue the filter operation +normally. When returning `false`, the library will cancel the filter operation +with the `OIDN_ERROR_CANCELLED` error code. + +After setting all necessary parameters for the filter, the changes must be +commmitted by calling + + void oidnCommitFilter(OIDNFilter filter); + +The parameters can be updated after committing the filter, but it must be +re-committed for the changes to take effect. + +Finally, an image can be filtered by executing the filter with + + void oidnExecuteFilter(OIDNFilter filter); + +which will read the input image data from the specified buffers and produce the +denoised output image. + +In the following we describe the different filters that are currently +implemented in Open Image Denoise. + +### RT + +The `RT` (**r**ay **t**racing) filter is a generic ray tracing denoising filter +which is suitable for denoising images rendered with Monte Carlo ray tracing +methods like unidirectional and bidirectional path tracing. It supports depth +of field and motion blur as well, but it is *not* temporally stable. The filter +is based on a deep learning based denoising algorithm, and it aims to provide a +good balance between denoising performance and quality for a wide range of +samples per pixel. + +It accepts either a low dynamic range (LDR) or high dynamic range (HDR) color +image as input. Optionally, it also accepts auxiliary *feature* images, e.g. +albedo and normal, which improve the denoising quality, preserving more details +in the image. + +The `RT` filter has certain limitations regarding the supported input images. +Most notably, it cannot denoise images that were not rendered with ray tracing. +Another important limitation is related to anti-aliasing filters. Most +renderers use a high-quality pixel reconstruction filter instead of a trivial +box filter to minimize aliasing artifacts (e.g. Gaussian, Blackman-Harris). The +`RT` filter does support such pixel filters but only if implemented with +importance sampling. Weighted pixel sampling (sometimes called *splatting*) +introduces correlation between neighboring pixels, which causes the denoising +to fail (the noise will not be filtered), thus it is not supported. + +The filter can be created by passing `"RT"` to the `oidnNewFilter` function +as the filter type. The filter supports the following parameters: + +--------- -------- ----------- -------- --------------------------------------- +Type Format Name Default Description +--------- -------- ----------- -------- --------------------------------------- +Image float3 color input color image (LDR values in [0, 1] + or HDR values in [0, +∞)) + +Image float3 albedo input feature image containing the + albedo (values in [0, 1]) of the first + hit per pixel; *optional* + +Image float3 normal input feature image containing the + shading normal (world-space or + view-space, arbitrary length, values in + (−∞, +∞)) of the first hit per + pixel; *optional*, requires setting the + albedo image too + +Image float3 output output image; can be one of the input + images + +bool hdr false whether the color is HDR + +bool srgb false whether the color is encoded with the + sRGB (or 2.2 gamma) curve (LDR only) or + is linear; the output will be encoded + with the same curve + +int maxMemoryMB 6000 approximate maximum amount of memory to + use in megabytes (actual memory usage + may be higher); limiting memory usage + may cause slower denoising due to + internally splitting the image into + overlapping tiles, but cannot cause the + denoising to fail + +const int alignment when manually denoising the image in + tiles, the tile size and offsets should + be multiples of this amount of pixels + to avoid artifacts; note that manual + tiled denoising is supported *only* for + LDR images + +const int overlap when manually denoising the image in + tiles, the tiles should overlap by this + amount of pixels + +--------- -------- ----------- -------- --------------------------------------- +: Parameters supported by the `RT` filter. + +All specified images must have the same dimensions. + +![Example noisy color image rendered using unidirectional path tracing (512 +spp). *Scene by Evermotion.*][imgMazdaColor] + +![Example output image denoised using color and auxiliary feature images +(albedo and normal).][imgMazdaDenoised] + +Using auxiliary feature images like albedo and normal helps preserving fine +details and textures in the image thus can significantly improve denoising +quality. These images should typically contain feature values for the first +hit (i.e. the surface which is directly visible) per pixel. This works well for +most surfaces but does not provide any benefits for reflections and objects +visible through transparent surfaces (compared to just using the color as +input). However, in certain cases this issue can be fixed by storing feature +values for a subsequent hit (i.e. the reflection and/or refraction) instead of +the first hit. For example, it usually works well to follow perfect specular +(*delta*) paths and store features for the first diffuse or glossy surface hit +instead (e.g. for perfect specular dielectrics and mirrors). This can greatly +improve the quality of reflections and transmission. We will describe this +approach in more detail in the following subsections. + +The auxiliary feature images should be as noise-free as possible. It is not a +strict requirement but too much noise in the feature images may cause residual +noise in the output. Also, all feature images should use the same pixel +reconstruction filter as the color image. Using a properly anti-aliased color +image but aliased albedo or normal images will likely introduce artifacts +around edges. + +#### Albedo + +The albedo image is the feature image that usually provides the biggest quality +improvement. It should contain the approximate color of the surfaces +independent of illumination and viewing angle. + +For simple matte surfaces this means using the diffuse color/texture as the +albedo. For other, more complex surfaces it is not always obvious what is +the best way to compute the albedo, but the denoising filter is flexibile to +a certain extent and works well with differently computed albedos. Thus it is +not necessary to compute the strict, exact albedo values but must be always +between 0 and 1. + +![Example albedo image obtained using the first hit. Note that the albedos of +all transparent surfaces are 1.][imgMazdaAlbedoFirstHit] + +![Example albedo image obtained using the first diffuse or glossy (non-delta) +hit. Note that the albedos of perfect specular (delta) transparent surfaces +are computed as the Fresnel blend of the reflected and transmitted +albedos.][imgMazdaAlbedoNonDeltaHit] + +For metallic surfaces the albedo should be either the reflectivity at normal +incidence (e.g. from the artist friendly metallic Fresnel model) or the +average reflectivity; or if these are constant (not textured) or unknown, the +albedo can be simply 1 as well. + +The albedo for dielectric surfaces (e.g. glass) should be either 1 or, if the +surface is perfect specular (i.e. has a delta BSDF), the Fresnel blend of the +reflected and transmitted albedos (as previously discussed). The latter usually +works better but *only* if it does not introduce too much additional noise due to +random sampling. Thus we recommend to split the path into a reflected and a +transmitted path at the first hit, and perhaps fall back to an albedo of 1 for +subsequent dielectric hits, to avoid noise. The reflected albedo in itself can +be used for mirror-like surfaces as well. + +The albedo for layered surfaces can be computed as the weighted sum of the +albedos of the individual layers. Non-absorbing clear coat layers can be simply +ignored (or the albedo of the perfect specular reflection can be used as well) +but absorption should be taken into account. + +#### Normal + +The normal image should contain the shading normals of the surfaces either in +world-space or view-space. It is recommended to include normal maps to +preserve as much detail as possible. + +Just like any other input image, the normal image should be anti-aliased (i.e. +by accumulating the normalized normals per pixel). The final accumulated +normals do not have to be normalized but must be in a range symmetric about 0 +(i.e. normals mapped to [0, 1] are *not* acceptable and must be remapped to +e.g. [−1, 1]). + +Similar to the albedo, the normal can be stored for either the first +or a subsequent hit (if the first hit has a perfect specular/delta BSDF). + +![Example normal image obtained using the first hit (the values are actually +in [−1, 1] but were mapped to [0, 1] for illustration +purposes).][imgMazdaNormalFirstHit] + +![Example normal image obtained using the first diffuse or glossy (non-delta) +hit. Note that the normals of perfect specular (delta) transparent surfaces +are computed as the Fresnel blend of the reflected and transmitted +normals.][imgMazdaNormalNonDeltaHit] diff --git a/oidn/doc/compilation.md b/oidn/doc/compilation.md new file mode 100644 index 0000000..acd0424 --- /dev/null +++ b/oidn/doc/compilation.md @@ -0,0 +1,149 @@ +Building Open Image Denoise from Source +======================================= + +The latest Open Image Denoise sources are always available at the +[Open Image Denoise GitHub repository](http://github.com/OpenImageDenoise/oidn). +The default `master` branch should always point to the latest tested bugfix +release. + +Prerequisites +------------- + +Open Image Denoise currently supports 64-bit Linux, Windows, and macOS +operating systems. In addition, before you can build Open Image Denoise +you need the following prerequisites: + +- You can clone the latest Open Image Denoise sources via: + + git clone --recursive https://github.com/OpenImageDenoise/oidn.git + +- To build Open Image Denoise you need [CMake](http://www.cmake.org) 3.1 or + later, a C++11 compiler (we recommend using Clang, but also support GCC, + Microsoft Visual Studio 2015 or later, and + [Intel® C++ Compiler](https://software.intel.com/en-us/c-compilers) 17.0 or + later), and Python 2.7 or later. +- Additionally you require a copy of [Intel® Threading Building + Blocks](https://www.threadingbuildingblocks.org/) (TBB) 2017 or later. + +Depending on your Linux distribution you can install these dependencies +using `yum` or `apt-get`. Some of these packages might already be installed or +might have slightly different names. + +Type the following to install the dependencies using `yum`: + + sudo yum install cmake + sudo yum install tbb-devel + +Type the following to install the dependencies using `apt-get`: + + sudo apt-get install cmake-curses-gui + sudo apt-get install libtbb-dev + +Under macOS these dependencies can be installed using +[MacPorts](http://www.macports.org/): + + sudo port install cmake tbb + +Under Windows please directly use the appropriate installers or packages for +[CMake](https://cmake.org/download/), +[Python](https://www.python.org/downloads/), +and [TBB](https://github.com/01org/tbb/releases). + + +Compiling Open Image Denoise on Linux/macOS +------------------------------------------- + +Assuming the above prerequisites are all fulfilled, building Open Image Denoise +through CMake is easy: + +- Create a build directory, and go into it + + mkdir oidn/build + cd oidn/build + + (We do recommend having separate build directories for different + configurations such as release, debug, etc.). + +- The compiler CMake will use by default will be whatever the `CC` and + `CXX` environment variables point to. Should you want to specify a + different compiler, run cmake manually while specifying the desired + compiler. The default compiler on most Linux machines is `gcc`, but + it can be pointed to `clang` instead by executing the following: + + cmake -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_C_COMPILER=clang .. + + CMake will now use Clang instead of GCC. If you are OK with using + the default compiler on your system, then simply skip this step. + Note that the compiler variables cannot be changed after the first + `cmake` or `ccmake` run. + +- Open the CMake configuration dialog + + ccmake .. + +- Make sure to properly set the build mode and enable the components you + need, etc.; then type 'c'onfigure and 'g'enerate. When back on the + command prompt, build it using + + make + +- You should now have `libOpenImageDenoise.so` as well as a set of example + applications. + + +Compiling Open Image Denoise on Windows +--------------------------------------- + +On Windows using the CMake GUI (`cmake-gui.exe`) is the most convenient way to +configure Open Image Denoise and to create the Visual Studio solution files: + +- Browse to the Open Image Denoise sources and specify a build directory (if + it does not exist yet CMake will create it). + +- Click "Configure" and select as generator the Visual Studio version you + have (Open Image Denoise needs Visual Studio 14 2015 or newer), for Win64 + (32-bit builds are not supported), e.g., "Visual Studio 15 2017 Win64". + +- If the configuration fails because some dependencies could not be found + then follow the instructions given in the error message, e.g., set the + variable `TBB_ROOT` to the folder where TBB was installed. + +- Optionally change the default build options, and then click "Generate" to + create the solution and project files in the build directory. + +- Open the generated `OpenImageDenoise.sln` in Visual Studio, select the + build configuration and compile the project. + + +Alternatively, Open Image Denoise can also be built without any GUI, entirely on the +console. In the Visual Studio command prompt type: + + cd path\to\oidn + mkdir build + cd build + cmake -G "Visual Studio 15 2017 Win64" [-D VARIABLE=value] .. + cmake --build . --config Release + +Use `-D` to set variables for CMake, e.g., the path to TBB with "`-D +TBB_ROOT=\path\to\tbb`". + + +CMake Configuration +------------------- + +The default CMake configuration in the configuration dialog should be appropriate +for most usages. The following list describes the options that can be configured +in CMake: + +- `CMAKE_BUILD_TYPE`: Can be used to switch between Debug mode + (Debug), Release mode (Release) (default), and Release mode with + enabled assertions and debug symbols (RelWithDebInfo). + +- `OIDN_STATIC_LIB`: Builds Open Image Denoise as a static library (OFF by + default). CMake 3.13.0 or later is required to enable this option. When using + the statically compiled Open Image Denoise library, you either have to use + the generated CMake configuration files (recommended), or you have to + manually define `OIDN_STATIC_LIB` before including the library headers in your + application. + +- `TBB_ROOT`: The path to the TBB installation (autodetected by default). diff --git a/oidn/doc/documentation.md b/oidn/doc/documentation.md new file mode 100644 index 0000000..5f1ea4c --- /dev/null +++ b/oidn/doc/documentation.md @@ -0,0 +1,7 @@ +Documentation +============= + +The following [API documentation][OIDNReadme] of Open Image Denoise can also be +found as a [pdf document][OIDNReadme]. + + diff --git a/oidn/doc/downloads.md b/oidn/doc/downloads.md new file mode 100644 index 0000000..eedbc38 --- /dev/null +++ b/oidn/doc/downloads.md @@ -0,0 +1,31 @@ +Download Precompiled Open Image Denoise Binary Packages +======================================================= + +Prerequisites +------------- + +Your CPU must support at least SSE4.1 to run Open Image Denoise, and you need +a 64-bit operating system as well. The TGZ/ZIP packages contain most needed +3rd party dependencies. + +Packages +-------- + +For Linux we provide Open Image Denoise precompiled for 64-bit as a TGZ file: + +[oidn-.x86_64.linux.tar.gz](https://github.com/OpenImageDenoise/oidn/releases/download/v/oidn-.x86_64.linux.tar.gz) + +For macOS we provide Open Image Denoise as a TGZ file: + +[oidn-.x86_64.macos.tar.gz](https://github.com/OpenImageDenoise/oidn/releases/download/v/oidn-.x86_64.macos.tar.gz) + +For Windows we provide Open Image Denoise binaries precompiled for 64-bit as a ZIP archive: + +[oidn-.x64.vc14.windows.zip](https://github.com/OpenImageDenoise/oidn/releases/download/v/oidn-.x64.vc14.windows.zip) + +The source code of the latest Open Image Denoise version can be downloaded here: + +[oidn-.src.zip](https://github.com/OpenImageDenoise/oidn/releases/download/v/oidn-.src.zip) +[oidn-.src.tar.gz](https://github.com/OpenImageDenoise/oidn/releases/download/v/oidn-.src.tar.gz) + +You can also access [old Open Image Denoise releases](https://github.com/OpenImageDenoise/oidn/releases). diff --git a/oidn/doc/examples.md b/oidn/doc/examples.md new file mode 100644 index 0000000..60c7ddd --- /dev/null +++ b/oidn/doc/examples.md @@ -0,0 +1,19 @@ +Examples +======== + +Denoise +------- + +A minimal working example demonstrating how to use Open Image Denoise can be +found at `examples/denoise.cpp`, which uses the C++11 convenience wrappers of +the C99 API. + +This example is a simple command-line application that denoises the provided +image, which can optionally have auxiliary feature images as well (e.g. albedo +and normal). The images must be stored in the [Portable +FloatMap](http://www.pauldebevec.com/Research/HDR/PFM/) (PFM) format, and the +color values must be encoded in little-endian format. + +Running `./denoise` without any arguments will bring up a list of command line +options. + diff --git a/oidn/doc/filter-latex.py b/oidn/doc/filter-latex.py new file mode 100644 index 0000000..5072f9d --- /dev/null +++ b/oidn/doc/filter-latex.py @@ -0,0 +1,73 @@ +# 1. convert tables to use 'tabu' +# 2. always add hypertargets, before headings, to workaround issue #2719 +# Based on Wagner Macedo's filter.py posted at +# https://groups.google.com/forum/#!msg/pandoc-discuss/RUC-tuu_qf0/h-H3RRVt1coJ +import pandocfilters as pf + +def latex(s): + return pf.RawBlock('latex', s) + +def inlatex(s): + return pf.RawInline('latex', s) + +def tbl_caption(s): + return pf.Para([inlatex(r'\caption{')] + s + [inlatex(r'}')]) + +def tbl_alignment(a, w): + aligns = { + "AlignDefault": 'l', + "AlignLeft": 'l', + "AlignCenter": 'c', + "AlignRight": 'r', + } + s = ''; + for i in range(len(a)): + s += 'X[%.3f,' % -w[i] + aligns[a[i]['t']] + ']' + return s; + +def tbl_headers(s): + result = s[0][0]['c'][:] + for i in range(1, len(s)): + result.append(inlatex(' & ')) + result.extend(s[i][0]['c']) + result.append(inlatex(r'\\' '\n')) + return pf.Para(result) + +def tbl_contents(s): + result = [] + for row in s: + para = [] + for col in row: + if col: + para.extend(col[0]['c']) + para.append(inlatex(' & ')) + result.extend(para) + result[-1] = inlatex(r'\\' '\n') + return pf.Para(result) + +def do_filter(k, v, f, m): + if k == "Table": + w = v[2] + if sum(w) == 0: + w = [1 for e in w] + wd = '' + ha = r'\centering' + else: + wd = '*' + ha = r'\raggedright' + return [latex(r'\begin{table'+wd+'}[!h]'), + tbl_caption(v[0]), + latex(ha), + latex(r'\begin{tabu} spread 0pt {' + tbl_alignment(v[1], w) + '}'), + latex(r'\toprule'), + tbl_headers(v[3]), + latex(r'\midrule'), + tbl_contents(v[4]), + latex(r'\bottomrule' '\n' r'\end{tabu}'), + latex(r'\end{table'+wd+'}')] + if k == "Header": + return [latex(r'\hypertarget{' + v[1][0] + r'}{}'), + pf.Header(v[0], v[1], v[2])] + +if __name__ == "__main__": + pf.toJSONFilter(do_filter) diff --git a/oidn/doc/gallery.md b/oidn/doc/gallery.md new file mode 100644 index 0000000..4fdf835 --- /dev/null +++ b/oidn/doc/gallery.md @@ -0,0 +1,96 @@ +Open Image Denoise Gallery +========================== + +This page contains a few sample screenshots of different renderings denoised +with Open Image Denoise, using the color, albedo, and normal buffers as inputs. +The original noisy images are also shown. Hover over an image (or tap on it if +you have a touchscreen) to move the slider between the original and denoised +versions. + +If *you* have created any notable images using Open Image Denoise and would +like to share them on this page, please [send us an +email](mailto:openimagedenoise@googlegroups.com). + +Moana Island Scene +------------------ + +Rendered at 16 spp with [Intel® OSPRay](http://www.ospray.org): + +
+Denoised +
Original
+
+ +*[Publicly available](https://www.technology.disneyanimation.com/islandscene) dataset courtesy of Walt Disney Animation Studios.* + +Amazon Lumberyard Bistro +------------------------ + +Rendered at 64 spp: + +
+Denoised +
Original
+
+ +
+Denoised +
Original
+
+ +*Scene created by Amazon Lumberyard, released publicly in the NVIDIA [Open Research Content Archive +(ORCA)](http://developer.nvidia.com/orca/amazon-lumberyard-bistro) collection, downloaded from +[Morgan McGuire's Computer Graphics Archive](https://casual-effects.com/data).* + +Crytek Sponza +------------- + +Rendered at 16 spp: + +
+Denoised +
Original
+
+ +*Scene courtesy of Frank Meinl, downloaded from [Morgan McGuire's Computer Graphics Archive](https://casual-effects.com/data).* + +Mazda +----- + +Rendered at 64 spp: + +
+Denoised +
Original
+
+ +*Scene by Evermotion.* + +Villa +----- + +Rendered at 64 spp: + +
+Denoised +
Original
+
+ +
+Denoised +
Original
+
+ +*Scene by Evermotion.* + +Art Deco +-------- + +Rendered at 2048 spp: + +
+Denoised +
Original
+
+ +*Scene by Evermotion.* diff --git a/oidn/doc/images.md b/oidn/doc/images.md new file mode 100644 index 0000000..63f1dcf --- /dev/null +++ b/oidn/doc/images.md @@ -0,0 +1,6 @@ +[imgMazdaColor]: mazda_512spp_color.jpg { width=90% } +[imgMazdaDenoised]: mazda_512spp_oidn.jpg { width=90% } +[imgMazdaAlbedoFirstHit]: mazda_512spp_albedo_firsthit.jpg { width=90% } +[imgMazdaAlbedoNonDeltaHit]: mazda_512spp_albedo_nondeltahit.jpg { width=90% } +[imgMazdaNormalFirstHit]: mazda_512spp_normal_firsthit.jpg { width=90% } +[imgMazdaNormalNonDeltaHit]: mazda_512spp_normal_nondeltahit.jpg { width=90% } diff --git a/oidn/doc/legal.md b/oidn/doc/legal.md new file mode 100644 index 0000000..8df4422 --- /dev/null +++ b/oidn/doc/legal.md @@ -0,0 +1,25 @@ +Disclaimer and Legal Information +================================ + +© 2018-2019 Intel Corporation + +[Privacy Notice](https://www.intel.com/privacy) + +Intel, the Intel logo, Xeon, Intel Xeon Phi, and Intel Core are +trademarks of Intel Corporation in the U.S. and/or other countries. +*Other names and brands may be claimed as the property of others. + + +Optimization Notice: Intel's compilers may or may not optimize to the +same degree for non-Intel microprocessors for optimizations that are not +unique to Intel microprocessors. These optimizations include SSE2, SSE3, +and SSSE3 instruction sets and other optimizations. Intel does not +guarantee the availability, functionality, or effectiveness of any +optimization on microprocessors not manufactured by Intel. +Microprocessor-dependent optimizations in this product are intended for +use with Intel microprocessors. Certain optimizations not specific to +Intel microarchitecture are reserved for Intel microprocessors. Please +refer to the applicable product User and Reference Guides for more +information regarding the specific instruction sets covered by this +notice. +Notice Revision #20110804 diff --git a/oidn/doc/links.md b/oidn/doc/links.md new file mode 100644 index 0000000..1821003 --- /dev/null +++ b/oidn/doc/links.md @@ -0,0 +1,5 @@ + +[news/updates]: https://openimagedenoise.github.io/news.html +[getting OIDN]: https://openimagedenoise.github.io/downloads.html +[OIDNReadme]: https://github.com/OpenImageDenoise/oidn/blob/master/readme.pdf "Open Image Denoise Documentation" + diff --git a/oidn/doc/news.md b/oidn/doc/news.md new file mode 100644 index 0000000..0d2301d --- /dev/null +++ b/oidn/doc/news.md @@ -0,0 +1,26 @@ +News, Updates, and Announcements +================================ + +May 9, 2019: Version v0.9.0 now released on GitHub +-------------------------------------------------- + +New release version 0.9.0 is now available on the [Open Image Denoise +GitHub page](https://github.com/OpenImageDenoise/oidn/releases/v0.9.0). + +Mar 25, 2019: Version v0.8.2 now released on GitHub +--------------------------------------------------- + +New release version 0.8.2 is now available on the [Open Image Denoise +GitHub page](https://github.com/OpenImageDenoise/oidn/releases/v0.8.2). + +Feb 3, 2019: Version v0.8.1 now released on GitHub +-------------------------------------------------- + +New release version 0.8.1 is now available on the [Open Image Denoise +GitHub page](https://github.com/OpenImageDenoise/oidn/releases/v0.8.1). + +Jan 29, 2019: Version v0.8.0 now released on GitHub +--------------------------------------------------- + +Initial beta release version 0.8.0 is now available on the [Open Image Denoise +GitHub page](https://github.com/OpenImageDenoise/oidn/releases/v0.8.0). diff --git a/oidn/doc/overview.md b/oidn/doc/overview.md new file mode 100644 index 0000000..abb88f3 --- /dev/null +++ b/oidn/doc/overview.md @@ -0,0 +1,59 @@ +Open Image Denoise Overview +=========================== + +Intel® Open Image Denoise is an open source library of high-performance, +high-quality denoising filters for images rendered with ray tracing. +Open Image Denoise is part of the +[Intel Rendering Framework](https://software.intel.com/en-us/rendering-framework) +and is released under the permissive +[Apache 2.0 license](http://www.apache.org/licenses/LICENSE-2.0). + +The purpose of Open Image Denoise is to provide an open, high-quality, +efficient, and easy-to-use denoising library that allows one to significantly +reduce rendering times in ray tracing based rendering applications. It filters +out the Monte Carlo noise inherent to stochastic ray tracing methods like path +tracing, reducing the amount of necessary samples per pixel by even multiple +orders of magnitude (depending on the desired closeness to the ground truth). +A simple but flexible C/C++ API ensures that the library can be easily +integrated into most existing or new rendering solutions. + +At the heart of the Open Image Denoise library is an efficient deep learning +based denoising filter, which was trained to handle a wide range of samples per +pixel (spp), from 1 spp to almost fully converged. Thus it is suitable for both +preview and final-frame rendering. The filters can denoise images either using +only the noisy color (*beauty*) buffer, or, to preserve as much detail as +possible, can optionally utilize auxiliary feature buffers as well (e.g. +albedo, normal). Such buffers are supported by most renderers as arbitrary +output variables (AOVs) or can be usually implemented with little effort. + +Open Image Denoise supports Intel® 64 architecture based CPUs and compatible +architectures, and runs on anything from laptops, to workstations, to compute +nodes in HPC systems. It is efficient enough to be suitable not only for +offline rendering, but, depending on the hardware used, also for interactive +ray tracing. + +Open Image Denoise internally builds on top of +[Intel® Math Kernel Library for Deep Neural Networks (MKL-DNN)](https://github.com/intel/mkl-dnn), +and automatically exploits modern instruction sets like Intel SSE4, AVX2, and +AVX-512 to achieve high denoising performance. A CPU with support for at least +SSE4.1 is required to run Open Image Denoise. + + +Support and Contact +------------------- + +Open Image Denoise is under active development, and though we do our best to +guarantee stable release versions a certain number of bugs, as-yet-missing +features, inconsistencies, or any other issues are still possible. Should you +find any such issues please report them immediately via the +[Open Image Denoise GitHub Issue Tracker](https://github.com/OpenImageDenoise/oidn/issues) +(or, if you should happen to have a fix for it, you can also send us a pull +request); for missing features please contact us via email at +. + +For recent news, updates, and announcements, please see our complete +[news/updates] page. + +Join our [mailing list](https://groups.google.com/d/forum/openimagedenoise/) to +receive release announcements and major news regarding Open Image Denoise. + diff --git a/oidn/doc/preamble.tex b/oidn/doc/preamble.tex new file mode 100644 index 0000000..f5bf147 --- /dev/null +++ b/oidn/doc/preamble.tex @@ -0,0 +1,191 @@ +\usepackage{polyglossia} +\setdefaultlanguage{english} + +\usepackage{amssymb,amsmath} +\usepackage{nicefrac} +\usepackage{ifxetex,ifluatex} +\ifnum 0\ifxetex 1\fi\ifluatex 1\fi=0 % if pdftex + \usepackage[T1]{fontenc} + \usepackage[utf8]{inputenc} +\else % if luatex or xelatex + \ifxetex +% \usepackage{mathspec} + \usepackage[no-sscript]{xltxtra} + \usepackage{xunicode} + \else + \usepackage{fontspec} + \fi + \defaultfontfeatures{Mapping=tex-text,Scale=MatchLowercase} + \newcommand{\euro}{€} +\fi +% use upquote if available, for straight quotes in verbatim environments +\IfFileExists{upquote.sty}{\usepackage{upquote}}{} +% use microtype if available +\IfFileExists{microtype.sty}{% +\usepackage{microtype} +\UseMicrotypeSet[protrusion]{basicmath} % disable protrusion for tt fonts +}{} +\PassOptionsToPackage{hyphens}{url} % url is loaded by hyperref +\usepackage{tabu,booktabs} +\tabulinesep=3pt + +\usepackage{graphicx} +\usepackage{color} +\usepackage{fancyvrb} +\newcommand{\VerbBar}{|} +\newcommand{\VERB}{\Verb[commandchars=\\\{\}]} +\DefineVerbatimEnvironment{Highlighting}{Verbatim}{commandchars=\\\{\}} +% fix issue with linebreaks and letter spacing in non-cpp blocks +\DefineVerbatimEnvironment{verbatim}{Verbatim}{} +% Add ',fontsize=\small' for more characters per line +\newenvironment{Shaded}{}{} +\newcommand{\KeywordTok}[1]{\textcolor[rgb]{0.00,0.44,0.13}{\textbf{#1}}} +\newcommand{\DataTypeTok}[1]{\textcolor[rgb]{0.56,0.13,0.00}{#1}} +\newcommand{\DecValTok}[1]{\textcolor[rgb]{0.25,0.63,0.44}{#1}} +\newcommand{\BaseNTok}[1]{\textcolor[rgb]{0.25,0.63,0.44}{#1}} +\newcommand{\FloatTok}[1]{\textcolor[rgb]{0.25,0.63,0.44}{#1}} +\newcommand{\ConstantTok}[1]{\textcolor[rgb]{0.53,0.00,0.00}{#1}} +\newcommand{\CharTok}[1]{\textcolor[rgb]{0.25,0.44,0.63}{#1}} +\newcommand{\SpecialCharTok}[1]{\textcolor[rgb]{0.25,0.44,0.63}{#1}} +\newcommand{\StringTok}[1]{\textcolor[rgb]{0.25,0.44,0.63}{#1}} +\newcommand{\VerbatimStringTok}[1]{\textcolor[rgb]{0.25,0.44,0.63}{#1}} +\newcommand{\SpecialStringTok}[1]{\textcolor[rgb]{0.73,0.40,0.53}{#1}} +\newcommand{\ImportTok}[1]{#1} +\newcommand{\CommentTok}[1]{\textcolor[rgb]{0.38,0.63,0.69}{\textit{#1}}} +\newcommand{\DocumentationTok}[1]{\textcolor[rgb]{0.73,0.13,0.13}{\textit{#1}}} +\newcommand{\AnnotationTok}[1]{\textcolor[rgb]{0.38,0.63,0.69}{\textbf{\textit{#1}}}} +\newcommand{\CommentVarTok}[1]{\textcolor[rgb]{0.38,0.63,0.69}{\textbf{\textit{#1}}}} +\newcommand{\OtherTok}[1]{\textcolor[rgb]{0.00,0.44,0.13}{#1}} +\newcommand{\FunctionTok}[1]{\textcolor[rgb]{0.02,0.16,0.49}{#1}} +\newcommand{\VariableTok}[1]{\textcolor[rgb]{0.10,0.09,0.49}{#1}} +\newcommand{\ControlFlowTok}[1]{\textcolor[rgb]{0.00,0.44,0.13}{\textbf{#1}}} +\newcommand{\OperatorTok}[1]{\textcolor[rgb]{0.40,0.40,0.40}{#1}} +\newcommand{\BuiltInTok}[1]{#1} +\newcommand{\ExtensionTok}[1]{#1} +\newcommand{\PreprocessorTok}[1]{\textcolor[rgb]{0.74,0.48,0.00}{#1}} +\newcommand{\AttributeTok}[1]{\textcolor[rgb]{0.49,0.56,0.16}{#1}} +\newcommand{\RegionMarkerTok}[1]{#1} +\newcommand{\InformationTok}[1]{\textcolor[rgb]{0.38,0.63,0.69}{\textbf{\textit{#1}}}} +\newcommand{\WarningTok}[1]{\textcolor[rgb]{0.38,0.63,0.69}{\textbf{\textit{#1}}}} +\newcommand{\AlertTok}[1]{\textcolor[rgb]{1.00,0.00,0.00}{\textbf{#1}}} +\newcommand{\ErrorTok}[1]{\textcolor[rgb]{1.00,0.00,0.00}{\textbf{#1}}} +\newcommand{\NormalTok}[1]{#1} + +\providecommand{\tightlist}{% + \setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}} + +\makeatletter +\def\maxwidth{\ifdim\Gin@nat@width>\columnwidth\columnwidth\else\Gin@nat@width\fi} +\def\maxheight{\ifdim\Gin@nat@height>\textheight\textheight\else\Gin@nat@height\fi} +\def\fps@figure{htp}% set default figure placement +\makeatother +% Scale images if necessary, so that they will not overflow the page +% margins by default, and it is still possible to overwrite the defaults +% using explicit options in \includegraphics[width, height, ...]{} +\setkeys{Gin}{width=\maxwidth,height=\maxheight,keepaspectratio} + +\ifxetex + \usepackage[setpagesize=false, % page size defined by xetex + unicode=false, % unicode breaks when used with xetex + xetex]{hyperref} +\else + \usepackage[unicode=true]{hyperref} +\fi + +% read version into \oidnversion +\newread\versionfile +\openin\versionfile=tmp/version +\read\versionfile to\oidnversion +\closein\versionfile + +\hypersetup{breaklinks=true, + bookmarks=true, + pdfauthor={Intel Corporation}, + pdftitle={Open Image Denoise \oidnversion}, + colorlinks=true, + citecolor=blue, + urlcolor=blue, + linkcolor=blue, + pdfborder={0 0 0}} + +\copyrightyears{2018--2019} +\trademarkacknowledgment{% +Intel, the Intel logo, Xeon, Intel Xeon Phi, and Intel Core are +trademarks of Intel Corporation in the U.S. and/or other countries. +} +\ftcoptimizationnotice + +% no hyphenation (e.g. for trademarks) +\hyphenation{Intel Xeon} + + +% fix missing unicode chars in used font +\catcode`\⇐\active +\def⇐{\ensuremath{\Leftarrow}} + +\catcode`\⇒\active +\def⇒{\ensuremath{\Rightarrow}} + +\catcode`\←\active +\def←{\ensuremath{\leftarrow}} + +\catcode`\→\active +\def→{\ensuremath{\rightarrow}} + +\catcode`\∞\active +\def∞{\ensuremath{\infty}} + +\catcode`\½\active +\def½{\nicefrac12} + +\catcode`\⅓\active +\def⅓{\nicefrac13} + +\catcode`\⅔\active +\def⅔{\nicefrac23} + +\catcode`\¼\active +\def¼{\nicefrac14} + +\catcode`\¾\active +\def¾{\nicefrac34} + +\catcode`\∙\active +\def∙{\ensuremath{\cdot}} + +% fix overfull hboxes, somehow required for xelatex +% pdflatex and lualatex is fine without +\emergencystretch=0.5em + +\makeatletter% +\newcommand*{\BreakableChar}{% + \leavevmode% + \nobreak\hskip\z@skip% + \discretionary{}{}{}% + \nobreak\hskip\z@skip% +}% +\makeatother + +% enable (more flexible) linebreaks in \texttt +\renewcommand{\texttt}[1]{% +\begingroup% +\protect\renewcommand{\_}{\textunderscore\BreakableChar}% +\ttfamily% +\fontdimen3\font=0.1em% interword stretch +\fontdimen4\font=0.1em% interword shrink +\hyphenchar\font=`\-% to allow hyphenation +\begingroup\lccode`~=`/\lowercase{\endgroup\def~}{/\BreakableChar}% +\catcode`/=\active% +\begingroup\lccode`~=`*\lowercase{\endgroup\def~}{*\BreakableChar}% +\catcode`*=\active% +\begingroup\lccode`~=`?\lowercase{\endgroup\def~}{?\BreakableChar}% +\catcode`?=\active% +\begingroup\lccode`~=`)\lowercase{\endgroup\def~}{)\BreakableChar}% +\catcode`)=\active% +\begingroup\lccode`~=`.\lowercase{\endgroup\def~}{.\BreakableChar}% +\catcode`.=\active% +\begingroup\lccode`~=`;\lowercase{\endgroup\def~}{;\BreakableChar}% +\catcode`;=\active% +\scantokens{#1\noexpand}% +\endgroup% +} diff --git a/oidn/doc/readme.tex b/oidn/doc/readme.tex new file mode 100644 index 0000000..27479c7 --- /dev/null +++ b/oidn/doc/readme.tex @@ -0,0 +1,36 @@ +\IfFileExists{oidn-doc/intel-spec.cls} +{ + \documentclass[oneside]{oidn-doc/intel-spec} +}{ + \documentclass[oneside]{report} + \newcommand{\copyrightyears}[1] {} + \newcommand{\trademarkacknowledgment}[1] {} + \newcommand{\ftcdisclaimer}{} + \newcommand{\ftcoptimizationnotice}{} + \newcommand{\makedisclaimers}{} + \newcommand{\version}[1] { \author{Version ##1} } +} + +\include{preamble} + +\begin{document} +\title{Intel® Open Image Denoise\vskip0.3\baselineskip\LARGE +\noindent High-Performance Denoising Library\\for Ray Tracing} +\version{\oidnversion} + +\maketitle +\tableofcontents + +\input{tmp/overview} +\input{tmp/changelog} +\input{tmp/compilation} +\addtocontents{toc}{\protect\setcounter{tocdepth}{2}} +\hypersetup{bookmarksdepth=2} +\input{tmp/api} +\addtocontents{toc}{\protect\setcounter{tocdepth}{1}} +\hypersetup{bookmarksdepth=1} +\input{tmp/examples} + +\makedisclaimers + +\end{document} diff --git a/oidn/doc/readme_head.md b/oidn/doc/readme_head.md new file mode 100644 index 0000000..3004153 --- /dev/null +++ b/oidn/doc/readme_head.md @@ -0,0 +1,7 @@ +Intel® Open Image Denoise +========================= + +This is release v of Open Image Denoise. For changes and new +features see the [changelog](CHANGELOG.md). Visit +http://www.openimagedenoise.org for more information. + diff --git a/oidn/doc/related_projects.md b/oidn/doc/related_projects.md new file mode 100644 index 0000000..3bc8083 --- /dev/null +++ b/oidn/doc/related_projects.md @@ -0,0 +1,19 @@ +Projects that make use of Open Image Denoise +============================================ + +This page gives a brief (and incomplete) list of other projects that +make use of Open Image Denoise, as well as a set of related links to other +projects and related information. + +If you have a project that makes use of Open Image Denoise and would like this +to be listed here, please let us know. + +- [Intel® OSPRay](http://www.ospray.org), a ray tracing based rendering engine for high-fidelity visualization + + +Projects that are closely related to Open Image Denoise +======================================================= + +- The [Intel® Embree](http://embree.github.io) Ray Tracing Kernel Framework + + diff --git a/oidn/doc/stylesheet.css b/oidn/doc/stylesheet.css new file mode 100644 index 0000000..de87e7a --- /dev/null +++ b/oidn/doc/stylesheet.css @@ -0,0 +1,438 @@ +body { + font-size: 16px; + font-weight: normal; + letter-spacing: normal; + color:#373737; + background: #f2f2f2; + font-family: "Myriad Set Pro", "Helvetica Neue", Helvetica, Arial, sans-serif; + text-rendering: optimizeLegibility; + font-style: normal; + line-height: 1.5; + -webkit-font-smoothing: antialiased; + text-align: justify; + margin: 0; +} + +h1, h2, h3, h4, h5, h6 { + margin: 5px 0; + font-weight: 700; + color: #0071C5; + letter-spacing: normal; + clear: both; +} + +h2 { + background: none; + border-top: 1pt solid #333; + padding-top: 0.5em; +} + +h1 { font-size: 24px; } +h2 { font-size: 20px; } +h3 { font-size: 16px; } +h4 { font-size: 16px; } + +p { + margin: 10px 0 15px 0; +} + +dl dt { font-weight:bold; +} + +code { font-size: 90%; } + +li p { margin: 0 } + +img { + display: block; + margin-left: auto; + margin-right: auto; + padding: 0 2em 1ex 0; +} +figcaption { + color: #666; + text-align: center; +} +div.left { + float: left; + max-width: 250px; + margin: 0; + padding: 0; +} +br { clear:both; } + + +#demo-bullets { font-size: 90%; } + +#footer { + padding-top: 5px; + margin: 0; + text-align: center; + z-index: 10; + background:#212121; + position:fixed; + bottom:0px; + height:22px; + width:100%; + font-size: 12px; + color: #ffffff; +} + +#footer_padding { + margin: 0; + text-align: center; + background:#212121; + bottom:0px; + width:100%; + height:100%; + font-size: 12px; + color: #ffffff; +} + +#footer a:hover { + color:#ff0000; +} +#footer a { + color:#ffffff; +} + +#header { + color: #fff; + position: fixed; + width: 100%; + letter-spacing: -1px; + background: #379; + z-index: 20; + display: block; +} + +#content-wrap { + width:100%; + height:100%; + background: #f2f2f2; + /*padding-left: 100px;*/ + padding-top:100px; +} + +#content { + padding-left: 5px; + padding-right: 0px; + position:static; + max-width: 1024px; + margin-left: auto; + margin-right: auto; + background: #f2f2f2; + padding-top:10px; + padding-bottom:10px; +} + +#header-title { + /*display:inline-block;*/ + margin: 0; + color: #fff; + font-size: 42px; + /*background:#212121; */ + font-weight: 700; + padding: 20px 0px 0px 10px; + text-shadow: #111 0px 0px 10px; + /*padding-left: 100px;*/ + padding-top:10px; + letter-spacing: -1px; + max-width: 1024px; + margin-left: auto; + margin-right: auto; +} + +#header-subtitle { + display:inline-block; + color: #fff; + font-size: 20px; + font-weight: 300; + /*background: none;*/ + text-shadow: #111 0px 0px 10px; + /*padding-top: 10px;*/ + /*padding-left: 100px;*/ + padding-left: 10px; + /*padding-bottom: 10px;*/ + /*background:#212121; */ + letter-spacing: -1px; +} + + + +#header-github { + color: #fff; + font-size: 16px; + font-weight: 300; + background: none; +} + + +#forkme-banner { + display: block; + position: absolute; + top: 0; + right: 10px; + z-index: 30; + padding: 10px 50px 10px 10px; color:#fff; + background: url('images/blacktocat.png') #0090ff no-repeat 95% 50%; + font-weight: 700; + box-shadow: 0 0 10px rgba(0,0,0,0.5); + border-bottom-left-radius: 2px; + border-bottom-right-radius: 2px; + text-decoration: none; +} +#forkme-banner:hover, #forkme-banner:focus { + text-decoration: underline; +} + + +#header-spacing { + color: #fff; + font-size: 16px; + font-weight: 300; + height:5px; + background: #f2f2f2; +} +#footer-spacing { + color: #fff; + font-size: 16px; + font-weight: 300; + height:5px; + background: #f2f2f2; +} + +#header-navbar { + font-family: 'Myriad Pro', Calibri, Helvetica, Arial, sans-serif; + color: #fff; + font-size: 14px; + font-weight: normal; + letter-spacing: normal; + background: #444; + padding-top: 0px; + padding-bottom: 4px; + /*padding-left: 100px;*/ + height:18px; +} +#header-navbar ul { + list-style : none; + margin: 0; + padding: 0; + padding-top: 2px; + max-width: 1024px; + margin-left: auto; + margin-right: auto; +} +#header-navbar ul li { + display: inline; +} +#header-navbar ul li a { + display: block; + float: left; + padding-left: 8px; + padding-right: 8px; + color: #DFDFDF; + text-decoration: none; +} +#header-navbar ul li a:hover { + color: #FAFAFA; +} +#header-navbar ul li#selected a { + color: #fff; +} + +.title +{ + border: 0; padding: 0; margin: 0; + margin-top: 40px; + margin-bottom: 60px; +} + +.title h1 { + color: #333; + font-size: 56px; + font-weight: 300; + text-align: center; + border: 0; padding: 0; margin: 0; + border: 0; + } + +.title h2 { + color: #888; + font-size: 24px; + font-weight: 200; + text-align: center; + border: 0; padding: 0; margin: 0; + border: 0; +} + + +div.feature { + display:inline-block; + width: 100%; + margin-left: 5px; + margin-right: 5px; + margin-top: 10px; + margin-bottom: 20px; +} +div.feature a { + color:#373737; +} +div.feature img { + float: left; + display:inline-block; + width: 360px; + height: 260px; + padding:0; + box-shadow: 4px 4px 18px -5px rgba(0,0,0,0.86); +} +.feature-alt img +{ + float: right !important; +} + +.feature .container { + margin-left: 20px; + margin-right: 20px; + display:inline-block; + max-width: 544px; + color: #666; + font-size: 18px; + font-weight: 200; +} + +div.feature p { + padding-left: 20px; +} + +div.feature h2 { + border: 0; + color:#555; + font-size: 36px; +} + +/* image compare div */ +.img-compare { + display: block; + margin-left: auto; + margin-right: auto; + margin-bottom: 20px; + margin-top: 0px; + padding: 0; + position: relative; +} +.img-compare img { + position: absolute; + margin-left: auto; + height: 100%; + padding: 0; +} +/* left div */ +.img-compare > div { + position: absolute; + bottom: 0px; + left: 0px; + z-index: 1; + border-right: 2px solid #fff; + width: 50%; + height: 100%; + overflow: hidden; + pointer-events: none; +} +/* left label */ +.img-compare > div > span { + position: absolute; + bottom: 8px; + left: 16px; + font-size: 18px; + white-space: nowrap; + color: #fff; + text-shadow: 0px 0px 5px #111; + pointer-events: none; +} +/* right label */ +.img-compare > span { + position: absolute; + bottom: 8px; + right: 16px; + font-size: 18px; + white-space: nowrap; + color: #fff; + text-shadow: 0px 0px 5px #111; + pointer-events: none; +} + +.teaser-img { + margin-top: 60px; + margin-bottom: 60px; +} + +.teaser-features { + margin-top: 250px; + margin-bottom: 150px; +} + +hr { + width: 60%; + display: block; + height: 1px; + margin-left: auto; + margin-right: auto; + margin-top: 0px; + margin-bottom: 0px; + border: 0; + border-top: 1.5px solid #aaa; + padding: 0; +} + +table { + counter-increment: table; + border-collapse: collapse; + border-spacing: 0; + border: 0px solid #373737; + margin-top: 20px; + margin-bottom: 20px; + text-align: left; +} +table > caption:before { + content: 'Table ' counter(table) ': '; + color: #0071C5; +} + +th { + font-family: 'Lucida Grande', 'Helvetica Neue', Helvetica, Arial, sans-serif; + padding: 10px; + background: #0071C5; + color: #fff; +} + +td { + padding: 10px; + border: 0px solid #373737; + color: #222; + background-color: #fff; + vertical-align:top; +} + +div.figure { + clear: both; + counter-increment: figure; + padding: 0; + border: 0; + font-size: 100%; + font: inherit; + vertical-align: baseline; + margin: 0 0 3em 0; + display: inline-block; + width: 100%; +} +p.caption { + margin-top: 3ex; + font-size: 1.1rem; + line-height: 1.6; + text-align: left; +} +p.caption:before { + content: 'Figure ' counter(figure) ': '; + color: #0071C5; +} diff --git a/oidn/doc/teaser.html b/oidn/doc/teaser.html new file mode 100644 index 0000000..a987467 --- /dev/null +++ b/oidn/doc/teaser.html @@ -0,0 +1,12 @@ +
+

Intel® Open Image Denoise

+

High-Performance Denoising Library for Ray Tracing

+
+
+
+ Denoised +
Original
+
+

Moana Island Scene rendered at 16 spp with [Intel® OSPRay](http://www.ospray.org) and denoised with Intel® Open Image Denoise. Publicly available dataset courtesy of Walt Disney Animation Studios. Hover over the image (or tap on it) to move the slider between the original and denoised versions.

+
+ diff --git a/oidn/doc/webtemplate.html b/oidn/doc/webtemplate.html new file mode 100644 index 0000000..e962afb --- /dev/null +++ b/oidn/doc/webtemplate.html @@ -0,0 +1,57 @@ + + + + + Intel Open Image Denoise$if(select_news)$ News$endif$$if(select_demos)$ Demos$endif$$if(select_documentation)$ Documentation$endif$$if(select_gallery)$ Gallery$endif$$if(select_downloads)$ Download$endif$$if(select_related_projects)$ – RelatedProjects$endif$ + +$if(highlighting-css)$ + +$endif$ + + + + + +
+
+ +$body$ + +
+
+ +$if(select_legal)$ +$else$ + +$endif$ + + diff --git a/oidn/examples/CMakeLists.txt b/oidn/examples/CMakeLists.txt new file mode 100644 index 0000000..c17b567 --- /dev/null +++ b/oidn/examples/CMakeLists.txt @@ -0,0 +1,24 @@ +## ======================================================================== ## +## Copyright 2009-2019 Intel Corporation ## +## ## +## Licensed under the Apache License, Version 2.0 (the "License"); ## +## you may not use this file except in compliance with the License. ## +## You may obtain a copy of the License at ## +## ## +## http://www.apache.org/licenses/LICENSE-2.0 ## +## ## +## Unless required by applicable law or agreed to in writing, software ## +## distributed under the License is distributed on an "AS IS" BASIS, ## +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ## +## See the License for the specific language governing permissions and ## +## limitations under the License. ## +## ======================================================================== ## + +macro(add_example EXAMPLE_NAME) + add_executable(${EXAMPLE_NAME} ${EXAMPLE_NAME}.cpp image_io.h image_io.cpp cli.h) + target_link_libraries(${EXAMPLE_NAME} PRIVATE common ${PROJECT_NAME}) + install(TARGETS ${EXAMPLE_NAME} DESTINATION ${CMAKE_INSTALL_BINDIR} COMPONENT examples) +endmacro() + +add_example(denoise) + diff --git a/oidn/examples/cli.h b/oidn/examples/cli.h new file mode 100644 index 0000000..b7c59c7 --- /dev/null +++ b/oidn/examples/cli.h @@ -0,0 +1,76 @@ +// ======================================================================== // +// Copyright 2009-2019 Intel Corporation // +// // +// Licensed under the Apache License, Version 2.0 (the "License"); // +// you may not use this file except in compliance with the License. // +// You may obtain a copy of the License at // +// // +// http://www.apache.org/licenses/LICENSE-2.0 // +// // +// Unless required by applicable law or agreed to in writing, software // +// distributed under the License is distributed on an "AS IS" BASIS, // +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // +// See the License for the specific language governing permissions and // +// limitations under the License. // +// ======================================================================== // + +#pragma once + +#include +#include +#include + +namespace oidn { + + // Command-line argument parser + class ArgParser + { + private: + int argc; + char** argv; + int pos; + + public: + ArgParser(int argc, char* argv[]) + : argc(argc), argv(argv), + pos(1) + {} + + bool hasNext() const + { + return pos < argc; + } + + std::string getNext() + { + if (pos < argc) + return argv[pos++]; + else + throw std::invalid_argument("argument expected"); + } + + std::string getNextOpt() + { + std::string str = getNext(); + if (str.empty() || str[0] != '-') + throw std::invalid_argument("option expected"); + return str.substr(str.find_first_not_of("-")); + } + + std::string getNextValue() + { + std::string str = getNext(); + if (!str.empty() && str[0] == '-') + throw std::invalid_argument("value expected"); + return str; + } + + int getNextValueInt() + { + std::string str = getNextValue(); + return atoi(str.c_str()); + } + }; + +} // namespace oidn + diff --git a/oidn/examples/denoise.cpp b/oidn/examples/denoise.cpp new file mode 100644 index 0000000..c0408e4 --- /dev/null +++ b/oidn/examples/denoise.cpp @@ -0,0 +1,304 @@ +// ======================================================================== // +// Copyright 2009-2019 Intel Corporation // +// // +// Licensed under the Apache License, Version 2.0 (the "License"); // +// you may not use this file except in compliance with the License. // +// You may obtain a copy of the License at // +// // +// http://www.apache.org/licenses/LICENSE-2.0 // +// // +// Unless required by applicable law or agreed to in writing, software // +// distributed under the License is distributed on an "AS IS" BASIS, // +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // +// See the License for the specific language governing permissions and // +// limitations under the License. // +// ======================================================================== // + +#include +#include +#include +#include + +#ifdef VTUNE +#include +#endif + +#include + +#include "common/timer.h" +#include "image_io.h" +#include "cli.h" + +using namespace oidn; + +void printUsage() +{ + std::cout << "Open Image Denoise Example" << std::endl; + std::cout << "Usage: denoise [-ldr ldr_color.pfm] [-srgb] [-hdr hdr_color.pfm]" << std::endl + << " [-alb albedo.pfm] [-nrm normal.pfm]" << std::endl + << " [-o output.pfm] [-ref reference_output.pfm]" << std::endl + << " [-bench ntimes] [-threads n] [-affinity 0|1] [-maxmem MB] [-verbose 0-3]" << std::endl; +} + +void errorCallback(void* userPtr, oidn::Error error, const char* message) +{ + throw std::runtime_error(message); +} + +volatile bool isCancelled = false; + +void signalHandler(int signal) +{ + isCancelled = true; +} + +bool progressCallback(void* userPtr, double n) +{ + if (isCancelled) + return false; + std::cout << "\rDenoising " << int(n * 100.) << "%" << std::flush; + return true; +} + +int main(int argc, char* argv[]) +{ + std::string colorFilename, albedoFilename, normalFilename; + std::string outputFilename, refFilename; + bool hdr = false; + bool srgb = false; + int numBenchmarkRuns = 0; + int numThreads = -1; + int setAffinity = -1; + int maxMemoryMB = -1; + int verbose = -1; + + // Parse the arguments + if (argc == 1) + { + printUsage(); + return 1; + } + + try + { + ArgParser args(argc, argv); + while (args.hasNext()) + { + std::string opt = args.getNextOpt(); + if (opt == "ldr") + { + colorFilename = args.getNextValue(); + hdr = false; + } + else if (opt == "hdr") + { + colorFilename = args.getNextValue(); + hdr = true; + } + else if (opt == "srgb") + srgb = true; + else if (opt == "alb" || opt == "albedo") + albedoFilename = args.getNextValue(); + else if (opt == "nrm" || opt == "normal") + normalFilename = args.getNextValue(); + else if (opt == "o" || opt == "out" || opt == "output") + outputFilename = args.getNextValue(); + else if (opt == "ref" || opt == "reference") + refFilename = args.getNextValue(); + else if (opt == "bench" || opt == "benchmark") + numBenchmarkRuns = std::max(args.getNextValueInt(), 0); + else if (opt == "threads") + numThreads = args.getNextValueInt(); + else if (opt == "affinity") + setAffinity = args.getNextValueInt(); + else if (opt == "maxmem") + maxMemoryMB = args.getNextValueInt(); + else if (opt == "verbose") + verbose = args.getNextValueInt(); + else if (opt == "h" || opt == "help") + { + printUsage(); + return 1; + } + else + throw std::invalid_argument("invalid argument"); + } + + if (colorFilename.empty()) + throw std::runtime_error("no color image specified"); + + // Load the input image + ImageBuffer color, albedo, normal; + ImageBuffer ref; + + std::cout << "Loading input" << std::endl; + + color = loadImage(colorFilename); + if (color.getChannels() != 3) + throw std::runtime_error("invalid color image"); + + if (!albedoFilename.empty()) + { + albedo = loadImage(albedoFilename); + if (albedo.getChannels() != 3 || albedo.getSize() != color.getSize()) + throw std::runtime_error("invalid albedo image"); + } + + if (!normalFilename.empty()) + { + normal = loadImage(normalFilename); + if (normal.getChannels() != 3 || normal.getSize() != color.getSize()) + throw std::runtime_error("invalid normal image"); + } + + if (!refFilename.empty()) + { + ref = loadImage(refFilename); + if (ref.getChannels() != 3 || ref.getSize() != color.getSize()) + throw std::runtime_error("invalid reference output image"); + } + + const int width = color.getWidth(); + const int height = color.getHeight(); + std::cout << "Resolution: " << width << "x" << height << std::endl; + + // Initialize the output image + ImageBuffer output(width, height, 3); + + // Initialize the denoising filter + std::cout << "Initializing" << std::endl; + Timer timer; + + oidn::DeviceRef device = oidn::newDevice(); + + const char* errorMessage; + if (device.getError(errorMessage) != oidn::Error::None) + throw std::runtime_error(errorMessage); + device.setErrorFunction(errorCallback); + + if (numThreads > 0) + device.set("numThreads", numThreads); + if (setAffinity >= 0) + device.set("setAffinity", bool(setAffinity)); + if (verbose >= 0) + device.set("verbose", verbose); + device.commit(); + + oidn::FilterRef filter = device.newFilter("RT"); + + filter.setImage("color", color.getData(), oidn::Format::Float3, width, height); + if (albedo) + filter.setImage("albedo", albedo.getData(), oidn::Format::Float3, width, height); + if (normal) + filter.setImage("normal", normal.getData(), oidn::Format::Float3, width, height); + filter.setImage("output", output.getData(), oidn::Format::Float3, width, height); + + if (hdr) + filter.set("hdr", true); + if (srgb) + filter.set("srgb", true); + + if (maxMemoryMB >= 0) + filter.set("maxMemoryMB", maxMemoryMB); + + filter.setProgressMonitorFunction(progressCallback); + signal(SIGINT, signalHandler); + + filter.commit(); + + const double initTime = timer.query(); + + const int versionMajor = device.get("versionMajor"); + const int versionMinor = device.get("versionMinor"); + const int versionPatch = device.get("versionPatch"); + + std::cout << " version=" << versionMajor << "." << versionMinor << "." << versionPatch + << ", msec=" << (1000. * initTime) << std::endl; + + // Denoise the image + //std::cout << "Denoising"; + timer.reset(); + + filter.execute(); + + const double denoiseTime = timer.query(); + std::cout << std::endl << " msec=" << (1000. * denoiseTime) << std::endl; + + filter.setProgressMonitorFunction(nullptr); + signal(SIGINT, SIG_DFL); + + if (ref) + { + // Verify the output values + std::cout << "Verifying output" << std::endl; + + ImageBuffer diff(width, height, 3); + int nerr = 0; + float maxre = 0; + + for (size_t i = 0; i < output.getDataSize(); ++i) + { + float expect = std::max(ref[i], 0.f); + const float actual = output[i]; + float re; + if (std::abs(expect) < 1e-5 && std::abs(actual) < 1e-5) + re = 0; + else if (expect != 0) + re = std::abs((expect - actual) / expect); + else + re = std::abs(expect - actual); + if (maxre < re) maxre = re; + if (re > 1e-3) + { + //std::cout << "i=" << i << " expect=" << expect << " actual=" << actual << std::endl; + ++nerr; + } + + diff[i] = std::abs(ref[i] - output[i]); + } + std::cout << " nfloats=" << output.getDataSize() << ", nerr=" << nerr << ", maxre=" << maxre << std::endl; + + // Save debug images + std::cout << "Saving debug images" << std::endl; + saveImage("denoise_in.ppm", color); + saveImage("denoise_out.ppm", output); + saveImage("denoise_ref.ppm", ref); + saveImage("denoise_diff.ppm", diff); + } + + if (!outputFilename.empty()) + { + // Save output image + std::cout << "Saving output" << std::endl; + saveImage(outputFilename, output); + } + + if (numBenchmarkRuns > 0) + { + // Benchmark loop + #ifdef VTUNE + __itt_resume(); + #endif + + std::cout << "Benchmarking: " << "ntimes=" << numBenchmarkRuns << std::endl; + timer.reset(); + + for (int i = 0; i < numBenchmarkRuns; ++i) + filter.execute(); + + const double totalTime = timer.query(); + std::cout << " sec=" << totalTime << ", msec/image=" << (1000.*totalTime / numBenchmarkRuns) << std::endl; + + #ifdef VTUNE + __itt_pause(); + #endif + } + } + catch (std::exception& e) + { + std::cout << "Error: " << e.what() << std::endl; + return 1; + } + + return 0; +} diff --git a/oidn/examples/image_io.cpp b/oidn/examples/image_io.cpp new file mode 100644 index 0000000..7f598a8 --- /dev/null +++ b/oidn/examples/image_io.cpp @@ -0,0 +1,169 @@ +// ======================================================================== // +// Copyright 2009-2019 Intel Corporation // +// // +// Licensed under the Apache License, Version 2.0 (the "License"); // +// you may not use this file except in compliance with the License. // +// You may obtain a copy of the License at // +// // +// http://www.apache.org/licenses/LICENSE-2.0 // +// // +// Unless required by applicable law or agreed to in writing, software // +// distributed under the License is distributed on an "AS IS" BASIS, // +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // +// See the License for the specific language governing permissions and // +// limitations under the License. // +// ======================================================================== // + +#include +#include +#include "image_io.h" + +namespace oidn { + + namespace + { + std::string getExtension(const std::string& filename) + { + const size_t pos = filename.find_last_of('.'); + if (pos == std::string::npos) + return ""; // no extension + else + return filename.substr(pos + 1); + } + + ImageBuffer loadImagePFM(const std::string& filename) + { + // Open the file + std::ifstream file(filename, std::ios::binary); + if (file.fail()) + throw std::runtime_error("cannot open file '" + filename + "'"); + + // Read the header + std::string id; + file >> id; + int C; + if (id == "PF") + C = 3; + else if (id == "Pf") + C = 1; + else + throw std::runtime_error("invalid PFM image"); + + int H, W; + file >> W >> H; + + float scale; + file >> scale; + + file.get(); // skip newline + + if (file.fail()) + throw std::runtime_error("invalid PFM image"); + + if (scale >= 0.f) + throw std::runtime_error("big-endian PFM images are not supported"); + scale = fabs(scale); + + // Read the pixels + ImageBuffer image(W, H, C); + + for (int h = 0; h < H; ++h) + { + for (int w = 0; w < W; ++w) + { + for (int c = 0; c < C; ++c) + { + float x; + file.read((char*)&x, sizeof(float)); + image[((H-1-h)*W + w) * C + c] = x * scale; + } + } + } + + if (file.fail()) + throw std::runtime_error("invalid PFM image"); + + return image; + } + + void saveImagePFM(const std::string& filename, const ImageBuffer& image) + { + const int H = image.getHeight(); + const int W = image.getWidth(); + const int C = image.getChannels(); + + // Open the file + std::ofstream file(filename, std::ios::binary); + if (file.fail()) + throw std::runtime_error("cannot open file: '" + filename + "'"); + + // Write the header + file << "PF" << std::endl; + file << W << " " << H << std::endl; + file << "-1.0" << std::endl; + + // Write the pixels + for (int h = 0; h < H; ++h) + { + for (int w = 0; w < W; ++w) + { + for (int c = 0; c < 3; ++c) + { + const float x = image[((H-1-h)*W + w) * C + c]; + file.write((char*)&x, sizeof(float)); + } + } + } + } + + void saveImagePPM(const std::string& filename, const ImageBuffer& image) + { + if (image.getChannels() != 3) + throw std::invalid_argument("image must have 3 channels"); + const int H = image.getHeight(); + const int W = image.getWidth(); + const int C = image.getChannels(); + + // Open the file + std::ofstream file(filename, std::ios::binary); + if (file.fail()) + throw std::runtime_error("cannot open file: '" + filename + "'"); + + // Write the header + file << "P6" << std::endl; + file << W << " " << H << std::endl; + file << "255" << std::endl; + + // Write the pixels + for (int i = 0; i < W*H; ++i) + { + for (int c = 0; c < 3; ++c) + { + float x = image[i*C+c]; + x = pow(x, 1.f/2.2f); + int ch = std::min(std::max(int(x * 255.f), 0), 255); + file.put(char(ch)); + } + } + } + } + + ImageBuffer loadImage(const std::string& filename) + { + if (getExtension(filename) != "pfm") + throw std::runtime_error("unsupported image file format"); + return loadImagePFM(filename); + } + + void saveImage(const std::string& filename, const ImageBuffer& image) + { + const std::string ext = getExtension(filename); + if (ext == "pfm") + saveImagePFM(filename, image); + else if (ext == "ppm") + saveImagePPM(filename, image); + else + throw std::runtime_error("unsupported image file format"); + } + +} // namespace oidn diff --git a/oidn/examples/image_io.h b/oidn/examples/image_io.h new file mode 100644 index 0000000..740239d --- /dev/null +++ b/oidn/examples/image_io.h @@ -0,0 +1,66 @@ +// ======================================================================== // +// Copyright 2009-2019 Intel Corporation // +// // +// Licensed under the Apache License, Version 2.0 (the "License"); // +// you may not use this file except in compliance with the License. // +// You may obtain a copy of the License at // +// // +// http://www.apache.org/licenses/LICENSE-2.0 // +// // +// Unless required by applicable law or agreed to in writing, software // +// distributed under the License is distributed on an "AS IS" BASIS, // +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // +// See the License for the specific language governing permissions and // +// limitations under the License. // +// ======================================================================== // + +#pragma once + +#include +#include +#include + +namespace oidn { + + class ImageBuffer + { + private: + std::vector data; + int width; + int height; + int channels; + + public: + ImageBuffer() + : width(0), + height(0), + channels(0) {} + + ImageBuffer(int width, int height, int channels) + : data(width * height * channels), + width(width), + height(height), + channels(channels) {} + + operator bool() const + { + return data.data() != nullptr; + } + + const float& operator [](size_t i) const { return data[i]; } + float& operator [](size_t i) { return data[i]; } + + int getWidth() const { return width; } + int getHeight() const { return height; } + std::array getSize() const { return {width, height}; } + int getChannels() const { return channels; } + + const float* getData() const { return data.data(); } + float* getData() { return data.data(); } + int getDataSize() { return int(data.size()); } + }; + + ImageBuffer loadImage(const std::string& filename); + void saveImage(const std::string& filename, const ImageBuffer& image); + +} // namespace oidn diff --git a/oidn/include/OpenImageDenoise/oidn.h b/oidn/include/OpenImageDenoise/oidn.h new file mode 100644 index 0000000..1219228 --- /dev/null +++ b/oidn/include/OpenImageDenoise/oidn.h @@ -0,0 +1,208 @@ +// ======================================================================== // +// Copyright 2009-2019 Intel Corporation // +// // +// Licensed under the Apache License, Version 2.0 (the "License"); // +// you may not use this file except in compliance with the License. // +// You may obtain a copy of the License at // +// // +// http://www.apache.org/licenses/LICENSE-2.0 // +// // +// Unless required by applicable law or agreed to in writing, software // +// distributed under the License is distributed on an "AS IS" BASIS, // +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // +// See the License for the specific language governing permissions and // +// limitations under the License. // +// ======================================================================== // + +#pragma once + +#include +#include +#include + +#include "version.h" + +#if defined(__cplusplus) +extern "C" { +#endif + +#ifndef OIDN_API +#if defined(_WIN32) && !defined(OIDN_STATIC_LIB) +# define OIDN_API __declspec(dllimport) +#else +# define OIDN_API +#endif +#endif + +// ---------------------------------------------------------------------------- +// Device +// ---------------------------------------------------------------------------- + +// Open Image Denoise device types +typedef enum +{ + OIDN_DEVICE_TYPE_DEFAULT = 0, // select device automatically + + OIDN_DEVICE_TYPE_CPU = 1, // CPU device +} OIDNDeviceType; + +// Error codes +typedef enum +{ + OIDN_ERROR_NONE = 0, // no error occurred + OIDN_ERROR_UNKNOWN = 1, // an unknown error occurred + OIDN_ERROR_INVALID_ARGUMENT = 2, // an invalid argument was specified + OIDN_ERROR_INVALID_OPERATION = 3, // the operation is not allowed + OIDN_ERROR_OUT_OF_MEMORY = 4, // not enough memory to execute the operation + OIDN_ERROR_UNSUPPORTED_HARDWARE = 5, // the hardware (e.g. CPU) is not supported + OIDN_ERROR_CANCELLED = 6, // the operation was cancelled by the user +} OIDNError; + +// Error callback function +typedef void (*OIDNErrorFunction)(void* userPtr, OIDNError code, const char* message); + +// Device handle +typedef struct OIDNDeviceImpl* OIDNDevice; + +// Creates a new Open Image Denoise device. +OIDN_API OIDNDevice oidnNewDevice(OIDNDeviceType type); + +// Retains the device (increments the reference count). +OIDN_API void oidnRetainDevice(OIDNDevice device); + +// Releases the device (decrements the reference count). +OIDN_API void oidnReleaseDevice(OIDNDevice device); + +// Sets a boolean parameter of the device. +OIDN_API void oidnSetDevice1b(OIDNDevice device, const char* name, bool value); + +// Sets an integer parameter of the device. +OIDN_API void oidnSetDevice1i(OIDNDevice device, const char* name, int value); + +// Gets a boolean parameter of the device. +OIDN_API bool oidnGetDevice1b(OIDNDevice device, const char* name); + +// Gets an integer parameter of the device (e.g. "version"). +OIDN_API int oidnGetDevice1i(OIDNDevice device, const char* name); + +// Sets the error callback function of the device. +OIDN_API void oidnSetDeviceErrorFunction(OIDNDevice device, OIDNErrorFunction func, void* userPtr); + +// Returns the first unqueried error code stored in the device for the current +// thread, optionally also returning a string message (if not NULL), and clears +// the stored error. Can be called with a NULL device as well to check why a +// device creation failed. +OIDN_API OIDNError oidnGetDeviceError(OIDNDevice device, const char** outMessage); + +// Commits all previous changes to the device. +// Must be called before first using the device (e.g. creating filters). +OIDN_API void oidnCommitDevice(OIDNDevice device); + +// ---------------------------------------------------------------------------- +// Buffer +// ---------------------------------------------------------------------------- + +// Formats for images and other data stored in buffers +typedef enum +{ + OIDN_FORMAT_UNDEFINED = 0, + + // 32-bit single-precision floating point scalar and vector formats + OIDN_FORMAT_FLOAT = 1, + OIDN_FORMAT_FLOAT2 = 2, + OIDN_FORMAT_FLOAT3 = 3, + OIDN_FORMAT_FLOAT4 = 4, +} OIDNFormat; + +// Access modes for mapping buffers +typedef enum +{ + OIDN_ACCESS_READ = 0, // read-only access + OIDN_ACCESS_WRITE = 1, // write-only access + OIDN_ACCESS_READ_WRITE = 2, // read and write access + OIDN_ACCESS_WRITE_DISCARD = 3, // write-only access, previous contents discarded +} OIDNAccess; + +// Buffer handle +typedef struct OIDNBufferImpl* OIDNBuffer; + +// Creates a new buffer (data allocated and owned by the device). +OIDN_API OIDNBuffer oidnNewBuffer(OIDNDevice device, size_t byteSize); + +// Creates a new shared buffer (data allocated and owned by the user). +OIDN_API OIDNBuffer oidnNewSharedBuffer(OIDNDevice device, void* ptr, size_t byteSize); + +// Maps a region of the buffer to host memory. +// If byteSize is 0, the maximum available amount of memory will be mapped. +OIDN_API void* oidnMapBuffer(OIDNBuffer buffer, OIDNAccess access, size_t byteOffset, size_t byteSize); + +// Unmaps a region of the buffer. +// mappedPtr must be a pointer returned by a previous call to oidnMapBuffer. +OIDN_API void oidnUnmapBuffer(OIDNBuffer buffer, void* mappedPtr); + +// Retains the buffer (increments the reference count). +OIDN_API void oidnRetainBuffer(OIDNBuffer buffer); + +// Releases the buffer (decrements the reference count). +OIDN_API void oidnReleaseBuffer(OIDNBuffer buffer); + +// ---------------------------------------------------------------------------- +// Filter +// ---------------------------------------------------------------------------- + +// Progress monitor callback function +typedef bool (*OIDNProgressMonitorFunction)(void* userPtr, double n); + +// Filter handle +typedef struct OIDNFilterImpl* OIDNFilter; + +// Creates a new filter of the specified type (e.g. "RT"). +OIDN_API OIDNFilter oidnNewFilter(OIDNDevice device, const char* type); + +// Retains the filter (increments the reference count). +OIDN_API void oidnRetainFilter(OIDNFilter filter); + +// Releases the filter (decrements the reference count). +OIDN_API void oidnReleaseFilter(OIDNFilter filter); + +// Sets an image parameter of the filter (stored in a buffer). +// If bytePixelStride and/or byteRowStride are zero, these will be computed automatically. +OIDN_API void oidnSetFilterImage(OIDNFilter filter, const char* name, + OIDNBuffer buffer, OIDNFormat format, + size_t width, size_t height, + size_t byteOffset, + size_t bytePixelStride, size_t byteRowStride); + +// Sets an image parameter of the filter (owned by the user). +// If bytePixelStride and/or byteRowStride are zero, these will be computed automatically. +OIDN_API void oidnSetSharedFilterImage(OIDNFilter filter, const char* name, + void* ptr, OIDNFormat format, + size_t width, size_t height, + size_t byteOffset, + size_t bytePixelStride, size_t byteRowStride); + +// Sets a boolean parameter of the filter. +OIDN_API void oidnSetFilter1b(OIDNFilter filter, const char* name, bool value); + +// Sets an integer parameter of the filter. +OIDN_API void oidnSetFilter1i(OIDNFilter filter, const char* name, int value); + +// Gets a boolean parameter of the filter. +OIDN_API bool oidnGetFilter1b(OIDNFilter filter, const char* name); + +// Gets an integer parameter of the filter. +OIDN_API int oidnGetFilter1i(OIDNFilter filter, const char* name); + +// Sets the progress monitor callback function of the filter. +OIDN_API void oidnSetFilterProgressMonitorFunction(OIDNFilter filter, OIDNProgressMonitorFunction func, void* userPtr); + +// Commits all previous changes to the filter. +// Must be called before first executing the filter. +OIDN_API void oidnCommitFilter(OIDNFilter filter); + +// Executes the filter. +OIDN_API void oidnExecuteFilter(OIDNFilter filter); + +#if defined(__cplusplus) +} +#endif diff --git a/oidn/include/OpenImageDenoise/oidn.hpp b/oidn/include/OpenImageDenoise/oidn.hpp new file mode 100644 index 0000000..9968508 --- /dev/null +++ b/oidn/include/OpenImageDenoise/oidn.hpp @@ -0,0 +1,455 @@ +// ======================================================================== // +// Copyright 2009-2019 Intel Corporation // +// // +// Licensed under the Apache License, Version 2.0 (the "License"); // +// you may not use this file except in compliance with the License. // +// You may obtain a copy of the License at // +// // +// http://www.apache.org/licenses/LICENSE-2.0 // +// // +// Unless required by applicable law or agreed to in writing, software // +// distributed under the License is distributed on an "AS IS" BASIS, // +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // +// See the License for the specific language governing permissions and // +// limitations under the License. // +// ======================================================================== // + +#pragma once + +#include +#include "oidn.h" + +namespace oidn { + + // -------------------------------------------------------------------------- + // Buffer + // -------------------------------------------------------------------------- + + // Formats for images and other data stored in buffers + enum class Format + { + Undefined = OIDN_FORMAT_UNDEFINED, + + // 32-bit single-precision floating point scalar and vector formats + Float = OIDN_FORMAT_FLOAT, + Float2 = OIDN_FORMAT_FLOAT2, + Float3 = OIDN_FORMAT_FLOAT3, + Float4 = OIDN_FORMAT_FLOAT4, + }; + + // Access modes for mapping buffers + enum class Access + { + Read = OIDN_ACCESS_READ, // read-only access + Write = OIDN_ACCESS_WRITE, // write-only access + ReadWrite = OIDN_ACCESS_READ_WRITE, // read and write access + WriteDiscard = OIDN_ACCESS_WRITE_DISCARD, // write-only access, previous contents discarded + }; + + // Buffer object with automatic reference counting + class BufferRef + { + private: + OIDNBuffer handle; + + public: + BufferRef() : handle(nullptr) {} + BufferRef(OIDNBuffer handle) : handle(handle) {} + + BufferRef(const BufferRef& other) : handle(other.handle) + { + if (handle) + oidnRetainBuffer(handle); + } + + BufferRef(BufferRef&& other) : handle(other.handle) + { + other.handle = nullptr; + } + + BufferRef& operator =(const BufferRef& other) + { + if (&other != this) + { + if (other.handle) + oidnRetainBuffer(other.handle); + if (handle) + oidnReleaseBuffer(handle); + handle = other.handle; + } + return *this; + } + + BufferRef& operator =(BufferRef&& other) + { + std::swap(handle, other.handle); + return *this; + } + + BufferRef& operator =(OIDNBuffer other) + { + if (other) + oidnRetainBuffer(other); + if (handle) + oidnReleaseBuffer(handle); + handle = other; + return *this; + } + + ~BufferRef() + { + if (handle) + oidnReleaseBuffer(handle); + } + + OIDNBuffer getHandle() const + { + return handle; + } + + operator bool() const + { + return handle != nullptr; + } + + // Maps a region of the buffer to host memory. + // If byteSize is 0, the maximum available amount of memory will be mapped. + void* map(Access access = Access::ReadWrite, size_t byteOffset = 0, size_t byteSize = 0) + { + return oidnMapBuffer(handle, (OIDNAccess)access, byteOffset, byteSize); + } + + // Unmaps a region of the buffer. + // mappedPtr must be a pointer returned by a previous call to map. + void unmap(void* mappedPtr) + { + oidnUnmapBuffer(handle, mappedPtr); + } + }; + + // -------------------------------------------------------------------------- + // Filter + // -------------------------------------------------------------------------- + + // Progress monitor callback function + typedef bool (*ProgressMonitorFunction)(void* userPtr, double n); + + // Filter object with automatic reference counting + class FilterRef + { + private: + OIDNFilter handle; + + public: + FilterRef() : handle(nullptr) {} + FilterRef(OIDNFilter handle) : handle(handle) {} + + FilterRef(const FilterRef& other) : handle(other.handle) + { + if (handle) + oidnRetainFilter(handle); + } + + FilterRef(FilterRef&& other) : handle(other.handle) + { + other.handle = nullptr; + } + + FilterRef& operator =(const FilterRef& other) + { + if (&other != this) + { + if (other.handle) + oidnRetainFilter(other.handle); + if (handle) + oidnReleaseFilter(handle); + handle = other.handle; + } + return *this; + } + + FilterRef& operator =(FilterRef&& other) + { + std::swap(handle, other.handle); + return *this; + } + + FilterRef& operator =(OIDNFilter other) + { + if (other) + oidnRetainFilter(other); + if (handle) + oidnReleaseFilter(handle); + handle = other; + return *this; + } + + ~FilterRef() + { + if (handle) + oidnReleaseFilter(handle); + } + + OIDNFilter getHandle() const + { + return handle; + } + + operator bool() const + { + return handle != nullptr; + } + + // Sets an image parameter of the filter (stored in a buffer). + void setImage(const char* name, + const BufferRef& buffer, Format format, + size_t width, size_t height, + size_t byteOffset = 0, + size_t bytePixelStride = 0, size_t byteRowStride = 0) + { + oidnSetFilterImage(handle, name, + buffer.getHandle(), (OIDNFormat)format, + width, height, + byteOffset, + bytePixelStride, byteRowStride); + } + + // Sets an image parameter of the filter (owned by the user). + void setImage(const char* name, + void* ptr, Format format, + size_t width, size_t height, + size_t byteOffset = 0, + size_t bytePixelStride = 0, size_t byteRowStride = 0) + { + oidnSetSharedFilterImage(handle, name, + ptr, (OIDNFormat)format, + width, height, + byteOffset, + bytePixelStride, byteRowStride); + } + + // Sets a boolean parameter of the filter. + void set(const char* name, bool value) + { + oidnSetFilter1b(handle, name, value); + } + + // Sets an integer parameter of the filter. + void set(const char* name, int value) + { + oidnSetFilter1i(handle, name, value); + } + + // Gets a parameter of the filter. + template + T get(const char* name); + + // Sets the progress monitor callback function of the filter. + void setProgressMonitorFunction(ProgressMonitorFunction func, void* userPtr = nullptr) + { + oidnSetFilterProgressMonitorFunction(handle, (OIDNProgressMonitorFunction)func, userPtr); + } + + // Commits all previous changes to the filter. + void commit() + { + oidnCommitFilter(handle); + } + + // Executes the filter. + void execute() + { + oidnExecuteFilter(handle); + } + }; + + // Gets a boolean parameter of the filter. + template<> + inline bool FilterRef::get(const char* name) + { + return oidnGetFilter1b(handle, name); + } + + // Gets an integer parameter of the filter. + template<> + inline int FilterRef::get(const char* name) + { + return oidnGetFilter1i(handle, name); + } + + // -------------------------------------------------------------------------- + // Device + // -------------------------------------------------------------------------- + + // Open Image Denoise device types + enum class DeviceType + { + Default = OIDN_DEVICE_TYPE_DEFAULT, // select device automatically + + CPU = OIDN_DEVICE_TYPE_CPU, // CPU device + }; + + // Error codes + enum class Error + { + None = OIDN_ERROR_NONE, // no error occurred + Unknown = OIDN_ERROR_UNKNOWN, // an unknown error occurred + InvalidArgument = OIDN_ERROR_INVALID_ARGUMENT, // an invalid argument was specified + InvalidOperation = OIDN_ERROR_INVALID_OPERATION, // the operation is not allowed + OutOfMemory = OIDN_ERROR_OUT_OF_MEMORY, // not enough memory to execute the operation + UnsupportedHardware = OIDN_ERROR_UNSUPPORTED_HARDWARE, // the hardware (e.g. CPU) is not supported + Cancelled = OIDN_ERROR_CANCELLED, // the operation was cancelled by the user + }; + + // Error callback function + typedef void (*ErrorFunction)(void* userPtr, Error code, const char* message); + + // Device object with automatic reference counting + class DeviceRef + { + private: + OIDNDevice handle; + + public: + DeviceRef() : handle(nullptr) {} + DeviceRef(OIDNDevice handle) : handle(handle) {} + + DeviceRef(const DeviceRef& other) : handle(other.handle) + { + if (handle) + oidnRetainDevice(handle); + } + + DeviceRef(DeviceRef&& other) : handle(other.handle) + { + other.handle = nullptr; + } + + DeviceRef& operator =(const DeviceRef& other) + { + if (&other != this) + { + if (other.handle) + oidnRetainDevice(other.handle); + if (handle) + oidnReleaseDevice(handle); + handle = other.handle; + } + return *this; + } + + DeviceRef& operator =(DeviceRef&& other) + { + std::swap(handle, other.handle); + return *this; + } + + DeviceRef& operator =(OIDNDevice other) + { + if (other) + oidnRetainDevice(other); + if (handle) + oidnReleaseDevice(handle); + handle = other; + return *this; + } + + ~DeviceRef() + { + if (handle) + oidnReleaseDevice(handle); + } + + OIDNDevice getHandle() const + { + return handle; + } + + operator bool() const + { + return handle != nullptr; + } + + // Sets a boolean parameter of the device. + void set(const char* name, bool value) + { + oidnSetDevice1b(handle, name, value); + } + + // Sets an integer parameter of the device. + void set(const char* name, int value) + { + oidnSetDevice1i(handle, name, value); + } + + // Gets a parameter of the device. + template + T get(const char* name); + + // Sets the error callback function of the device. + void setErrorFunction(ErrorFunction func, void* userPtr = nullptr) + { + oidnSetDeviceErrorFunction(handle, (OIDNErrorFunction)func, userPtr); + } + + // Returns the first unqueried error code and clears the stored error. + // Can be called for a null device as well to check why a device creation failed. + Error getError() + { + return (Error)oidnGetDeviceError(handle, nullptr); + } + + // Returns the first unqueried error code and string message, and clears the stored error. + // Can be called for a null device as well to check why a device creation failed. + Error getError(const char*& outMessage) + { + return (Error)oidnGetDeviceError(handle, &outMessage); + } + + // Commits all previous changes to the device. + // Must be called before first using the device (e.g. creating filters). + void commit() + { + oidnCommitDevice(handle); + } + + // Creates a new buffer (data allocated and owned by the device). + BufferRef newBuffer(size_t byteSize) + { + return oidnNewBuffer(handle, byteSize); + } + + // Creates a new shared buffer (data allocated and owned by the user). + BufferRef newBuffer(void* ptr, size_t byteSize) + { + return oidnNewSharedBuffer(handle, ptr, byteSize); + } + + // Creates a new filter of the specified type (e.g. "RT"). + FilterRef newFilter(const char* type) + { + return oidnNewFilter(handle, type); + } + }; + + // Gets a boolean parameter of the device. + template<> + inline bool DeviceRef::get(const char* name) + { + return oidnGetDevice1b(handle, name); + } + + // Gets an integer parameter of the device (e.g. "version"). + template<> + inline int DeviceRef::get(const char* name) + { + return oidnGetDevice1i(handle, name); + } + + // Creates a new Open Image Denoise device. + inline DeviceRef newDevice(DeviceType type = DeviceType::Default) + { + return DeviceRef(oidnNewDevice((OIDNDeviceType)type)); + } + +} // namespace oidn diff --git a/oidn/include/OpenImageDenoise/version.h.in b/oidn/include/OpenImageDenoise/version.h.in new file mode 100644 index 0000000..69ee61b --- /dev/null +++ b/oidn/include/OpenImageDenoise/version.h.in @@ -0,0 +1,23 @@ +// ======================================================================== // +// Copyright 2009-2019 Intel Corporation // +// // +// Licensed under the Apache License, Version 2.0 (the "License"); // +// you may not use this file except in compliance with the License. // +// You may obtain a copy of the License at // +// // +// http://www.apache.org/licenses/LICENSE-2.0 // +// // +// Unless required by applicable law or agreed to in writing, software // +// distributed under the License is distributed on an "AS IS" BASIS, // +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // +// See the License for the specific language governing permissions and // +// limitations under the License. // +// ======================================================================== // + +#pragma once + +#define OIDN_VERSION_MAJOR @OIDN_VERSION_MAJOR@ +#define OIDN_VERSION_MINOR @OIDN_VERSION_MINOR@ +#define OIDN_VERSION_PATCH @OIDN_VERSION_PATCH@ +#define OIDN_VERSION @OIDN_VERSION_NUMBER@ +#define OIDN_VERSION_STRING "@OIDN_VERSION_MAJOR@.@OIDN_VERSION_MINOR@.@OIDN_VERSION_PATCH@@OIDN_VERSION_NOTE@" diff --git a/oidn/mkl-dnn/.github/issue_template.md b/oidn/mkl-dnn/.github/issue_template.md new file mode 100644 index 0000000..16f7324 --- /dev/null +++ b/oidn/mkl-dnn/.github/issue_template.md @@ -0,0 +1,31 @@ +Here's the place for your question, suggestion, a feature request or brief +description of the problem. If you are submitting a defect report please fill +all the sections below. For everything else feel free to remove everything +below the line. + +----------------------------------------------------------------------------- + +### Environment +Intel MKL-DNN includes hardware-specific optimizations and may behave +differently on depending on the compiler and build environment. Include +the following information to help reproduce the issue: +* CPU make and model (try `lscpu`; if your `lscpu` does not list CPU flags, + try running `cat /proc/cpuinfo | grep flags | sort -u`) +* OS version (`uname -a`) +* Compiler version (`gcc --version`) +* MKLROOT value (`echo MKLROOT=$MKLROOT`) +* CMake version (`cmake --version`) +* CMake output log +* git hash (`git log -1 --format=%H`) + +### Steps to reproduce +Please check that the issue is reproducible with the latest revision on +master. Include all the steps to reproduce the issue. A short C/C++ program +or modified unit tests demonstrating the issue will greatly help +with the investigation. + +### Actual behavior +Describe the behavior you see. + +### Expected behavior +Describe the behavior you expect. diff --git a/oidn/mkl-dnn/.gitignore b/oidn/mkl-dnn/.gitignore new file mode 100644 index 0000000..646e08e --- /dev/null +++ b/oidn/mkl-dnn/.gitignore @@ -0,0 +1,3 @@ +build +external +.*.sw? diff --git a/oidn/mkl-dnn/CMakeLists.txt b/oidn/mkl-dnn/CMakeLists.txt new file mode 100644 index 0000000..158f0e6 --- /dev/null +++ b/oidn/mkl-dnn/CMakeLists.txt @@ -0,0 +1,93 @@ +#=============================================================================== +# Copyright 2016-2018 Intel Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +#=============================================================================== + +cmake_minimum_required(VERSION 2.8) + +if(POLICY CMP0022) + cmake_policy(SET CMP0022 NEW) +endif() + +if(POLICY CMP0054) + cmake_policy(SET CMP0054 NEW) +endif() + +# Enable RPATH on MacOS/OSX +if(POLICY CMP0042) + cmake_policy(SET CMP0042 NEW) +endif() + +# Do not export symbols from executables +if(POLICY CMP0065) + cmake_policy(SET CMP0065 NEW) +endif() + +# Pass all flags to try_compile +if(POLICY CMP0056) + cmake_policy(SET CMP0056 NEW) +endif() +if(POLICY CMP0066) + cmake_policy(SET CMP0066 NEW) +endif() + +set(PROJECT_NAME "Intel(R) MKL-DNN") +set(PROJECT_FULL_NAME "Intel(R) Math Kernel Library for Deep Neural Networks (Intel(R) MKL-DNN)") +set(PROJECT_VERSION "0.90.0") + +set(LIB_NAME mkldnn) + +if (CMAKE_VERSION VERSION_LESS 3.0) + project(${PROJECT_NAME} C CXX) +else() + cmake_policy(SET CMP0048 NEW) + project(${PROJECT_NAME} VERSION "${PROJECT_VERSION}" LANGUAGES C CXX) +endif() + +if (NOT CMAKE_SIZEOF_VOID_P EQUAL 8) + message("FATAL_ERROR" "Intel(R) MKL-DNN supports 64 bit platforms only") +endif() + +if("${CMAKE_BUILD_TYPE}" STREQUAL "") + message(STATUS "CMAKE_BUILD_TYPE is unset, defaulting to Release") + set(CMAKE_BUILD_TYPE "Release") +endif() + +set(CMAKE_SRC_CCXX_FLAGS) # SRC specifics +set(CMAKE_EXAMPLE_CCXX_FLAGS) # EXAMPLE specifics +set(CMAKE_TEST_CCXX_FLAGS) # TESTS specifics + +include(GNUInstallDirs) +include(CMakePackageConfigHelpers) + +include("cmake/utils.cmake") +include("cmake/options.cmake") +include("cmake/OpenMP.cmake") +include("cmake/TBB.cmake") +include("cmake/platform.cmake") +include("cmake/SDL.cmake") +#include("cmake/MKL.cmake") +#include("cmake/Doxygen.cmake") +include("cmake/version.cmake") + +enable_testing() + +include_directories(include) + +add_subdirectory(src) +add_subdirectory(examples) +add_subdirectory(tests) + +# Cannot use CMAKE_INSTALL_DOCDIR since it uses PROJECT_NAME and not LIB_NAME +install(FILES LICENSE DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/doc/${LIB_NAME}) diff --git a/oidn/mkl-dnn/LICENSE b/oidn/mkl-dnn/LICENSE new file mode 100644 index 0000000..fde864d --- /dev/null +++ b/oidn/mkl-dnn/LICENSE @@ -0,0 +1,215 @@ + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "{}" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright {yyyy} {name of copyright owner} + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. + + ============================================================================ + + Intel MKL-DNN includes components with separate copyright + notices and license terms. + + XByak, 3-clause BSD license + Copyright (c) 2007 MITSUNARI Shigeo + See full copyright notice and license text in src/cpu/xbyak/COPYRIGHT + + gtest, 3-clause BSD license + Copyright 2008, Google Inc. + See full copyright notice and license text in tests/gtests/gtest/LICENSE + \ No newline at end of file diff --git a/oidn/mkl-dnn/README.md b/oidn/mkl-dnn/README.md new file mode 100644 index 0000000..366fa94 --- /dev/null +++ b/oidn/mkl-dnn/README.md @@ -0,0 +1,433 @@ +# Intel(R) Math Kernel Library for Deep Neural Networks (Intel(R) MKL-DNN) +![v0.90 beta](https://img.shields.io/badge/v0.90-beta-orange.svg) + +> NOTE +> +> The master branch is now used to work on the upcoming Intel MKL-DNN v1.0 with +> incompatible changes to the v0.x. The changes are described in the following +> [RFC](https://github.com/intel/mkl-dnn/pull/384). +> +> For a limited time the team would maintain +> [0.x branch](https://github.com/intel/mkl-dnn/tree/mnt-v0), +> backporting fixes and some of the features from the mainline. + +Intel(R) Math Kernel Library for Deep Neural Networks (Intel(R) MKL-DNN) is +an open-source performance library for deep-learning applications. The library +accelerates deep-learning applications and frameworks on Intel architecture. +Intel MKL-DNN contains vectorized and threaded building blocks that you can +use to implement deep neural networks (DNN) with C and C++ interfaces. + +DNN functionality optimized for Intel architecture is also included in +[Intel Math Kernel Library (Intel MKL)](https://software.intel.com/en-us/mkl/features/deep-neural-networks). +The API in that implementation is not compatible with Intel MKL-DNN and does not +include certain new and experimental features. + +This release contains performance-critical functions that improve performance of +the following deep learning topologies and variations of these: + +| Application | Example topology +|:--- |:--- +| Image recognition | AlexNet, VGG, GoogleNet, ResNet, MobileNet +| Image segmentation | FCN, SegNet, MaskRCNN, U-Net +| Volumetric segmentation | 3D-Unet +| Object detection | SSD, Faster R-CNN, Yolo +| Neural machine translation | GNMT +| Speech recognition | DeepSpeech +| Adversarial networks | DCGAN, 3DGAN +| Reinforcement learning | A3C +| Text-to-speech | WaveNet + +Intel MKL-DNN is used in the following software products: +* [Caffe\* Optimized for Intel Architecture](https://github.com/intel/caffe) +* [Chainer\*](https://chainer.org) +* [DeepBench](https://github.com/baidu-research/DeepBench) +* [PaddlePaddle\*](http://www.paddlepaddle.org) +* [PyTorch\*](https://pytorch.org/) +* [Tensorflow\*](https://www.tensorflow.org) +* [Microsoft\* Cognitive Toolkit (CNTK)](https://docs.microsoft.com/en-us/cognitive-toolkit) +* [Apache\* MXNet](https://mxnet.apache.org) +* [OpenVINO(TM) toolkit](https://01.org/openvinotoolkit) +* [Intel Nervana Graph](https://github.com/NervanaSystems/ngraph) +* [Menoh\*](https://github.com/pfnet-research/menoh) +* [DeepLearning4J\*](https://deeplearning4j.org) +* [BigDL](https://github.com/intel-analytics/BigDL) + +## License +Intel MKL-DNN is licensed under +[Apache License Version 2.0](http://www.apache.org/licenses/LICENSE-2.0). This +software includes the following third-party components: +* [Xbyak](https://github.com/herumi/xbyak) distributed under [3-clause BSD licence](src/cpu/xbyak/COPYRIGHT) +* [gtest](https://github.com/google/googletest) distributed under [3-clause BSD license](tests/gtests/gtest/LICENSE) + +## Documentation +* [Introduction](https://intel.github.io/mkl-dnn) explains the programming model +and basic concepts +* [Reference manual](https://intel.github.io/mkl-dnn/modules.html) provides +detailed functionality description +* [Examples](https://github.com/intel/mkl-dnn/tree/master/examples) +demonstrates use of C and C++ APIs in simple topologies +* [Tutorial](https://software.intel.com/en-us/articles/intel-mkl-dnn-part-1-library-overview-and-installation) +provides step-by-step installation instructions and an example walkthrough + +## Support +Please submit your questions, feature requests, and bug reports on the +[GitHub issues](https://github.com/intel/mkl-dnn/issues) page. + +**WARNING** The following functionality has preview status and might change +without prior notification in future releases: +* Threading Building Blocks (TBB) support + +## How to Contribute +We welcome community contributions to Intel MKL-DNN. If you have an idea on how to improve the library: + +* Share your proposal via + [GitHub issues](https://github.com/intel/mkl-dnn/issues). +* Ensure you can build the product and run all the examples with your patch. +* In the case of a larger feature, create a test. +* Submit a [pull request](https://github.com/intel/mkl-dnn/pulls). + +We will review your contribution and, if any additional fixes or modifications +are necessary, may provide feedback to guide you. When accepted, your pull +request will be merged to the repository. + +## System Requirements +Intel MKL-DNN supports Intel 64 architecture and compatible architectures. +The library is optimized for the systems based on +* Intel Atom(R) processor with Intel SSE4.1 support +* 4th, 5th, 6th, 7th, and 8th generation Intel(R) Core(TM) processor +* Intel(R) Xeon(R) processor E5 v3 family (formerly Haswell) +* Intel Xeon processor E5 v4 family (formerly Broadwell) +* Intel Xeon Platinum processor family (formerly Skylake) +* Intel(R) Xeon Phi(TM) processor x200 product family (formerly Knights Landing) +* Intel Xeon Phi processor x205 product family (formerly Knights Mill) + +and compatible processors. + +The software dependencies are: +* [Cmake](https://cmake.org/download/) 2.8.0 or later +* [Doxygen](http://www.stack.nl/~dimitri/doxygen/download.html#srcbin) 1.8.5 or later +* C++ compiler with C++11 standard support +* Optional dependencies: + * GNU\* OpenMP\*, LLVM OpenMP, or Intel OpenMP + * Threading Building Blocks (TBB) 2017 or later + * Intel MKL 2017 Update 1 or Intel MKL small libraries + +> **Note** +> Building Intel MKL-DNN with optional dependencies may introduce additional +> runtime dependencies for the library. For details, refer to the corresponding +> software system requirements. + +The software was validated on RedHat\* Enterprise Linux 7 with +* GNU Compiler Collection 4.8, 5.4, 6.1, 7.2, and 8.1 +* Clang\* 3.8.0 +* [Intel C/C++ Compiler](https://software.intel.com/en-us/intel-parallel-studio-xe) + 17.0, 18.0, and 19.0 + +on Windows Server\* 2012 R2 with +* Microsoft Visual C++ 14.0 (Visual Studio 2015 Update 3) +* [Intel C/C++ Compiler](https://software.intel.com/en-us/intel-parallel-studio-xe) + 17.0 and 19.0 + +on macOS\* 10.13 (High Sierra) with +* Apple LLVM version 9.2 (XCode 9.2) +* [Intel C/C++ Compiler](https://software.intel.com/en-us/intel-parallel-studio-xe) + 18.0 and 19.0 + +The implementation uses OpenMP 4.0 SIMD extensions. We recommend using the +Intel C++ Compiler for the best performance results. + +## Installation + +### Build from source + +#### Download source code +Download [Intel MKL-DNN source code](https://github.com/intel/mkl-dnn/archive/master.zip) +or clone [the repository](https://github.com/intel/mkl-dnn.git) to your system. + +``` +git clone https://github.com/intel/mkl-dnn.git +``` + +#### Configure build +Intel MKL-DNN uses a CMake-based build system. You can use CMake options to control the build. +Along with the standard CMake options such as `CMAKE_INSTALL_PREFIX` and `CMAKE_BUILD_TYPE`, +you can pass Intel MKL-DNN specific options: + +|Option | Possible Values (defaults in bold) | Description +|:--- |:--- | :--- +|MKLDNN_LIBRARY_TYPE | **SHARED**, STATIC | Defines the resulting library type +|MKLDNN_THREADING | **OMP**, OMP:INTEL, OMP:COMP, TBB | Defines the threading type +|MKLDNN_BUILD_EXAMPLES | **ON**, OFF | Controls building the examples +|MKLDNN_BUILD_TESTS | **ON**, OFF | Controls building the tests +|MKLDNN_ARCH_OPT_FLAGS | *compiler flags* | Specifies compiler optimization flags (see warning note below) +|VTUNEROOT | *path* | Enables integration with Intel(R) VTune(TM) Amplifier + +> **WARNING** +> +> By default, Intel MKL-DNN is built specifically for the processor type of the +> compiling machine (for example, `-march=native` in the case of GCC). While this option +> gives better performance, the resulting library can be run only on systems +> that are instruction-set compatible with the compiling machine. +> +> Therefore, if Intel MKL-DNN is to be shipped to other platforms (for example, built by +> Linux distribution maintainers), consider setting `MKLDNN_ARCH_OPT_FLAGS` to `""`. + +For more options and details, check [cmake/options.cmake](cmake/options.cmake). + +##### Using Intel MKL (optional) +Intel MKL-DNN includes an optimized matrix-matrix multiplication (GEMM) implementation for modern platforms. +The library can also take advantage of GEMM functions from Intel MKL to improve performance with older +versions of compilers or on older platforms. This behavior is controlled by the `MKLDNN_USE_MKL` option. + +|Option | Possible Values (defaults in bold) | Description +|:--- |:--- | :--- +|MKLDNN_USE_MKL | **DEF**, NONE, ML, FULL, FULL:STATIC | Defines the binary dependency on Intel MKL + +The dynamic library with this functionality is included in the repository. +If you choose to build Intel MKL-DNN with the binary dependency, download the Intel MKL small +libraries using the provided script: + +*Linux/macOS* +``` +cd scripts && ./prepare_mkl.sh && cd .. +``` + +*Windows\** +``` +cd scripts && call prepare_mkl.bat && cd .. +``` + +or manually from [GitHub release section](https://github.com/intel/mkl-dnn/releases), +and unpack it to the `external` directory in the repository root. Intel MKL-DNN +can also be built with full Intel MKL if the latter is installed on the system. +You might need to set the `MKLROOT` environment variable to the path where the full +Intel MKL is installed to help `cmake` locate the library. + +> **Note** +> +> Using Intel MKL small libraries currently works only for Intel MKL-DNN built with +> OpenMP. Building with Intel TBB requires either the full Intel MKL library +> or a standalone build. +> +> Using Intel MKL or Intel MKL small libraries will introduce additional +> runtime dependencies. For additional information, refer to Intel MKL +> [system requirements](https://software.intel.com/en-us/articles/intel-math-kernel-library-intel-mkl-2019-system-requirements). + +##### Threading +Intel MKL-DNN is parallelized and can use the OpenMP or TBB threading runtime. OpenMP threading is the default build mode +and is recommended for the best performance. TBB support is experimental. This behavior is controlled by the `MKLDNN_THREADING` option. + +|Option | Possible Values (defaults in bold) | Description +|:--- |:--- | :--- +|MKLDNN_THREADING | **OMP**, OMP:INTEL, OMP:COMP, TBB | Defines the threading type + +##### OpenMP +Intel MKL-DNN can use Intel, GNU or CLANG OpenMP runtime. Because different OpenMP runtimes may not be binary compatible, +it's important to ensure that only one OpenMP runtime is used throughout the +application. Having more than one OpenMP runtime initialized may lead to +undefined behavior including incorrect results or crashes. + +Intel MKL-DNN library built with the binary dependency will link against the Intel OpenMP +runtime included with the Intel MKL small libraries package. The Intel OpenMP runtime +is binary compatible with the GNU OpenMP and Clang OpenMP runtimes and is +recommended for the best performance results. + +Intel MKL-DNN library built standalone will use the OpenMP runtime supplied by +the compiler, so as long as both the library and the application use the +same compiler, the correct OpenMP runtime will be used. + +##### TBB +TBB support is experimental. Intel MKL-DNN has limited optimizations done for Intel TBB and has some functional +limitations if built with Intel TBB. + +Functional limitations: +* Convolution with Winograd algorithm is not supported + +Performance limitations (mostly less parallelism than in case of OpenMP): +* Batch normalization +* Convolution backward by weights +* mkldnn_sgemm + +> **WARNING** +> +> If the library is built with the full Intel MKL, the user is expected to set +> the `MKL_THREADING_LAYER` environment variable to either `tbb` or `sequential` in order +> to force Intel MKL to use Intel TBB for parallelization or to be sequential, +> respectively. Without this setting, Intel MKL (RT library) tries +> to use OpenMP for parallelization by default. + +#### Build on Linux/macOS +Ensure that all software dependencies are in place and have at least the minimal +supported version. + +Configure CMake and create a makefile: + +``` +mkdir -p build && cd build && cmake $CMAKE_OPTIONS .. +``` + +Build the application: + +``` +make +``` + +The build can be validated with the unit-test suite: + +``` +ctest +``` + +The reference manual is provided inline and can also be generated in HTML format with Doxygen: + +``` +make doc +``` + +Documentation will reside in the `build/reference/html` folder. + +Finally: + +``` +make install +``` + +will place the header files, libraries, and documentation in `/usr/local`. To change +the installation path, use the option `-DCMAKE_INSTALL_PREFIX=` when invoking CMake. + +#### Build on Windows +Ensure that all software dependencies are in place and have at least the minimal +supported version. + +> **NOTE** +> +> Building Intel MKL-DNN from a terminal requires using either the Intel Parallel Studio command prompt +> or the Microsoft\* Visual Studio\* developer command prompt instead of the default Windows command prompt. +> +> The Intel(R) Parallel Studio command prompt is an item in the **Start** menu in the **Intel Parallel Studio +> \** folder that has a Windows Command Prompt icon and a name like **Compiler 18.0 Update 5…**. +> +> The default for building the project for the Intel C++ Compiler is to use the Intel +> Parallel Studio developer command prompt. + +Configure CMake and create a Microsoft Visual Studio solution: + +``` +mkdir build & cd build && cmake -G "Visual Studio 15 2017 Win64" .. +``` + +For the solution to use Intel C++ Compiler: + +``` +cmake -G "Visual Studio 15 2017 Win64" -T "Intel C++ Compiler 18.0" .. +``` + +After you have built the initial project using CMake, you can then open the project with +Microsoft Visual Studio and build from there. You can also use msbuild command-line tool +to build from the command line: + +``` +msbuild "Intel(R) MKL-DNN.sln" /p:Configuration=Release [/t:rebuild] /m +``` +where the optional argument `/t:rebuild` rebuilds the project. + +The build can be validated with the unit-test suite: + +``` +ctest +``` + +## Linking Your Application + +### Linux/macOS +Intel MKL-DNN includes several header files providing C and C++ APIs for +the functionality and one or several dynamic libraries depending on how +Intel MKL-DNN was built. + +**Linux** + +|File | Description +|:--- |:--- +|include/mkldnn.h | C header +|include/mkldnn.hpp | C++ header +|include/mkldnn_types.h | Auxiliary C header +|lib/libmkldnn.so | Intel MKL-DNN dynamic library +|lib/libmkldnn.a | Intel MKL-DNN static library (if built with `MKLDNN_LIBRARY_TYPE=STATIC`) +|lib/libiomp5.so | Intel OpenMP\* runtime library (if built with `MKLDNN_USE_MKL=ML`) +|lib/libmklml_gnu.so | Intel MKL small library for GNU OpenMP runtime (if built with `MKLDNN_USE_MKL=ML`) +|lib/libmklml_intel.so | Intel MKL small library for Intel OpenMP runtime (if built with `MKLDNN_USE_MKL=ML`) + +**macOS** + +|File | Description +|:--- |:--- +|include/mkldnn.h | C header +|include/mkldnn.hpp | C++ header +|include/mkldnn_types.h | Auxiliary C header +|lib/libmkldnn.dylib | Intel MKL-DNN dynamic library +|lib/libmkldnn.a | Intel MKL-DNN static library (if built with `MKLDNN_LIBRARY_TYPE=STATIC`) +|lib/libiomp5.dylib | Intel OpenMP\* runtime library (if built with `MKLDNN_USE_MKL=ML`) +|lib/libmklml_gnu.dylib | Intel MKL small library for GNU OpenMP runtime (if built with `MKLDNN_USE_MKL=ML`) +|lib/libmklml_intel.dylib | Intel MKL small library for Intel OpenMP runtime (if built with `MKLDNN_USE_MKL=ML`) + +Linkline examples below assume that Intel MKL-DNN is installed in the directory +defined in the MKLDNNROOT environment variable. + +``` +g++ -std=c++11 -I${MKLDNNROOT}/include -L${MKLDNNROOT}/lib simple_net.cpp -lmkldnn +clang -std=c++11 -I${MKLDNNROOT}/include -L${MKLDNNROOT}/lib simple_net.cpp -lmkldnn +icpc -std=c++11 -I${MKLDNNROOT}/include -L${MKLDNNROOT}/lib simple_net.cpp -lmkldnn +``` + +> **WARNING** +> +> Using the GNU compiler with the `-fopenmp` and `-liomp5` options will link the +> application with both the Intel and GNU OpenMP runtime libraries. This will lead +> to undefined behavior in the application. + +> **NOTE** +> +> Applications linked dynamically will resolve the dependencies at runtime. +> Make sure that the dependencies are available in the standard locations +> defined by the operating system, in the locatons listed in `LD_LIBRARY_PATH` (Linux), +> `DYLD_LIBRARY_PATH` (macOS) environment variables, or `rpath` mechanism. + +### Windows +Intel MKL-DNN includes several header files providing C and C++ APIs for +the functionality and one or several dynamic libraries depending on how +Intel MKL-DNN was built. + +|File | Description +|:--- |:--- +|bin\libmkldnn.dll | Intel MKL-DNN dynamic library +|bin\libiomp5.dll | Intel OpenMP\* runtime library (if built with `MKLDNN_USE_MKL=ML`) +|bin\libmklml.dll | Intel MKL small library (if built with `MKLDNN_USE_MKL=ML`) +|include\mkldnn.h | C header +|include\mkldnn.hpp | C++ header +|include\mkldnn_types.h | Auxiliary C header +|lib\libmkldnn.lib | Intel MKL-DNN import library +|lib\libiomp5.lib | Intel OpenMP\* runtime import library (if built with `MKLDNN_USE_MKL=ML`) +|lib\libmklml.lib | Intel MKL small library import library (if built with `MKLDNN_USE_MKL=ML`) + +To link the application from the command line, set up the `LIB` and `INCLUDE` environment variables to point to the locations of +the Intel MKL-DNN headers and libraries. The Linkline examples below assume that Intel MKL-DNN is installed in the directory +defined in the MKLDNNROOT environment variable. + +``` +set INCLUDE=%MKLDNNROOT%\include;%INCLUDE% +set LIB=%MKLDNNROOT%\lib;%LIB% +icl /Qstd=c++11 /qopenmp simple_net.cpp mkldnn.lib +cl simple_net.cpp mkldnn.lib +``` + +Refer to [Microsoft Visual Studio documentation](https://docs.microsoft.com/en-us/cpp/build/walkthrough-creating-and-using-a-dynamic-link-library-cpp?view=vs-2017) +on linking the application using MSVS solutions. + +> **NOTE** +> Applications linked dynamically will resolve the dependencies at runtime. +> Make sure that the dependencies are available in the standard locations +> defined by the operating system or in the locatons listed in the `PATH` environment variable. + +-------- + +[Legal Information](doc/legal_information.md) diff --git a/oidn/mkl-dnn/_clang-format b/oidn/mkl-dnn/_clang-format new file mode 100644 index 0000000..5b4bf54 --- /dev/null +++ b/oidn/mkl-dnn/_clang-format @@ -0,0 +1,97 @@ +#=============================================================================== +# Copyright 2016-2018 Intel Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +#=============================================================================== + +# vim:ft=conf + +ColumnLimit: 80 + +Language: 'Cpp' +Standard: 'Cpp11' +DisableFormat: false + +ContinuationIndentWidth: 8 +IndentWidth: 4 +TabWidth: 4 +AccessModifierOffset: -4 +UseTab: 'Never' + +AlignAfterOpenBracket: 'DontAlign' +AlignConsecutiveAssignments: 'false' +AlignConsecutiveDeclarations: 'false' +AlignEscapedNewlinesLeft: 'true' +AlignOperands: 'false' +AlignTrailingComments: 'false' + +AllowAllParametersOfDeclarationOnNextLine: 'true' +AllowShortBlocksOnASingleLine: 'true' +AllowShortCaseLabelsOnASingleLine: 'true' +AllowShortFunctionsOnASingleLine: 'Inline' +AllowShortIfStatementsOnASingleLine: 'false' +AllowShortLoopsOnASingleLine: 'false' + +AlwaysBreakAfterDefinitionReturnType: 'None' +AlwaysBreakAfterReturnType: 'None' +AlwaysBreakBeforeMultilineStrings: 'true' +AlwaysBreakTemplateDeclarations: 'true' + +BinPackArguments: 'true' +BinPackParameters: 'true' + +BreakBeforeBraces: 'Custom' +BraceWrapping: { + AfterClass: 'true' + AfterControlStatement: 'false' + AfterEnum : 'false' + AfterFunction : 'false' + AfterNamespace : 'false' + AfterStruct : 'false' + AfterUnion : 'false' + BeforeCatch : 'false' + BeforeElse : 'false' + IndentBraces : 'false' +} + +BreakBeforeBinaryOperators: 'All' +BreakBeforeTernaryOperators: 'true' + +BreakConstructorInitializersBeforeComma: 'true' +ConstructorInitializerAllOnOneLineOrOnePerLine: 'true' +ConstructorInitializerIndentWidth: 4 + +Cpp11BracedListStyle: 'false' + +DerivePointerAlignment: 'false' +PointerAlignment: 'Right' + +IndentCaseLabels: 'false' +IndentWrappedFunctionNames: 'false' + +KeepEmptyLinesAtTheStartOfBlocks: 'true' +MaxEmptyLinesToKeep: 1 + +NamespaceIndentation: 'None' + +SpaceAfterCStyleCast: 'false' +SpaceBeforeAssignmentOperators: 'true' +SpaceBeforeParens: 'ControlStatements' +SpaceInEmptyParentheses: 'false' +SpacesBeforeTrailingComments: 1 +SpacesInAngles: 'false' +SpacesInCStyleCastParentheses: 'false' +SpacesInContainerLiterals: 'false' +SpacesInParentheses: 'false' +SpacesInSquareBrackets: 'false' + diff --git a/oidn/mkl-dnn/cmake/Doxygen.cmake b/oidn/mkl-dnn/cmake/Doxygen.cmake new file mode 100644 index 0000000..d23c617 --- /dev/null +++ b/oidn/mkl-dnn/cmake/Doxygen.cmake @@ -0,0 +1,57 @@ +#=============================================================================== +# Copyright 2016-2018 Intel Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +#=============================================================================== + +# Locates Doxygen and configures documentation generation +#=============================================================================== + +if(Doxygen_cmake_included) + return() +endif() +set(Doxygen_cmake_included true) + +find_package(Doxygen) +if(DOXYGEN_FOUND) + set(DOXYGEN_OUTPUT_DIR ${CMAKE_CURRENT_BINARY_DIR}/reference) + set(DOXYGEN_STAMP_FILE ${CMAKE_CURRENT_BINARY_DIR}/doc.stamp) + configure_file( + ${CMAKE_CURRENT_SOURCE_DIR}/doc/Doxyfile.in + ${CMAKE_CURRENT_BINARY_DIR}/Doxyfile + @ONLY) + configure_file( + ${CMAKE_CURRENT_SOURCE_DIR}/doc/header.html.in + ${CMAKE_CURRENT_BINARY_DIR}/header.html + @ONLY) + file(GLOB_RECURSE HEADERS + ${PROJECT_SOURCE_DIR}/include/*.h + ${PROJECT_SOURCE_DIR}/include/*.hpp + ) + file(GLOB_RECURSE DOX + ${PROJECT_SOURCE_DIR}/doc/* + ) + add_custom_command( + OUTPUT ${DOXYGEN_STAMP_FILE} + DEPENDS ${HEADERS} ${DOX} + COMMAND ${DOXYGEN_EXECUTABLE} Doxyfile + COMMAND cmake -E touch ${DOXYGEN_STAMP_FILE} + WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR} + COMMENT "Generating API documentation with Doxygen" VERBATIM) + add_custom_target(doc DEPENDS ${DOXYGEN_STAMP_FILE}) + install( + DIRECTORY ${DOXYGEN_OUTPUT_DIR} + DESTINATION share/doc/${LIB_NAME} OPTIONAL) +endif(DOXYGEN_FOUND) + + diff --git a/oidn/mkl-dnn/cmake/MKL.cmake b/oidn/mkl-dnn/cmake/MKL.cmake new file mode 100644 index 0000000..5a716bf --- /dev/null +++ b/oidn/mkl-dnn/cmake/MKL.cmake @@ -0,0 +1,277 @@ +#=============================================================================== +# Copyright 2016-2018 Intel Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +#=============================================================================== + +# Locate Intel(R) MKL installation using MKLROOT or look in +# ${CMAKE_CURRENT_SOURCE_DIR}/external +#=============================================================================== + +if(MKL_cmake_included) + return() +endif() +set(MKL_cmake_included true) +include("cmake/utils.cmake") +include("cmake/options.cmake") + +# set SKIP_THIS_MKL to true if given configuration is not supported +function(maybe_skip_this_mkl LIBNAME) + # Optimism... + set(SKIP_THIS_MKL False PARENT_SCOPE) + + # Both mklml_intel and mklml_gnu are OpenMP based. + # So in case of TBB link with Intel MKL (RT library) and either set: + # MKL_THREADING_LAYER=tbb + # to make Intel MKL use TBB threading as well, or + # MKL_THREADING_LAYER=sequential + # to make Intel MKL be sequential. + if (MKLDNN_THREADING STREQUAL "TBB" AND LIBNAME MATCHES "mklml") + set(SKIP_THIS_MKL True PARENT_SCOPE) + endif() + + # user doesn't want Intel MKL at all + if (MKLDNN_USE_MKL STREQUAL "NONE") + set(SKIP_THIS_MKL True PARENT_SCOPE) + endif() + + # user specifies Intel MKL-ML should be used + if (MKLDNN_USE_MKL STREQUAL "ML") + if (LIBNAME STREQUAL "mkl_rt") + set(SKIP_THIS_MKL True PARENT_SCOPE) + endif() + endif() + + # user specifies full Intel MKL should be used + if (MKLDNN_USE_MKL MATCHES "FULL") + if (LIBNAME MATCHES "mklml") + set(SKIP_THIS_MKL True PARENT_SCOPE) + endif() + endif() + + # avoid using Intel MKL-ML that is not compatible with compiler's OpenMP RT + if (MKLDNN_THREADING STREQUAL "OMP:COMP") + if ((LIBNAME STREQUAL "mklml_intel" OR LIBNAME STREQUAL "mklml") + AND (NOT CMAKE_CXX_COMPILER_ID STREQUAL "Intel")) + set(SKIP_THIS_MKL True PARENT_SCOPE) + elseif (LIBNAME STREQUAL "mklml_gnu" + AND (NOT CMAKE_CXX_COMPILER_ID STREQUAL "GNU")) + set(SKIP_THIS_MKL True PARENT_SCOPE) + endif() + elseif (MKLDNN_THREADING STREQUAL "OMP:INTEL") + if (LIBNAME STREQUAL "mklml_gnu") + set(SKIP_THIS_MKL True PARENT_SCOPE) + endif() + endif() +endfunction() + +function(detect_mkl LIBNAME) + if(HAVE_MKL) + return() + endif() + + maybe_skip_this_mkl(${LIBNAME}) + set_if(SKIP_THIS_MKL MAYBE_SKIP_MSG "... skipped") + message(STATUS "Detecting Intel(R) MKL: trying ${LIBNAME}${MAYBE_SKIP_MSG}") + + if (SKIP_THIS_MKL) + return() + endif() + + find_path(MKLINC mkl_cblas.h + HINTS ${MKLROOT}/include $ENV{MKLROOT}/include) + + # skip full Intel MKL while looking for Intel MKL-ML + if (MKLINC AND LIBNAME MATCHES "mklml") + get_filename_component(__mklinc_root "${MKLINC}" PATH) + find_library(tmp_MKLLIB NAMES "mkl_rt" + HINTS ${__mklinc_root}/lib/intel64 + NO_DEFAULT_PATH) + set_if(tmp_MKLLIB MKLINC "") + unset(tmp_MKLLIB CACHE) + endif() + + if(NOT MKLINC) + file(GLOB_RECURSE MKLINC + ${CMAKE_CURRENT_SOURCE_DIR}/external/*/mkl_cblas.h) + if(MKLINC) + # if user has multiple version under external/ then guess last + # one alphabetically is "latest" and warn + list(LENGTH MKLINC MKLINCLEN) + if(MKLINCLEN GREATER 1) + list(SORT MKLINC) + list(REVERSE MKLINC) + list(GET MKLINC 0 MKLINCLST) + set(MKLINC "${MKLINCLST}") + endif() + get_filename_component(MKLINC ${MKLINC} PATH) + endif() + endif() + if(NOT MKLINC) + return() + endif() + + get_filename_component(__mklinc_root "${MKLINC}" PATH) + + unset(MKLLIB CACHE) # make find_library to redo the search + # At first, try to locate Intel MKL in the path where the header was found + find_library(MKLLIB NAMES ${LIBNAME} + PATHS ${__mklinc_root}/lib ${__mklinc_root}/lib/intel64 + NO_DEFAULT_PATH) + # On failure, check the system paths + find_library(MKLLIB NAMES ${LIBNAME}) + if(NOT MKLLIB) + return() + endif() + + if(WIN32) + set(MKLREDIST ${MKLINC}/../../redist/) + find_file(MKLDLL NAMES ${LIBNAME}.dll + HINTS + ${MKLREDIST}/mkl + ${MKLREDIST}/intel64/mkl + ${__mklinc_root}/lib) + if(NOT MKLDLL) + return() + endif() + endif() + + if(NOT CMAKE_CXX_COMPILER_ID STREQUAL "Intel") + get_filename_component(MKLLIBPATH ${MKLLIB} PATH) + find_library(MKLIOMP5LIB + NAMES "iomp5" "iomp5md" "libiomp5" "libiomp5md" + HINTS ${MKLLIBPATH} + ${MKLLIBPATH}/../../lib + ${MKLLIBPATH}/../../../lib/intel64 + ${MKLLIBPATH}/../../compiler/lib + ${MKLLIBPATH}/../../../compiler/lib/intel64) + if(NOT MKLIOMP5LIB) + return() + endif() + if(WIN32) + find_file(MKLIOMP5DLL + NAMES "libiomp5.dll" "libiomp5md.dll" + HINTS ${MKLREDIST}/../compiler ${__mklinc_root}/lib) + if(NOT MKLIOMP5DLL) + return() + endif() + endif() + else() + set(MKLIOMP5LIB) + set(MKLIOMP5DLL) + endif() + + get_filename_component(MKLLIBPATH "${MKLLIB}" PATH) + string(FIND "${MKLLIBPATH}" ${CMAKE_CURRENT_SOURCE_DIR}/external __idx) + if(${__idx} EQUAL 0) + if(WIN32) + install(PROGRAMS ${MKLDLL} ${MKLIOMP5DLL} + DESTINATION ${CMAKE_INSTALL_BINDIR}) + else() + install(PROGRAMS ${MKLLIB} ${MKLIOMP5LIB} + DESTINATION ${CMAKE_INSTALL_LIBDIR}) + endif() + endif() + + if(WIN32) + # Add paths to DLL to %PATH% on Windows + get_filename_component(MKLDLLPATH "${MKLDLL}" PATH) + append_to_windows_path_list(CTESTCONFIG_PATH "${MKLDLLPATH}") + set(CTESTCONFIG_PATH "${CTESTCONFIG_PATH}" PARENT_SCOPE) + endif() + + # TODO: cache the value + set(HAVE_MKL TRUE PARENT_SCOPE) + set(MKLINC ${MKLINC} PARENT_SCOPE) + set(MKLLIB "${MKLLIB}" PARENT_SCOPE) + set(MKLDLL "${MKLDLL}" PARENT_SCOPE) + if(LIBNAME MATCHES "mklml") + set(MKLDNN_USES_MKL "MKLML:SHARED" PARENT_SCOPE) + else() + set(MKLDNN_USES_MKL "FULL:SHARED" PARENT_SCOPE) + endif() + + set(MKLIOMP5LIB "${MKLIOMP5LIB}" PARENT_SCOPE) + set(MKLIOMP5DLL "${MKLIOMP5DLL}" PARENT_SCOPE) +endfunction() + +function(set_static_mkl_libs libpath) + set_ternary(lib WIN32 "" "lib") + set_ternary(a WIN32 ".lib" ".a") + + if (MKLDNN_THREADING STREQUAL "TBB") + set(thr_name "tbb_thread") + elseif (MKLDNN_THREADING STREQUAL "OMP:COMP" AND CMAKE_CXX_COMPILER_ID STREQUAL "GNU") + set(thr_name "gnu_thread") + else() + set(thr_name "intel_thread") + endif() + + find_library(mkl_iface NAMES "${lib}mkl_intel_lp64${a}" HINTS ${libpath}) + find_library(mkl_thr NAMES "${lib}mkl_${thr_name}${a}" HINTS ${libpath}) + find_library(mkl_core NAMES "${lib}mkl_core${a}" HINTS ${libpath}) + + set(MKLLIB "${mkl_iface};${mkl_thr};${mkl_core}") + if (UNIX AND NOT APPLE) + list(APPEND MKLLIB "${mkl_iface};${mkl_thr};${mkl_core}") + endif() + set_if(UNIX MKLLIB "${MKLLIB};m;dl") + set(MKLLIB "${MKLLIB}" PARENT_SCOPE) +endfunction() + +set(MKLDNN_USES_MKL "") +detect_mkl("mklml_intel") +detect_mkl("mklml_gnu") +detect_mkl("mklml") +detect_mkl("mkl_rt") + +if(HAVE_MKL) + if (MKLDNN_USE_MKL STREQUAL "FULL:STATIC") + set(MKLDLL "") + get_filename_component(MKLLIBPATH "${MKLLIB}" PATH) + set_static_mkl_libs(${MKLLIBPATH}) + list(APPEND EXTRA_STATIC_LIBS ${MKLLIB}) + set(MKLDNN_USES_MKL "FULL:STATIC") + else() + list(APPEND EXTRA_SHARED_LIBS ${MKLLIB}) + endif() + + add_definitions(-DUSE_MKL -DUSE_CBLAS) + include_directories(AFTER ${MKLINC}) + + set(MSG "Intel(R) MKL:") + message(STATUS "${MSG} include ${MKLINC}") + message(STATUS "${MSG} lib ${MKLLIB}") + if(WIN32 AND MKLDLL) + message(STATUS "${MSG} dll ${MKLDLL}") + endif() +else() + if (MKLDNN_USE_MKL STREQUAL "NONE") + return() + endif() + + if (NOT MKLDNN_USE_MKL STREQUAL "DEF") + set(FAIL_WITHOUT_MKL True) + endif() + + if(DEFINED ENV{FAIL_WITHOUT_MKL} OR DEFINED FAIL_WITHOUT_MKL) + set(SEVERITY "FATAL_ERROR") + else() + set(SEVERITY "WARNING") + endif() + message(${SEVERITY} + "Intel(R) MKL not found. Some performance features may not be " + "available. Please run scripts/prepare_mkl.sh to download a minimal " + "set of libraries or get a full version from " + "https://software.intel.com/en-us/intel-mkl") +endif() diff --git a/oidn/mkl-dnn/cmake/OpenMP.cmake b/oidn/mkl-dnn/cmake/OpenMP.cmake new file mode 100644 index 0000000..84303f8 --- /dev/null +++ b/oidn/mkl-dnn/cmake/OpenMP.cmake @@ -0,0 +1,160 @@ +#=============================================================================== +# Copyright 2017-2018 Intel Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +#=============================================================================== + +# Manage OpenMP-related compiler flags +#=============================================================================== + +if(OpenMP_cmake_included) + return() +endif() +set(OpenMP_cmake_included true) +include("cmake/Threading.cmake") +#include("cmake/MKL.cmake") + +if (NOT MKLDNN_THREADING MATCHES "OMP") + # Enable OpenMP SIMD only + if(WIN32) + if(CMAKE_CXX_COMPILER_ID STREQUAL MSVC) + add_definitions(/Qpar) + elseif(CMAKE_CXX_COMPILER_ID STREQUAL "Intel") + set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} /Qopenmp-simd") + set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /Qopenmp-simd") + endif() + elseif(CMAKE_CXX_COMPILER_ID STREQUAL "Intel") + set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -qopenmp-simd") + set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -qopenmp-simd") + else() + set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -fopenmp-simd") + set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fopenmp-simd") + endif() + return() +endif() + +set(MKLDNN_USES_INTEL_OPENMP FALSE) + +if (APPLE AND CMAKE_CXX_COMPILER_ID STREQUAL "Clang") + # OSX Clang doesn't have OpenMP by default. + # But we still want to build the library. + set(_omp_severity "WARNING") +else() + set(_omp_severity "FATAL_ERROR") +endif() + +macro(forbid_link_compiler_omp_rt) + if (NOT WIN32) + set_if(OpenMP_C_FOUND + CMAKE_C_CREATE_SHARED_LIBRARY_FORBIDDEN_FLAGS + "${OpenMP_C_FLAGS}") + set_if(OpenMP_CXX_FOUND + CMAKE_CXX_CREATE_SHARED_LIBRARY_FORBIDDEN_FLAGS + "${OpenMP_CXX_FLAGS}") + if (NOT APPLE) + append(CMAKE_SHARED_LINKER_FLAGS "-Wl,--as-needed") + endif() + endif() +endmacro() + +macro(use_intel_omp_rt) + # fast return + if (CMAKE_CXX_COMPILER_ID STREQUAL "Intel") + set(MKLDNN_USES_INTEL_OPENMP TRUE) + return() + endif() + + # Do not link with compiler-native OpenMP library if Intel MKL is present. + # Rationale: Intel MKL comes with Intel OpenMP library which is compatible + # with all libraries shipped with compilers that Intel MKL-DNN supports. + get_filename_component(MKLIOMP5LIB "${MKLIOMP5LIB}" PATH) + find_library(IOMP5LIB + NAMES "iomp5" "iomp5md" "libiomp5" "libiomp5md" + HINTS ${MKLIOMP5LIB} ) + if(IOMP5LIB) + forbid_link_compiler_omp_rt() + if (WIN32) + get_filename_component(MKLIOMP5DLL "${MKLIOMP5DLL}" PATH) + find_file(IOMP5DLL + NAMES "libiomp5.dll" "libiomp5md.dll" + HINTS ${MKLIOMP5DLL}) + endif() + list(APPEND EXTRA_SHARED_LIBS ${IOMP5LIB}) + else() + if (MKLDNN_THREADING STREQUAL "OMP:INTEL") + message(${_omp_severity} "Intel OpenMP runtime could not be found. " + "Please either use OpenMP runtime that comes with the compiler " + "(via -DMKLDNN_THREADING={OMP,OMP:COMP}), or " + "explicitely provide the path to libiomp with the " + "-DCMAKE_LIBRARY_PATH option") + endif() + endif() +endmacro() + +if(WIN32 AND ${CMAKE_CXX_COMPILER_ID} STREQUAL MSVC) + add_definitions(/Qpar) + add_definitions(/openmp) + set(OpenMP_CXX_FOUND true) +elseif(MSVC AND CMAKE_CXX_COMPILER_ID STREQUAL "Clang") + append(CMAKE_C_FLAGS "-Xclang -fopenmp") + append(CMAKE_CXX_FLAGS "-Xclang -fopenmp") + set(OpenMP_CXX_FOUND true) + list(APPEND EXTRA_SHARED_LIBS ${IOMP5LIB}) +else() + find_package(OpenMP) + #newer version for findOpenMP (>= v. 3.9) + if(CMAKE_VERSION VERSION_LESS "3.9" AND OPENMP_FOUND) + if(${CMAKE_MAJOR_VERSION} VERSION_LESS "3" AND ${CMAKE_CXX_COMPILER_ID} STREQUAL "Intel") + # Override FindOpenMP flags for Intel Compiler (otherwise deprecated) + set(OpenMP_CXX_FLAGS "-fopenmp") + set(OpenMP_C_FLAGS "-fopenmp") + endif() + set(OpenMP_C_FOUND true) + set(OpenMP_CXX_FOUND true) + endif() + append_if(OpenMP_C_FOUND CMAKE_SRC_CCXX_FLAGS "${OpenMP_C_FLAGS}") +endif() + +if (MKLDNN_THREADING MATCHES "OMP") + if (OpenMP_CXX_FOUND) + set_threading("OMP") + append(CMAKE_TEST_CCXX_FLAGS "${OpenMP_CXX_FLAGS}") + append(CMAKE_EXAMPLE_CCXX_FLAGS "${OpenMP_CXX_FLAGS}") + else() + message(${_omp_severity} "OpenMP library could not be found. " + "Proceeding might lead to highly sub-optimal performance.") + endif() + + if (MKLDNN_THREADING STREQUAL "OMP:COMP") + set(IOMP5LIB "") + set(IOMP5DLL "") + else() + use_intel_omp_rt() + endif() + + if(MKLIOMP5LIB) + set(MKLDNN_USES_INTEL_OPENMP TRUE) + endif() +else() + # Compilation happens with OpenMP to enable `#pragma omp simd` + # but during linkage OpenMP dependency should be avoided + forbid_link_compiler_omp_rt() + return() +endif() + +set_ternary(_omp_lib_msg IOMP5LIB "${IOMP5LIB}" "provided by compiler") +message(STATUS "OpenMP lib: ${_omp_lib_msg}") +if(WIN32) + set_ternary(_omp_dll_msg IOMP5DLL "${IOMP5LIB}" "provided by compiler") + message(STATUS "OpenMP dll: ${_omp_dll_msg}") +endif() diff --git a/oidn/mkl-dnn/cmake/SDL.cmake b/oidn/mkl-dnn/cmake/SDL.cmake new file mode 100644 index 0000000..73c9953 --- /dev/null +++ b/oidn/mkl-dnn/cmake/SDL.cmake @@ -0,0 +1,61 @@ +#=============================================================================== +# Copyright 2017-2018 Intel Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +#=============================================================================== + +# Manage secure Development Lifecycle-related compiler flags +#=============================================================================== + +if(SDL_cmake_included) + return() +endif() +set(SDL_cmake_included true) +#include("cmake/utils.cmake") + +if(UNIX) + set(CMAKE_CCXX_FLAGS "-fPIC -Wformat -Wformat-security") + append(CMAKE_CXX_FLAGS_RELEASE "-D_FORTIFY_SOURCE=2") + append(CMAKE_C_FLAGS_RELEASE "-D_FORTIFY_SOURCE=2") + if("${CMAKE_CXX_COMPILER_ID}" STREQUAL "GNU") + if(CMAKE_CXX_COMPILER_VERSION VERSION_LESS 4.9) + append(CMAKE_CCXX_FLAGS "-fstack-protector-all") + else() + append(CMAKE_CCXX_FLAGS "-fstack-protector-strong") + endif() + + # GCC might be very paranoid for partial structure initialization, e.g. + # struct { int a, b; } s = { 0, }; + # However the behavior is triggered by `Wmissing-field-initializers` + # only. To prevent warnings on users' side who use the library and turn + # this warning on, let's use it too. Applicable for the library sources + # and interfaces only (tests currently rely on that fact heavily) + append(CMAKE_SRC_CCXX_FLAGS "-Wmissing-field-initializers") + append(CMAKE_EXAMPLE_CCXX_FLAGS "-Wmissing-field-initializers") + elseif("${CMAKE_CXX_COMPILER_ID}" STREQUAL "Clang") + append(CMAKE_CCXX_FLAGS "-fstack-protector-all") + elseif("${CMAKE_CXX_COMPILER_ID}" STREQUAL "Intel") + append(CMAKE_CXX_FLAGS "-fstack-protector") + endif() + if(APPLE) + append(CMAKE_SHARED_LINKER_FLAGS "-Wl,-bind_at_load") + append(CMAKE_EXE_LINKER_FLAGS "-Wl,-bind_at_load") + else() + append(CMAKE_EXE_LINKER_FLAGS "-pie") + append(CMAKE_SHARED_LINKER_FLAGS "-Wl,-z,noexecstack -Wl,-z,relro -Wl,-z,now") + append(CMAKE_EXE_LINKER_FLAGS "-Wl,-z,noexecstack -Wl,-z,relro -Wl,-z,now") + endif() +endif() + +set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${CMAKE_CCXX_FLAGS}") +set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${CMAKE_CCXX_FLAGS}") diff --git a/oidn/mkl-dnn/cmake/TBB.cmake b/oidn/mkl-dnn/cmake/TBB.cmake new file mode 100644 index 0000000..c3be583 --- /dev/null +++ b/oidn/mkl-dnn/cmake/TBB.cmake @@ -0,0 +1,199 @@ +#=============================================================================== +# Copyright 2009-2018 Intel Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +#=============================================================================== + +if(TBB_cmake_included) + return() +endif() +set(TBB_cmake_included true) +#include("cmake/Threading.cmake") + +if(NOT MKLDNN_THREADING STREQUAL "TBB") + return() +endif() + +if(NOT TBB_ROOT) + set(TBB_ROOT $ENV{TBB_ROOT}) +endif() +if(NOT TBB_ROOT) + set(TBB_ROOT $ENV{TBBROOT}) +endif() + +if(WIN32) + # workaround for parentheses in variable name / CMP0053 + set(PROGRAMFILESx86 "PROGRAMFILES(x86)") + set(PROGRAMFILES32 "$ENV{${PROGRAMFILESx86}}") + if(NOT PROGRAMFILES32) + set(PROGRAMFILES32 "$ENV{PROGRAMFILES}") + endif() + if(NOT PROGRAMFILES32) + set(PROGRAMFILES32 "C:/Program Files (x86)") + endif() + find_path(TBB_ROOT include/tbb/tbb.h + DOC "Root of TBB installation" + PATHS ${PROJECT_SOURCE_DIR}/tbb + NO_DEFAULT_PATH + ) + find_path(TBB_ROOT include/tbb/tbb.h + HINTS ${TBB_ROOT} + PATHS + ${PROJECT_SOURCE_DIR}/../tbb + "${PROGRAMFILES32}/IntelSWTools/compilers_and_libraries/windows/tbb" + "${PROGRAMFILES32}/Intel/Composer XE/tbb" + "${PROGRAMFILES32}/Intel/compilers_and_libraries/windows/tbb" + ) + + if(CMAKE_SIZEOF_VOID_P EQUAL 8) + set(TBB_ARCH intel64) + else() + set(TBB_ARCH ia32) + endif() + + if(MSVC10) + set(TBB_VCVER vc10) + elseif(MSVC11) + set(TBB_VCVER vc11) + elseif(MSVC12) + set(TBB_VCVER vc12) + else() + set(TBB_VCVER vc14) + endif() + + if(TBB_ROOT STREQUAL "") + find_path(TBB_INCLUDE_DIR tbb/task_scheduler_init.h) + find_path(TBB_BIN_DIR tbb.dll) + find_library(TBB_LIBRARY tbb) + find_library(TBB_LIBRARY_MALLOC tbbmalloc) + else() + set(TBB_INCLUDE_DIR TBB_INCLUDE_DIR-NOTFOUND) + set(TBB_BIN_DIR TBB_BIN_DIR-NOTFOUND) + set(TBB_LIBRARY TBB_LIBRARY-NOTFOUND) + set(TBB_LIBRARY_MALLOC TBB_LIBRARY_MALLOC-NOTFOUND) + find_path(TBB_INCLUDE_DIR tbb/task_scheduler_init.h PATHS ${TBB_ROOT}/include NO_DEFAULT_PATH) + find_path(TBB_BIN_DIR tbb.dll + HINTS + ${TBB_ROOT}/bin/${TBB_ARCH}/${TBB_VCVER} + ${TBB_ROOT}/bin + ${TBB_ROOT}/../redist/${TBB_ARCH}/tbb/${TBB_VCVER} + ${TBB_ROOT}/../redist/${TBB_ARCH}_win/tbb/${TBB_VCVER} + NO_DEFAULT_PATH + ) + set(TBB_LIB_DIR ${TBB_ROOT}/lib/${TBB_ARCH}/${TBB_VCVER}) + find_library(TBB_LIBRARY tbb PATHS ${TBB_LIB_DIR} ${TBB_ROOT}/lib NO_DEFAULT_PATH) + find_library(TBB_LIBRARY_MALLOC tbbmalloc PATHS ${TBB_LIB_DIR} ${TBB_ROOT}/lib NO_DEFAULT_PATH) + endif() + +else() + + find_path(TBB_ROOT include/tbb/tbb.h + DOC "Root of TBB installation" + PATHS ${PROJECT_SOURCE_DIR}/tbb + NO_DEFAULT_PATH + ) + find_path(TBB_ROOT include/tbb/tbb.h + DOC "Root of TBB installation" + HINTS ${TBB_ROOT} + PATHS + ${PROJECT_SOURCE_DIR}/tbb + /opt/intel/composerxe/tbb + /opt/intel/compilers_and_libraries/tbb + /opt/intel/tbb + ) + + if(TBB_ROOT STREQUAL "") + find_path(TBB_INCLUDE_DIR tbb/task_scheduler_init.h) + find_library(TBB_LIBRARY tbb) + find_library(TBB_LIBRARY_MALLOC tbbmalloc) + + elseif(EXISTS ${TBB_ROOT}/cmake/TBBBuild.cmake AND EXISTS ${TBB_ROOT}/src/tbb/tbb_version.h) + option(TBB_STATIC_LIB "Build TBB as a static library (building TBB as a static library is NOT recommended)") + if(TBB_STATIC_LIB) + include(${TBB_ROOT}/cmake/TBBBuild.cmake) + tbb_build(TBB_ROOT ${TBB_ROOT} CONFIG_DIR TBB_DIR MAKE_ARGS extra_inc=big_iron.inc) + set(TBB_INCLUDE_DIR ${TBB_ROOT}/include) + set(TBB_LIBRARY ${PROJECT_BINARY_DIR}/tbb_cmake_build/tbb_cmake_build_subdir_release/libtbb.a) + set(TBB_LIBRARY_MALLOC ${PROJECT_BINARY_DIR}/tbb_cmake_build/tbb_cmake_build_subdir_release/libtbbmalloc.a) + else() + include(${TBB_ROOT}/cmake/TBBBuild.cmake) + tbb_build(TBB_ROOT ${TBB_ROOT} CONFIG_DIR TBB_DIR) + set(TBB_INCLUDE_DIR ${TBB_ROOT}/include) + set(TBB_LIBRARY ${PROJECT_BINARY_DIR}/tbb_cmake_build/tbb_cmake_build_subdir_release/libtbb.so.2) + set(TBB_LIBRARY_MALLOC ${PROJECT_BINARY_DIR}/tbb_cmake_build/tbb_cmake_build_subdir_release/libtbbmalloc.so.2) + endif() + + else() + set(TBB_INCLUDE_DIR TBB_INCLUDE_DIR-NOTFOUND) + set(TBB_LIBRARY TBB_LIBRARY-NOTFOUND) + set(TBB_LIBRARY_MALLOC TBB_LIBRARY_MALLOC-NOTFOUND) + if(APPLE) + find_path(TBB_INCLUDE_DIR tbb/task_scheduler_init.h PATHS ${TBB_ROOT}/include NO_DEFAULT_PATH) + find_library(TBB_LIBRARY tbb PATHS ${TBB_ROOT}/lib NO_DEFAULT_PATH) + find_library(TBB_LIBRARY_MALLOC tbbmalloc PATHS ${TBB_ROOT}/lib NO_DEFAULT_PATH) + else() + find_path(TBB_INCLUDE_DIR tbb/task_scheduler_init.h PATHS ${TBB_ROOT}/include NO_DEFAULT_PATH) + set(TBB_HINTS HINTS ${TBB_ROOT}/lib/intel64/gcc4.4 ${TBB_ROOT}/lib ${TBB_ROOT}/lib64 PATHS /usr/libx86_64-linux-gnu/) + find_library(TBB_LIBRARY tbb ${TBB_HINTS}) + find_library(TBB_LIBRARY_MALLOC tbbmalloc ${TBB_HINTS}) + endif() + endif() + +endif() + +include(FindPackageHandleStandardArgs) +FIND_PACKAGE_HANDLE_STANDARD_ARGS(TBB DEFAULT_MSG TBB_INCLUDE_DIR TBB_LIBRARY TBB_LIBRARY_MALLOC) + +if(TBB_FOUND) + add_library(TBB::tbb SHARED IMPORTED) + set_target_properties(TBB::tbb PROPERTIES + INTERFACE_INCLUDE_DIRECTORIES ${TBB_INCLUDE_DIR} + INTERFACE_COMPILE_DEFINITIONS "__TBB_NO_IMPLICIT_LINKAGE=1" + ) + + add_library(TBB::tbbmalloc SHARED IMPORTED) + set_target_properties(TBB::tbbmalloc PROPERTIES + INTERFACE_COMPILE_DEFINITIONS "__TBBMALLOC_NO_IMPLICIT_LINKAGE=1" + ) + + if(WIN32) + set_target_properties(TBB::tbb PROPERTIES + IMPORTED_IMPLIB ${TBB_LIBRARY} + ) + + set_target_properties(TBB::tbbmalloc PROPERTIES + IMPORTED_IMPLIB ${TBB_LIBRARY_MALLOC} + ) + else() + set_target_properties(TBB::tbb PROPERTIES + IMPORTED_LOCATION ${TBB_LIBRARY} + IMPORTED_NO_SONAME TRUE + ) + + set_target_properties(TBB::tbbmalloc PROPERTIES + IMPORTED_LOCATION ${TBB_LIBRARY_MALLOC} + IMPORTED_NO_SONAME TRUE + ) + endif() + + + set(TBB_LIBRARIES TBB::tbb TBB::tbbmalloc) + + set_threading("TBB") + list(APPEND EXTRA_SHARED_LIBS ${TBB_LIBRARIES}) +endif() + +mark_as_advanced(TBB_INCLUDE_DIR) +mark_as_advanced(TBB_LIBRARY) +mark_as_advanced(TBB_LIBRARY_MALLOC) + diff --git a/oidn/mkl-dnn/cmake/Threading.cmake b/oidn/mkl-dnn/cmake/Threading.cmake new file mode 100644 index 0000000..f509c79 --- /dev/null +++ b/oidn/mkl-dnn/cmake/Threading.cmake @@ -0,0 +1,39 @@ +#=============================================================================== +# Copyright 2018 Intel Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +#=============================================================================== + +# Utils for managing threading-related configuration +#=============================================================================== + +if(Threading_cmake_included) + return() +endif() +set(Threading_cmake_included true) + +# Replace existing define for threading (if any) with a new one +macro(set_threading threading) + if(MKLDNN_THR_CURRENT) + remove_definitions(-DMKLDNN_THR=${MKLDNN_THR_CURRENT}) + endif() + set(MKLDNN_THR_CURRENT MKLDNN_THR_${threading}) + add_definitions(-DMKLDNN_THR=${MKLDNN_THR_CURRENT}) +endmacro() + +# While MKL-DNN defaults to OpenMP (if _OPENMP is defined) without CMake, here +# we default to sequential threading and let OpenMP.cmake and TBB.cmake to +# figure things out. This is especially important because OpenMP is used both +# for threading and vectorization via #pragma omp simd +set_threading("SEQ") + diff --git a/oidn/mkl-dnn/cmake/config.cmake.in b/oidn/mkl-dnn/cmake/config.cmake.in new file mode 100644 index 0000000..53b7032 --- /dev/null +++ b/oidn/mkl-dnn/cmake/config.cmake.in @@ -0,0 +1,6 @@ +@PACKAGE_INIT@ +include("${CMAKE_CURRENT_LIST_DIR}/@LIB_EXPORT_NAME@.cmake") +set(MKLDNN_THREADING "@MKLDNN_THREADING@") +set(MKLDNN_USES_INTEL_OPENMP @MKLDNN_USES_INTEL_OPENMP@) +set(MKLDNN_USES_MKL "@MKLDNN_USES_MKL@") +check_required_components("@LIB_NAME@") diff --git a/oidn/mkl-dnn/cmake/options.cmake b/oidn/mkl-dnn/cmake/options.cmake new file mode 100644 index 0000000..ef5d83a --- /dev/null +++ b/oidn/mkl-dnn/cmake/options.cmake @@ -0,0 +1,128 @@ +#=============================================================================== +# Copyright 2018 Intel Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +#=============================================================================== + +# Manage different library options +#=============================================================================== + +if(options_cmake_included) + return() +endif() +set(options_cmake_included true) + +# ======== +# Features +# ======== + +option(MKLDNN_VERBOSE + "allows Intel(R) MKL-DNN be verbose whenever MKLDNN_VERBOSE + environment variable set to 1" ON) # enabled by default + +option(MKLDNN_ENABLE_CONCURRENT_EXEC + "disables sharing a common scratchpad between primitives. + This option must be turned on if there is a possibility of concurrent + execution of primitives that were created in the same thread. + CAUTION: enabling this option increases memory consumption" + OFF) # disabled by default + +# ============================= +# Building properties and scope +# ============================= + +set(MKLDNN_LIBRARY_TYPE "SHARED" CACHE STRING + "specifies whether Intel(R) MKL-DNN library should be SHARED or STATIC") +option(MKLDNN_BUILD_EXAMPLES "builds examples" ON) +option(MKLDNN_BUILD_TESTS "builds tests" ON) + +set(MKLDNN_THREADING "OMP" CACHE STRING + "specifies threading type; supports OMP (default), OMP:COMP, OMP:INTEL, or TBB. + + When OpenMP is used a user can choose what runtime to use: + - native OpenMP runtime that comes with the compiler (OMP:COMP), or + - Intel OpenMP runtime that is compatible with all the compilers that + Intel MKL-DNN supports (OMP:INTEL). This option requires Intel MKL + be installed or Intel MKL-ML library be downloaded. This option doesn't + work with MSVC (w/o Intel Compiler). + The default option is OMP, which gives a preference to OMP:INTEL, but if + neither Intel MKL is installed nor Intel MKL-ML is available then fallback + to OMP:COMP. + + To use Intel(R) Threading Building Blocks (Intel(R) TBB) one should also + set TBBROOT (either environment variable or CMake option) to the library + location") + +set(MKLDNN_USE_MKL "DEF" CACHE STRING + "specifies what Intel MKL library to use. + Supports DEF (default), NONE, ML, FULL, FULL:STATIC. + + By default (DEF) cmakes tries to find Intel MKL-ML library, then full + Intel MKL library, or just builds Intel MKL-DNN w/o any binary dependency. + + To build Intel MKL-DNN w/o any dependencies on Intel MKL / Intel MKL-ML + use NONE. Note that building system would not be able to use Intel OpenMP + runtime that comes with Intel MKL or Intel MKL-ML, and would be available + only if Intel Compiler is used. + + To force Intel MKL-DNN to use Intel MKL-ML use ML. Depending on the + threading the build system would choose between libmklml_intel or + libmklml_gnu. + + To force Intel MKL-DNN to use the full Intel MKL pass FULL or FULL:STATIC + to cmake. The former option would make Intel MKL-DNN link against + Intel MKL RT (libmkl_rt). The latter one would link against static + Intel MKL. Use static linking to reduce the size of the resulting library + (including its dependencies). + Caution: Intel MKL RT allows setting the threading layer using environment + variable MKL_THREADING_LAYER. By default Intel MKL would use + OpenMP. If Intel MKL-DNN is built with TBB it is recommended to + set MKL_THREADING_LAYER to `tbb` or `sequential`, to avoid + conflict between OpenMP and TBB thread pools.") + +# ====================== +# Profiling capabilities +# ====================== + +option(MKLDNN_ENABLE_JIT_PROFILING + "Enable registration of Intel(R) MKL-DNN kernels that are generated at + runtime with Intel VTune Amplifier (on by default). Without the + registrations, Intel VTune Amplifier would report data collected inside + the kernels as `outside any known module`." + ON) + +# ============= +# Miscellaneous +# ============= + +option(BENCHDNN_USE_RDPMC + "enables rdpms counter to report precise cpu frequency in benchdnn. + CAUTION: may not work on all cpus (hence disabled by default)" + OFF) # disabled by default + +# ============= +# Developer flags +# ============= + +set(MKLDNN_USE_CLANG_SANITIZER "" CACHE STRING + "instructs build system to use a Clang sanitizer. Possible values: + Address: enables AddressSanitizer + Memory: enables MemorySanitizer + MemoryWithOrigin: enables MemorySanitizer with origin tracking + Undefined: enables UndefinedBehaviourSanitizer + This feature is experimental and is only available on Linux.") + +option(MKLDNN_PRODUCT_BUILD_MODE + "Enables/disables product build mode. For example, + setting MKLDNN_PRODUCT_BUILD_MODE=OFF makes warnings non-fatal" + ON) diff --git a/oidn/mkl-dnn/cmake/platform.cmake b/oidn/mkl-dnn/cmake/platform.cmake new file mode 100644 index 0000000..6bf9648 --- /dev/null +++ b/oidn/mkl-dnn/cmake/platform.cmake @@ -0,0 +1,178 @@ +#=============================================================================== +# Copyright 2016-2018 Intel Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +#=============================================================================== + +# Manage platform-specific quirks +#=============================================================================== + +if(platform_cmake_included) + return() +endif() +set(platform_cmake_included true) + +#include("cmake/utils.cmake") + +if(MKLDNN_LIBRARY_TYPE STREQUAL "SHARED") + add_definitions(-DMKLDNN_DLL -DMKLDNN_DLL_EXPORTS) +endif() + +# UNIT8_MAX-like macros are a part of the C99 standard and not a part of the +# C++ standard (see C99 standard 7.18.2 and 7.18.4) +add_definitions(-D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS) + +set(CMAKE_CCXX_FLAGS) +set(CMAKE_CCXX_NOWARN_FLAGS) +set(ISA_FLAGS_SSE41) + +if(MSVC) + set(USERCONFIG_PLATFORM "x64") + # enable intrinsic functions + append(CMAKE_CXX_FLAGS "/Oi") + # enable full optimizations + append(CMAKE_CXX_FLAGS_RELEASE "/Ox") + append(CMAKE_CXX_FLAGS_RELWITHDEBINFO "/Ox") + # package individual functions + append(CMAKE_CXX_FLAGS_RELEASE "/Gy") + append(CMAKE_CXX_FLAGS_RELWITHDEBINFO "/Gy") + # compiler specific settings + if (CMAKE_CXX_COMPILER_ID STREQUAL "MSVC") + append(CMAKE_CCXX_FLAGS "/MP") + # int -> bool + append(CMAKE_CCXX_NOWARN_FLAGS "/wd4800") + # unknown pragma + append(CMAKE_CCXX_NOWARN_FLAGS "/wd4068") + # double -> float + append(CMAKE_CCXX_NOWARN_FLAGS "/wd4305") + # UNUSED(func) + append(CMAKE_CCXX_NOWARN_FLAGS "/wd4551") + # int64_t -> int (tent) + append(CMAKE_CCXX_NOWARN_FLAGS "/wd4244") + elseif(CMAKE_CXX_COMPILER_ID STREQUAL "Intel") + append(CMAKE_CCXX_FLAGS "/MP") + set(ISA_FLAGS_SSE41 "-Qxsse4.1") + # disable: loop was not vectorized with "simd" + append(CMAKE_CCXX_NOWARN_FLAGS "-Qdiag-disable:15552") + append(CMAKE_CCXX_NOWARN_FLAGS "-Qdiag-disable:15335") + # disable: unknown pragma + append(CMAKE_CCXX_NOWARN_FLAGS "-Qdiag-disable:3180") + elseif(CMAKE_CXX_COMPILER_ID MATCHES "Clang") + set(ISA_FLAGS_SSE41 "-msse4.1") + # Clang cannot vectorize some loops with #pragma omp simd and gets + # very upset. Tell it that it's okay and that we love it + # unconditionally. + append(CMAKE_CCXX_FLAGS "-Wno-pass-failed") + endif() + # disable secure warnings + add_definitions(-D_CRT_SECURE_NO_WARNINGS) +elseif(UNIX OR MINGW) + append(CMAKE_CCXX_FLAGS "-Wall -Wno-unknown-pragmas") + append_if_product(CMAKE_CCXX_FLAGS "-Werror") + append(CMAKE_CCXX_FLAGS "-fvisibility=internal") + append(CMAKE_C_FLAGS "-std=c99") + append(CMAKE_CXX_FLAGS "-std=c++11 -fvisibility-inlines-hidden") + # compiler specific settings + if(CMAKE_CXX_COMPILER_ID MATCHES "Clang") + set(ISA_FLAGS_SSE41 "-msse4.1") + # Clang cannot vectorize some loops with #pragma omp simd and gets + # very upset. Tell it that it's okay and that we love it + # unconditionally. + append(CMAKE_CCXX_NOWARN_FLAGS "-Wno-pass-failed") + if(MKLDNN_USE_CLANG_SANITIZER MATCHES "Memory(WithOrigin)?") + if(NOT MKLDNN_THREADING STREQUAL "SEQ") + message(WARNING "Clang OpenMP is not compatible with MSan! " + "Expect a lot of false positives!") + endif() + append(CMAKE_CCXX_SANITIZER_FLAGS "-fsanitize=memory") + if(MKLDNN_USE_CLANG_SANITIZER STREQUAL "MemoryWithOrigin") + append(CMAKE_CCXX_SANITIZER_FLAGS + "-fsanitize-memory-track-origins=2") + append(CMAKE_CCXX_SANITIZER_FLAGS + "-fno-omit-frame-pointer") + endif() + set(MKLDNN_ENABLED_CLANG_SANITIZER "${MKLDNN_USE_CLANG_SANITIZER}") + elseif(MKLDNN_USE_CLANG_SANITIZER STREQUAL "Undefined") + append(CMAKE_CCXX_SANITIZER_FLAGS "-fsanitize=undefined") + append(CMAKE_CCXX_SANITIZER_FLAGS + "-fno-sanitize=function,vptr") # work around linking problems + append(CMAKE_CCXX_SANITIZER_FLAGS "-fno-omit-frame-pointer") + set(MKLDNN_ENABLED_CLANG_SANITIZER "${MKLDNN_USE_CLANG_SANITIZER}") + elseif(MKLDNN_USE_CLANG_SANITIZER STREQUAL "Address") + append(CMAKE_CCXX_SANITIZER_FLAGS "-fsanitize=address") + set(MKLDNN_ENABLED_CLANG_SANITIZER "${MKLDNN_USE_CLANG_SANITIZER}") + elseif(MKLDNN_USE_CLANG_SANITIZER STREQUAL "Thread") + append(CMAKE_CCXX_SANITIZER_FLAGS "-fsanitize=thread") + set(MKLDNN_ENABLED_CLANG_SANITIZER "${MKLDNN_USE_CLANG_SANITIZER}") + elseif(MKLDNN_USE_CLANG_SANITIZER STREQUAL "Leak") + append(CMAKE_CCXX_SANITIZER_FLAGS "-fsanitize=leak") + set(MKLDNN_ENABLED_CLANG_SANITIZER "${MKLDNN_USE_CLANG_SANITIZER}") + elseif(NOT MKLDNN_USE_CLANG_SANITIZER STREQUAL "") + message(FATAL_ERROR + "Unsupported Clang sanitizer '${MKLDNN_USE_CLANG_SANITIZER}'") + endif() + if(MKLDNN_ENABLED_CLANG_SANITIZER) + message(STATUS + "Using Clang ${MKLDNN_ENABLED_CLANG_SANITIZER} " + "sanitizer (experimental!)") + append(CMAKE_CCXX_SANITIZER_FLAGS "-g -fno-omit-frame-pointer") + endif() + elseif("${CMAKE_CXX_COMPILER_ID}" STREQUAL "GNU") + if(NOT CMAKE_CXX_COMPILER_VERSION VERSION_LESS 5.0) + set(ISA_FLAGS_SSE41 "-msse4.1") + endif() + # suppress warning on assumptions made regarding overflow (#146) + append(CMAKE_CCXX_NOWARN_FLAGS "-Wno-strict-overflow") + elseif(CMAKE_CXX_COMPILER_ID STREQUAL "Intel") + set(ISA_FLAGS_SSE41 "-xsse4.1") + # workaround for Intel Compiler 16.0 that produces error caused + # by pragma omp simd collapse(..) + if(CMAKE_CXX_COMPILER_VERSION VERSION_LESS "17.0") + append(CMAKE_CCXX_NOWARN_FLAGS "-diag-disable:13379") + endif() + append(CMAKE_CCXX_NOWARN_FLAGS "-diag-disable:15552") + # disable `was not vectorized: vectorization seems inefficient` remark + append(CMAKE_CCXX_NOWARN_FLAGS "-diag-disable:15335") + # disable optimizations in debug mode + append(CMAKE_CXX_FLAGS_DEBUG "-O0") + endif() + # disable assertions + set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} -DNDEBUG") +endif() + +if(WIN32) + string(REPLACE ";" "\;" ENV_PATH "$ENV{PATH}") + set(CTESTCONFIG_PATH "${CTESTCONFIG_PATH}\;${MKLDLLPATH}\;${ENV_PATH}") + if(CMAKE_CXX_COMPILER_ID STREQUAL "Intel") + # Link Intel and MS libraries statically for release builds + string(REPLACE "/MD" "/MT" CMAKE_CXX_FLAGS_RELEASE ${CMAKE_CXX_FLAGS_RELEASE}) + string(REPLACE "/MD" "/MT" CMAKE_CXX_FLAGS_RELWITHDEBINFO ${CMAKE_CXX_FLAGS_RELWITHDEBINFO}) + endif() +endif() + +if(UNIX OR MINGW) + if(CMAKE_CXX_COMPILER_ID STREQUAL "Intel") + # Link Intel libraries statically (except for iomp5) + if(MKLDNN_THREADING MATCHES "OMP") + append(CMAKE_SHARED_LINKER_FLAGS "-liomp5") + endif() + append(CMAKE_SHARED_LINKER_FLAGS "-static-intel") + # Tell linker to not complain about missing static libraries + append(CMAKE_SHARED_LINKER_FLAGS "-diag-disable:10237") + endif() +endif() + +if(APPLE) + append(CMAKE_CXX_FLAGS "-mmacosx-version-min=10.7") # makes sure code runs on older macOS versions + append(CMAKE_CXX_FLAGS "-stdlib=libc++") # link against libc++ which supports C++11 features +endif() diff --git a/oidn/mkl-dnn/cmake/utils.cmake b/oidn/mkl-dnn/cmake/utils.cmake new file mode 100644 index 0000000..46500be --- /dev/null +++ b/oidn/mkl-dnn/cmake/utils.cmake @@ -0,0 +1,123 @@ +#=============================================================================== +# Copyright 2018 Intel Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +#=============================================================================== + +# Auxiliary build functions +#=============================================================================== + +if(utils_cmake_included) + return() +endif() +set(utils_cmake_included true) +include("cmake/options.cmake") + +# Common configuration for tests / test cases on Windows +function(maybe_configure_windows_test name kind) + if(WIN32 OR MINGW) + string(REPLACE ";" "\;" PATH "${CTESTCONFIG_PATH};$ENV{PATH}") + set_property(${kind} ${name} PROPERTY ENVIRONMENT "PATH=${PATH}") + configure_file(${PROJECT_SOURCE_DIR}/cmake/template.vcxproj.user + ${name}.vcxproj.user @ONLY) + endif() +endfunction() + +# Register new executable/test +# name -- name of the executable +# srcs -- list of source, if many must be enclosed with "" +# test -- "test" to mark executable as a test, "" otherwise +# arg4 -- (optional) list of extra library dependencies +function(register_exe name srcs test) + add_executable(${name} ${srcs}) + target_link_libraries(${name} ${LIB_NAME} ${EXTRA_SHARED_LIBS} ${ARGV3}) + if("${test}" STREQUAL "test") + add_test(${name} ${name}) + maybe_configure_windows_test(${name} TEST) + endif() +endfunction() + +# Append to a variable +# var = var + value +macro(append var value) + set(${var} "${${var}} ${value}") +endmacro() + +# Append to a variable if building a product build (as opposed to a developer +# build that is detected via the MKLDNN_PRODUCT_BUILD_MODE option) +macro(append_if_product var value) + if(MKLDNN_PRODUCT_BUILD_MODE) + append(${var} "${value}") + endif() +endmacro() + +if(MKLDNN_PRODUCT_BUILD_MODE) + message(STATUS "This is a product build") +else() + message(WARNING "This is a developer build") +endif() + +# Set variable depending on condition: +# var = cond ? val_if_true : val_if_false +macro(set_ternary var condition val_if_true val_if_false) + if (${condition}) + set(${var} "${val_if_true}") + else() + set(${var} "${val_if_false}") + endif() +endmacro() + +# Conditionally set a variable +# if (cond) var = value +macro(set_if condition var value) + if (${condition}) + set(${var} "${value}") + endif() +endmacro() + +# Conditionally append +# if (cond) var = var + value +macro(append_if condition var value) + if (${condition}) + append(${var} "${value}") + endif() +endmacro() + +# Append a path to path_list variable (Windows-only version) +macro(append_to_windows_path_list path_list path) + file(TO_NATIVE_PATH "${path}" append_to_windows_path_list_tmp__) + if(${path_list}) + set(${path_list} + "${${path_list}};${append_to_windows_path_list_tmp__}") + else() + set(${path_list} + "${append_to_windows_path_list_tmp__}") + endif() +endmacro() + +function(target_link_libraries_build target list) + # Foreach is required for compatibility with 2.8.11 ways + foreach(lib ${list}) + target_link_libraries(${target} LINK_PUBLIC + "$") + endforeach(lib) +endfunction() + +function(target_link_libraries_install target list) + # Foreach is required for compatibility with 2.8.11 ways + foreach(lib ${list}) + get_filename_component(base "${lib}" NAME) + target_link_libraries(${target} LINK_PUBLIC + "$") + endforeach(lib) +endfunction() diff --git a/oidn/mkl-dnn/cmake/version.cmake b/oidn/mkl-dnn/cmake/version.cmake new file mode 100644 index 0000000..4591880 --- /dev/null +++ b/oidn/mkl-dnn/cmake/version.cmake @@ -0,0 +1,46 @@ +#=============================================================================== +# Copyright 2019 Intel Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +#=============================================================================== + +# Control generating version file +#=============================================================================== + +if(version_cmake_included) + return() +endif() +set(version_cmake_included true) + +string(REPLACE "." ";" VERSION_LIST ${PROJECT_VERSION}) +list(GET VERSION_LIST 0 MKLDNN_VERSION_MAJOR) +list(GET VERSION_LIST 1 MKLDNN_VERSION_MINOR) +list(GET VERSION_LIST 2 MKLDNN_VERSION_PATCH) + +find_package(Git) +if (GIT_FOUND) + execute_process(COMMAND ${GIT_EXECUTABLE} log -1 --format=%H + WORKING_DIRECTORY ${PROJECT_SOURCE_DIR} + RESULT_VARIABLE RESULT + OUTPUT_VARIABLE MKLDNN_VERSION_HASH + OUTPUT_STRIP_TRAILING_WHITESPACE) +endif() + +if(NOT GIT_FOUND OR RESULT) + set(MKLDNN_VERSION_HASH "N/A") +endif() + +configure_file( + "${PROJECT_SOURCE_DIR}/include/mkldnn_version.h.in" + "${PROJECT_BINARY_DIR}/include/mkldnn_version.h" +) diff --git a/oidn/mkl-dnn/doc/Doxyfile.in b/oidn/mkl-dnn/doc/Doxyfile.in new file mode 100644 index 0000000..8c38fd9 --- /dev/null +++ b/oidn/mkl-dnn/doc/Doxyfile.in @@ -0,0 +1,2287 @@ +#=============================================================================== +# Copyright 2016-2018 Intel Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +#=============================================================================== + +# Doxyfile 1.8.5 + +# This file describes the settings to be used by the documentation system +# doxygen (www.doxygen.org) for a project. +# +# All text after a double hash (##) is considered a comment and is placed in +# front of the TAG it is preceding. +# +# All text after a single hash (#) is considered a comment and will be ignored. +# The format is: +# TAG = value [value, ...] +# For lists, items can also be appended using: +# TAG += value [value, ...] +# Values that contain spaces should be placed between quotes (\" \"). + +#--------------------------------------------------------------------------- +# Project related configuration options +#--------------------------------------------------------------------------- + +# This tag specifies the encoding used for all characters in the config file +# that follow. The default is UTF-8 which is also the encoding used for all text +# before the first occurrence of this tag. Doxygen uses libiconv (or the iconv +# built into libc) for the transcoding. See http://www.gnu.org/software/libiconv +# for the list of possible encodings. +# The default value is: UTF-8. + +DOXYFILE_ENCODING = UTF-8 + +# The PROJECT_NAME tag is a single word (or a sequence of words surrounded by +# double-quotes, unless you are using Doxywizard) that should identify the +# project for which the documentation is generated. This name is used in the +# title of most generated pages and in a few other places. +# The default value is: My Project. + +PROJECT_NAME = "@PROJECT_NAME@" + +# The PROJECT_NUMBER tag can be used to enter a project or revision number. This +# could be handy for archiving the generated documentation or if some version +# control system is used. + +PROJECT_NUMBER = "@PROJECT_VERSION@" + +# Using the PROJECT_BRIEF tag one can provide an optional one line description +# for a project that appears at the top of each page and should give viewer a +# quick idea about the purpose of the project. Keep the description short. + +PROJECT_BRIEF = "Performance library for Deep Learning" + +# With the PROJECT_LOGO tag one can specify an logo or icon that is included in +# the documentation. The maximum height of the logo should not exceed 55 pixels +# and the maximum width should not exceed 200 pixels. Doxygen will copy the logo +# to the output directory. + +PROJECT_LOGO = + +# The OUTPUT_DIRECTORY tag is used to specify the (relative or absolute) path +# into which the generated documentation will be written. If a relative path is +# entered, it will be relative to the location where doxygen was started. If +# left blank the current directory will be used. + +OUTPUT_DIRECTORY = @DOXYGEN_OUTPUT_DIR@ + +# If the CREATE_SUBDIRS tag is set to YES, then doxygen will create 4096 sub- +# directories (in 2 levels) under the output directory of each output format and +# will distribute the generated files over these directories. Enabling this +# option can be useful when feeding doxygen a huge amount of source files, where +# putting all generated files in the same directory would otherwise causes +# performance problems for the file system. +# The default value is: NO. + +CREATE_SUBDIRS = NO + +# The OUTPUT_LANGUAGE tag is used to specify the language in which all +# documentation generated by doxygen is written. Doxygen will use this +# information to generate all constant output in the proper language. +# Possible values are: Afrikaans, Arabic, Brazilian, Catalan, Chinese, Chinese- +# Traditional, Croatian, Czech, Danish, Dutch, English, Esperanto, Farsi, +# Finnish, French, German, Greek, Hungarian, Italian, Japanese, Japanese-en, +# Korean, Korean-en, Latvian, Norwegian, Macedonian, Persian, Polish, +# Portuguese, Romanian, Russian, Serbian, Slovak, Slovene, Spanish, Swedish, +# Turkish, Ukrainian and Vietnamese. +# The default value is: English. + +OUTPUT_LANGUAGE = English + +# If the BRIEF_MEMBER_DESC tag is set to YES doxygen will include brief member +# descriptions after the members that are listed in the file and class +# documentation (similar to Javadoc). Set to NO to disable this. +# The default value is: YES. + +BRIEF_MEMBER_DESC = YES + +# If the REPEAT_BRIEF tag is set to YES doxygen will prepend the brief +# description of a member or function before the detailed description +# +# Note: If both HIDE_UNDOC_MEMBERS and BRIEF_MEMBER_DESC are set to NO, the +# brief descriptions will be completely suppressed. +# The default value is: YES. + +REPEAT_BRIEF = YES + +# This tag implements a quasi-intelligent brief description abbreviator that is +# used to form the text in various listings. Each string in this list, if found +# as the leading text of the brief description, will be stripped from the text +# and the result, after processing the whole list, is used as the annotated +# text. Otherwise, the brief description is used as-is. If left blank, the +# following values are used ($name is automatically replaced with the name of +# the entity):The $name class, The $name widget, The $name file, is, provides, +# specifies, contains, represents, a, an and the. + +ABBREVIATE_BRIEF = + +# If the ALWAYS_DETAILED_SEC and REPEAT_BRIEF tags are both set to YES then +# doxygen will generate a detailed section even if there is only a brief +# description. +# The default value is: NO. + +ALWAYS_DETAILED_SEC = NO + +# If the INLINE_INHERITED_MEMB tag is set to YES, doxygen will show all +# inherited members of a class in the documentation of that class as if those +# members were ordinary class members. Constructors, destructors and assignment +# operators of the base classes will not be shown. +# The default value is: NO. + +INLINE_INHERITED_MEMB = NO + +# If the FULL_PATH_NAMES tag is set to YES doxygen will prepend the full path +# before files name in the file list and in the header files. If set to NO the +# shortest path that makes the file name unique will be used +# The default value is: YES. + +FULL_PATH_NAMES = YES + +# The STRIP_FROM_PATH tag can be used to strip a user-defined part of the path. +# Stripping is only done if one of the specified strings matches the left-hand +# part of the path. The tag can be used to show relative paths in the file list. +# If left blank the directory from which doxygen is run is used as the path to +# strip. +# +# Note that you can specify absolute paths here, but also relative paths, which +# will be relative from the directory where doxygen is started. +# This tag requires that the tag FULL_PATH_NAMES is set to YES. + +STRIP_FROM_PATH = @PROJECT_SOURCE_DIR@ + +# The STRIP_FROM_INC_PATH tag can be used to strip a user-defined part of the +# path mentioned in the documentation of a class, which tells the reader which +# header file to include in order to use a class. If left blank only the name of +# the header file containing the class definition is used. Otherwise one should +# specify the list of include paths that are normally passed to the compiler +# using the -I flag. + +STRIP_FROM_INC_PATH = + +# If the SHORT_NAMES tag is set to YES, doxygen will generate much shorter (but +# less readable) file names. This can be useful is your file systems doesn't +# support long names like on DOS, Mac, or CD-ROM. +# The default value is: NO. + +SHORT_NAMES = NO + +# If the JAVADOC_AUTOBRIEF tag is set to YES then doxygen will interpret the +# first line (until the first dot) of a Javadoc-style comment as the brief +# description. If set to NO, the Javadoc-style will behave just like regular Qt- +# style comments (thus requiring an explicit @brief command for a brief +# description.) +# The default value is: NO. + +JAVADOC_AUTOBRIEF = YES + +# If the QT_AUTOBRIEF tag is set to YES then doxygen will interpret the first +# line (until the first dot) of a Qt-style comment as the brief description. If +# set to NO, the Qt-style will behave just like regular Qt-style comments (thus +# requiring an explicit \brief command for a brief description.) +# The default value is: NO. + +QT_AUTOBRIEF = NO + +# The MULTILINE_CPP_IS_BRIEF tag can be set to YES to make doxygen treat a +# multi-line C++ special comment block (i.e. a block of //! or /// comments) as +# a brief description. This used to be the default behavior. The new default is +# to treat a multi-line C++ comment block as a detailed description. Set this +# tag to YES if you prefer the old behavior instead. +# +# Note that setting this tag to YES also means that rational rose comments are +# not recognized any more. +# The default value is: NO. + +MULTILINE_CPP_IS_BRIEF = YES + +# If the INHERIT_DOCS tag is set to YES then an undocumented member inherits the +# documentation from any documented member that it re-implements. +# The default value is: YES. + +INHERIT_DOCS = YES + +# If the SEPARATE_MEMBER_PAGES tag is set to YES, then doxygen will produce a +# new page for each member. If set to NO, the documentation of a member will be +# part of the file/class/namespace that contains it. +# The default value is: NO. + +SEPARATE_MEMBER_PAGES = NO + +# The TAB_SIZE tag can be used to set the number of spaces in a tab. Doxygen +# uses this value to replace tabs by spaces in code fragments. +# Minimum value: 1, maximum value: 16, default value: 4. + +TAB_SIZE = 4 + +# This tag can be used to specify a number of aliases that act as commands in +# the documentation. An alias has the form: +# name=value +# For example adding +# "sideeffect=@par Side Effects:\n" +# will allow you to put the command \sideeffect (or @sideeffect) in the +# documentation, which will result in a user-defined paragraph with heading +# "Side Effects:". You can put \n's in the value part of an alias to insert +# newlines. + +ALIASES = + +# This tag can be used to specify a number of word-keyword mappings (TCL only). +# A mapping has the form "name=value". For example adding "class=itcl::class" +# will allow you to use the command class in the itcl::class meaning. + +TCL_SUBST = + +# Set the OPTIMIZE_OUTPUT_FOR_C tag to YES if your project consists of C sources +# only. Doxygen will then generate output that is more tailored for C. For +# instance, some of the names that are used will be different. The list of all +# members will be omitted, etc. +# The default value is: NO. + +OPTIMIZE_OUTPUT_FOR_C = NO + +# Set the OPTIMIZE_OUTPUT_JAVA tag to YES if your project consists of Java or +# Python sources only. Doxygen will then generate output that is more tailored +# for that language. For instance, namespaces will be presented as packages, +# qualified scopes will look different, etc. +# The default value is: NO. + +OPTIMIZE_OUTPUT_JAVA = NO + +# Set the OPTIMIZE_FOR_FORTRAN tag to YES if your project consists of Fortran +# sources. Doxygen will then generate output that is tailored for Fortran. +# The default value is: NO. + +OPTIMIZE_FOR_FORTRAN = NO + +# Set the OPTIMIZE_OUTPUT_VHDL tag to YES if your project consists of VHDL +# sources. Doxygen will then generate output that is tailored for VHDL. +# The default value is: NO. + +OPTIMIZE_OUTPUT_VHDL = NO + +# Doxygen selects the parser to use depending on the extension of the files it +# parses. With this tag you can assign which parser to use for a given +# extension. Doxygen has a built-in mapping, but you can override or extend it +# using this tag. The format is ext=language, where ext is a file extension, and +# language is one of the parsers supported by doxygen: IDL, Java, Javascript, +# C#, C, C++, D, PHP, Objective-C, Python, Fortran, VHDL. For instance to make +# doxygen treat .inc files as Fortran files (default is PHP), and .f files as C +# (default is Fortran), use: inc=Fortran f=C. +# +# Note For files without extension you can use no_extension as a placeholder. +# +# Note that for custom extensions you also need to set FILE_PATTERNS otherwise +# the files are not read by doxygen. + +EXTENSION_MAPPING = + +# If the MARKDOWN_SUPPORT tag is enabled then doxygen pre-processes all comments +# according to the Markdown format, which allows for more readable +# documentation. See http://daringfireball.net/projects/markdown/ for details. +# The output of markdown processing is further processed by doxygen, so you can +# mix doxygen, HTML, and XML commands with Markdown formatting. Disable only in +# case of backward compatibilities issues. +# The default value is: YES. + +MARKDOWN_SUPPORT = YES + +# When enabled doxygen tries to link words that correspond to documented +# classes, or namespaces to their corresponding documentation. Such a link can +# be prevented in individual cases by by putting a % sign in front of the word +# or globally by setting AUTOLINK_SUPPORT to NO. +# The default value is: YES. + +AUTOLINK_SUPPORT = YES + +# If you use STL classes (i.e. std::string, std::vector, etc.) but do not want +# to include (a tag file for) the STL sources as input, then you should set this +# tag to YES in order to let doxygen match functions declarations and +# definitions whose arguments contain STL classes (e.g. func(std::string); +# versus func(std::string) {}). This also make the inheritance and collaboration +# diagrams that involve STL classes more complete and accurate. +# The default value is: NO. + +BUILTIN_STL_SUPPORT = NO + +# If you use Microsoft's C++/CLI language, you should set this option to YES to +# enable parsing support. +# The default value is: NO. + +CPP_CLI_SUPPORT = NO + +# Set the SIP_SUPPORT tag to YES if your project consists of sip (see: +# http://www.riverbankcomputing.co.uk/software/sip/intro) sources only. Doxygen +# will parse them like normal C++ but will assume all classes use public instead +# of private inheritance when no explicit protection keyword is present. +# The default value is: NO. + +SIP_SUPPORT = NO + +# For Microsoft's IDL there are propget and propput attributes to indicate +# getter and setter methods for a property. Setting this option to YES will make +# doxygen to replace the get and set methods by a property in the documentation. +# This will only work if the methods are indeed getting or setting a simple +# type. If this is not the case, or you want to show the methods anyway, you +# should set this option to NO. +# The default value is: YES. + +IDL_PROPERTY_SUPPORT = YES + +# If member grouping is used in the documentation and the DISTRIBUTE_GROUP_DOC +# tag is set to YES, then doxygen will reuse the documentation of the first +# member in the group (if any) for the other members of the group. By default +# all members of a group must be documented explicitly. +# The default value is: NO. + +DISTRIBUTE_GROUP_DOC = NO + +# Set the SUBGROUPING tag to YES to allow class member groups of the same type +# (for instance a group of public functions) to be put as a subgroup of that +# type (e.g. under the Public Functions section). Set it to NO to prevent +# subgrouping. Alternatively, this can be done per class using the +# \nosubgrouping command. +# The default value is: YES. + +SUBGROUPING = YES + +# When the INLINE_GROUPED_CLASSES tag is set to YES, classes, structs and unions +# are shown inside the group in which they are included (e.g. using \ingroup) +# instead of on a separate page (for HTML and Man pages) or section (for LaTeX +# and RTF). +# +# Note that this feature does not work in combination with +# SEPARATE_MEMBER_PAGES. +# The default value is: NO. + +INLINE_GROUPED_CLASSES = NO + +# When the INLINE_SIMPLE_STRUCTS tag is set to YES, structs, classes, and unions +# with only public data fields or simple typedef fields will be shown inline in +# the documentation of the scope in which they are defined (i.e. file, +# namespace, or group documentation), provided this scope is documented. If set +# to NO, structs, classes, and unions are shown on a separate page (for HTML and +# Man pages) or section (for LaTeX and RTF). +# The default value is: NO. + +INLINE_SIMPLE_STRUCTS = NO + +# When TYPEDEF_HIDES_STRUCT tag is enabled, a typedef of a struct, union, or +# enum is documented as struct, union, or enum with the name of the typedef. So +# typedef struct TypeS {} TypeT, will appear in the documentation as a struct +# with name TypeT. When disabled the typedef will appear as a member of a file, +# namespace, or class. And the struct will be named TypeS. This can typically be +# useful for C code in case the coding convention dictates that all compound +# types are typedef'ed and only the typedef is referenced, never the tag name. +# The default value is: NO. + +TYPEDEF_HIDES_STRUCT = NO + +# The size of the symbol lookup cache can be set using LOOKUP_CACHE_SIZE. This +# cache is used to resolve symbols given their name and scope. Since this can be +# an expensive process and often the same symbol appears multiple times in the +# code, doxygen keeps a cache of pre-resolved symbols. If the cache is too small +# doxygen will become slower. If the cache is too large, memory is wasted. The +# cache size is given by this formula: 2^(16+LOOKUP_CACHE_SIZE). The valid range +# is 0..9, the default is 0, corresponding to a cache size of 2^16=65536 +# symbols. At the end of a run doxygen will report the cache usage and suggest +# the optimal cache size from a speed point of view. +# Minimum value: 0, maximum value: 9, default value: 0. + +LOOKUP_CACHE_SIZE = 0 + +#--------------------------------------------------------------------------- +# Build related configuration options +#--------------------------------------------------------------------------- + +# If the EXTRACT_ALL tag is set to YES doxygen will assume all entities in +# documentation are documented, even if no documentation was available. Private +# class members and static file members will be hidden unless the +# EXTRACT_PRIVATE respectively EXTRACT_STATIC tags are set to YES. +# Note: This will also disable the warnings about undocumented members that are +# normally produced when WARNINGS is set to YES. +# The default value is: NO. + +EXTRACT_ALL = YES + +# If the EXTRACT_PRIVATE tag is set to YES all private members of a class will +# be included in the documentation. +# The default value is: NO. + +EXTRACT_PRIVATE = NO + +# If the EXTRACT_PACKAGE tag is set to YES all members with package or internal +# scope will be included in the documentation. +# The default value is: NO. + +EXTRACT_PACKAGE = NO + +# If the EXTRACT_STATIC tag is set to YES all static members of a file will be +# included in the documentation. +# The default value is: NO. + +EXTRACT_STATIC = NO + +# If the EXTRACT_LOCAL_CLASSES tag is set to YES classes (and structs) defined +# locally in source files will be included in the documentation. If set to NO +# only classes defined in header files are included. Does not have any effect +# for Java sources. +# The default value is: YES. + +EXTRACT_LOCAL_CLASSES = YES + +# This flag is only useful for Objective-C code. When set to YES local methods, +# which are defined in the implementation section but not in the interface are +# included in the documentation. If set to NO only methods in the interface are +# included. +# The default value is: NO. + +EXTRACT_LOCAL_METHODS = NO + +# If this flag is set to YES, the members of anonymous namespaces will be +# extracted and appear in the documentation as a namespace called +# 'anonymous_namespace{file}', where file will be replaced with the base name of +# the file that contains the anonymous namespace. By default anonymous namespace +# are hidden. +# The default value is: NO. + +EXTRACT_ANON_NSPACES = NO + +# If the HIDE_UNDOC_MEMBERS tag is set to YES, doxygen will hide all +# undocumented members inside documented classes or files. If set to NO these +# members will be included in the various overviews, but no documentation +# section is generated. This option has no effect if EXTRACT_ALL is enabled. +# The default value is: NO. + +HIDE_UNDOC_MEMBERS = NO + +# If the HIDE_UNDOC_CLASSES tag is set to YES, doxygen will hide all +# undocumented classes that are normally visible in the class hierarchy. If set +# to NO these classes will be included in the various overviews. This option has +# no effect if EXTRACT_ALL is enabled. +# The default value is: NO. + +HIDE_UNDOC_CLASSES = NO + +# If the HIDE_FRIEND_COMPOUNDS tag is set to YES, doxygen will hide all friend +# (class|struct|union) declarations. If set to NO these declarations will be +# included in the documentation. +# The default value is: NO. + +HIDE_FRIEND_COMPOUNDS = NO + +# If the HIDE_IN_BODY_DOCS tag is set to YES, doxygen will hide any +# documentation blocks found inside the body of a function. If set to NO these +# blocks will be appended to the function's detailed documentation block. +# The default value is: NO. + +HIDE_IN_BODY_DOCS = NO + +# The INTERNAL_DOCS tag determines if documentation that is typed after a +# \internal command is included. If the tag is set to NO then the documentation +# will be excluded. Set it to YES to include the internal documentation. +# The default value is: NO. + +INTERNAL_DOCS = NO + +# If the CASE_SENSE_NAMES tag is set to NO then doxygen will only generate file +# names in lower-case letters. If set to YES upper-case letters are also +# allowed. This is useful if you have classes or files whose names only differ +# in case and if your file system supports case sensitive file names. Windows +# and Mac users are advised to set this option to NO. +# The default value is: system dependent. + +CASE_SENSE_NAMES = YES + +# If the HIDE_SCOPE_NAMES tag is set to NO then doxygen will show members with +# their full class and namespace scopes in the documentation. If set to YES the +# scope will be hidden. +# The default value is: NO. + +HIDE_SCOPE_NAMES = NO + +# If the SHOW_INCLUDE_FILES tag is set to YES then doxygen will put a list of +# the files that are included by a file in the documentation of that file. +# The default value is: YES. + +SHOW_INCLUDE_FILES = YES + +# If the FORCE_LOCAL_INCLUDES tag is set to YES then doxygen will list include +# files with double quotes in the documentation rather than with sharp brackets. +# The default value is: NO. + +FORCE_LOCAL_INCLUDES = NO + +# If the INLINE_INFO tag is set to YES then a tag [inline] is inserted in the +# documentation for inline members. +# The default value is: YES. + +INLINE_INFO = YES + +# If the SORT_MEMBER_DOCS tag is set to YES then doxygen will sort the +# (detailed) documentation of file and class members alphabetically by member +# name. If set to NO the members will appear in declaration order. +# The default value is: YES. + +SORT_MEMBER_DOCS = NO + +# If the SORT_BRIEF_DOCS tag is set to YES then doxygen will sort the brief +# descriptions of file, namespace and class members alphabetically by member +# name. If set to NO the members will appear in declaration order. +# The default value is: NO. + +SORT_BRIEF_DOCS = NO + +# If the SORT_MEMBERS_CTORS_1ST tag is set to YES then doxygen will sort the +# (brief and detailed) documentation of class members so that constructors and +# destructors are listed first. If set to NO the constructors will appear in the +# respective orders defined by SORT_BRIEF_DOCS and SORT_MEMBER_DOCS. +# Note: If SORT_BRIEF_DOCS is set to NO this option is ignored for sorting brief +# member documentation. +# Note: If SORT_MEMBER_DOCS is set to NO this option is ignored for sorting +# detailed member documentation. +# The default value is: NO. + +SORT_MEMBERS_CTORS_1ST = NO + +# If the SORT_GROUP_NAMES tag is set to YES then doxygen will sort the hierarchy +# of group names into alphabetical order. If set to NO the group names will +# appear in their defined order. +# The default value is: NO. + +SORT_GROUP_NAMES = NO + +# If the SORT_BY_SCOPE_NAME tag is set to YES, the class list will be sorted by +# fully-qualified names, including namespaces. If set to NO, the class list will +# be sorted only by class name, not including the namespace part. +# Note: This option is not very useful if HIDE_SCOPE_NAMES is set to YES. +# Note: This option applies only to the class list, not to the alphabetical +# list. +# The default value is: NO. + +SORT_BY_SCOPE_NAME = NO + +# If the STRICT_PROTO_MATCHING option is enabled and doxygen fails to do proper +# type resolution of all parameters of a function it will reject a match between +# the prototype and the implementation of a member function even if there is +# only one candidate or it is obvious which candidate to choose by doing a +# simple string match. By disabling STRICT_PROTO_MATCHING doxygen will still +# accept a match between prototype and implementation in such cases. +# The default value is: NO. + +STRICT_PROTO_MATCHING = NO + +# The GENERATE_TODOLIST tag can be used to enable ( YES) or disable ( NO) the +# todo list. This list is created by putting \todo commands in the +# documentation. +# The default value is: YES. + +GENERATE_TODOLIST = YES + +# The GENERATE_TESTLIST tag can be used to enable ( YES) or disable ( NO) the +# test list. This list is created by putting \test commands in the +# documentation. +# The default value is: YES. + +GENERATE_TESTLIST = YES + +# The GENERATE_BUGLIST tag can be used to enable ( YES) or disable ( NO) the bug +# list. This list is created by putting \bug commands in the documentation. +# The default value is: YES. + +GENERATE_BUGLIST = YES + +# The GENERATE_DEPRECATEDLIST tag can be used to enable ( YES) or disable ( NO) +# the deprecated list. This list is created by putting \deprecated commands in +# the documentation. +# The default value is: YES. + +GENERATE_DEPRECATEDLIST= YES + +# The ENABLED_SECTIONS tag can be used to enable conditional documentation +# sections, marked by \if ... \endif and \cond +# ... \endcond blocks. + +ENABLED_SECTIONS = + +# The MAX_INITIALIZER_LINES tag determines the maximum number of lines that the +# initial value of a variable or macro / define can have for it to appear in the +# documentation. If the initializer consists of more lines than specified here +# it will be hidden. Use a value of 0 to hide initializers completely. The +# appearance of the value of individual variables and macros / defines can be +# controlled using \showinitializer or \hideinitializer command in the +# documentation regardless of this setting. +# Minimum value: 0, maximum value: 10000, default value: 30. + +MAX_INITIALIZER_LINES = 30 + +# Set the SHOW_USED_FILES tag to NO to disable the list of files generated at +# the bottom of the documentation of classes and structs. If set to YES the list +# will mention the files that were used to generate the documentation. +# The default value is: YES. + +SHOW_USED_FILES = YES + +# Set the SHOW_FILES tag to NO to disable the generation of the Files page. This +# will remove the Files entry from the Quick Index and from the Folder Tree View +# (if specified). +# The default value is: YES. + +SHOW_FILES = YES + +# Set the SHOW_NAMESPACES tag to NO to disable the generation of the Namespaces +# page. This will remove the Namespaces entry from the Quick Index and from the +# Folder Tree View (if specified). +# The default value is: YES. + +SHOW_NAMESPACES = NO + +# The FILE_VERSION_FILTER tag can be used to specify a program or script that +# doxygen should invoke to get the current version for each file (typically from +# the version control system). Doxygen will invoke the program by executing (via +# popen()) the command command input-file, where command is the value of the +# FILE_VERSION_FILTER tag, and input-file is the name of an input file provided +# by doxygen. Whatever the program writes to standard output is used as the file +# version. For an example see the documentation. + +FILE_VERSION_FILTER = + +# The LAYOUT_FILE tag can be used to specify a layout file which will be parsed +# by doxygen. The layout file controls the global structure of the generated +# output files in an output format independent way. To create the layout file +# that represents doxygen's defaults, run doxygen with the -l option. You can +# optionally specify a file name after the option, if omitted DoxygenLayout.xml +# will be used as the name of the layout file. +# +# Note that if you run doxygen from a directory containing a file called +# DoxygenLayout.xml, doxygen will parse it automatically even if the LAYOUT_FILE +# tag is left empty. + +LAYOUT_FILE = @CMAKE_CURRENT_SOURCE_DIR@/doc/DoxygenLayout.xml + +# The CITE_BIB_FILES tag can be used to specify one or more bib files containing +# the reference definitions. This must be a list of .bib files. The .bib +# extension is automatically appended if omitted. This requires the bibtex tool +# to be installed. See also http://en.wikipedia.org/wiki/BibTeX for more info. +# For LaTeX the style of the bibliography can be controlled using +# LATEX_BIB_STYLE. To use this feature you need bibtex and perl available in the +# search path. Do not use file names with spaces, bibtex cannot handle them. See +# also \cite for info how to create references. + +CITE_BIB_FILES = + +#--------------------------------------------------------------------------- +# Configuration options related to warning and progress messages +#--------------------------------------------------------------------------- + +# The QUIET tag can be used to turn on/off the messages that are generated to +# standard output by doxygen. If QUIET is set to YES this implies that the +# messages are off. +# The default value is: NO. + +QUIET = YES + +# The WARNINGS tag can be used to turn on/off the warning messages that are +# generated to standard error ( stderr) by doxygen. If WARNINGS is set to YES +# this implies that the warnings are on. +# +# Tip: Turn warnings on while writing the documentation. +# The default value is: YES. + +WARNINGS = YES + +# If the WARN_IF_UNDOCUMENTED tag is set to YES, then doxygen will generate +# warnings for undocumented members. If EXTRACT_ALL is set to YES then this flag +# will automatically be disabled. +# The default value is: YES. + +WARN_IF_UNDOCUMENTED = YES + +# If the WARN_IF_DOC_ERROR tag is set to YES, doxygen will generate warnings for +# potential errors in the documentation, such as not documenting some parameters +# in a documented function, or documenting parameters that don't exist or using +# markup commands wrongly. +# The default value is: YES. + +WARN_IF_DOC_ERROR = YES + +# This WARN_NO_PARAMDOC option can be enabled to get warnings for functions that +# are documented, but have no documentation for their parameters or return +# value. If set to NO doxygen will only warn about wrong or incomplete parameter +# documentation, but not about the absence of documentation. +# The default value is: NO. + +WARN_NO_PARAMDOC = NO + +# The WARN_FORMAT tag determines the format of the warning messages that doxygen +# can produce. The string should contain the $file, $line, and $text tags, which +# will be replaced by the file and line number from which the warning originated +# and the warning text. Optionally the format may contain $version, which will +# be replaced by the version of the file (if it could be obtained via +# FILE_VERSION_FILTER) +# The default value is: $file:$line: $text. + +WARN_FORMAT = "$file:$line: $text" + +# The WARN_LOGFILE tag can be used to specify a file to which warning and error +# messages should be written. If left blank the output is written to standard +# error (stderr). + +WARN_LOGFILE = + +#--------------------------------------------------------------------------- +# Configuration options related to the input files +#--------------------------------------------------------------------------- + +# The INPUT tag is used to specify the files and/or directories that contain +# documented source files. You may enter file names like myfile.cpp or +# directories like /usr/src/myproject. Separate the files or directories with +# spaces. +# Note: If this tag is empty the current directory is searched. + +INPUT = @CMAKE_CURRENT_SOURCE_DIR@/include \ + @CMAKE_CURRENT_SOURCE_DIR@/doc + +# This tag can be used to specify the character encoding of the source files +# that doxygen parses. Internally doxygen uses the UTF-8 encoding. Doxygen uses +# libiconv (or the iconv built into libc) for the transcoding. See the libiconv +# documentation (see: http://www.gnu.org/software/libiconv) for the list of +# possible encodings. +# The default value is: UTF-8. + +INPUT_ENCODING = UTF-8 + +# If the value of the INPUT tag contains directories, you can use the +# FILE_PATTERNS tag to specify one or more wildcard patterns (like *.cpp and +# *.h) to filter out the source-files in the directories. If left blank the +# following patterns are tested:*.c, *.cc, *.cxx, *.cpp, *.c++, *.java, *.ii, +# *.ixx, *.ipp, *.i++, *.inl, *.idl, *.ddl, *.odl, *.h, *.hh, *.hxx, *.hpp, +# *.h++, *.cs, *.d, *.php, *.php4, *.php5, *.phtml, *.inc, *.m, *.markdown, +# *.md, *.mm, *.dox, *.py, *.f90, *.f, *.for, *.tcl, *.vhd, *.vhdl, *.ucf, +# *.qsf, *.as and *.js. + +FILE_PATTERNS = *.h \ + *.hpp \ + *.md + +# The RECURSIVE tag can be used to specify whether or not subdirectories should +# be searched for input files as well. +# The default value is: NO. + +RECURSIVE = YES + +# The EXCLUDE tag can be used to specify files and/or directories that should be +# excluded from the INPUT source files. This way you can easily exclude a +# subdirectory from a directory tree whose root is specified with the INPUT tag. +# +# Note that relative paths are relative to the directory from which doxygen is +# run. + +EXCLUDE = + +# The EXCLUDE_SYMLINKS tag can be used to select whether or not files or +# directories that are symbolic links (a Unix file system feature) are excluded +# from the input. +# The default value is: NO. + +EXCLUDE_SYMLINKS = NO + +# If the value of the INPUT tag contains directories, you can use the +# EXCLUDE_PATTERNS tag to specify one or more wildcard patterns to exclude +# certain files from those directories. +# +# Note that the wildcards are matched against the file with absolute path, so to +# exclude all test directories for example use the pattern */test/* + +EXCLUDE_PATTERNS = + +# The EXCLUDE_SYMBOLS tag can be used to specify one or more symbol names +# (namespaces, classes, functions, etc.) that should be excluded from the +# output. The symbol name can be a fully qualified name, a word, or if the +# wildcard * is used, a substring. Examples: ANamespace, AClass, +# AClass::ANamespace, ANamespace::*Test +# +# Note that the wildcards are matched against the file with absolute path, so to +# exclude all test directories use the pattern */test/* + +EXCLUDE_SYMBOLS = + +# The EXAMPLE_PATH tag can be used to specify one or more files or directories +# that contain example code fragments that are included (see the \include +# command). + +EXAMPLE_PATH = + +# If the value of the EXAMPLE_PATH tag contains directories, you can use the +# EXAMPLE_PATTERNS tag to specify one or more wildcard pattern (like *.cpp and +# *.h) to filter out the source-files in the directories. If left blank all +# files are included. + +EXAMPLE_PATTERNS = + +# If the EXAMPLE_RECURSIVE tag is set to YES then subdirectories will be +# searched for input files to be used with the \include or \dontinclude commands +# irrespective of the value of the RECURSIVE tag. +# The default value is: NO. + +EXAMPLE_RECURSIVE = NO + +# The IMAGE_PATH tag can be used to specify one or more files or directories +# that contain images that are to be included in the documentation (see the +# \image command). + +IMAGE_PATH = @CMAKE_CURRENT_SOURCE_DIR@/doc + +# The INPUT_FILTER tag can be used to specify a program that doxygen should +# invoke to filter for each input file. Doxygen will invoke the filter program +# by executing (via popen()) the command: +# +# +# +# where is the value of the INPUT_FILTER tag, and is the +# name of an input file. Doxygen will then use the output that the filter +# program writes to standard output. If FILTER_PATTERNS is specified, this tag +# will be ignored. +# +# Note that the filter must not add or remove lines; it is applied before the +# code is scanned, but not when the output code is generated. If lines are added +# or removed, the anchors will not be placed correctly. + +INPUT_FILTER = + +# The FILTER_PATTERNS tag can be used to specify filters on a per file pattern +# basis. Doxygen will compare the file name with each pattern and apply the +# filter if there is a match. The filters are a list of the form: pattern=filter +# (like *.cpp=my_cpp_filter). See INPUT_FILTER for further information on how +# filters are used. If the FILTER_PATTERNS tag is empty or if none of the +# patterns match the file name, INPUT_FILTER is applied. + +FILTER_PATTERNS = + +# If the FILTER_SOURCE_FILES tag is set to YES, the input filter (if set using +# INPUT_FILTER ) will also be used to filter the input files that are used for +# producing the source files to browse (i.e. when SOURCE_BROWSER is set to YES). +# The default value is: NO. + +FILTER_SOURCE_FILES = NO + +# The FILTER_SOURCE_PATTERNS tag can be used to specify source filters per file +# pattern. A pattern will override the setting for FILTER_PATTERN (if any) and +# it is also possible to disable source filtering for a specific pattern using +# *.ext= (so without naming a filter). +# This tag requires that the tag FILTER_SOURCE_FILES is set to YES. + +FILTER_SOURCE_PATTERNS = + +# If the USE_MDFILE_AS_MAINPAGE tag refers to the name of a markdown file that +# is part of the input, its contents will be placed on the main page +# (index.html). This can be useful if you have a project on for instance GitHub +# and want to reuse the introduction page also for the doxygen output. + +USE_MDFILE_AS_MAINPAGE = mainpage.md + +#--------------------------------------------------------------------------- +# Configuration options related to source browsing +#--------------------------------------------------------------------------- + +# If the SOURCE_BROWSER tag is set to YES then a list of source files will be +# generated. Documented entities will be cross-referenced with these sources. +# +# Note: To get rid of all source code in the generated output, make sure that +# also VERBATIM_HEADERS is set to NO. +# The default value is: NO. + +SOURCE_BROWSER = NO + +# Setting the INLINE_SOURCES tag to YES will include the body of functions, +# classes and enums directly into the documentation. +# The default value is: NO. + +INLINE_SOURCES = NO + +# Setting the STRIP_CODE_COMMENTS tag to YES will instruct doxygen to hide any +# special comment blocks from generated source code fragments. Normal C, C++ and +# Fortran comments will always remain visible. +# The default value is: YES. + +STRIP_CODE_COMMENTS = YES + +# If the REFERENCED_BY_RELATION tag is set to YES then for each documented +# function all documented functions referencing it will be listed. +# The default value is: NO. + +REFERENCED_BY_RELATION = NO + +# If the REFERENCES_RELATION tag is set to YES then for each documented function +# all documented entities called/used by that function will be listed. +# The default value is: NO. + +REFERENCES_RELATION = NO + +# If the REFERENCES_LINK_SOURCE tag is set to YES and SOURCE_BROWSER tag is set +# to YES, then the hyperlinks from functions in REFERENCES_RELATION and +# REFERENCED_BY_RELATION lists will link to the source code. Otherwise they will +# link to the documentation. +# The default value is: YES. + +REFERENCES_LINK_SOURCE = YES + +# If SOURCE_TOOLTIPS is enabled (the default) then hovering a hyperlink in the +# source code will show a tooltip with additional information such as prototype, +# brief description and links to the definition and documentation. Since this +# will make the HTML file larger and loading of large files a bit slower, you +# can opt to disable this feature. +# The default value is: YES. +# This tag requires that the tag SOURCE_BROWSER is set to YES. + +SOURCE_TOOLTIPS = YES + +# If the USE_HTAGS tag is set to YES then the references to source code will +# point to the HTML generated by the htags(1) tool instead of doxygen built-in +# source browser. The htags tool is part of GNU's global source tagging system +# (see http://www.gnu.org/software/global/global.html). You will need version +# 4.8.6 or higher. +# +# To use it do the following: +# - Install the latest version of global +# - Enable SOURCE_BROWSER and USE_HTAGS in the config file +# - Make sure the INPUT points to the root of the source tree +# - Run doxygen as normal +# +# Doxygen will invoke htags (and that will in turn invoke gtags), so these +# tools must be available from the command line (i.e. in the search path). +# +# The result: instead of the source browser generated by doxygen, the links to +# source code will now point to the output of htags. +# The default value is: NO. +# This tag requires that the tag SOURCE_BROWSER is set to YES. + +USE_HTAGS = NO + +# If the VERBATIM_HEADERS tag is set the YES then doxygen will generate a +# verbatim copy of the header file for each class for which an include is +# specified. Set to NO to disable this. +# See also: Section \class. +# The default value is: YES. + +VERBATIM_HEADERS = YES + +#--------------------------------------------------------------------------- +# Configuration options related to the alphabetical class index +#--------------------------------------------------------------------------- + +# If the ALPHABETICAL_INDEX tag is set to YES, an alphabetical index of all +# compounds will be generated. Enable this if the project contains a lot of +# classes, structs, unions or interfaces. +# The default value is: YES. + +ALPHABETICAL_INDEX = YES + +# The COLS_IN_ALPHA_INDEX tag can be used to specify the number of columns in +# which the alphabetical index list will be split. +# Minimum value: 1, maximum value: 20, default value: 5. +# This tag requires that the tag ALPHABETICAL_INDEX is set to YES. + +COLS_IN_ALPHA_INDEX = 5 + +# In case all classes in a project start with a common prefix, all classes will +# be put under the same header in the alphabetical index. The IGNORE_PREFIX tag +# can be used to specify a prefix (or a list of prefixes) that should be ignored +# while generating the index headers. +# This tag requires that the tag ALPHABETICAL_INDEX is set to YES. + +IGNORE_PREFIX = + +#--------------------------------------------------------------------------- +# Configuration options related to the HTML output +#--------------------------------------------------------------------------- + +# If the GENERATE_HTML tag is set to YES doxygen will generate HTML output +# The default value is: YES. + +GENERATE_HTML = YES + +# The HTML_OUTPUT tag is used to specify where the HTML docs will be put. If a +# relative path is entered the value of OUTPUT_DIRECTORY will be put in front of +# it. +# The default directory is: html. +# This tag requires that the tag GENERATE_HTML is set to YES. + +HTML_OUTPUT = html + +# The HTML_FILE_EXTENSION tag can be used to specify the file extension for each +# generated HTML page (for example: .htm, .php, .asp). +# The default value is: .html. +# This tag requires that the tag GENERATE_HTML is set to YES. + +HTML_FILE_EXTENSION = .html + +# The HTML_HEADER tag can be used to specify a user-defined HTML header file for +# each generated HTML page. If the tag is left blank doxygen will generate a +# standard header. +# +# To get valid HTML the header file that includes any scripts and style sheets +# that doxygen needs, which is dependent on the configuration options used (e.g. +# the setting GENERATE_TREEVIEW). It is highly recommended to start with a +# default header using +# doxygen -w html new_header.html new_footer.html new_stylesheet.css +# YourConfigFile +# and then modify the file new_header.html. See also section "Doxygen usage" +# for information on how to generate the default header that doxygen normally +# uses. +# Note: The header is subject to change so you typically have to regenerate the +# default header when upgrading to a newer version of doxygen. For a description +# of the possible markers and block names see the documentation. +# This tag requires that the tag GENERATE_HTML is set to YES. + +HTML_HEADER = header.html + +# The HTML_FOOTER tag can be used to specify a user-defined HTML footer for each +# generated HTML page. If the tag is left blank doxygen will generate a standard +# footer. See HTML_HEADER for more information on how to generate a default +# footer and what special commands can be used inside the footer. See also +# section "Doxygen usage" for information on how to generate the default footer +# that doxygen normally uses. +# This tag requires that the tag GENERATE_HTML is set to YES. + +HTML_FOOTER = + +# The HTML_STYLESHEET tag can be used to specify a user-defined cascading style +# sheet that is used by each HTML page. It can be used to fine-tune the look of +# the HTML output. If left blank doxygen will generate a default style sheet. +# See also section "Doxygen usage" for information on how to generate the style +# sheet that doxygen normally uses. +# Note: It is recommended to use HTML_EXTRA_STYLESHEET instead of this tag, as +# it is more robust and this tag (HTML_STYLESHEET) will in the future become +# obsolete. +# This tag requires that the tag GENERATE_HTML is set to YES. + +HTML_STYLESHEET = + +# The HTML_EXTRA_STYLESHEET tag can be used to specify an additional user- +# defined cascading style sheet that is included after the standard style sheets +# created by doxygen. Using this option one can overrule certain style aspects. +# This is preferred over using HTML_STYLESHEET since it does not replace the +# standard style sheet and is therefor more robust against future updates. +# Doxygen will copy the style sheet file to the output directory. For an example +# see the documentation. +# This tag requires that the tag GENERATE_HTML is set to YES. + +HTML_EXTRA_STYLESHEET = + +# The HTML_EXTRA_FILES tag can be used to specify one or more extra images or +# other source files which should be copied to the HTML output directory. Note +# that these files will be copied to the base HTML output directory. Use the +# $relpath^ marker in the HTML_HEADER and/or HTML_FOOTER files to load these +# files. In the HTML_STYLESHEET file, use the file name only. Also note that the +# files will be copied as-is; there are no commands or markers available. +# This tag requires that the tag GENERATE_HTML is set to YES. + +HTML_EXTRA_FILES = + +# The HTML_COLORSTYLE_HUE tag controls the color of the HTML output. Doxygen +# will adjust the colors in the stylesheet and background images according to +# this color. Hue is specified as an angle on a colorwheel, see +# http://en.wikipedia.org/wiki/Hue for more information. For instance the value +# 0 represents red, 60 is yellow, 120 is green, 180 is cyan, 240 is blue, 300 +# purple, and 360 is red again. +# Minimum value: 0, maximum value: 359, default value: 220. +# This tag requires that the tag GENERATE_HTML is set to YES. + +HTML_COLORSTYLE_HUE = 220 + +# The HTML_COLORSTYLE_SAT tag controls the purity (or saturation) of the colors +# in the HTML output. For a value of 0 the output will use grayscales only. A +# value of 255 will produce the most vivid colors. +# Minimum value: 0, maximum value: 255, default value: 100. +# This tag requires that the tag GENERATE_HTML is set to YES. + +HTML_COLORSTYLE_SAT = 100 + +# The HTML_COLORSTYLE_GAMMA tag controls the gamma correction applied to the +# luminance component of the colors in the HTML output. Values below 100 +# gradually make the output lighter, whereas values above 100 make the output +# darker. The value divided by 100 is the actual gamma applied, so 80 represents +# a gamma of 0.8, The value 220 represents a gamma of 2.2, and 100 does not +# change the gamma. +# Minimum value: 40, maximum value: 240, default value: 80. +# This tag requires that the tag GENERATE_HTML is set to YES. + +HTML_COLORSTYLE_GAMMA = 80 + +# If the HTML_TIMESTAMP tag is set to YES then the footer of each generated HTML +# page will contain the date and time when the page was generated. Setting this +# to NO can help when comparing the output of multiple runs. +# The default value is: YES. +# This tag requires that the tag GENERATE_HTML is set to YES. + +HTML_TIMESTAMP = NO + +# If the HTML_DYNAMIC_SECTIONS tag is set to YES then the generated HTML +# documentation will contain sections that can be hidden and shown after the +# page has loaded. +# The default value is: NO. +# This tag requires that the tag GENERATE_HTML is set to YES. + +HTML_DYNAMIC_SECTIONS = NO + +# With HTML_INDEX_NUM_ENTRIES one can control the preferred number of entries +# shown in the various tree structured indices initially; the user can expand +# and collapse entries dynamically later on. Doxygen will expand the tree to +# such a level that at most the specified number of entries are visible (unless +# a fully collapsed tree already exceeds this amount). So setting the number of +# entries 1 will produce a full collapsed tree by default. 0 is a special value +# representing an infinite number of entries and will result in a full expanded +# tree by default. +# Minimum value: 0, maximum value: 9999, default value: 100. +# This tag requires that the tag GENERATE_HTML is set to YES. + +HTML_INDEX_NUM_ENTRIES = 100 + +# If the GENERATE_DOCSET tag is set to YES, additional index files will be +# generated that can be used as input for Apple's Xcode 3 integrated development +# environment (see: http://developer.apple.com/tools/xcode/), introduced with +# OSX 10.5 (Leopard). To create a documentation set, doxygen will generate a +# Makefile in the HTML output directory. Running make will produce the docset in +# that directory and running make install will install the docset in +# ~/Library/Developer/Shared/Documentation/DocSets so that Xcode will find it at +# startup. See http://developer.apple.com/tools/creatingdocsetswithdoxygen.html +# for more information. +# The default value is: NO. +# This tag requires that the tag GENERATE_HTML is set to YES. + +GENERATE_DOCSET = NO + +# This tag determines the name of the docset feed. A documentation feed provides +# an umbrella under which multiple documentation sets from a single provider +# (such as a company or product suite) can be grouped. +# The default value is: Doxygen generated docs. +# This tag requires that the tag GENERATE_DOCSET is set to YES. + +DOCSET_FEEDNAME = "Doxygen generated docs" + +# This tag specifies a string that should uniquely identify the documentation +# set bundle. This should be a reverse domain-name style string, e.g. +# com.mycompany.MyDocSet. Doxygen will append .docset to the name. +# The default value is: org.doxygen.Project. +# This tag requires that the tag GENERATE_DOCSET is set to YES. + +DOCSET_BUNDLE_ID = org.doxygen.Project + +# The DOCSET_PUBLISHER_ID tag specifies a string that should uniquely identify +# the documentation publisher. This should be a reverse domain-name style +# string, e.g. com.mycompany.MyDocSet.documentation. +# The default value is: org.doxygen.Publisher. +# This tag requires that the tag GENERATE_DOCSET is set to YES. + +DOCSET_PUBLISHER_ID = org.doxygen.Publisher + +# The DOCSET_PUBLISHER_NAME tag identifies the documentation publisher. +# The default value is: Publisher. +# This tag requires that the tag GENERATE_DOCSET is set to YES. + +DOCSET_PUBLISHER_NAME = Publisher + +# If the GENERATE_HTMLHELP tag is set to YES then doxygen generates three +# additional HTML index files: index.hhp, index.hhc, and index.hhk. The +# index.hhp is a project file that can be read by Microsoft's HTML Help Workshop +# (see: http://www.microsoft.com/en-us/download/details.aspx?id=21138) on +# Windows. +# +# The HTML Help Workshop contains a compiler that can convert all HTML output +# generated by doxygen into a single compiled HTML file (.chm). Compiled HTML +# files are now used as the Windows 98 help format, and will replace the old +# Windows help format (.hlp) on all Windows platforms in the future. Compressed +# HTML files also contain an index, a table of contents, and you can search for +# words in the documentation. The HTML workshop also contains a viewer for +# compressed HTML files. +# The default value is: NO. +# This tag requires that the tag GENERATE_HTML is set to YES. + +GENERATE_HTMLHELP = NO + +# The CHM_FILE tag can be used to specify the file name of the resulting .chm +# file. You can add a path in front of the file if the result should not be +# written to the html output directory. +# This tag requires that the tag GENERATE_HTMLHELP is set to YES. + +CHM_FILE = + +# The HHC_LOCATION tag can be used to specify the location (absolute path +# including file name) of the HTML help compiler ( hhc.exe). If non-empty +# doxygen will try to run the HTML help compiler on the generated index.hhp. +# The file has to be specified with full path. +# This tag requires that the tag GENERATE_HTMLHELP is set to YES. + +HHC_LOCATION = + +# The GENERATE_CHI flag controls if a separate .chi index file is generated ( +# YES) or that it should be included in the master .chm file ( NO). +# The default value is: NO. +# This tag requires that the tag GENERATE_HTMLHELP is set to YES. + +GENERATE_CHI = NO + +# The CHM_INDEX_ENCODING is used to encode HtmlHelp index ( hhk), content ( hhc) +# and project file content. +# This tag requires that the tag GENERATE_HTMLHELP is set to YES. + +CHM_INDEX_ENCODING = + +# The BINARY_TOC flag controls whether a binary table of contents is generated ( +# YES) or a normal table of contents ( NO) in the .chm file. +# The default value is: NO. +# This tag requires that the tag GENERATE_HTMLHELP is set to YES. + +BINARY_TOC = NO + +# The TOC_EXPAND flag can be set to YES to add extra items for group members to +# the table of contents of the HTML help documentation and to the tree view. +# The default value is: NO. +# This tag requires that the tag GENERATE_HTMLHELP is set to YES. + +TOC_EXPAND = NO + +# If the GENERATE_QHP tag is set to YES and both QHP_NAMESPACE and +# QHP_VIRTUAL_FOLDER are set, an additional index file will be generated that +# can be used as input for Qt's qhelpgenerator to generate a Qt Compressed Help +# (.qch) of the generated HTML documentation. +# The default value is: NO. +# This tag requires that the tag GENERATE_HTML is set to YES. + +GENERATE_QHP = NO + +# If the QHG_LOCATION tag is specified, the QCH_FILE tag can be used to specify +# the file name of the resulting .qch file. The path specified is relative to +# the HTML output folder. +# This tag requires that the tag GENERATE_QHP is set to YES. + +QCH_FILE = + +# The QHP_NAMESPACE tag specifies the namespace to use when generating Qt Help +# Project output. For more information please see Qt Help Project / Namespace +# (see: http://qt-project.org/doc/qt-4.8/qthelpproject.html#namespace). +# The default value is: org.doxygen.Project. +# This tag requires that the tag GENERATE_QHP is set to YES. + +QHP_NAMESPACE = org.doxygen.Project + +# The QHP_VIRTUAL_FOLDER tag specifies the namespace to use when generating Qt +# Help Project output. For more information please see Qt Help Project / Virtual +# Folders (see: http://qt-project.org/doc/qt-4.8/qthelpproject.html#virtual- +# folders). +# The default value is: doc. +# This tag requires that the tag GENERATE_QHP is set to YES. + +QHP_VIRTUAL_FOLDER = doc + +# If the QHP_CUST_FILTER_NAME tag is set, it specifies the name of a custom +# filter to add. For more information please see Qt Help Project / Custom +# Filters (see: http://qt-project.org/doc/qt-4.8/qthelpproject.html#custom- +# filters). +# This tag requires that the tag GENERATE_QHP is set to YES. + +QHP_CUST_FILTER_NAME = + +# The QHP_CUST_FILTER_ATTRS tag specifies the list of the attributes of the +# custom filter to add. For more information please see Qt Help Project / Custom +# Filters (see: http://qt-project.org/doc/qt-4.8/qthelpproject.html#custom- +# filters). +# This tag requires that the tag GENERATE_QHP is set to YES. + +QHP_CUST_FILTER_ATTRS = + +# The QHP_SECT_FILTER_ATTRS tag specifies the list of the attributes this +# project's filter section matches. Qt Help Project / Filter Attributes (see: +# http://qt-project.org/doc/qt-4.8/qthelpproject.html#filter-attributes). +# This tag requires that the tag GENERATE_QHP is set to YES. + +QHP_SECT_FILTER_ATTRS = + +# The QHG_LOCATION tag can be used to specify the location of Qt's +# qhelpgenerator. If non-empty doxygen will try to run qhelpgenerator on the +# generated .qhp file. +# This tag requires that the tag GENERATE_QHP is set to YES. + +QHG_LOCATION = + +# If the GENERATE_ECLIPSEHELP tag is set to YES, additional index files will be +# generated, together with the HTML files, they form an Eclipse help plugin. To +# install this plugin and make it available under the help contents menu in +# Eclipse, the contents of the directory containing the HTML and XML files needs +# to be copied into the plugins directory of eclipse. The name of the directory +# within the plugins directory should be the same as the ECLIPSE_DOC_ID value. +# After copying Eclipse needs to be restarted before the help appears. +# The default value is: NO. +# This tag requires that the tag GENERATE_HTML is set to YES. + +GENERATE_ECLIPSEHELP = NO + +# A unique identifier for the Eclipse help plugin. When installing the plugin +# the directory name containing the HTML and XML files should also have this +# name. Each documentation set should have its own identifier. +# The default value is: org.doxygen.Project. +# This tag requires that the tag GENERATE_ECLIPSEHELP is set to YES. + +ECLIPSE_DOC_ID = org.doxygen.Project + +# If you want full control over the layout of the generated HTML pages it might +# be necessary to disable the index and replace it with your own. The +# DISABLE_INDEX tag can be used to turn on/off the condensed index (tabs) at top +# of each HTML page. A value of NO enables the index and the value YES disables +# it. Since the tabs in the index contain the same information as the navigation +# tree, you can set this option to YES if you also set GENERATE_TREEVIEW to YES. +# The default value is: NO. +# This tag requires that the tag GENERATE_HTML is set to YES. + +DISABLE_INDEX = NO + +# The GENERATE_TREEVIEW tag is used to specify whether a tree-like index +# structure should be generated to display hierarchical information. If the tag +# value is set to YES, a side panel will be generated containing a tree-like +# index structure (just like the one that is generated for HTML Help). For this +# to work a browser that supports JavaScript, DHTML, CSS and frames is required +# (i.e. any modern browser). Windows users are probably better off using the +# HTML help feature. Via custom stylesheets (see HTML_EXTRA_STYLESHEET) one can +# further fine-tune the look of the index. As an example, the default style +# sheet generated by doxygen has an example that shows how to put an image at +# the root of the tree instead of the PROJECT_NAME. Since the tree basically has +# the same information as the tab index, you could consider setting +# DISABLE_INDEX to YES when enabling this option. +# The default value is: NO. +# This tag requires that the tag GENERATE_HTML is set to YES. + +GENERATE_TREEVIEW = NO + +# The ENUM_VALUES_PER_LINE tag can be used to set the number of enum values that +# doxygen will group on one line in the generated HTML documentation. +# +# Note that a value of 0 will completely suppress the enum values from appearing +# in the overview section. +# Minimum value: 0, maximum value: 20, default value: 4. +# This tag requires that the tag GENERATE_HTML is set to YES. + +ENUM_VALUES_PER_LINE = 4 + +# If the treeview is enabled (see GENERATE_TREEVIEW) then this tag can be used +# to set the initial width (in pixels) of the frame in which the tree is shown. +# Minimum value: 0, maximum value: 1500, default value: 250. +# This tag requires that the tag GENERATE_HTML is set to YES. + +TREEVIEW_WIDTH = 250 + +# When the EXT_LINKS_IN_WINDOW option is set to YES doxygen will open links to +# external symbols imported via tag files in a separate window. +# The default value is: NO. +# This tag requires that the tag GENERATE_HTML is set to YES. + +EXT_LINKS_IN_WINDOW = NO + +# Use this tag to change the font size of LaTeX formulas included as images in +# the HTML documentation. When you change the font size after a successful +# doxygen run you need to manually remove any form_*.png images from the HTML +# output directory to force them to be regenerated. +# Minimum value: 8, maximum value: 50, default value: 10. +# This tag requires that the tag GENERATE_HTML is set to YES. + +FORMULA_FONTSIZE = 10 + +# Use the FORMULA_TRANPARENT tag to determine whether or not the images +# generated for formulas are transparent PNGs. Transparent PNGs are not +# supported properly for IE 6.0, but are supported on all modern browsers. +# +# Note that when changing this option you need to delete any form_*.png files in +# the HTML output directory before the changes have effect. +# The default value is: YES. +# This tag requires that the tag GENERATE_HTML is set to YES. + +FORMULA_TRANSPARENT = YES + +# Enable the USE_MATHJAX option to render LaTeX formulas using MathJax (see +# http://www.mathjax.org) which uses client side Javascript for the rendering +# instead of using prerendered bitmaps. Use this if you do not have LaTeX +# installed or if you want to formulas look prettier in the HTML output. When +# enabled you may also need to install MathJax separately and configure the path +# to it using the MATHJAX_RELPATH option. +# The default value is: NO. +# This tag requires that the tag GENERATE_HTML is set to YES. + +USE_MATHJAX = NO + +# When MathJax is enabled you can set the default output format to be used for +# the MathJax output. See the MathJax site (see: +# http://docs.mathjax.org/en/latest/output.html) for more details. +# Possible values are: HTML-CSS (which is slower, but has the best +# compatibility), NativeMML (i.e. MathML) and SVG. +# The default value is: HTML-CSS. +# This tag requires that the tag USE_MATHJAX is set to YES. + +MATHJAX_FORMAT = HTML-CSS + +# When MathJax is enabled you need to specify the location relative to the HTML +# output directory using the MATHJAX_RELPATH option. The destination directory +# should contain the MathJax.js script. For instance, if the mathjax directory +# is located at the same level as the HTML output directory, then +# MATHJAX_RELPATH should be ../mathjax. The default value points to the MathJax +# Content Delivery Network so you can quickly see the result without installing +# MathJax. However, it is strongly recommended to install a local copy of +# MathJax from http://www.mathjax.org before deployment. +# The default value is: http://cdn.mathjax.org/mathjax/latest. +# This tag requires that the tag USE_MATHJAX is set to YES. + +MATHJAX_RELPATH = http://cdn.mathjax.org/mathjax/latest + +# The MATHJAX_EXTENSIONS tag can be used to specify one or more MathJax +# extension names that should be enabled during MathJax rendering. For example +# MATHJAX_EXTENSIONS = TeX/AMSmath TeX/AMSsymbols +# This tag requires that the tag USE_MATHJAX is set to YES. + +MATHJAX_EXTENSIONS = + +# The MATHJAX_CODEFILE tag can be used to specify a file with javascript pieces +# of code that will be used on startup of the MathJax code. See the MathJax site +# (see: http://docs.mathjax.org/en/latest/output.html) for more details. For an +# example see the documentation. +# This tag requires that the tag USE_MATHJAX is set to YES. + +MATHJAX_CODEFILE = + +# When the SEARCHENGINE tag is enabled doxygen will generate a search box for +# the HTML output. The underlying search engine uses javascript and DHTML and +# should work on any modern browser. Note that when using HTML help +# (GENERATE_HTMLHELP), Qt help (GENERATE_QHP), or docsets (GENERATE_DOCSET) +# there is already a search function so this one should typically be disabled. +# For large projects the javascript based search engine can be slow, then +# enabling SERVER_BASED_SEARCH may provide a better solution. It is possible to +# search using the keyboard; to jump to the search box use + S +# (what the is depends on the OS and browser, but it is typically +# , /