Skip to content

Commit

Permalink
Merge pull request lammps#4413 from stanmoore1/kk_update_4.5.0
Browse files Browse the repository at this point in the history
Update Kokkos library in LAMMPS to v4.5.1
  • Loading branch information
akohlmey authored Jan 9, 2025
2 parents 0abb371 + 8595d8f commit 52d932d
Show file tree
Hide file tree
Showing 622 changed files with 21,832 additions and 17,438 deletions.
21 changes: 4 additions & 17 deletions cmake/Modules/Packages/KOKKOS.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -7,26 +7,13 @@ endif()

########################################################################
# consistency checks and Kokkos options/settings required by LAMMPS
if(Kokkos_ENABLE_CUDA)
option(Kokkos_ENABLE_IMPL_CUDA_MALLOC_ASYNC "CUDA asynchronous malloc support" OFF)
mark_as_advanced(Kokkos_ENABLE_IMPL_CUDA_MALLOC_ASYNC)
if(Kokkos_ENABLE_IMPL_CUDA_MALLOC_ASYNC)
message(STATUS "KOKKOS: CUDA malloc async support enabled")
else()
message(STATUS "KOKKOS: CUDA malloc async support disabled")
endif()
endif()
if(Kokkos_ENABLE_HIP)
option(Kokkos_ENABLE_HIP_MULTIPLE_KERNEL_INSTANTIATIONS "Enable multiple kernel instantiations with HIP" ON)
mark_as_advanced(Kokkos_ENABLE_HIP_MULTIPLE_KERNEL_INSTANTIATIONS)
option(Kokkos_ENABLE_ROCTHRUST "Use RoCThrust library" ON)
mark_as_advanced(Kokkos_ENABLE_ROCTHRUST)

if(Kokkos_ARCH_AMD_GFX942 OR Kokkos_ARCH_AMD_GFX940)
option(Kokkos_ENABLE_IMPL_HIP_UNIFIED_MEMORY "Enable unified memory with HIP" ON)
mark_as_advanced(Kokkos_ENABLE_IMPL_HIP_UNIFIED_MEMORY)
endif()
endif()

# Adding OpenMP compiler flags without the checks done for
# BUILD_OMP can result in compile failures. Enforce consistency.
if(Kokkos_ENABLE_OPENMP)
Expand Down Expand Up @@ -70,8 +57,8 @@ if(DOWNLOAD_KOKKOS)
list(APPEND KOKKOS_LIB_BUILD_ARGS "-DCMAKE_CXX_EXTENSIONS=${CMAKE_CXX_EXTENSIONS}")
list(APPEND KOKKOS_LIB_BUILD_ARGS "-DCMAKE_TOOLCHAIN_FILE=${CMAKE_TOOLCHAIN_FILE}")
include(ExternalProject)
set(KOKKOS_URL "https://github.com/kokkos/kokkos/archive/4.4.01.tar.gz" CACHE STRING "URL for KOKKOS tarball")
set(KOKKOS_MD5 "de6ee80d00b6212b02bfb7f1e71a8392" CACHE STRING "MD5 checksum of KOKKOS tarball")
set(KOKKOS_URL "https://github.com/kokkos/kokkos/archive/4.5.01.tar.gz" CACHE STRING "URL for KOKKOS tarball")
set(KOKKOS_MD5 "4d832aa0284169d9e3fbae3165286bc6" CACHE STRING "MD5 checksum of KOKKOS tarball")
mark_as_advanced(KOKKOS_URL)
mark_as_advanced(KOKKOS_MD5)
GetFallbackURL(KOKKOS_URL KOKKOS_FALLBACK)
Expand All @@ -96,7 +83,7 @@ if(DOWNLOAD_KOKKOS)
add_dependencies(LAMMPS::KOKKOSCORE kokkos_build)
add_dependencies(LAMMPS::KOKKOSCONTAINERS kokkos_build)
elseif(EXTERNAL_KOKKOS)
find_package(Kokkos 4.4.01 REQUIRED CONFIG)
find_package(Kokkos 4.5.01 REQUIRED CONFIG)
target_link_libraries(lammps PRIVATE Kokkos::kokkos)
else()
set(LAMMPS_LIB_KOKKOS_SRC_DIR ${LAMMPS_LIB_SOURCE_DIR}/kokkos)
Expand Down
102 changes: 57 additions & 45 deletions doc/src/Build_extras.rst
Original file line number Diff line number Diff line change
Expand Up @@ -547,16 +547,7 @@ They must be specified in uppercase.
- Local machine
* - AMDAVX
- HOST
- AMD 64-bit x86 CPU (AVX 1)
* - ZEN
- HOST
- AMD Zen class CPU (AVX 2)
* - ZEN2
- HOST
- AMD Zen2 class CPU (AVX 2)
* - ZEN3
- HOST
- AMD Zen3 class CPU (AVX 2)
- AMD chip
* - ARMV80
- HOST
- ARMv8.0 Compatible CPU
Expand All @@ -572,105 +563,126 @@ They must be specified in uppercase.
* - A64FX
- HOST
- ARMv8.2 with SVE Support
* - ARMV9_GRACE
- HOST
- ARMv9 NVIDIA Grace CPU
* - SNB
- HOST
- Intel Sandy/Ivy Bridge CPU (AVX 1)
- Intel Sandy/Ivy Bridge CPUs
* - HSW
- HOST
- Intel Haswell CPU (AVX 2)
- Intel Haswell CPUs
* - BDW
- HOST
- Intel Broadwell Xeon E-class CPU (AVX 2 + transactional mem)
* - SKL
- HOST
- Intel Skylake Client CPU
* - SKX
- HOST
- Intel Skylake Xeon Server CPU (AVX512)
- Intel Broadwell Xeon E-class CPUs
* - ICL
- HOST
- Intel Ice Lake Client CPU (AVX512)
- Intel Ice Lake Client CPUs (AVX512)
* - ICX
- HOST
- Intel Ice Lake Xeon Server CPU (AVX512)
* - SPR
- Intel Ice Lake Xeon Server CPUs (AVX512)
* - SKL
- HOST
- Intel Sapphire Rapids Xeon Server CPU (AVX512)
- Intel Skylake Client CPUs
* - SKX
- HOST
- Intel Skylake Xeon Server CPUs (AVX512)
* - KNC
- HOST
- Intel Knights Corner Xeon Phi
* - KNL
- HOST
- Intel Knights Landing Xeon Phi
* - SPR
- HOST
- Intel Sapphire Rapids Xeon Server CPUs (AVX512)
* - POWER8
- HOST
- IBM POWER8 CPU
- IBM POWER8 CPUs
* - POWER9
- HOST
- IBM POWER9 CPU
- IBM POWER9 CPUs
* - ZEN
- HOST
- AMD Zen architecture
* - ZEN2
- HOST
- AMD Zen2 architecture
* - ZEN3
- HOST
- AMD Zen3 architecture
* - RISCV_SG2042
- HOST
- SG2042 (RISC-V) CPU
- SG2042 (RISC-V) CPUs
* - RISCV_RVA22V
- HOST
- RVA22V (RISC-V) CPUs
* - KEPLER30
- GPU
- NVIDIA Kepler generation CC 3.0 GPU
- NVIDIA Kepler generation CC 3.0
* - KEPLER32
- GPU
- NVIDIA Kepler generation CC 3.2 GPU
- NVIDIA Kepler generation CC 3.2
* - KEPLER35
- GPU
- NVIDIA Kepler generation CC 3.5 GPU
- NVIDIA Kepler generation CC 3.5
* - KEPLER37
- GPU
- NVIDIA Kepler generation CC 3.7 GPU
- NVIDIA Kepler generation CC 3.7
* - MAXWELL50
- GPU
- NVIDIA Maxwell generation CC 5.0 GPU
- NVIDIA Maxwell generation CC 5.0
* - MAXWELL52
- GPU
- NVIDIA Maxwell generation CC 5.2 GPU
- NVIDIA Maxwell generation CC 5.2
* - MAXWELL53
- GPU
- NVIDIA Maxwell generation CC 5.3 GPU
- NVIDIA Maxwell generation CC 5.3
* - PASCAL60
- GPU
- NVIDIA Pascal generation CC 6.0 GPU
- NVIDIA Pascal generation CC 6.0
* - PASCAL61
- GPU
- NVIDIA Pascal generation CC 6.1 GPU
- NVIDIA Pascal generation CC 6.1
* - VOLTA70
- GPU
- NVIDIA Volta generation CC 7.0 GPU
- NVIDIA Volta generation CC 7.0
* - VOLTA72
- GPU
- NVIDIA Volta generation CC 7.2 GPU
- NVIDIA Volta generation CC 7.2
* - TURING75
- GPU
- NVIDIA Turing generation CC 7.5 GPU
- NVIDIA Turing generation CC 7.5
* - AMPERE80
- GPU
- NVIDIA Ampere generation CC 8.0 GPU
- NVIDIA Ampere generation CC 8.0
* - AMPERE86
- GPU
- NVIDIA Ampere generation CC 8.6 GPU
- NVIDIA Ampere generation CC 8.6
* - ADA89
- GPU
- NVIDIA Ada Lovelace generation CC 8.9 GPU
- NVIDIA Ada generation CC 8.9
* - HOPPER90
- GPU
- NVIDIA Hopper generation CC 9.0 GPU
- NVIDIA Hopper generation CC 9.0
* - AMD_GFX906
- GPU
- AMD GPU MI50/MI60
- AMD GPU MI50/60
* - AMD_GFX908
- GPU
- AMD GPU MI100
* - AMD_GFX90A
- GPU
- AMD GPU MI200
* - AMD_GFX940
- GPU
- AMD GPU MI300
* - AMD_GFX942
- GPU
- AMD GPU MI300
* - AMD_GFX942_APU
- GPU
- AMD APU MI300A
* - AMD_GFX1030
- GPU
- AMD GPU V620/W6800
Expand All @@ -679,7 +691,7 @@ They must be specified in uppercase.
- AMD GPU RX7900XTX
* - AMD_GFX1103
- GPU
- AMD Phoenix APU with Radeon 740M/760M/780M/880M/890M
- AMD APU Phoenix
* - INTEL_GEN
- GPU
- SPIR64-based devices, e.g. Intel GPUs, using JIT
Expand All @@ -702,7 +714,7 @@ They must be specified in uppercase.
- GPU
- Intel GPU Ponte Vecchio

This list was last updated for version 4.3.0 of the Kokkos library.
This list was last updated for version 4.5.1 of the Kokkos library.

.. tabs::

Expand Down
112 changes: 109 additions & 3 deletions lib/kokkos/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,112 @@
# CHANGELOG

## 4.5.01

[Full Changelog](https://github.com/kokkos/kokkos/compare/4.5.00...4.5.01)

### Bug Fixes

* Fix re-builds after cleaning the binary tree when doing `add_subdirectory` on the Kokkos source [\#7557](https://github.com/kokkos/kokkos/pull/7557)
* Update mdspan to include fix for submdspan and bracket operator with clang 15&16 [\#7559](https://github.com/kokkos/kokkos/pull/7559)
* Fix DynRankView performance regression by re-introducing shortcut operator() impls [\#7606](https://github.com/kokkos/kokkos/pull/7606)
* Add missing MI300A (`GFX942_APU`) option to Makefile build-system

## 4.5.00

[Full Changelog](https://github.com/kokkos/kokkos/compare/4.4.01...4.5.00)

### Features

* SYCL backend graduated to production ready
* Introduce new `SequentialHostInit` view allocation property [\#7229](https://github.com/kokkos/kokkos/pull/7229) (backported in 4.4.01)
* Support building with Run-Time Type Information (RTTI) disabled
* Add new `KOKKOS_RELOCATABLE_FUNCTION` function annotation macro [\#5993](https://github.com/kokkos/kokkos/pull/5993)

### Backend and Architecture Enhancements

#### CUDA

* Adding occupancy tuning for CUDA architectures [\#6788](https://github.com/kokkos/kokkos/pull/6788)
* By default disable `cudaMallocAsync` (i.e., revert the change made in version 4.2) [\#7353](https://github.com/kokkos/kokkos/pull/7353)

#### HIP

* Add support for AMD Phoenix APUs with Radeon 740M/760M/780M/880M/890M [\#7162](https://github.com/kokkos/kokkos/pull/7162)
* Update maximum waves per CU values for consumer card [\#7347](https://github.com/kokkos/kokkos/pull/7347)
* Check that Kokkos is running on the architecture it was compiled for [\#7379](https://github.com/kokkos/kokkos/pull/7379)
* Add opt-in option to use `hipMallocAsync` instead of `hipMalloc` [\#7324](https://github.com/kokkos/kokkos/pull/7324)
* Introduce new architecture option `AMD_GFX942_APU` for MI300A [\#7462](https://github.com/kokkos/kokkos/pull/7462)

#### SYCL

* Move the `SYCL` backend out of the `Experimental` namespace [\#7171](https://github.com/kokkos/kokkos/pull/7171)
* Introduce `KOKKOS_ENABLE_SYCL_RELOCATABLE_DEVICE_CODE` as CMake option [\#5993](https://github.com/kokkos/kokkos/pull/5993)

#### OpenACC

* Add support for building with the Clacc compiler [\#7198](https://github.com/kokkos/kokkos/pull/7198)
* Workaround NVHPC collapse clause bug for `MDRangePolicy` [\#7425](https://github.com/kokkos/kokkos/pull/7425)

#### HPX

* Implement `Experimental::partition_space` to produce truly independent execution spaces [\#7287](https://github.com/kokkos/kokkos/pull/7287)

#### Threads

* Fix compilation for `parallel_reduce` `MDRange` with `Dynamic` scheduling [\#7478](https://github.com/kokkos/kokkos/pull/7478)
* Fix race conditions on ARM architectures [\#7498](https://github.com/kokkos/kokkos/pull/7498)

#### OpenMP

* Fix run time behavior when compiling with `-fvisibility-hidden` [\#7284](https://github.com/kokkos/kokkos/pull/7284) (backported in 4.4.01)
* Fix linking with Cray Clang compiler [\#7341](https://github.com/kokkos/kokkos/pull/7341)

#### Serial

* Allow `Kokkos_ENABLE_ATOMICS_BYPASS` to skip mutexes to remediate performance regression in 4.4 [\#7369](https://github.com/kokkos/kokkos/pull/7369)

### General Enhancements

* Improve `View` initialization/destruction for non-scalar trivial and trivially-destructible types [\#7219](https://github.com/kokkos/kokkos/pull/7219) [\#7225](https://github.com/kokkos/kokkos/pull/7225)
* Add getters for default tile sizes used in `MDRangePolicy` [\#6839](https://github.com/kokkos/kokkos/pull/6839)
* Improve performance of `Kokkos::sort` when `std::sort` is used [\#7264](https://github.com/kokkos/kokkos/pull/7264)
* Add range-based for loop support for `Array<T, N>` [\#7293](https://github.com/kokkos/kokkos/pull/7293)
* Allow functors as reducers for nested team parallel reduce [\#6921](https://github.com/kokkos/kokkos/pull/6921)
* Avoid making copies of string rvalue reference arguments to `view_alloc()` [\#7364](https://github.com/kokkos/kokkos/pull/7364)
* Add `atomic_{mod,xor,nand,lshift,rshift}` [\#7458](https://github.com/kokkos/kokkos/pull/7458)
* Allow using `SequentialHostInit` with `Kokkos::DualView` [\#7456](https://github.com/kokkos/kokkos/pull/7456)
* Add `Graph::instantiate()` [\#7240](https://github.com/kokkos/kokkos/pull/7240)
* Allow an arbitrary execution space instance to be used in `Kokkos::Graph::submit()` [\#7249](https://github.com/kokkos/kokkos/pull/7249)
* Enable compile-time diagnostic of illegal reduction target for graphs [\#7460](https://github.com/kokkos/kokkos/pull/7460)

### Build System Changes

* Make sure backend-specific options such as `IMPL_CUDA_MALLOC_ASYNC` only show when that backend is actually enabled [\#7228](https://github.com/kokkos/kokkos/pull/7228)
* Major refactoring removing `TriBITS` paths [\#6164](https://github.com/kokkos/kokkos/pull/6164)
* Add support for SpacemiT K60 (RISC-V) [\#7160](https://github.com/kokkos/kokkos/pull/7160)

### Deprecations

* Deprecate Tasking interface [\#7393](https://github.com/kokkos/kokkos/pull/7393)
* Deprecate `atomic_query_version`, `atomic_assign`, `atomic_compare_exchange_strong`, `atomic_{inc, dec}rement` [\#7458](https://github.com/kokkos/kokkos/pull/7458)
* Deprecate `{OpenMP,HPX}::is_asynchronous()` [\#7322](https://github.com/kokkos/kokkos/pull/7322)

### Bug Fixes

* Fix undefined behavior in `BinSort` when sorting within bins on host [\#7223](https://github.com/kokkos/kokkos/pull/7223)
* Using CUDA limits to set extents for blocks, grids [\#7235](https://github.com/kokkos/kokkos/pull/7235)
* Fix `deep_copy (serial_exec, dst, src)` with multiple host backends [\#7245](https://github.com/kokkos/kokkos/pull/7245)
* Skip `RangePolicy` bounds conversion checks if roundtrip convertibility is not provided [\#7172](https://github.com/kokkos/kokkos/pull/7172)
* Allow extracting host and device views from `DualView` with `const` value type [\#7242](https://github.com/kokkos/kokkos/pull/7242)
* Fix `TeamPolicy` array reduction for CUDA and HIP [\#6296](https://github.com/kokkos/kokkos/pull/6296)
* Fix implicit copy assignment operators in few AVX2 masks being deleted [\#7296](https://github.com/kokkos/kokkos/pull/7296)
* Fix configuring without architecture flags for SYCL [\#7303](https://github.com/kokkos/kokkos/pull/7303)
* Set an initial value index during join of `MinLoc`, `MaxLoc` or `MinMaxLoc` [\#7330](https://github.com/kokkos/kokkos/pull/7330)
* Fix storage lifetime of driver for global launch of graph nodes for CUDA and HIP [\#7365](https://github.com/kokkos/kokkos/pull/7365)
* Make `value_type` for `RandomAccessIterator` non-`const` [\#7485](https://github.com/kokkos/kokkos/pull/7485)

## [4.4.01](https://github.com/kokkos/kokkos/tree/4.4.01)
[Full Changelog](https://github.com/kokkos/kokkos/compare/4.0.00...4.4.01)
[Full Changelog](https://github.com/kokkos/kokkos/compare/4.4.00...4.4.01)

### Features:
* Introduce new SequentialHostInit view allocation property [\#7229](https://github.com/kokkos/kokkos/pull/7229)
Expand All @@ -13,7 +118,7 @@

### Bug Fixes
* OpenMP: Fix issue related to the visibility of an internal symbol with shared libraries that affected `ScatterView` in particular [\#7284](https://github.com/kokkos/kokkos/pull/7284)
* Fix implicit copy assignment operators in few AVX2 masks being deleted [#7296](https://github.com/kokkos/kokkos/pull/7296)
* Fix implicit copy assignment operators in few AVX2 masks being deleted [\#7296](https://github.com/kokkos/kokkos/pull/7296)

## [4.4.00](https://github.com/kokkos/kokkos/tree/4.4.00)
[Full Changelog](https://github.com/kokkos/kokkos/compare/4.3.01...4.4.00)
Expand Down Expand Up @@ -57,6 +162,7 @@
* SIMD: Allow flexible vector width for 32 bit types [\#6802](https://github.com/kokkos/kokkos/pull/6802)
* Updates for `Kokkos::Array`: add `kokkos_swap(Array<T, N>)` specialization [\#6943](https://github.com/kokkos/kokkos/pull/6943), add `Kokkos::to_array` [\#6375](https://github.com/kokkos/kokkos/pull/6375), make `Kokkos::Array` equality-comparable [\#7148](https://github.com/kokkos/kokkos/pull/7148)
* Structured binding support for `Kokkos::complex` [\#7040](https://github.com/kokkos/kokkos/pull/7040)
* Introduce `KOKKOS_DEDUCTION_GUIDE` macro to allow for portable user-defined deduction guides [\#6954](https://github.com/kokkos/kokkos/pull/6954)

### Build System Changes
* Do not require OpenMP support for languages other than CXX [\#6965](https://github.com/kokkos/kokkos/pull/6965)
Expand Down Expand Up @@ -1388,7 +1494,7 @@
**Closed issues:**

- Silent error (Validate storage level arg to set_scratch_size) [\#3097](https://github.com/kokkos/kokkos/issues/3097)
- Remove KOKKKOS\_ENABLE\_PROFILING Option [\#3095](https://github.com/kokkos/kokkos/issues/3095)
- Remove KOKKOS\_ENABLE\_PROFILING Option [\#3095](https://github.com/kokkos/kokkos/issues/3095)
- Cuda 11 -\> allow C++17 [\#3083](https://github.com/kokkos/kokkos/issues/3083)
- In source build failure not explained [\#3081](https://github.com/kokkos/kokkos/issues/3081)
- Allow naming of Views for initialization kernel [\#3070](https://github.com/kokkos/kokkos/issues/3070)
Expand Down
Loading

0 comments on commit 52d932d

Please sign in to comment.