-
CFD
-
What are possible methods to solve compressible Euler equations
-
CFD Julia: A Learning Module Structuring an Introductory Course on Computational Fluid Dynamics
-
Lattice Boltzmann codes
-
Nektar++
-
Simple (and not-so-simple) CFD solvers written in Fortran with Python plotting routines
-
Exasim - Generating Discontinuous Galerkin Codes For Extreme Scalable Simulations
-
Chebyshev Pseudo-Spectral Method (PSM)
-
Chebyshev Polynomials J.C. Mason D.C. Handscomb
- Chapter 1. Definitions
- Chapter 2. Basic Properties and Formulae
- Chapter 3. The Minimax Property and Its Applications
- Chapter 4. Orthogonality and Least-Squares Approximation
- Chapter 5. Chebyshev Series
- Chapter 6. Chebyshev Interpolation
- Chapter 7. Near-Best L∞, L1 and Lp Approximations
- Chapter 8. Integration Using Chebyshev Polynomials
- Chapter 9. Solution of Integral Equations
- Chapter 10. Solution of Ordinary Differential Equations
- Chapter 11. Chebyshev and Spectral Methods for Partial DifferentialEquations
- Chapter 12. Conclusion
- Appendices:
- Summary of Notations, Definitions and ImportantProperties
- Tables of Coefficients
- FFTW Discrete Cosine Transform Derivative
- FAST ALGORITHMS FOR DISCRETE POLYNOMIAL TRANSFORMS
- A New Method for Chebyshev Polynomial Interpolation Based on Cosine Transforms
-
A brief introduction to pseudo-spectral methods: application to diffusion problems
-
An Introduction to Domain Decomposition Methods:algorithms, theory and parallel implementation
-
Chebyshev-Legendre Spectral Domain Decomposition Method for Two-Dimensional Vorticity Equations
-
An efficient domain-decomposition pseudo-spectral method for solving elliptic differential equations
-
A Pseudospectral Multi-Domain Method for the Incompressible Navier-Stokes Equations
-
multiple-interval pseudospectral methods to solve optimal control problems
-
FDBB (Fluid Dynamics Building Blocks) is a C++ expression template library for fluid dynamics
-
-
Finite Element Methods (FEM) and Spectral Element Methods (SEM)
- deal.II — an open source finite element library
- Feel++ finite element embedded library in C++
- Veamy: an extensible object-oriented C++ library for the virtual element method
- Two dimensional high-order spectral element method fluid dynamics solver
- github: ITHACA-SEM - In real Time Highly Advanced Computational Applications for Spectral Element Methods
- AxiSEM is a parallel spectral-element method to solve 3D wave propagation in a sphere with axisymmetric or spherically symmetric visco-elastic, acoustic, anisotropic structures
- HDGlab: An open-source implementation of the hybridisable discontinuous Galerkin method in MATLAB
- Euler Equations for Ideal Gases
-
Siemens
-
Maxeler
-
facilities to experiment with Discontinuous Petrov Galerkin (DPG) methods
-
- Code_Saturne
- Large-Scale CFD Parallel Computing Dealing with Massive Mesh016c0bd28b2435d468ce3cd1771426de9f264af6
-
Open source tools in technical photorealistic large-scale visualisation
-
3D, block structured, explicit/implicit, Navier-Stokes solver
-
CFD + GPU
- Recent progress and challenges in exploiting graphics processors in computational fluid dynamics: slightly outdated but interesting
- PyFR
- PyFR is an open-source Python based framework for solving advection-diffusion type problems on streaming architectures using the Flux Reconstruction approach of Huynh
- New PyFR Paper “Heterogeneous Computing on Mixed Unstructured Grids with PyFR”
- PyFR: An open source framework for solving advection–diffusion type problems on streaming architectures using the flux reconstruction approach
- High Performance Parallelism Pearls Volume Two: Multicore and Many-core Programming Approaches
-
Camellia
-
Co-design at Lawrence Livermore National Lab
-
Modern Fortran
- NNSA, national labs team with Nvidia to develop open-source Fortran compiler technology
- Flang is a ground-up implementation of a Fortran front end written in modern C++. It started off as the f18 project
- F18 is a front-end for Fortran intended to replace the existing front-end in the Flang compiler
tl;dr 301 moved The code from this repository can now be found at flang - Flang and F18
- Installing LLVM Flang Fortran compiler
tl;dr
git clone https://github.com/llvm/llvm-project mkdir -p llvm-project/build cd llvm-project/build cmake ../llvm -DLLVM_ENABLE_PROJECTS=flang
+ [Unknown CMake command “tablegen”](https://stackoverflow.com/questions/59691069/unknown-cmake-command-tablegen)
- F18 is a front-end for Fortran intended to replace the existing front-end in the Flang compiler
-
libCEED: the CEED Library: Code for Efficient Extensible Discretization
-
AMG
- AMG intro
- Iteration methods
- Algebraic multigrid method by smoothed agglomeration for a Stokes problem
- Convergence of Algebraic Multigrid Based on Smoothed Aggregation II: Extension to a Petrov-Galerkin Method
- Lawrence Livermore National Laboratory Robert D. Falgout Center for Applied Scientific Computing An Algebraic Multigrid Tutorial
- An Introduction to Algebraic Multigrid
- An Algebraic Multigrid Tutorial IMA Tutorial – FastSolution Techniques November28-29, 2010
- Multigrid Methods: From Geometrical to Algebraic Versions Gundolf HAASE
- A root-node based algebraic multigrid method
- Iterative methods for linear, non-linear and eigenvalue problems
- A Multigrid Tutorial by William L. Briggs
- Algebraic Multigrid Code
- Performance of Preconditioners for Large-Scale Simulations Using Nek5000
- Reducing Complexity in Parallel Algebraic Multigrid Preconditioners, Hans de Sterck, Ulrike Meier Yang and Jeffrey J. Heys
- 3.2.5. Block Compressed Sparse Row Format (BSR)
- AMGX
- AMGX in Julia
- pyamgx: Python interface to NVIDIA's AMGX library
- AmgXWrapper
- An example and benchmark of AmgX and PETSc with Poisson system
- PetIBM - toolbox and applications of the immersed-boundary method on distributed-memory architectures
- geoclaw-landspill
- High-productivity, high-performance workflow for virus-scale electrostatic simulations with Bempp-Exafmm
- Alexa: Simulating Shock Hydrodynamics on the GPU using Kokkos
- GPGPU acceleration a case study of algebraic multigrid preconditioned GMRES
- AmgX: A Library for GPU Accelerated Algebraic Multigrid and Preconditioned Iterative Methods
- rocALUTION is a sparse linear algebra library with focus on exploring fine-grained parallelism
- amgcl
- amgcl
- C++ library for solving large sparse linear systems with algebraic multigrid method
- Triggering C++11 support in NVCC with CMake
tl;dr
diff --git a/CMakeLists.txt b/CMakeLists.txt index 6ca3264..b63e326 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -161,9 +161,9 @@ if(CMAKE_CXX_COMPILER_ID MATCHES "GNU" OR CMAKE_CXX_COMPILER_ID MATCHES "MSVC") if (CMAKE_CXX_COMPILER_ID MATCHES "GNU") list(APPEND CUDA_NVCC_FLAGS - ${CUDA_ARCH_FLAGS} -std=c++11 -Wno-deprecated-gpu-targets) + ${CUDA_ARCH_FLAGS} -std=c++17 -Wno-deprecated-gpu-targets) - list(APPEND CUDA_NVCC_FLAGS -Xcompiler -std=c++11 -Xcompiler -fPIC -Xcompiler -Wno-vla) + list(APPEND CUDA_NVCC_FLAGS -Xcompiler -std=c++17 -Xcompiler -fPIC -Xcompiler -Wno-vla) endif() add_library(cuda_target INTERFACE)
- SPARSH-AMG
- Ginkgo is a high-performance linear algebra library for manycore systems, with a focus on sparse solution of linear systems. It is implemented using modern C++ (you will need at least C++14 compliant compiler to build it), with GPU kernels implemented in CUDA and HIP. HAS support for AMG
- BootCMatchG
- multigrid solver for solving elliptic PDEs using finite differences on a rectangular grid
- Multigrid HowTo (Part I): A simple Multigrid solver in C++ in less than 200 lines of code
- Multigrid HowTo (Part II): An Open Source Algebraic Multigrid Solver in C++
- ExaStencils: Advanced Multigrid Solver Generation
- AMG intro
-
Sparse Linear System Solvers on GPUs
-
Freud, a tool to create Performance Annotations for C/C++ programs
-
RAPIDS - Open GPU Data Science
- RAFT: Reusable Accelerated Functions and Tools
- cuDF - GPU DataFrames
tl;drcd cpp && mkdir -p build && cd build && cmake .. -DCMAKE_BUILD_TYPE=Release -DOPENSSL_INCLUDE_DIR=/usr/include/openssl -DOPENSSL_CRYPTO_LIBRARY=/usr/lib/libcrypto.so -DOPENSSL_SSL_LIBRARY=/usr/lib/libssl.so
- cuSpatial - GPU-Accelerated Spatial and Trajectory Data Management and Analytics Library
-
CUDA rehab & NVidia docs
-
Course on CUDA Programming on NVIDIA GPUs, July 22-26, 2019
- Lecture 3: control flow and synchronisation: Warp divergence
- Lecture 5: libraries and tools
- Maximizing Unified Memory Performance in CUDA
- CUDA OPTIMIZATION TIPS, TRICKS AND TECHNIQUES Stephen Jones, GTC 2017
- HIGH THROUGHPUT WITH GPUS
- Small tips of optimizing CUDA programs
- Error using __ldg in cuda kernel at compile time tl;dr
nvcc -arch=sm_35 ...
-
Matrix multiplication in cuSparse (cusparseDcsrgemm) outputs wrong results
-
Параллельные вычисления с использованием стандартов MPI, OpenMP, OpenACC
-
Memory Model
-
LPC2018 - Open Source GPU compute stack - Not dancing the CUDA dance
-
OpenCL
- OpenCL 3.0 Specification Released With New Khronos Open-Source OpenCL SDK
- The State of OpenCL for Scientific Computing in 2018
- OpenCL: History & Future
- Tuned OpenCL BLAS
- OpenCL vloadn
- Could not find a package configuration file provided by "OpenCLHeaders"
- Using OpenCL on Adreno & Mali GPUs is slower than CPU
- Zero copy buffer allocation on arm mali midgard gpus?
- SYCL - C++ Single-source Heterogeneous Programming for OpenCL
-
OneAPI
-
Kompute
-
HCC is an Open Source, Optimizing C++ Compiler for Heterogeneous Compute currently for the ROCm GPU Computing Platform
- Why did AMD open source ROCm’s OpenCL driver-stack?
- wiki for HCC
- github HCC repository
- Portable Computing Language
- A collection of Arch Linux PKGBUILDS for the ROCm platform tl;dr
yay -S rocm-opencl-runtime
-
- clinfo ERROR: clBuildProgram(-11)
- rock-dkms kernel vs mainline clarification
- Error during installation of rock-dkms 4.0 on 5.4 kernel
- dkms build on unsported kernel and supported which makes errors
- ROCm support in upstream Linux kernels
- Information for rock-dkms
- Radeon ROCm 4.1 Released - Still Without RDNA GPU Support
- ROCm 4.1 - Vega 20 (Radeon VII) with upstream amdgpu
- AMD dkms fails
dkms install --no-depmod -m amdgpu-4.0 -v 23 -k 5.11.16-arch1-1 Error! Bad return status for module build on kernel: 5.11.16-arch1-1 (x86_64) Consult /var/lib/dkms/amdgpu-4.0/23/build/make.log for more information. ==> Warning, `dkms install --no-depmod -m amdgpu-4.0 -v 23 -k 5.11.16-arch1-1' returned 10 pacman -Qo /usr/src/amdgpu-4.0-23 /usr/src/amdgpu-4.0-23/ принадлежит rock-dkms-bin 4.0-3 /usr/src/amdgpu-4.0-23/ принадлежит rock-dkms-firmware-bin 4.0-3
-
OpenCL => Vulkan
-
OpenMP
-
OpenACC
-
MATOG - GPU Access Auto Tuning
-
LCSE - Linked Cluster Series Expansions - a framework for high-temperature series expansions
-
Apache Arrow
-
Sandia
- Trilinos is a collection of open-source software libraries, called packages, intended to be used as building blocks for the development of scientific applications.
- github repo fo Trilinos
tl;dr
$ yay -s trilinos 3 aur/trilinos 12.14.1-2 (+0 0.00%) algorithms for the solution of large-scale scientific problems 2 aur/mingw-w64-trilinos 12.12.1-1 (+0 0.00%) Framework for the solution of large-scale, complex multi-physics engineering and scientific problems (mingw-w64) 1 aur/trilinos-git 12.12.0.gd3b096f4f1-1 (+1 0.00%) (Out-of-date 2019-06-21) An effort to develop algorithms and enabling technologies within an object-oriented software framework for the solution of large-scale, complex multi-physics engineering and scientific problems.
- github repo fo Trilinos
- Trilinos is a collection of open-source software libraries, called packages, intended to be used as building blocks for the development of scientific applications.
-
ARM
-
CPU, GPU & DRAM Architecture Simulators
- A Survey of CPU-GPU Heterogeneous Computing Techniques
- Гибридная реализация алгоритма MST с использованием CPU и GPU
- Понимание конфликтов банков разделяемой (shared) памяти в NVIDIA CUDA
- Vulkan: The next Khronos graphics API… that is not OpenGL
- AMD supported project: HIP : Convert CUDA to Portable C++ Code
-
CARP: Correct and Efficient Accelerator Programming
- CARP dessimination
- PENCIL: a C99-based intermediate language for compute & optimization
- see also PPCG (below)
-
Framework for performance-portable parallel computations on unstructured meshes
- OP2: Developing an open-source framework for the execution of unstructured grid applications
- Optimising Unstructured Mesh Computational Fluid Dynamics Applications on Multicores via Machine Learning and Code Transformation
- Compiler Optimizations for Industrial Unstructured Mesh CFD Applications on GPUs
-
ROSE compiler + Mint for C-to-CUDA code generation
-
Nested Data Parallelism, Haskell, and friends
- Nested Data Parallelism on GPU
- Compiling a high-level language for GPUs: (via language support for architectures and compilers)
- NOVA: A Functional Language for Data Parallelism
- CuNesl: Compiling Nested Data-Parallel Languages for ...
- A Haskell EDSL for Nested Data-parallel Design-space ...
- Functional programming for nested data parallelism on GPUs
- Platform-Specific Optimization and Mapping of Stencil Codes through Refinement
- High-Performance Domain-Specific Languages for GPU Computing
- Monoids and their efficiency in practice
-
CUDA kernels generation using C++ expression templates technique
- CU++ -- an interesting approach
- VexCL is a C++ vector expression template library for OpenCL/CUDA
-
AnyDSL - A Framework for Rapid Development of Domain-Specific Libraries; thorin (The Higher-ORder INtermediate representation) / impala (An imperative and functional programming language)
-
A fast, ergonomic and portable tensor library with a deep learning focus
-
A curated list of awesome Nim frameworks, libraries and software
-
tl;dr
nimble refresh
nimble install neo
nimble install Arraymancer
-
Overview of the Efficient Programming Languages (v.3) 2018.4
-
Intel Level Zero
-
Code Generation for High Performance PDE Solvers on Modern Architectures
-
GPU roof model
- Elias Konstantinidis publications
- Analysis-Driven Optimization: Preparing for Analysis with NVIDIA Nsight Compute, Part 1
- GPU Performance Analysis
- Roofline and NVIDIA Ampere GPU Architecture Analysis
- Nsight Compute Feature Spotlight: Roofline Analysis, Asynchronous Copy, Sparse Data Compression
- Optimizing CUDA Memory Allocations Using NVIDIA Nsight Systems
- Roofline Hackathon 2020 part 1 and 2
-
YouTube videos on GPU embedded profiling/optimization
-
RICOS Co. Ltd. Research Institute for Computational Science Co.Ltd.