Skip to content

Commit

Permalink
base: enable ETC__CPU_ARCH__X86_64_V2 on MSVC
Browse files Browse the repository at this point in the history
If I understand the #148
discussion correctly, a user overrode WUFFS_BASE__CPU_ARCH__X86_FAMILY
(without also providing /arch:AVX) to enable SSE4.2 code (even though,
technically, x86_64 doesn't guarantee SSE4.2, only SSE2). This seemed to
work fine in practice, but was unsupported by Wuffs and 'broke' by
fixing #145, now looking for WUFFS_BASE__CPU_ARCH__X86_64 instead of
WUFFS_BASE__CPU_ARCH__X86_FAMILY.

WUFFS_BASE__CPU_ARCH__X86_FAMILY has since been renamed to
WUFFS_PRIVATE_IMPL__CPU_ARCH__X86_FAMILY and then removed entirely.

As SSE4.2 (roughly equivalent to x86-64-v2) seems to work fine at
compile time (with the existing cpuid detection at runtime), enable it
by default for MSVC, without need /arch.

Enabling x86-64-v3 too, by default, is held back for now, pending
further confirmation from MSVC users. The #148 user was exercising
Wuffs' PNG codec, which uses SSE4.2 but not AVX or AVX2. Currently, only
Wuffs' JPEG codec (and YCbCr conversion) uses AVX or AVX2.

GCC and Clang don't need this fiddliness, because they support
"__attribute__((target(arg)))".

Updates #148
  • Loading branch information
nigeltao committed Jul 9, 2024
1 parent 7a57e2b commit b64a761
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 10 deletions.
10 changes: 5 additions & 5 deletions internal/cgen/base/fundamental-public.h
Original file line number Diff line number Diff line change
Expand Up @@ -119,11 +119,13 @@
#elif defined(_MSC_VER) // (#if-chain ref AVOID_CPU_ARCH_1)

#if defined(_M_X64)
#if defined(__AVX2__) || defined(__clang__)

// We need <intrin.h> for the __cpuid function.
#include <intrin.h>
// That's not enough for X64 SIMD, with clang-cl, if we want to use
#define WUFFS_PRIVATE_IMPL__CPU_ARCH__X86_64
#define WUFFS_PRIVATE_IMPL__CPU_ARCH__X86_64_V2
#if defined(__AVX2__) || defined(__clang__)

// intrin.h isn't enough for X64 SIMD, with clang-cl, if we want to use
// "__attribute__((target(arg)))" without e.g. "/arch:AVX".
//
// Some web pages suggest that <immintrin.h> is all you need, as it pulls in
Expand All @@ -132,8 +134,6 @@
#include <immintrin.h> // AVX, AVX2, FMA, POPCNT
#include <nmmintrin.h> // SSE4.2
#include <wmmintrin.h> // AES, PCLMUL
#define WUFFS_PRIVATE_IMPL__CPU_ARCH__X86_64
#define WUFFS_PRIVATE_IMPL__CPU_ARCH__X86_64_V2
#define WUFFS_PRIVATE_IMPL__CPU_ARCH__X86_64_V3

#else // defined(__AVX2__) || defined(__clang__)
Expand Down
10 changes: 5 additions & 5 deletions release/c/wuffs-unsupported-snapshot.c
Original file line number Diff line number Diff line change
Expand Up @@ -175,11 +175,13 @@ extern "C" {
#elif defined(_MSC_VER) // (#if-chain ref AVOID_CPU_ARCH_1)

#if defined(_M_X64)
#if defined(__AVX2__) || defined(__clang__)

// We need <intrin.h> for the __cpuid function.
#include <intrin.h>
// That's not enough for X64 SIMD, with clang-cl, if we want to use
#define WUFFS_PRIVATE_IMPL__CPU_ARCH__X86_64
#define WUFFS_PRIVATE_IMPL__CPU_ARCH__X86_64_V2
#if defined(__AVX2__) || defined(__clang__)

// intrin.h isn't enough for X64 SIMD, with clang-cl, if we want to use
// "__attribute__((target(arg)))" without e.g. "/arch:AVX".
//
// Some web pages suggest that <immintrin.h> is all you need, as it pulls in
Expand All @@ -188,8 +190,6 @@ extern "C" {
#include <immintrin.h> // AVX, AVX2, FMA, POPCNT
#include <nmmintrin.h> // SSE4.2
#include <wmmintrin.h> // AES, PCLMUL
#define WUFFS_PRIVATE_IMPL__CPU_ARCH__X86_64
#define WUFFS_PRIVATE_IMPL__CPU_ARCH__X86_64_V2
#define WUFFS_PRIVATE_IMPL__CPU_ARCH__X86_64_V3

#else // defined(__AVX2__) || defined(__clang__)
Expand Down

0 comments on commit b64a761

Please sign in to comment.