Skip to content

Commit

Permalink
Fix AVX2 int4pack_mm_kernel crash if weighs are unaligned (pytorch#12…
Browse files Browse the repository at this point in the history
…4433)

Followup after pytorch#124128
`s/_mm256_load_si128/_mm256_loadu_si128/`

Pull Request resolved: pytorch#124433
Approved by: https://github.com/desertfire
  • Loading branch information
malfet authored and pytorchmergebot committed Apr 19, 2024
1 parent a6f044a commit b2f6cfd
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion aten/src/ATen/native/cpu/int4mm_kernel.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -280,7 +280,7 @@ inline void tinygemm_kernel(
// when BLOCK_N = 32, handle each row at a time
if constexpr (col == 0) {
__m256i mask = _mm256_set1_epi32(0xF);
__m128i b4 = _mm_load_si128((__m128i*)(B + k * ldb));
__m128i b4 = _mm_loadu_si128((__m128i*)(B + k * ldb));
if (k + PREFETCH_SIZE_K < K) {
_mm_prefetch(B + (k + PREFETCH_SIZE_K) * ldb, _MM_HINT_T0);
}
Expand Down

0 comments on commit b2f6cfd

Please sign in to comment.