-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathChangeLog
20364 lines (13930 loc) · 601 KB
/
ChangeLog
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
commit 700745cdbb34e964e1abda86183809fd8dd95796
Author: Matteo Frigo <[email protected]>
Date: Thu May 24 08:00:45 2018 -0400
Bump FFTW_MINOR_VERSION for fftw-3.3.8
commit 902d0982522cdf6f0acd60f01f59203824e8e6f3
Author: Matteo Frigo <[email protected]>
Date: Thu May 24 07:43:02 2018 -0400
update NEWS
commit 41b0d9eff394891ba3327b9062811d48677bb411
Author: Matteo Frigo <[email protected]>
Date: Thu May 24 07:35:36 2018 -0400
CFLAGS: don't use -ffast-math
-ffast-math is a relic from 1999 when it was kind of necessary for
full use of FMA on powerpc. Nowadays it is just a liability. For
example, 'gcc-8 -ffast-math' ignores the disctintion between +0 and
-0, thus breaking the avx and avx2 implementations in fftw-3.7.
commit 19eeeca592f63413698f23dd02b9961f22581803
Author: Matteo Frigo <[email protected]>
Date: Thu May 24 07:29:00 2018 -0400
Fixes for gcc-8
It looks like 'gcc-8 -ffast-math' does honor the distinction between
+0.0 and -0.0 in floating-point constants. I suppose that technically
-ffast-math has the right to do so.
For good measure, this patch encodes such constants as their explicit
binary representation. A separate patch will disable -ffast-math.
commit bf478afbf2367df0f38c77f31d1f912aeeb82585
Author: Miklos Espak <[email protected]>
Date: Thu Apr 26 18:31:57 2018 +0100
Define include directory for installed targets (#141)
commit ab888adf510338c03ea8ac49b4aab91fb57f1479
Author: Steven G. Johnson <[email protected]>
Date: Sat Apr 14 11:40:39 2018 -0400
don't need both identifier and name fields
commit 2b999c600c58c78b8acb78c3352b02d9df6f6e60
Author: Steven G. Johnson <[email protected]>
Date: Fri Apr 13 08:43:35 2018 -0400
JSON doesn't like trailing commas
commit 92eee8bbc4252c871aa870d2dce88eb98d0c7d18
Author: Steven G. Johnson <[email protected]>
Date: Fri Apr 13 08:38:50 2018 -0400
list both C and OCaml (as explained in codemeta/codemeta#181)
commit 35e5609f17e212bf1c40da9b2ebe66784ad37052
Author: Steven G. Johnson <[email protected]>
Date: Thu Apr 12 12:01:15 2018 -0400
add codemeta file
commit eba07c46b5d2f7824d293ab59aa5c29a25034963
Author: Matteo Frigo <[email protected]>
Date: Mon Feb 19 09:30:29 2018 -0500
Call _mm256_zeroupper() when leaving avx512 code
Carsten Steger says:
simd-avx512.h defines VLEAVE as nothing in FFTW 3.3.7. However, the
current Intel® 64 and IA-32 Architectures Optimization Reference Manual,
chapter 15.18, recommends the following:
- When you have to mix group B instructions with Intel SSE instructions,
or you suspect that such a mixture might occur, use the VZEROUPPER
instruction whenever a transition is expected.
- Add VZEROUPPER after group B instructions were executed and before any
function call that might lead to Intel SSE instruction execution.
- Add VZEROUPPER at the end of any function that uses group B instructions.
- Add VZEROUPPER before thread creation if not already in a clean state
so that the thread does not inherit Dirty Upper State.
(Group B are instruction types that modify bits 128-511 of vector
registers 0-15.)
Therefore, I believe it would be prudent to define VLEAVE as
_mm256_zeroupper in simd-avx512.h (see the attached patch).
At https://software.intel.com/en-us/forums/intel-isa-extensions/topic/704023
Mark Charney says:
To be clear, we very much still recommend using VZEROUPPER on
Skylake. Even though it does not have the same penalties as earlier
designs in that family for mixing AVX and SSE code, we definitely
recommend using VZEROUPPER on Skylake.
Yes it would obviously be better if there were one solution. For
code that has to run on both families, the "common code" solution
is to use the Xeon guidelines.
If Mark Charney recommends VZEROUPPER, that's good enough for me.
commit b267008613d082975b108252ed596ba0916ffa31
Author: Matteo Frigo <[email protected]>
Date: Wed Nov 22 12:54:18 2017 -0500
fftw3-mpi.f03 should be regenerated when Makefile changes
commit 708b202fd593cf1002cf97dce0863e2a438e3720
Merge: 2e0cfdda 8ba34c40
Author: Matteo Frigo <[email protected]>
Date: Mon Nov 20 09:37:17 2017 -0500
Merge pull request #113 from xantares/mingw
CMake enhancements
commit 2e0cfddacacccc8a1e6e679c5e3fa81fb0219bda
Author: Matteo Frigo <[email protected]>
Date: Mon Nov 20 07:07:30 2017 -0500
Attempt to strengthen language in README.md
commit 8ba34c40fef38f661c9c413781990a7c021ba22b
Author: Michel Zou <[email protected]>
Date: Thu Nov 9 22:33:51 2017 +0100
Preliminary Fortran support
commit bd753a7679ecca2799640e7c8ced6f1f784f1b51
Author: Michel Zou <[email protected]>
Date: Mon Nov 6 23:00:29 2017 +0100
CMake MinGW fixes
Mostly fixes the SSE2 macro in config.h, otherwise minor detection fixes
commit da5372a175bcb09578359960869c76da74c9fda3
Author: Matteo Frigo <[email protected]>
Date: Tue Oct 31 20:21:17 2017 -0400
EXTRA_DIST += README-perfcnt.md
commit 1b64d9269254e9d0a0f0b088e5eceb0db92d531f
Merge: b5ccc557 2be183c3
Author: Matteo Frigo <[email protected]>
Date: Tue Oct 31 20:19:13 2017 -0400
Merge pull request #112 from alexeicolin/PR--armv7-pmccntr-counter-and-docs
Pr armv7 pmccntr counter and docs
commit 2be183c3a44d58aaa11909ba8882310fb44d598c
Author: Alexei Colin <[email protected]>
Date: Tue Oct 31 23:34:38 2017 +0000
perf counters: name ARMv8 PMCCNTR_EL0 explicitly
For consistency with the rest.
commit 504ece7f8ffc60c2a03b28d977e9825230052d48
Author: Alexei Colin <[email protected]>
Date: Tue Oct 31 23:28:48 2017 +0000
perf counters: add PMCCNTR for ARMv7 and add docs
The existing armv7 counter (CNTVCT) does need enabling from kernel mode (so
updated the configure help), and the enable bit is different from the PMU
enable bit (described in the new docs).
Tested on XU4: printed the returned counter values and they look reasonable.
commit b5ccc557fd2e57bfc955f0db9b5182e92f9cb55c
Author: Matteo Frigo <[email protected]>
Date: Sun Oct 29 08:13:04 2017 -0400
fftw-mpi.h should include <fftw3.h>, not "fftw3.h"
commit 9e3f8da20e65f1e34e677768e550086b06d77f16
Author: Matteo Frigo <[email protected]>
Date: Sun Oct 29 08:09:35 2017 -0400
NEWS: warn that cmake support is experimental and not well tested
commit 9616fb9ff1c2694f5cfa2c4a59efa96094ae6812
Author: Matteo Frigo <[email protected]>
Date: Sun Oct 29 07:48:43 2017 -0400
Update NEWS for upcoming fftw-3.3.7
commit 62edb203fc09c8c8ac2c2d5ac3299ea8d4dc7838
Author: Matteo Frigo <[email protected]>
Date: Tue Oct 10 18:58:37 2017 -0400
Ditch --enable-debug-malloc and --enable-debug-alignment
We wrote DEBUG_MALLOC in 1997 to debug memory leaks. Nowadays
DEBUG_MALLOC is just confusing. Better tools are available, and
DEBUG_MALLOC is not thread-safe and it does not respect SIMD
alignment. It confused at least one user.
In the gcc-2.SOMETHING days, gcc would allocate doubles on the stack
at 4-byte boundary (vs. 8) reducing performance by a factor of 3.
That's when we introduced --enable-debug-alignment, which is totally
obsolete by now.
commit 6ed4297e85e5ef24a18ce428b18e020d8e48413a
Author: Matteo Frigo <[email protected]>
Date: Fri Sep 29 19:27:43 2017 -0400
Use armv7a cycle counter unconditionally if HAVE_ARMV7A_CNTVCT
It looks like __ARM_ARCH_7A__ is not always defined. If the
user says HAVE_ARMV7A_CNTVCT, trust the user.
commit 2dd77382319ceb99c32b38418716783eec8adad4
Merge: 04590cb1 e09ab8ca
Author: Matteo Frigo <[email protected]>
Date: Thu Sep 21 22:42:38 2017 -0400
Merge pull request #110 from junghans/cmake
Minor cmake fixes
commit e09ab8cac98c0f206968bbd962a6f76cf26e7437
Merge: 890dac59 76427f30
Author: Christoph Junghans <[email protected]>
Date: Thu Sep 21 16:13:43 2017 -0600
Merge commit 'refs/pull/109/head' of github.com:FFTW/fftw3 into cmake
commit 04590cb11baa11bbfdebe101fa90186bbf48423c
Author: Matteo Frigo <[email protected]>
Date: Thu Sep 21 18:00:58 2017 -0400
simd-vsx.h: don't use vpermxor
It seems like gcc-6 generates incorrect code when using vpermxor
(tested with qemu emulator, so there is a chance that gcc is right and
qemu is wrong). Disable the use of vpermxor and do the simple thing
(one multiplication + one permutation).
commit 76427f30080e2cab3ca5047193ce8ffe6110f047
Author: Michel Zou <[email protected]>
Date: Thu Sep 21 23:44:15 2017 +0200
No need to list includes
commit e47e9a81c41454e5e128cd68505b38152ad60500
Author: Matteo Frigo <[email protected]>
Date: Thu Sep 21 17:13:14 2017 -0400
Remove AC_FUNC_{MALLOC,REALLOC,MMAP}
They don't do what I thought. E.g., AC_FUNC_MALLOC checks that
malloc(0) returns NULL, and defines malloc to be rpl_malloc otherwise.
We don't support rpl_malloc() and we don't care about malloc(0).
commit 5aebc02ff30af12d2dc3be6c762e821a38f56595
Author: Matteo Frigo <[email protected]>
Date: Thu Sep 21 10:09:02 2017 -0400
Dead-Code Police
commit d97394a17250d71d6a722ae64dcc3123130cf08f
Author: Matteo Frigo <[email protected]>
Date: Thu Sep 21 09:54:36 2017 -0400
Fixup fftw3-mpi.h
fftw3-mpi.h must include "fftw3.h", not "api/fftw3.h", because both
fftw3-mpi.h and fftw3.h will ultimately be installed in /usr/include.
Thus, as a special exception, mpi/Makefile.am must specify the include
path -I $(top_srcdir)/api.
commit 890dac59aca4c153e7e22add0a8de00766227670
Merge: 4ebda892 106582aa
Author: Christoph Junghans <[email protected]>
Date: Wed Sep 20 14:44:04 2017 -0600
Merge commit 'refs/pull/109/head' of github.com:FFTW/fftw3 into cmake
commit 4ebda89297b6b38632c3d91bd5a673a1bee4ffff
Author: Christoph Junghans <[email protected]>
Date: Wed Sep 20 14:05:13 2017 -0600
autotools: fix install of FFTW3ConfigVersion.cmake
commit e9a66d5f748037f9cb9c0f5b8d824d73c0425042
Author: Christoph Junghans <[email protected]>
Date: Wed Sep 20 13:29:29 2017 -0600
cmake: use GNUInstallDirs
commit 4fbb72ad294e2070d64a83b24f89a601d4f624c6
Author: Matteo Frigo <[email protected]>
Date: Wed Sep 20 13:11:55 2017 -0400
Generate codlist.c only when MAINTAINER_MODE
The user is not supposed to regenerate .c files. In addition, the
generation rule is subtly nonportable (it depends on whether or not
'#' can be escaped in Makefiles, an issue that does not appear
settled.)
commit f243f8ce48be61952527d43da222096296fdd2f9
Author: Matteo Frigo <[email protected]>
Date: Wed Sep 20 11:54:13 2017 -0400
Generate {dft,rdft}/simd/{sse,sse2,avx,...}/*.c only when MAINTAINER_MODE
Users are not supposed to generate them. Apart from that, the
generation rule uses '$*' in an explicit make rule, which is
technically a GNU extension. (Works with {open,free}bsd, but breaks
Solaris.)
commit 106582aa8f97257f53730cbac81f98e8659b084c
Author: Michel Zou <[email protected]>
Date: Wed Sep 20 15:46:51 2017 +0200
Fix includes, export target
commit 1a24e67165ba56447f814bcdc12b9d6e083f1670
Author: Matteo Frigo <[email protected]>
Date: Wed Sep 20 07:24:58 2017 -0400
Restore the ability to build out of tree.
Before 1f3704b9, we had "-I $(top_srcdir)/foo -I $(top_srcdir)/bar".
After 1f3704b9, we had no -I specification at all, but automake wants
an explicit -I $(top_srcdir) in order to build out of tree.
commit 919b795940d1e86a948a4430193dbd0853f47272
Merge: 6076339a f7a64365
Author: Matteo Frigo <[email protected]>
Date: Wed Sep 20 06:41:50 2017 -0400
Merge pull request #107 from xantares/config-mode
Config mode
commit f7a6436509d324297783eb77df54010320b062f8
Author: Michel Zou <[email protected]>
Date: Wed Sep 20 11:46:05 2017 +0200
Build bench according to BUILD_TESTS
commit 82cec28b7e14280ad11878978e23a3680bb0e983
Author: Michel Zou <[email protected]>
Date: Wed Sep 20 11:41:20 2017 +0200
Use cmake config mode
Installs FFTW3Config.cmake instead of a FindFFTW3.cmake
Also configures the pkgconfig file from cmake
commit 6076339a342b12b0d0cfd9f6d967bfa9fbf6b1b2
Author: Matteo Frigo <[email protected]>
Date: Tue Sep 19 23:38:27 2017 -0400
Fix performance regression with gcc-3.3
commit f4c37657cb32b2552c5e86f0540c0308d4f451ef
Author: Matteo Frigo <[email protected]>
Date: Tue Sep 19 23:24:08 2017 -0400
get rid of the sse2-nonportable.c hack
It was necessary to support some broken compiler 15 years ago.
Remove it and see if anybody complains.
commit 362ae5c7b8a9df76b5ec0de4433131db33bae0ae
Author: Matteo Frigo <[email protected]>
Date: Tue Sep 19 21:44:13 2017 -0400
configure.ac Police
Remove some obsolete AC_CHECK_HEADERS, add new checks suggested by
autoscan.
commit a56b5b4b149e56fce43778172a56f77d30352833
Author: Matteo Frigo <[email protected]>
Date: Tue Sep 19 21:43:45 2017 -0400
Include Police
fftw-wisdom.c was including <fftw3.h> instead of "api/fftw3.h"
commit 1f3704b9eff4b7e80ef7d775fb13f5bb8de0a5f1
Author: Matteo Frigo <[email protected]>
Date: Tue Sep 19 21:12:22 2017 -0400
Do not set include path ("-I") in Makefile.am
.[ch] files should specify their own paths explicitly. Setting paths
in the Makefile was always a bad idea, but it is totally untenable if
we are supporting cmake.
commit 6e0ae04bad14a7dd9b4928f22d7a01e887dfdc03
Author: Matteo Frigo <[email protected]>
Date: Tue Sep 19 19:31:55 2017 -0400
Fix OpenBSD build
Using $< in a non-suffix rule context is a GNUmake idiom and OpenBSD
doesn't like it.
commit 31a53789197f90d6bf349dd230ab86023e5fb83c
Author: Matteo Frigo <[email protected]>
Date: Tue Sep 19 19:24:34 2017 -0400
EXTRA_DIST += FindFFTW3.cmake.in
commit ae1a764ce88166e8e1f05a25888f105ec8f1939d
Merge: 5fdca1d9 97b273d8
Author: Matteo Frigo <[email protected]>
Date: Tue Sep 19 17:13:58 2017 -0400
Merge pull request #69 from junghans/cmake
Build und install cmake module
commit 5fdca1d9b0a0b2e6491c98f63873dcf600355e09
Merge: b521e530 66506470
Author: Matteo Frigo <[email protected]>
Date: Tue Sep 19 15:57:59 2017 -0400
Merge pull request #92 from tklauser/armv7a-cycle-counter
Fix ARMV7-A cycle counter detection
commit b521e5305a7317c1c0f1d454beb6580eaf4de1db
Author: Matteo Frigo <[email protected]>
Date: Tue Sep 19 15:51:03 2017 -0400
cmake: don't check for dlfcn.h
We don't use it
commit fc852fcdfa80fab30eac2284249686853efa2e4b
Author: Matteo Frigo <[email protected]>
Date: Tue Sep 19 15:43:02 2017 -0400
Remove ancient paranoia
In the '90s we used to run autoconf three times, just in case
(because it really didn't work the first time). "Three" was modeled
after the "sync; sync; sync; reboot" incantation of the '80s.
Hopefully we are past this by now.
commit 34738e7f669882c6abc12c2744c8acc347c91719
Author: Matteo Frigo <[email protected]>
Date: Tue Sep 19 15:32:39 2017 -0400
Flip boolean in a way that makes more sense to me
commit a2bfd859d9ad08490d02252d8a80c5994dd82747
Author: Matteo Frigo <[email protected]>
Date: Tue Sep 19 15:28:56 2017 -0400
Various CMakeLists.txt fixes
* AVX2 codelets require -mfma
* --enable-avx2 automatically enables the 128-bit avx2 codelets in
*dft/simd/avx2-128
* bump FFTW_VERSION to 3.3.7, SOVERSION to 3.5.7
* build bench always, irrespective of Threads_FOUND
commit 93ac6e1075e73c0275a9e0006fe9161c3b6fae38
Merge: a71f3dd3 d3a8d13f
Author: Matteo Frigo <[email protected]>
Date: Tue Sep 19 14:31:03 2017 -0400
Merge pull request #103 from xantares/cmake
Add user cmake support
Still needs work, but let's move forward and move this contribution into the official repository
commit d3a8d13f74361a7ffc4c48c229181a86b35e9a7d
Author: Michel Zou <[email protected]>
Date: Tue Jul 18 12:16:43 2017 +0200
Add user cmake infrastructure
commit a71f3dd355f802dc362a52674a977ff81daadf9d
Author: Matteo Frigo <[email protected]>
Date: Wed Jul 5 06:33:40 2017 -0400
Disable ISA_EXTENSION_PREFERS_FMA for now
I still don't understand whether or not avx2 should use FMA codelets.
Ryzen is faster with the non-FMA version. Haswell prefers the FMA
version.
However, I suspect that Haswell prefers FMA because of a quirk of the
micro-architecture. Haswell has two floating-point "ports". You can
issue an addition only through one "port", but you can issue two FMA
in parallel on both ports, so FMA appears to be faster. Skylake
apparently restores balance (but I haven't tried yet). Suspend
judgment for now until I gather more data.
commit f82b8c94596868897987b71a648eaa664590602a
Author: Matteo Frigo <[email protected]>
Date: Tue Jul 4 20:06:57 2017 -0400
Rationalize HAVE_FMA
Distinguish ARCH_PREFERS_FMA, for architectures that "naturally"
prefer FMA (e.g., powerpc), from ISA_EXTENSION_PREFERS_FMA, for
instruction-set extensions that favor FMA where the base architecture
does not (e.g., avx2 on x86).
Previously, --enable-avx2 would use FMA code for scalar and avx
codelets, which is wrong.
This change improves performance by a few percent on Ryzen (where FMA
doesn't really do anything), and is a wash on Haswell.
commit 0869f4e51b8b0aeb7da1b21b2683c30cd4e10a5e
Author: Steven G. Johnson <[email protected]>
Date: Tue May 9 09:14:37 2017 -0400
document that howmany ≥ 0 (closes #95)
commit 665064700b26c01c0836e4c12a5ee0eab3923858
Author: Tobias Klauser <[email protected]>
Date: Wed Mar 29 16:15:45 2017 +0200
Fix ARMV7-A cycle counter detection
Check for the correct pre-processor define HAVE_ARMV7A_CNTVCT from
config.h (instead of ARMV7A_HAS_CNTVCT) to fix the detection of the
cycle counter for ARMv7-A in the configure script (and actually use it
in the built library).
Without this fix, even the following ./configure call:
./configure --enable-neon --enable-single --enable-armv7a-cntvct \
--host=arm-linux-gnueabihf --disable-fortran \
CC="arm-linux-gnueabihf-gcc -march=armv7-a"
will emit the warning:
checking whether a cycle counter is available... no
***************************************************************
WARNING: No cycle counter found. FFTW will use ESTIMATE mode
for all plans. See the manual for more information.
***************************************************************
With this fix applied, ./configure will correctly detect the cycle
counter register:
...
checking whether a cycle counter is available... yes
...
commit cc5fc8ce7ffd77f467740554f649aab4d3f71344
Merge: 102f2fd0 950b1539
Author: Matteo Frigo <[email protected]>
Date: Tue Mar 14 07:21:45 2017 -0400
Merge pull request #91 from fornwall/android-clock-gettime
Avoid trying to use CLOCK_SGI_CYCLE on Android
commit 950b153910f7f0dde9cc20cddeee5dc9048d25b7
Author: Fredrik Fornwall <[email protected]>
Date: Mon Mar 13 23:41:35 2017 +0100
Avoid trying to use CLOCK_SGI_CYCLE on Android
The Android headers defines CLOCK_SGI_CYCLE but the call fails at
runtime as it's not implemented. Combined with getticks() not
checking the return value of clock_gettime() this causes bogus
values to be returned from getticks().
commit 102f2fd0249dca301d195b4df1b94e7b339b8c60
Author: Matteo Frigo <[email protected]>
Date: Wed Feb 22 14:59:30 2017 -0500
Compute mflops() in 64 bit precision
Old code was overflowing for N>2^32
commit 2b63fc2eaae645a5c2ef4a97c384beb2adefd58d
Author: Matteo Frigo <[email protected]>
Date: Fri Jan 27 16:06:27 2017 -0500
Update NEWS for 3.3.6-pl2
commit d2ca54234956ad8be82ba050305ccf979fd631a7
Author: Matteo Frigo <[email protected]>
Date: Fri Jan 27 16:01:42 2017 -0500
Get ready for fftw-3.3.6-pl2
commit 83092f8efbf872aefe7cfc6ee8fa43412f8e167a
Author: Matteo Frigo <[email protected]>
Date: Fri Jan 27 15:52:18 2017 -0500
Fix scrips that generate the MPI F03 interface
It turns out that the scripts were using fftw3.h from /usr/include,
not ../api, and were failing silently if fftw3.h was not installed.
This bug led to a fftw-3.3.6pl1 release with incomplete mpi/f03 header
files.
commit ab402b00f9a003daa10863b9bcdbe0810b26f541
Author: Steven G. Johnson <[email protected]>
Date: Wed Jan 25 13:03:15 2017 -0500
mention mkdist.sh and summarize the build process in README.md (closes #85)
commit fa9f00b3831177f0a9582092f21efb14e3d4601f
Author: Matteo Frigo <[email protected]>
Date: Sun Jan 22 14:51:44 2017 -0500
add __cdecl decorators to fftw3.h functions on Windows
This patch re-does 1f19d597 in a more disciplined way.
Also, Whitespace Police.
commit 42c0036e839b78a7af651d5504add62ed57f9961
Author: Matteo Frigo <[email protected]>
Date: Sun Jan 22 14:32:32 2017 -0500
Revert "add __cdecl decorators to fftw3.h functions on Windows, in case someone compiles with a non-default calling convention, as discussed in #80"
This reverts commit 1f19d59793eb629dd8228e8a41f4f8618c20a246.
The chosen syntax
FFTW_EXTRN(T) X(name)
is improper because __cdecl appertains to the declarator
and not to the return type. (As is clear, e.g., in
void (__cdecl *foo)(void)).
This forces monstrosities such as
FFTW_EXTRN(R *) X(name)
that contradict the C declaration syntax.
I'll redo the patch in a way that looks like C:
FFTW_EXTERN R *FFTW_CDECL X(name)
commit 1f19d59793eb629dd8228e8a41f4f8618c20a246
Author: Steven G. Johnson <[email protected]>
Date: Thu Jan 19 23:09:23 2017 -0500
add __cdecl decorators to fftw3.h functions on Windows, in case someone compiles with a non-default calling convention, as discussed in #80
commit 596b924b86340456771fb75559016ec2cc1b44c4
Author: Matteo Frigo <[email protected]>
Date: Mon Jan 16 10:25:37 2017 -0500
Assert that CURRENT-AGE=3
This is an attempt to prevent the 3.3.6 version screwup from occurring
again.
In any reasonable universe, libraries would have a version H and they
would specify a L such that the library is compatible with all
versions in [L..H]. Any sensible programmer would never change L, as
this breaks backward compatibility and screws users. A new version
would increase H and be done. Instead, libtool wants CURRENT=H and
AGE=H-L (a new version change two variables). Furthermore, the name
of the library in the file system is a combination of L and H-L. The
two changes of basis arent't even orthogonal. Pure madness.
This change attempts to impose sanity by asserting that that the
implied L is 3, since we never intend to break backward compatibility
with fftw-3.3, which was version L=3.
commit 6fb9cd7b6359f29ce488a5802793139971d59c6c
Author: Matteo Frigo <[email protected]>
Date: Mon Jan 16 09:06:06 2017 -0500
Release 3.3.6-pl1
commit 18b7e53c54727303703db29373e61a35fb8d5db8
Author: Matteo Frigo <[email protected]>
Date: Mon Jan 16 08:56:53 2017 -0500
Fix #82: FFTW3 3.3.6 shared version rollback
commit 64a5a288e56c6ff4462b69531f4f34d740fdc12c
Author: Matteo Frigo <[email protected]>
Date: Mon Jan 16 08:42:01 2017 -0500
Improve documentation of fftw_make_planner_thread_safe
Specifically, tell people not to use it unless they must.
commit 811a672bdaedec4363272d9f7ed5fae56086aeb1
Author: Matteo Frigo <[email protected]>
Date: Sun Jan 15 17:40:37 2017 -0500
rm obsolete simd/ directory
We switched to simd-support/ many years ago, not sure why
it is still in git.
This was not a problem when the repository was private, but
the directory probably confuses people on github.
commit 5c9bead1ea35b3a21fb33f17011d6802722ba44b
Author: Matteo Frigo <[email protected]>
Date: Sun Jan 15 07:25:40 2017 -0500
Warnings Police
* suppress dead code in genfft/simd.ml
* fix on size_t/int confusion
* fix one float*/double* confusion (should have been void* because
we only check the alignment of the pointer, not its type).
commit 41b191ee128fefe28a228ab706dfdfb65d32c2e1
Author: Matteo Frigo <[email protected]>
Date: Sun Jan 15 07:02:40 2017 -0500
Update configure.ac, NEWS for 3.3.6
commit fc3ada6e6bd790341fb5d91c6775b8afd686bad7
Author: Matteo Frigo <[email protected]>
Date: Sun Jan 15 06:40:23 2017 -0500
Ansi C Police
fftw is supposed to compile with c89/c90. Restore this property
so that I can test with gcc -ansi.
This change may seem needlessly reactionary, but in the last release I
accidentally inserted an assertion before a declaration and I broke
the Visual Studio build, so we must be careful not to use C99
constructs.
There are a few non-ANSI function calls in tests, e.g. isnan(),
drand48(), snprintf(). Since nobody has complained about those in
years, I am leaving them alone.
commit 50dacdaba79694c873965ab23d11c8ca3b94d436
Author: Matteo Frigo <[email protected]>
Date: Sat Jan 7 09:01:47 2017 -0500
Revert simd-avx.h changes from b606e3191
They didn't improve performance at all as far as I can tell,
and they ended up breaking the PGI compiler.
It is always tempting to use the fancy addsub instructions in FFTW to
do complex multiplications, but the reality is that FFTW is designed
to avoid complex multiplications in most cases (we started in the SSE
days), and thus they don't make any difference. We are better off
using the minimal possible set of AVX instructions to minimize the
chance of triggering compiler bugs.
The same statement holds for _mm256_shuffle_pd() versus
_mm256_permute_pd(): in theory the latter is better, in practice
either one is rarely used. However, SHUFFLE is older (since the SSE
days) and has a higher chance of working.
commit 5fa55dc130e18cc4b3f4d88b8a159307eecf51d0
Merge: 1637e8aa aa00ba84
Author: Matteo Frigo <[email protected]>
Date: Sun Nov 13 05:49:09 2016 -0500
Merge pull request #77 from rolandschulz/master
Fix AVX512 load+store
commit aa00ba84079a272637666c9ae941821087f712b8
Author: Roland Schulz <[email protected]>
Date: Sat Nov 12 20:52:49 2016 -0800
Fix AVX512 load+store
FFTW alignment is only 16 bytes. AVX512 requires 64 bytes.
Thus unaligned load/store is required. AVX256 does the same.
commit 1637e8aace6e91d67837901b5a4cbbc87c42aca9
Merge: 3e7ee221 a538bf2c
Author: Matteo Frigo <[email protected]>
Date: Thu Nov 3 11:24:44 2016 -0400
Merge pull request #76 from forandom/patch-2
Update simd-vsx.h to support building with IBM XLC
commit a538bf2c4a17ec509f2cec37bffe48874702c671
Author: forandom <[email protected]>
Date: Thu Nov 3 23:06:17 2016 +0800
Update simd-vsx.h to support building with IBM XLC
defined(__POWER8_VECTOR__) && defined(__GNUC__) && defined(__LITTLE_ENDIAN__) is true for IBM XLC compiler for which we should use the intrinsic __vpermxor instead of __builtin_crypto_vpermxor.
commit 3e7ee2211ae1bd5e76901bbe1bcca67b31f84ccb
Author: Matteo Frigo <[email protected]>
Date: Sat Sep 24 06:39:01 2016 -0400
Do not run programs at configure time, ever.
configure was running a program to detect the ARM cycle counter,
thus preventing cross-compiling. Sorry about that.
commit fee0f966b2d3fae18019dd03a9bae338b4108d42
Merge: 3a3173b0 cca0c6e5
Author: Matteo Frigo <[email protected]>
Date: Fri Sep 9 06:49:23 2016 -0400
Merge pull request #72 from tkelman/patch-1
#include <intrin.h> in threads.c for windows build
commit cca0c6e5a8c717df10f380411709f3360ceea6e9
Author: Tony Kelman <[email protected]>
Date: Fri Sep 9 03:24:30 2016 -0700
#include <intrin.h> in threads.c for windows build
otherwise an i686-w64-mingw32 cross compile is giving
```
libtool: link: i686-w64-mingw32-gcc -march=pentium4 -m32 -std=gnu99 -shared -Wl,--whole-archive kernel/.libs/libkernel.a dft/.libs/libdft.a dft/scalar/.libs/libdft_scalar.a dft/scalar/codelets/.libs/libdft_scalar_codelets.a rdft/.libs/librdft.a rdft/scalar/.libs/librdft_scalar.a rdft/scalar/r2cf/.libs/librdft_scalar_r2cf.a rdft/scalar/r2cb/.libs/librdft_scalar_r2cb.a rdft/scalar/r2r/.libs/librdft_scalar_r2r.a reodft/.libs/libreodft.a api/.libs/libapi.a simd-support/.libs/libsimd_support.a simd-support/.libs/libsimd_sse2_nonportable.a dft/simd/avx/.libs/libdft_avx_codelets.a rdft/simd/avx/.libs/librdft_avx_codelets.a threads/.libs/libfftw3f_threads.a -Wl,--no-whole-archive -march=pentium4 -m32 -O3 -mtune=native -malign-double -Wl,--stack -Wl,8388608 -o .libs/libfftw3f-3.dll -Wl,--enable-auto-image-base -Xlinker --out-implib -Xlinker .libs/libfftw3f.dll.a
libtool: link: i686-w64-mingw32-gcc -march=pentium4 -m32 -std=gnu99 -shared -Wl,--whole-archive kernel/.libs/libkernel.a dft/.libs/libdft.a dft/scalar/.libs/libdft_scalar.a dft/scalar/codelets/.libs/libdft_scalar_codelets.a rdft/.libs/librdft.a rdft/scalar/.libs/librdft_scalar.a rdft/scalar/r2cf/.libs/librdft_scalar_r2cf.a rdft/scalar/r2cb/.libs/librdft_scalar_r2cb.a rdft/scalar/r2r/.libs/librdft_scalar_r2r.a reodft/.libs/libreodft.a api/.libs/libapi.a simd-support/.libs/libsimd_support.a simd-support/.libs/libsimd_sse2_nonportable.a dft/simd/avx/.libs/libdft_avx_codelets.a rdft/simd/avx/.libs/librdft_avx_codelets.a threads/.libs/libfftw3_threads.a -Wl,--no-whole-archive -march=pentium4 -m32 -O3 -mtune=native -malign-double -Wl,--stack -Wl,8388608 -o .libs/libfftw3-3.dll -Wl,--enable-auto-image-base -Xlinker --out-implib -Xlinker .libs/libfftw3.dll.a
threads/.libs/libfftw3_threads.a(libfftw3_threads_la-threads.o):threads.c:(.text+0x121): undefined reference to `_mm_pause'
threads/.libs/libfftw3_threads.a(libfftw3_threads_la-threads.o):threads.c:(.text+0x581): undefined reference to `_mm_pause'
collect2: error: ld returned 1 exit status
threads/.libs/libfftw3f_threads.a(libfftw3f_threads_la-threads.o):threads.c:(.text+0x121): undefined reference to `_mm_pause'
threads/.libs/libfftw3f_threads.a(libfftw3f_threads_la-threads.o):threads.c:(.text+0x581): undefined reference to `_mm_pause'
collect2: error: ld returned 1 exit status
make[4]: *** [Makefile:627: libfftw3f.la] Error 1
make[4]: *** [Makefile:627: libfftw3.la] Error 1
make[3]: *** [Makefile:672: all-recursive] Error 1
make[2]: *** [Makefile:536: all] Error 2
make[3]: *** [Makefile:672: all-recursive] Error 1
make[1]: *** [/home/Tony/julia32/deps/fftw.mk:46: scratch/fftw-3.3.5-single/build-compiled] Error 2
make[1]: *** Waiting for unfinished jobs....
make[2]: *** [Makefile:536: all] Error 2
make[1]: *** [/home/Tony/julia32/deps/fftw.mk:46: scratch/fftw-3.3.5-double/build-compiled] Error 2
make: *** [Makefile:81: julia-deps] Error 2
```
commit 97b273d87dcc797e688709e207f119dd4dfca015
Author: Christoph Junghans <[email protected]>
Date: Wed Aug 31 14:24:05 2016 -0600
Build und install cmake module
commit 3a3173b018f30d03df5f3166d459888f2669fe25
Author: Matteo Frigo <[email protected]>
Date: Wed Aug 31 06:14:51 2016 -0400
C++ compatibility
Although FFTW is a C program, we try to make it compilable by a C++
compiler as well. Implicit cast void * ==> double * is not allowed
in C++.
commit 5fd9609eaed60360ce84d98add5d9548093e0bdc
Author: Matteo Frigo <[email protected]>
Date: Fri Aug 12 04:24:52 2016 -0400
Updated NEWS
commit 402d2508fe970770d9316d9c83f21d6fc268ba12
Author: Matteo Frigo <[email protected]>
Date: Fri Aug 12 04:21:33 2016 -0400
Fix race condition when destroying a plan.
More generally, this patch calls the planner hooks when destroying a
plan. The intended usage is that the hooks do in fact acquire a lock.
commit 432835f2cd37d2cb8b9528ac8ef983b3b38738f2
Author: Matteo Frigo <[email protected]>
Date: Tue Aug 9 05:29:39 2016 -0400
MSVC fixes by Carsten Steger
* don't mix declarations and statements, stick to ANSI C
* suppress some warnings with Intel cc
* undefined variable in x86-cpuid.h when
(_MSC_VER > 1500) || (_MSC_VER == 1500 & _MSC_FULL_VER >= 150030729)
commit c018cbe430fd6b2af31d594c27a0aaf711292567
Author: Matteo Frigo <[email protected]>
Date: Thu Aug 4 06:36:29 2016 -0400
Fix SIMD autodetection on amd64 when (_MSC_VER > 1500)
commit d5055c9ae2e60f191f6cc2e8b5200fd06dbdb6be
Author: Matteo Frigo <[email protected]>
Date: Sun Jul 31 13:42:00 2016 -0400
revise README.md language
commit 0af8d8b9eea0750add8be0e6dec18841ee61424e
Author: Matteo Frigo <[email protected]>
Date: Sun Jul 31 13:39:49 2016 -0400
revise README.md language
commit 0d026e09f9b514cb86bbc7977ad0a03b664b95de
Author: Matteo Frigo <[email protected]>
Date: Sun Jul 31 13:37:09 2016 -0400
Attempt to tell users to download official tarballs from fftw.org instead of github
commit b405994456f9a87f2170ba19536d4c4d8278682f
Author: Matteo Frigo <[email protected]>
Date: Sat Jul 30 16:33:22 2016 -0400
update AUTHORS
commit 4d0c1894fb37c61b0f0a42b50afd435d226f6b9e
Author: Matteo Frigo <[email protected]>
Date: Sat Jul 30 15:18:06 2016 -0400
Fixes for Windows cross-compilation
These days mingw by default produces binaries that depend on
libgcc-sjlj-1.dll, which defeats the whole historical point of mingw
(produce vanilla win32 binaries with no GNU stuff).
Add a hack to link with -static-libgcc, which avoids the problem.
commit a17d44eeb3100780ba106a22f497d47a43be7642
Author: Matteo Frigo <[email protected]>
Date: Sat Jul 30 11:39:09 2016 -0400
Misc fixes.
* sed s/avx[_- ]128[-_ ]fma/avx-128-fma
* avoid some signed/unsigned casts
commit f3688be112ed0099b4c57970db74c08373f3604d
Author: Matteo Frigo <[email protected]>
Date: Sat Jul 30 10:52:53 2016 -0400
Fix SIMD autodetection
* AVX was not testing for OSXSAVE support
* AVX2 was broken (issuing XGETBV without checking for its presence---failing
on atom)
* AVX512 was broken in the same way as AVX2, I have guessed a fix but
I have no way to test it.
commit 7fce2ae37f8338bd7e021b1a406c75b213c31c77
Author: Matteo Frigo <[email protected]>
Date: Fri Jul 29 07:48:10 2016 -0400
document fftw_make_planner_thread_safe()
commit 6167b92e3362f2d116274daa561c0d788fb670d4
Author: Matteo Frigo <[email protected]>
Date: Fri Jul 29 07:28:03 2016 -0400
rm README-bench
It appears in tests/README
commit cc9640cbbaa70e6645a0ea46be0508268905c2ba
Author: Matteo Frigo <[email protected]>
Date: Fri Jul 29 07:27:25 2016 -0400
Add README-bench
commit d82fe4f3e06bdbf92b09324e36f4d477bc5fe376
Author: Matteo Frigo <[email protected]>
Date: Fri Jul 29 07:25:00 2016 -0400
Do not enable avx128-fma unless the user asks for it.
Adding SIMD instruction sets automatically is user-hostile behavior.
Also, update the manual to reflect the new SIMD support
commit dc32329871d304de8d95ad290973844dfbc6101f
Author: Matteo Frigo <[email protected]>
Date: Fri Jul 29 07:00:55 2016 -0400
Update NEWS for 3.3.5