forked from pytorch/pytorch.github.io
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathfeed.xml
1824 lines (1372 loc) · 304 KB
/
feed.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.7.3">Jekyll</generator><link href="https://pytorch.org/feed.xml" rel="self" type="application/atom+xml" /><link href="https://pytorch.org/" rel="alternate" type="text/html" /><updated>2018-10-30T07:52:48-07:00</updated><id>https://pytorch.org/</id><title type="html">PyTorch Website</title><subtitle>Scientific Computing...</subtitle><author><name>Facebook</name></author><entry><title type="html">The road to 1.0: production ready PyTorch</title><link href="https://pytorch.org/blog/the-road-to-1_0/" rel="alternate" type="text/html" title="The road to 1.0: production ready PyTorch" /><published>2018-05-02T00:00:00-07:00</published><updated>2018-05-02T00:00:00-07:00</updated><id>https://pytorch.org/blog/the-road-to-1_0</id><content type="html" xml:base="https://pytorch.org/blog/the-road-to-1_0/"><p>We would like to give you a preview of the roadmap for PyTorch 1.0 , the next release of PyTorch. Over the last year, we’ve had 0.2, 0.3 and 0.4 transform PyTorch from a [Torch+Chainer]-like interface into something cleaner, adding double-backwards, numpy-like functions, advanced indexing and removing Variable boilerplate. At this time, we’re confident that the API is in a reasonable and stable state to confidently release a 1.0.</p>
<p>However, 1.0 isn’t just about stability of the interface.</p>
<p>One of PyTorch’s biggest strengths is its first-class Python integration, imperative style, simplicity of the API and options. These are aspects that make PyTorch good for research and hackability.</p>
<p>One of its biggest downsides has been production-support. What we mean by production-support is the countless things one has to do to models to run them efficiently at massive scale:</p>
<ul>
<li>exporting to C++-only runtimes for use in larger projects</li>
<li>optimizing mobile systems on iPhone, Android, Qualcomm and other systems</li>
<li>using more efficient data layouts and performing kernel fusion to do faster inference (saving 10% of speed or memory at scale is a big win)</li>
<li>quantized inference (such as 8-bit inference)</li>
</ul>
<p>Startups, large companies and anyone who wants to build a product around PyTorch have asked for production support. At Facebook (the largest stakeholder for PyTorch) we have Caffe2, which has been the production-ready platform, running in our datacenters and shipping to more than 1 billion phones spanning eight generations of iPhones and six generations of Android CPU architectures. It has server-optimized inference on Intel / ARM, TensorRT support, and all the necessary bits for production. Considering all this value locked-in to a platform that the PyTorch team works quite closely with, <strong>we decided to marry PyTorch and Caffe2 which gives the production-level readiness for PyTorch</strong>.</p>
<p>Supporting production features without adding usability issues for our researchers and end-users needs creative solutions.</p>
<h2 id="production--pain-for-researchers">Production != Pain for researchers</h2>
<p>Adding production capabilities involves increasing the API complexity and number of configurable options for models. One configures memory-layouts (NCHW vs NHWC vs N,C/32,H,W,32, each providing different performance characteristics), quantization (8-bit? 3-bit?), fusion of low-level kernels (you used a Conv + BatchNorm + ReLU, let’s fuse them into a single kernel), separate backend options (MKLDNN backend for a few layers and NNPACK backend for other layers), etc.</p>
<p>PyTorch’s central goal is to provide a great platform for research and hackability. So, while we add all these optimizations, we’ve been working with a hard design constraint to never trade these off against usability.</p>
<p>To pull this off, we are introducing <code class="highlighter-rouge">torch.jit</code>, a just-in-time (JIT) compiler that at runtime takes your PyTorch models and rewrites them to run at production-efficiency. The JIT compiler can also export your model to run in a C++-only runtime based on Caffe2 bits.</p>
<blockquote>
<p>In 1.0, your code continues to work as-is, we’re not making any big changes to the existing API.</p>
</blockquote>
<p>Making your model production-ready is an opt-in annotation, which uses the <code class="highlighter-rouge">torch.jit</code> compiler to export your model to a Python-less environment, and improving its performance. Let’s walk through the JIT compiler in detail.</p>
<h2 id="torchjit-a-jit-compiler-for-your-models"><code class="highlighter-rouge">torch.jit</code>: A JIT-compiler for your models</h2>
<p>We strongly believe that it’s hard to match the productivity you get from specifying your models directly as idiomatic Python code. This is what makes PyTorch so flexible, but it also means that PyTorch pretty much never knows the operation you’ll run next. This however is a big blocker for export/productionization and heavyweight automatic performance optimizations because they need full upfront knowledge of how the computation will look before it even gets executed.</p>
<p>We provide two opt-in ways of recovering this information from your code, one based on tracing native python code and one based on compiling a subset of the python language annotated into a python-free intermediate representation. After thorough discussions we concluded that they’re both going to be useful in different contexts, and as such you will be able to mix and match them freely.</p>
<h2 id="tracing-mode">Tracing Mode</h2>
<p>The PyTorch tracer, <code class="highlighter-rouge">torch.jit.trace</code>, is a function that records all the native PyTorch operations performed in a code region, along with the data dependencies between them. In fact, PyTorch has had a tracer since 0.3, which has been used for exporting models through ONNX. What changes now, is that you no longer necessarily need to take the trace and run it elsewhere - PyTorch can re-execute it for you, using a carefully designed high-performance C++ runtime. As we develop PyTorch 1.0 this runtime will integrate all the optimizations and hardware integrations that Caffe2 provides.</p>
<p>The biggest benefit of this approach is that it doesn’t really care how your Python code is structured — you can trace through generators or coroutines, modules or pure functions. Since we only record native PyTorch operators, these details have no effect on the trace recorded. This behavior, however, is a double-edged sword. For example, if you have a loop in your model, it will get unrolled in the trace, inserting a copy of the loop body for as many times as the loop ran. This opens up opportunities for zero-cost abstraction (e.g. you can loop over modules, and the actual trace will be loop-overhead free!), but on the other hand this will also affect data dependent loops (think of e.g. processing sequences of varying lengths), effectively hard-coding a single length into the trace.</p>
<p>For networks that do not contain loops and if statements, tracing is non-invasive and is robust enough to handle a wide variety of coding styles. This code example illustrates what tracing looks like:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># This will run your nn.Module or regular Python function with the example</span>
<span class="c"># input that you provided. The returned callable can be used to re-execute</span>
<span class="c"># all operations that happened during the example run, but it will no longer</span>
<span class="c"># use the Python interpreter.</span>
<span class="kn">from</span> <span class="nn">torch.jit</span> <span class="kn">import</span> <span class="n">trace</span>
<span class="n">traced_model</span> <span class="o">=</span> <span class="n">trace</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">example_input</span><span class="o">=</span><span class="nb">input</span><span class="p">)</span>
<span class="n">traced_fn</span> <span class="o">=</span> <span class="n">trace</span><span class="p">(</span><span class="n">fn</span><span class="p">,</span> <span class="n">example_input</span><span class="o">=</span><span class="nb">input</span><span class="p">)</span>
<span class="c"># The training loop doesn't change. Traced model behaves exactly like an</span>
<span class="c"># nn.Module, except that you can't edit what it does or change its attributes.</span>
<span class="c"># Think of it as a "frozen module".</span>
<span class="k">for</span> <span class="nb">input</span><span class="p">,</span> <span class="n">target</span> <span class="ow">in</span> <span class="n">data_loader</span><span class="p">:</span>
<span class="n">loss</span> <span class="o">=</span> <span class="n">loss_fn</span><span class="p">(</span><span class="n">traced_model</span><span class="p">(</span><span class="nb">input</span><span class="p">),</span> <span class="n">target</span><span class="p">)</span>
</code></pre></div></div>
<h2 id="script-mode">Script Mode</h2>
<p>Tracing mode is a great way to minimize the impact on your code, but we’re also very excited about the models that fundamentally make use of control flow such as RNNs. Our solution to this is a scripting mode.</p>
<p>In this case you write out a regular Python function, except that you can no longer use certain more complicated language features. Once you isolated the desired functionality, you let us know that you’d like the function to get compiled by decorating it with an <code class="highlighter-rouge">@script</code> decorator. This annotation will transform your python function directly into our high-performance C++ runtime. This lets us recover all the PyTorch operations along with loops and conditionals. They will be embedded into our internal representation of this function, and will be accounted for every time this function is run.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">torch.jit</span> <span class="kn">import</span> <span class="n">script</span>
<span class="nd">@script</span>
<span class="k">def</span> <span class="nf">rnn_loop</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
<span class="n">hidden</span> <span class="o">=</span> <span class="bp">None</span>
<span class="k">for</span> <span class="n">x_t</span> <span class="ow">in</span> <span class="n">x</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="mi">1</span><span class="p">):</span>
<span class="n">x</span><span class="p">,</span> <span class="n">hidden</span> <span class="o">=</span> <span class="n">model</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">hidden</span><span class="p">)</span>
<span class="k">return</span> <span class="n">x</span>
</code></pre></div></div>
<h2 id="optimization-and-export">Optimization and Export</h2>
<p>Regardless of whether you use tracing or <code class="highlighter-rouge">@script</code>, the result is a python-free representation of your model, which can be used to optimize the model or to export the model from python for use in production environments.</p>
<p>Extracting bigger segments of the model into an intermediate representation makes it possible to do sophisticated whole-program optimizations and to offload computation to specialized AI accelerators which operate on graphs of computation. We have already been developing the beginnings of these optimizations, including passes that fuse GPU operations together to improve the performance of smaller RNN models.</p>
<p>It also allows us to use existing high-performance backends available in Caffe2 today to run the model efficiently. Additionally, @script functions (and modules!) can be fully exported to ONNX in a way that retains their dynamic nature, such that you can easily run them in a Python-free environment using the model executors from Caffe2 or by transferring the model to any other framework supporting ONNX.</p>
<h2 id="usability">Usability</h2>
<p><strong>We care deeply about maintaining our current level of usability and we know that execution of the code not directly in Python leads to harder debugging, but this is something that we think about a lot, and we’re making sure that you’re not getting locked in to a completely different programming language.</strong></p>
<p>First, we follow the principle of pay for what you use — if you don’t need to optimize or export your model, you do not have to use these new features and won’t see any downsides. Furthermore, use of traced or @script modules/functions can be done incrementally. For instance, all of these behaviors are allowed: You can trace part of your model and use the trace in a larger non-traced model. You can use tracing for 90% of your model, and use @script for the one sub-module that actually has some control flow in it. You can write a function using @script and have it call a native python function. If something appears incorrect in an @script function, you can remove the annotation and the code will execute in native python where it is easy to debug using your favorite tools and methods. Think of tracing and @script like type annotations using MyPy or TypeScript — each additional annotation can be tested incrementally, and none are required until you want to optimize or productionize.</p>
<p>Most importantly, these modes will be built into the core of PyTorch so that mixing and matching them with your existing code can happen seamlessly.</p>
<p><em>Note: The name JIT for these components is a bit of a misnomer and comes from historical reasons. The tracing/function execution in PyTorch started out as an optimizing JIT compiler that generated fused CUDA kernels but then grew to encompass optimization, @script, and export. When it is ready for release we will likely rename this functionality to the hybrid frontend, but we wanted to present it here as it is named in the code so that you can follow along as we develop it.</em></p>
<h2 id="other-changes-and-improvements">Other changes and improvements</h2>
<p>Production support is the big feature for 1.0, but we will continue optimizing and fixing other parts of PyTorch as course of the standard release process.</p>
<p>On the backend side of things, PyTorch will see some changes, which might affect user-written C and C++ extensions. We are replacing (or refactoring) the backend ATen library to incorporate features and optimizations from Caffe2.</p>
<h2 id="last-words">Last Words</h2>
<p>We aim to release 1.0 some time during the summer. You can follow-along our progress on the <a href="https://github.com/pytorch/pytorch/pulls">Pull Requests</a> page.</p>
<p>You can read this from the perspective of the Caffe2 project at: <a href="https://caffe2.ai/blog/2018/05/02/Caffe2_PyTorch_1_0.html">https://caffe2.ai/blog/2018/05/02/Caffe2_PyTorch_1_0.html</a></p></content><author><name>The PyTorch Team</name></author><summary type="html">We would like to give you a preview of the roadmap for PyTorch 1.0 , the next release of PyTorch. Over the last year, we’ve had 0.2, 0.3 and 0.4 transform PyTorch from a [Torch+Chainer]-like interface into something cleaner, adding double-backwards, numpy-like functions, advanced indexing and removing Variable boilerplate. At this time, we’re confident that the API is in a reasonable and stable state to confidently release a 1.0.</summary></entry><entry><title type="html">PyTorch 0.4.0 Migration Guide</title><link href="https://pytorch.org/blog/pytorch-0_4_0-migration-guide/" rel="alternate" type="text/html" title="PyTorch 0.4.0 Migration Guide" /><published>2018-04-22T00:00:00-07:00</published><updated>2018-04-22T00:00:00-07:00</updated><id>https://pytorch.org/blog/pytorch-0_4_0-migration-guide</id><content type="html" xml:base="https://pytorch.org/blog/pytorch-0_4_0-migration-guide/"><p>Welcome to the migration guide for PyTorch 0.4.0. In this release we introduced <a href="https://github.com/pytorch/pytorch/releases/tag/v0.4.0">many exciting new features and critical bug fixes</a>, with the goal of providing users a better and cleaner interface. In this guide, we will cover the most important changes in migrating existing code from previous versions:</p>
<ul>
<li><code class="highlighter-rouge">Tensors</code> and <code class="highlighter-rouge">Variables</code> have merged</li>
<li>Support for 0-dimensional (scalar) <code class="highlighter-rouge">Tensors</code></li>
<li>Deprecation of the <code class="highlighter-rouge">volatile</code> flag</li>
<li><code class="highlighter-rouge">dtypes</code>, <code class="highlighter-rouge">devices</code>, and Numpy-style <code class="highlighter-rouge">Tensor</code> creation functions</li>
<li>Writing device-agnostic code</li>
<li>New edge-case constraints on names of submodules, parameters, and buffers in <code class="highlighter-rouge">nn.Module</code></li>
</ul>
<h2 id="merging-tensor-and-variable-and-classes">Merging <a href="http://pytorch.org/docs/0.4.0/tensors.html"><code class="highlighter-rouge">Tensor</code></a> and <code class="highlighter-rouge">Variable</code> and classes</h2>
<p><a href="http://pytorch.org/docs/0.4.0/tensors.html"><code class="highlighter-rouge">torch.Tensor</code></a> and <code class="highlighter-rouge">torch.autograd.Variable</code> are now the same class. More precisely, <a href="http://pytorch.org/docs/0.4.0/tensors.html"><code class="highlighter-rouge">torch.Tensor</code></a> is capable of tracking history and behaves like the old <code class="highlighter-rouge">Variable</code>; <code class="highlighter-rouge">Variable</code> wrapping continues to work as before but returns an object of type <a href="http://pytorch.org/docs/0.4.0/tensors.html"><code class="highlighter-rouge">torch.Tensor</code></a>. This means that you don’t need the <code class="highlighter-rouge">Variable</code> wrapper everywhere in your code anymore.</p>
<h3 id="the-type-of-a-tensor-has-changed">The <code class="highlighter-rouge">type()</code> of a <a href="http://pytorch.org/docs/0.4.0/tensors.html"><code class="highlighter-rouge">Tensor</code></a> has changed</h3>
<p>Note also that the <code class="highlighter-rouge">type()</code> of a Tensor no longer reflects the data type. Use <code class="highlighter-rouge">isinstance()</code> or <code class="highlighter-rouge">x.type()</code>instead:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="n">x</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">DoubleTensor</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">])</span>
<span class="o">&gt;&gt;&gt;</span> <span class="k">print</span><span class="p">(</span><span class="nb">type</span><span class="p">(</span><span class="n">x</span><span class="p">))</span> <span class="c"># was torch.DoubleTensor</span>
<span class="s">"&lt;class 'torch.Tensor'&gt;"</span>
<span class="o">&gt;&gt;&gt;</span> <span class="k">print</span><span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="nb">type</span><span class="p">())</span> <span class="c"># OK: 'torch.DoubleTensor'</span>
<span class="s">'torch.DoubleTensor'</span>
<span class="o">&gt;&gt;&gt;</span> <span class="k">print</span><span class="p">(</span><span class="nb">isinstance</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">torch</span><span class="o">.</span><span class="n">DoubleTensor</span><span class="p">))</span> <span class="c"># OK: True</span>
<span class="bp">True</span>
</code></pre></div></div>
<h3 id="when-does-autograd-start-tracking-history-now">When does <a href="http://pytorch.org/docs/0.4.0/autograd.html"><code class="highlighter-rouge">autograd</code></a> start tracking history now?</h3>
<p><code class="highlighter-rouge">requires_grad</code>, the central flag for <a href="http://pytorch.org/docs/0.4.0/autograd.html"><code class="highlighter-rouge">autograd</code></a>, is now an attribute on <code class="highlighter-rouge">Tensors</code>. The same rules previously used for <code class="highlighter-rouge">Variables</code> applies to <code class="highlighter-rouge">Tensors</code>; <a href="http://pytorch.org/docs/0.4.0/autograd.html"><code class="highlighter-rouge">autograd</code></a> starts tracking history when any input <code class="highlighter-rouge">Tensor</code> of an operation has <code class="highlighter-rouge">requires_grad=True</code>. For example,</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="n">x</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">ones</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="c"># create a tensor with requires_grad=False (default)</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">x</span><span class="o">.</span><span class="n">requires_grad</span>
<span class="bp">False</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">y</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">ones</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="c"># another tensor with requires_grad=False</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">z</span> <span class="o">=</span> <span class="n">x</span> <span class="o">+</span> <span class="n">y</span>
<span class="o">&gt;&gt;&gt;</span> <span class="c"># both inputs have requires_grad=False. so does the output</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">z</span><span class="o">.</span><span class="n">requires_grad</span>
<span class="bp">False</span>
<span class="o">&gt;&gt;&gt;</span> <span class="c"># then autograd won't track this computation. let's verify!</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">z</span><span class="o">.</span><span class="n">backward</span><span class="p">()</span>
<span class="nb">RuntimeError</span><span class="p">:</span> <span class="n">element</span> <span class="mi">0</span> <span class="n">of</span> <span class="n">tensors</span> <span class="n">does</span> <span class="ow">not</span> <span class="n">require</span> <span class="n">grad</span> <span class="ow">and</span> <span class="n">does</span> <span class="ow">not</span> <span class="n">have</span> <span class="n">a</span> <span class="n">grad_fn</span>
<span class="o">&gt;&gt;&gt;</span>
<span class="o">&gt;&gt;&gt;</span> <span class="c"># now create a tensor with requires_grad=True</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">w</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">ones</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">requires_grad</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">w</span><span class="o">.</span><span class="n">requires_grad</span>
<span class="bp">True</span>
<span class="o">&gt;&gt;&gt;</span> <span class="c"># add to the previous result that has require_grad=False</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">total</span> <span class="o">=</span> <span class="n">w</span> <span class="o">+</span> <span class="n">z</span>
<span class="o">&gt;&gt;&gt;</span> <span class="c"># the total sum now requires grad!</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">total</span><span class="o">.</span><span class="n">requires_grad</span>
<span class="bp">True</span>
<span class="o">&gt;&gt;&gt;</span> <span class="c"># autograd can compute the gradients as well</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">total</span><span class="o">.</span><span class="n">backward</span><span class="p">()</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">w</span><span class="o">.</span><span class="n">grad</span>
<span class="n">tensor</span><span class="p">([</span> <span class="mf">1.</span><span class="p">])</span>
<span class="o">&gt;&gt;&gt;</span> <span class="c"># and no computation is wasted to compute gradients for x, y and z, which don't require grad</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">z</span><span class="o">.</span><span class="n">grad</span> <span class="o">==</span> <span class="n">x</span><span class="o">.</span><span class="n">grad</span> <span class="o">==</span> <span class="n">y</span><span class="o">.</span><span class="n">grad</span> <span class="o">==</span> <span class="bp">None</span>
<span class="bp">True</span>
</code></pre></div></div>
<h4 id="manipulating-requires_grad-flag">Manipulating <code class="highlighter-rouge">requires_grad</code> flag</h4>
<p>Other than directly setting the attribute, you can change this flag <code class="highlighter-rouge">in-place</code> using <a href="http://pytorch.org/docs/0.4.0/tensors.html#torch.Tensor.requires_grad_"><code class="highlighter-rouge">my_tensor.requires_grad_()</code></a>, or, as in the above example, at creation time by passing it in as an argument (default is <code class="highlighter-rouge">False</code>), e.g.,</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="n">existing_tensor</span><span class="o">.</span><span class="n">requires_grad_</span><span class="p">()</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">existing_tensor</span><span class="o">.</span><span class="n">requires_grad</span>
<span class="bp">True</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">my_tensor</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="n">requires_grad</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">my_tensor</span><span class="o">.</span><span class="n">requires_grad</span>
<span class="bp">True</span>
</code></pre></div></div>
<h3 id="what-about-data">What about <code class="highlighter-rouge">.data?</code></h3>
<p><code class="highlighter-rouge">.data</code> was the primary way to get the underlying <code class="highlighter-rouge">Tensor</code> from a <code class="highlighter-rouge">Variable</code>. After this merge, calling <code class="highlighter-rouge">y = x.data</code> still has similar semantics. So <code class="highlighter-rouge">y</code> will be a <code class="highlighter-rouge">Tensor</code> that shares the same data with <code class="highlighter-rouge">x</code>, is unrelated with the computation history of <code class="highlighter-rouge">x</code>, and has <code class="highlighter-rouge">requires_grad=False</code>.</p>
<p>However, <code class="highlighter-rouge">.data</code> can be unsafe in some cases. Any changes on <code class="highlighter-rouge">x.data</code> wouldn’t be tracked by <code class="highlighter-rouge">autograd</code>, and the computed gradients would be incorrect if <code class="highlighter-rouge">x</code> is needed in a backward pass. A safer alternative is to use <a href="http://pytorch.org/docs/master/autograd.html#torch.Tensor.detach"><code class="highlighter-rouge">x.detach()</code></a>, which also returns a <code class="highlighter-rouge">Tensor</code> that shares data with <code class="highlighter-rouge">requires_grad=False</code>, but will have its in-place changes reported by <code class="highlighter-rouge">autograd</code> if <code class="highlighter-rouge">x</code> is needed in backward.</p>
<p>Here is an example of the difference between <code class="highlighter-rouge">.data</code> and <code class="highlighter-rouge">x.detach()</code> (and why we recommend using <code class="highlighter-rouge">detach</code> in general).</p>
<p>If you use <code class="highlighter-rouge">Tensor.detach()</code>, the gradient computation is guaranteed to be correct.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="n">a</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">tensor</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mf">3.</span><span class="p">],</span> <span class="n">requires_grad</span> <span class="o">=</span> <span class="bp">True</span><span class="p">)</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">out</span> <span class="o">=</span> <span class="n">a</span><span class="o">.</span><span class="n">sigmoid</span><span class="p">()</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">c</span> <span class="o">=</span> <span class="n">out</span><span class="o">.</span><span class="n">detach</span><span class="p">()</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">c</span><span class="o">.</span><span class="n">zero_</span><span class="p">()</span>
<span class="n">tensor</span><span class="p">([</span> <span class="mf">0.</span><span class="p">,</span> <span class="mf">0.</span><span class="p">,</span> <span class="mf">0.</span><span class="p">])</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">out</span> <span class="c"># modified by c.zero_() !!</span>
<span class="n">tensor</span><span class="p">([</span> <span class="mf">0.</span><span class="p">,</span> <span class="mf">0.</span><span class="p">,</span> <span class="mf">0.</span><span class="p">])</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">out</span><span class="o">.</span><span class="nb">sum</span><span class="p">()</span><span class="o">.</span><span class="n">backward</span><span class="p">()</span> <span class="c"># Requires the original value of out, but that was overwritten by c.zero_()</span>
<span class="nb">RuntimeError</span><span class="p">:</span> <span class="n">one</span> <span class="n">of</span> <span class="n">the</span> <span class="n">variables</span> <span class="n">needed</span> <span class="k">for</span> <span class="n">gradient</span> <span class="n">computation</span> <span class="n">has</span> <span class="n">been</span> <span class="n">modified</span> <span class="n">by</span> <span class="n">an</span>
</code></pre></div></div>
<p>However, using <code class="highlighter-rouge">Tensor.data</code> can be unsafe and can easly result in incorrect gradients when a tensor is required for gradient computation but modified in-place.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="n">a</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">tensor</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mf">3.</span><span class="p">],</span> <span class="n">requires_grad</span> <span class="o">=</span> <span class="bp">True</span><span class="p">)</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">out</span> <span class="o">=</span> <span class="n">a</span><span class="o">.</span><span class="n">sigmoid</span><span class="p">()</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">c</span> <span class="o">=</span> <span class="n">out</span><span class="o">.</span><span class="n">data</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">c</span><span class="o">.</span><span class="n">zero_</span><span class="p">()</span>
<span class="n">tensor</span><span class="p">([</span> <span class="mf">0.</span><span class="p">,</span> <span class="mf">0.</span><span class="p">,</span> <span class="mf">0.</span><span class="p">])</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">out</span> <span class="c"># out was modified by c.zero_()</span>
<span class="n">tensor</span><span class="p">([</span> <span class="mf">0.</span><span class="p">,</span> <span class="mf">0.</span><span class="p">,</span> <span class="mf">0.</span><span class="p">])</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">out</span><span class="o">.</span><span class="nb">sum</span><span class="p">()</span><span class="o">.</span><span class="n">backward</span><span class="p">()</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">a</span><span class="o">.</span><span class="n">grad</span> <span class="c"># The result is very, very wrong because `out` changed!</span>
<span class="n">tensor</span><span class="p">([</span> <span class="mf">0.</span><span class="p">,</span> <span class="mf">0.</span><span class="p">,</span> <span class="mf">0.</span><span class="p">])</span>
</code></pre></div></div>
<h2 id="support-for-0-dimensional-scalar-tensors">Support for 0-dimensional (scalar) Tensors</h2>
<p>Previously, indexing into a <code class="highlighter-rouge">Tensor</code> vector (1-dimensional tensor) gave a Python number but indexing into a <code class="highlighter-rouge">Variable</code> vector gave (incosistently!) a vector of size <code class="highlighter-rouge">(1,)</code>! Similar behavior existed with reduction functions, e.g. <code class="highlighter-rouge">tensor.sum()</code> would return a Python number, but <code class="highlighter-rouge">variable.sum()</code> would return a vector of size <code class="highlighter-rouge">(1,)</code>.</p>
<p>Fortunately, this release introduces proper scalar (0-dimensional tensor) support in PyTorch! Scalars can be created using the new <code class="highlighter-rouge">torch.tensor</code> function (which will be explained in more detail later; for now just think of it as the PyTorch equivalent of <code class="highlighter-rouge">numpy.array</code>). Now you can do things like:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="n">torch</span><span class="o">.</span><span class="n">tensor</span><span class="p">(</span><span class="mf">3.1416</span><span class="p">)</span> <span class="c"># create a scalar directly</span>
<span class="n">tensor</span><span class="p">(</span><span class="mf">3.1416</span><span class="p">)</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">torch</span><span class="o">.</span><span class="n">tensor</span><span class="p">(</span><span class="mf">3.1416</span><span class="p">)</span><span class="o">.</span><span class="n">size</span><span class="p">()</span> <span class="c"># scalar is 0-dimensional</span>
<span class="n">torch</span><span class="o">.</span><span class="n">Size</span><span class="p">([])</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">torch</span><span class="o">.</span><span class="n">tensor</span><span class="p">([</span><span class="mi">3</span><span class="p">])</span><span class="o">.</span><span class="n">size</span><span class="p">()</span> <span class="c"># compare to a vector of size 1</span>
<span class="n">torch</span><span class="o">.</span><span class="n">Size</span><span class="p">([</span><span class="mi">1</span><span class="p">])</span>
<span class="o">&gt;&gt;&gt;</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">vector</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">6</span><span class="p">)</span> <span class="c"># this is a vector</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">vector</span>
<span class="n">tensor</span><span class="p">([</span> <span class="mf">2.</span><span class="p">,</span> <span class="mf">3.</span><span class="p">,</span> <span class="mf">4.</span><span class="p">,</span> <span class="mf">5.</span><span class="p">])</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">vector</span><span class="o">.</span><span class="n">size</span><span class="p">()</span>
<span class="n">torch</span><span class="o">.</span><span class="n">Size</span><span class="p">([</span><span class="mi">4</span><span class="p">])</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">vector</span><span class="p">[</span><span class="mi">3</span><span class="p">]</span> <span class="c"># indexing into a vector gives a scalar</span>
<span class="n">tensor</span><span class="p">(</span><span class="mf">5.</span><span class="p">)</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">vector</span><span class="p">[</span><span class="mi">3</span><span class="p">]</span><span class="o">.</span><span class="n">item</span><span class="p">()</span> <span class="c"># .item() gives the value as a Python number</span>
<span class="mf">5.0</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">mysum</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">tensor</span><span class="p">([</span><span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">])</span><span class="o">.</span><span class="nb">sum</span><span class="p">()</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">mysum</span>
<span class="n">tensor</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">mysum</span><span class="o">.</span><span class="n">size</span><span class="p">()</span>
<span class="n">torch</span><span class="o">.</span><span class="n">Size</span><span class="p">([])</span>
</code></pre></div></div>
<h3 id="accumulating-losses">Accumulating losses</h3>
<p>Consider the widely used pattern <code class="highlighter-rouge">total_loss += loss.data[0]</code>. Before 0.4.0. <code class="highlighter-rouge">loss</code> was a <code class="highlighter-rouge">Variable</code> wrapping a tensor of size <code class="highlighter-rouge">(1,)</code>, but in 0.4.0 <code class="highlighter-rouge">loss</code> is now a scalar and has <code class="highlighter-rouge">0</code> dimensions. Indexing into a scalar doesn’t make sense (it gives a warning now, but will be a hard error in 0.5.0). Use <code class="highlighter-rouge">loss.item()</code> to get the Python number from a scalar.</p>
<p>Note that if you don’t convert to a Python number when accumulating losses, you may find increased memory usage in your program. This is because the right-hand-side of the above expression used to be a Python float, while it is now a zero-dim Tensor. The total loss is thus accumulating Tensors and their gradient history, which may keep around large autograd graphs for much longer than necessary.</p>
<h2 id="deprecation-of-volatile-flag">Deprecation of volatile flag</h2>
<p>The <code class="highlighter-rouge">volatile</code> flag is now deprecated and has no effect. Previously, any computation that involves a <code class="highlighter-rouge">Variable</code> with <code class="highlighter-rouge">volatile=True</code> wouldn’t be tracked by <code class="highlighter-rouge">autograd</code>. This has now been replaced by a <a href="http://pytorch.org/docs/0.4.0/torch.html#locally-disabling-gradient-computation">set of more flexible context managers</a> including <code class="highlighter-rouge">torch.no_grad()</code>, <code class="highlighter-rouge">torch.set_grad_enabled(grad_mode)</code>, and others.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="n">x</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">requires_grad</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="o">&gt;&gt;&gt;</span> <span class="k">with</span> <span class="n">torch</span><span class="o">.</span><span class="n">no_grad</span><span class="p">():</span>
<span class="o">...</span> <span class="n">y</span> <span class="o">=</span> <span class="n">x</span> <span class="o">*</span> <span class="mi">2</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">y</span><span class="o">.</span><span class="n">requires_grad</span>
<span class="bp">False</span>
<span class="o">&gt;&gt;&gt;</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">is_train</span> <span class="o">=</span> <span class="bp">False</span>
<span class="o">&gt;&gt;&gt;</span> <span class="k">with</span> <span class="n">torch</span><span class="o">.</span><span class="n">set_grad_enabled</span><span class="p">(</span><span class="n">is_train</span><span class="p">):</span>
<span class="o">...</span> <span class="n">y</span> <span class="o">=</span> <span class="n">x</span> <span class="o">*</span> <span class="mi">2</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">y</span><span class="o">.</span><span class="n">requires_grad</span>
<span class="bp">False</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">torch</span><span class="o">.</span><span class="n">set_grad_enabled</span><span class="p">(</span><span class="bp">True</span><span class="p">)</span> <span class="c"># this can also be used as a function</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">y</span> <span class="o">=</span> <span class="n">x</span> <span class="o">*</span> <span class="mi">2</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">y</span><span class="o">.</span><span class="n">requires_grad</span>
<span class="bp">True</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">torch</span><span class="o">.</span><span class="n">set_grad_enabled</span><span class="p">(</span><span class="bp">False</span><span class="p">)</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">y</span> <span class="o">=</span> <span class="n">x</span> <span class="o">*</span> <span class="mi">2</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">y</span><span class="o">.</span><span class="n">requires_grad</span>
<span class="bp">False</span>
</code></pre></div></div>
<h2 id="dtypes-devices-and-numpy-style-creation-functions"><a href="http://pytorch.org/docs/0.4.0/tensor_attributes.html#torch.torch.dtype"><code class="highlighter-rouge">dtypes</code></a>, <a href="http://pytorch.org/docs/0.4.0/tensor_attributes.html#torch.torch.device"><code class="highlighter-rouge">devices</code></a> and NumPy-style creation functions</h2>
<p>In previous versions of PyTorch, we used to specify data type (e.g. float vs double), device type (cpu vs cuda) and layout (dense vs sparse) together as a “tensor type”. For example, <code class="highlighter-rouge">torch.cuda.sparse.DoubleTensor</code> was the <code class="highlighter-rouge">Tensor</code> type respresenting the <code class="highlighter-rouge">double</code> data type, living on CUDA devices, and with <a href="https://en.wikipedia.org/wiki/Sparse_matrix#Coordinate_list_(COO)">COO sparse tensor</a> layout.</p>
<p>In this release, we introduce <a href="http://pytorch.org/docs/0.4.0/tensor_attributes.html#torch.torch.dtype"><code class="highlighter-rouge">torch.dtype</code></a>, <a href="http://pytorch.org/docs/0.4.0/tensor_attributes.html#torch.torch.device"><code class="highlighter-rouge">torch.device</code></a> and <a href="http://pytorch.org/docs/0.4.0/tensor_attributes.html#torch.torch.layout"><code class="highlighter-rouge">torch.layout</code></a> classes to allow better management of these properties via NumPy-style creation functions.</p>
<h3 id="torchdtype"><a href="http://pytorch.org/docs/0.4.0/tensor_attributes.html#torch.torch.dtype"><code class="highlighter-rouge">torch.dtype</code></a></h3>
<p>Below is a complete list of available <a href="http://pytorch.org/docs/0.4.0/tensor_attributes.html#torch.torch.dtype"><code class="highlighter-rouge">torch.dtype</code></a>s (data types) and their corresponding tensor types.</p>
<table>
<thead>
<tr>
<th>Data</th>
<th><code class="highlighter-rouge">type torch.dtype</code></th>
<th>Tensor types</th>
</tr>
</thead>
<tbody>
<tr>
<td>32-bit floating point</td>
<td><code class="highlighter-rouge">torch.float32</code> or <code class="highlighter-rouge">torch.float</code></td>
<td><code class="highlighter-rouge">torch.*.FloatTensor</code></td>
</tr>
<tr>
<td>64-bit floating point</td>
<td><code class="highlighter-rouge">torch.float64</code> or <code class="highlighter-rouge">torch.double</code></td>
<td><code class="highlighter-rouge">torch.*.DoubleTensor</code></td>
</tr>
<tr>
<td>16-bit floating point</td>
<td><code class="highlighter-rouge">torch.float16</code> or <code class="highlighter-rouge">torch.half</code></td>
<td><code class="highlighter-rouge">torch.*.HalfTensor</code></td>
</tr>
<tr>
<td>8-bit integer (unsigned)</td>
<td><code class="highlighter-rouge">torch.uint8</code></td>
<td><code class="highlighter-rouge">torch.*.ByteTensor</code></td>
</tr>
<tr>
<td>8-bit integer (signed)</td>
<td><code class="highlighter-rouge">torch.int8</code></td>
<td><code class="highlighter-rouge">torch.*.CharTensor</code></td>
</tr>
<tr>
<td>16-bit integer (signed)</td>
<td><code class="highlighter-rouge">torch.int16</code> or <code class="highlighter-rouge">torch.short</code></td>
<td><code class="highlighter-rouge">torch.*.ShortTensor</code></td>
</tr>
<tr>
<td>32-bit integer (signed)</td>
<td><code class="highlighter-rouge">torch.int32</code> or <code class="highlighter-rouge">torch.int</code></td>
<td><code class="highlighter-rouge">torch.*.IntTensor</code></td>
</tr>
<tr>
<td>64-bit integer (signed)</td>
<td><code class="highlighter-rouge">torch.int64</code> or <code class="highlighter-rouge">torch.long</code></td>
<td><code class="highlighter-rouge">torch.*.LongTensor</code></td>
</tr>
</tbody>
</table>
<p>The dtype of a tensor can be access via its <code class="highlighter-rouge">dtype</code> attribute.</p>
<h3 id="torchdevice"><a href="http://pytorch.org/docs/0.4.0/tensor_attributes.html#torch.torch.device"><code class="highlighter-rouge">torch.device</code></a></h3>
<p>A <a href="http://pytorch.org/docs/0.4.0/tensor_attributes.html#torch.torch.device"><code class="highlighter-rouge">torch.device</code></a> contains a device type (<code class="highlighter-rouge">'cpu'</code> or <code class="highlighter-rouge">'cuda'</code>) and optional device ordinal (id) for the device type. It can be initilized with <code class="highlighter-rouge">torch.device('{device_type}')</code> or <code class="highlighter-rouge">torch.device('{device_type}:{device_ordinal}')</code>.</p>
<p>If the device ordinal is not present, this represents the current device for the device type; e.g., <code class="highlighter-rouge">torch.device('cuda')</code> is equivalent to <code class="highlighter-rouge">torch.device('cuda:X')</code> where <code class="highlighter-rouge">X</code> is the result of <code class="highlighter-rouge">torch.cuda.current_device()</code>.</p>
<p>The device of a tensor can be accessed via its <code class="highlighter-rouge">device</code> attribute.</p>
<h3 id="torchlayout"><a href="http://pytorch.org/docs/0.4.0/tensor_attributes.html#torch.torch.layout"><code class="highlighter-rouge">torch.layout</code></a></h3>
<p><a href="http://pytorch.org/docs/0.4.0/tensor_attributes.html#torch.torch.layout"><code class="highlighter-rouge">torch.layout</code></a> represents the data layout of a <a href="http://pytorch.org/docs/0.4.0/tensors.html"><code class="highlighter-rouge">Tensor</code></a>. Currently <code class="highlighter-rouge">torch.strided</code> (dense tensors, the default) and <code class="highlighter-rouge">torch.sparse_coo</code> (sparse tensors with COO format) are supported.</p>
<p>The layout of a tensor can be access via its <code class="highlighter-rouge">layout</code> attribute.</p>
<h3 id="creating-tensors">Creating Tensors</h3>
<p><a href="http://pytorch.org/docs/0.4.0/torch.html#creation-ops">Methods that create a</a> <a href="http://pytorch.org/docs/0.4.0/tensors.html"><code class="highlighter-rouge">Tensor</code></a> now also take in <code class="highlighter-rouge">dtype</code>, <code class="highlighter-rouge">device</code>, <code class="highlighter-rouge">layout</code>, and <code class="highlighter-rouge">requires_grad</code> options to specify the desired attributes on the returned <code class="highlighter-rouge">Tensor</code>. For example,</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="n">device</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">device</span><span class="p">(</span><span class="s">"cuda:1"</span><span class="p">)</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">x</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">randn</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">torch</span><span class="o">.</span><span class="n">float64</span><span class="p">,</span> <span class="n">device</span><span class="o">=</span><span class="n">device</span><span class="p">)</span>
<span class="n">tensor</span><span class="p">([[</span><span class="o">-</span><span class="mf">0.6344</span><span class="p">,</span> <span class="mf">0.8562</span><span class="p">,</span> <span class="o">-</span><span class="mf">1.2758</span><span class="p">],</span>
<span class="p">[</span> <span class="mf">0.8414</span><span class="p">,</span> <span class="mf">1.7962</span><span class="p">,</span> <span class="mf">1.0589</span><span class="p">],</span>
<span class="p">[</span><span class="o">-</span><span class="mf">0.1369</span><span class="p">,</span> <span class="o">-</span><span class="mf">1.0462</span><span class="p">,</span> <span class="o">-</span><span class="mf">0.4373</span><span class="p">]],</span> <span class="n">dtype</span><span class="o">=</span><span class="n">torch</span><span class="o">.</span><span class="n">float64</span><span class="p">,</span> <span class="n">device</span><span class="o">=</span><span class="s">'cuda:1'</span><span class="p">)</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">x</span><span class="o">.</span><span class="n">requires_grad</span> <span class="c"># default is False</span>
<span class="bp">False</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">x</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="n">requires_grad</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">x</span><span class="o">.</span><span class="n">requires_grad</span>
<span class="bp">True</span>
</code></pre></div></div>
<h5 id="torchtensordata-"><a href="http://pytorch.org/docs/0.4.0/torch.html#torch.tensor"><code class="highlighter-rouge">torch.tensor(data, ...)</code></a></h5>
<p><a href="http://pytorch.org/docs/0.4.0/torch.html#torch.tensor"><code class="highlighter-rouge">torch.tensor</code></a> is one of the newly added <a href="http://pytorch.org/docs/0.4.0/torch.html#creation-ops">tensor creation methods</a>. It takes in array-like data of all kinds and copies the contained values into a new <code class="highlighter-rouge">Tensor</code>. As mentioned earlier, <a href="http://pytorch.org/docs/0.4.0/torch.html#torch.tensor"><code class="highlighter-rouge">torch.tensor</code></a> is the PyTorch equivalent of NumPy’s <code class="highlighter-rouge">numpy.array</code>constructor. Unlike the <code class="highlighter-rouge">torch.*Tensor</code> methods, you can also create zero-dimensional <code class="highlighter-rouge">Tensor</code>s (aka scalars) this way (a single python number is treated as a Size in the <code class="highlighter-rouge">torch.*Tensor methods</code>). Moreover, if a <code class="highlighter-rouge">dtype</code> argument isn’t given, it will infer the suitable <code class="highlighter-rouge">dtype</code> given the data. It is the recommended way to create a tensor from existing data like a Python list. For example,</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="n">cuda</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">device</span><span class="p">(</span><span class="s">"cuda"</span><span class="p">)</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">torch</span><span class="o">.</span><span class="n">tensor</span><span class="p">([[</span><span class="mi">1</span><span class="p">],</span> <span class="p">[</span><span class="mi">2</span><span class="p">],</span> <span class="p">[</span><span class="mi">3</span><span class="p">]],</span> <span class="n">dtype</span><span class="o">=</span><span class="n">torch</span><span class="o">.</span><span class="n">half</span><span class="p">,</span> <span class="n">device</span><span class="o">=</span><span class="n">cuda</span><span class="p">)</span>
<span class="n">tensor</span><span class="p">([[</span> <span class="mi">1</span><span class="p">],</span>
<span class="p">[</span> <span class="mi">2</span><span class="p">],</span>
<span class="p">[</span> <span class="mi">3</span><span class="p">]],</span> <span class="n">device</span><span class="o">=</span><span class="s">'cuda:0'</span><span class="p">)</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">torch</span><span class="o">.</span><span class="n">tensor</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="c"># scalar</span>
<span class="n">tensor</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">torch</span><span class="o">.</span><span class="n">tensor</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mf">2.3</span><span class="p">])</span><span class="o">.</span><span class="n">dtype</span> <span class="c"># type inferece</span>
<span class="n">torch</span><span class="o">.</span><span class="n">float32</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">torch</span><span class="o">.</span><span class="n">tensor</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">])</span><span class="o">.</span><span class="n">dtype</span> <span class="c"># type inferece</span>
<span class="n">torch</span><span class="o">.</span><span class="n">int64</span>
</code></pre></div></div>
<p>We’ve also added more tensor creation methods. Some of them have <code class="highlighter-rouge">torch.*_like</code> and/or <code class="highlighter-rouge">tensor.new_*</code> variants.</p>
<ul>
<li>
<p><code class="highlighter-rouge">torch.*_like</code> takes in an input <code class="highlighter-rouge">Tensor</code> instead of a shape. It returns a <code class="highlighter-rouge">Tensor</code> with same attributes as the input <code class="highlighter-rouge">Tensor</code> by default unless otherwise specified:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="o">&gt;&gt;&gt;</span> <span class="n">x</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">randn</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">torch</span><span class="o">.</span><span class="n">float64</span><span class="p">)</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">torch</span><span class="o">.</span><span class="n">zeros_like</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="n">tensor</span><span class="p">([</span> <span class="mf">0.</span><span class="p">,</span> <span class="mf">0.</span><span class="p">,</span> <span class="mf">0.</span><span class="p">],</span> <span class="n">dtype</span><span class="o">=</span><span class="n">torch</span><span class="o">.</span><span class="n">float64</span><span class="p">)</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">torch</span><span class="o">.</span><span class="n">zeros_like</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">torch</span><span class="o">.</span><span class="nb">int</span><span class="p">)</span>
<span class="n">tensor</span><span class="p">([</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">],</span> <span class="n">dtype</span><span class="o">=</span><span class="n">torch</span><span class="o">.</span><span class="n">int32</span><span class="p">)</span>
</code></pre></div> </div>
</li>
<li>
<p><code class="highlighter-rouge">tensor.new_*</code> can also create <code class="highlighter-rouge">Tensors</code> with same attributes as <code class="highlighter-rouge">tensor</code>, but it always takes in a shape argument:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="o">&gt;&gt;&gt;</span> <span class="n">x</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">randn</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">torch</span><span class="o">.</span><span class="n">float64</span><span class="p">)</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">x</span><span class="o">.</span><span class="n">new_ones</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span>
<span class="n">tensor</span><span class="p">([</span> <span class="mf">1.</span><span class="p">,</span> <span class="mf">1.</span><span class="p">],</span> <span class="n">dtype</span><span class="o">=</span><span class="n">torch</span><span class="o">.</span><span class="n">float64</span><span class="p">)</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">x</span><span class="o">.</span><span class="n">new_ones</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">torch</span><span class="o">.</span><span class="nb">int</span><span class="p">)</span>
<span class="n">tensor</span><span class="p">([</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span> <span class="n">dtype</span><span class="o">=</span><span class="n">torch</span><span class="o">.</span><span class="n">int32</span><span class="p">)</span>
</code></pre></div> </div>
</li>
</ul>
<p>To specify the desired shape, you can either use a tuple (e.g., <code class="highlighter-rouge">torch.zeros((2, 3))</code>) or variable arguments (e.g., <code class="highlighter-rouge">torch.zeros(2, 3)</code>) in most cases.</p>
<table>
<thead>
<tr>
<th>Name</th>
<th>Returned <code class="highlighter-rouge">Tensor</code></th>
<th><code class="highlighter-rouge">torch.*_like</code> variant</th>
<th><code class="highlighter-rouge">tensor.new_*</code> variant</th>
</tr>
</thead>
<tbody>
<tr>
<td><a href="http://pytorch.org/docs/0.4.0/torch.html#torch.empty"><code class="highlighter-rouge">torch.empty</code></a></td>
<td>unintialized memory</td>
<td>✔</td>
<td>✔</td>
</tr>
<tr>
<td><a href="http://pytorch.org/docs/0.4.0/torch.html#torch.zeros"><code class="highlighter-rouge">torch.zeros</code></a></td>
<td>all zeros</td>
<td>✔</td>
<td>✔</td>
</tr>
<tr>
<td><a href="http://pytorch.org/docs/0.4.0/torch.html#torch.ones"><code class="highlighter-rouge">torch.ones</code></a></td>
<td>all ones</td>
<td>✔</td>
<td>✔</td>
</tr>
<tr>
<td><a href="http://pytorch.org/docs/0.4.0/torch.html#torch.full"><code class="highlighter-rouge">torch.full</code></a></td>
<td>filled with a given value</td>
<td>✔</td>
<td>✔</td>
</tr>
<tr>
<td><a href="http://pytorch.org/docs/0.4.0/torch.html#torch.rand"><code class="highlighter-rouge">torch.rand</code></a></td>
<td>i.i.d. continuous Uniform[0, 1)</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td><a href="http://pytorch.org/docs/0.4.0/torch.html#torch.randn"><code class="highlighter-rouge">torch.randn</code></a></td>
<td>i.i.d. <code class="highlighter-rouge">Normal(0, 1)</code></td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td><a href="http://pytorch.org/docs/0.4.0/torch.html#torch.randint"><code class="highlighter-rouge">torch.randint</code></a></td>
<td>i.i.d. discrete Uniform in given range</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td><a href="http://pytorch.org/docs/0.4.0/torch.html#torch.randperm"><code class="highlighter-rouge">torch.randperm</code></a></td>
<td>random permutation of <code class="highlighter-rouge">{0, 1, ..., n - 1}</code></td>
<td> </td>
<td> </td>
</tr>
<tr>
<td><a href="http://pytorch.org/docs/0.4.0/torch.html#torch.tensor"><code class="highlighter-rouge">torch.tensor</code></a></td>
<td>copied from existing data (list, NumPy ndarray, etc.)</td>
<td> </td>
<td>✔</td>
</tr>
<tr>
<td><a href="http://pytorch.org/docs/0.4.0/torch.html#torch.from_numpy"><code class="highlighter-rouge">torch.from_numpy</code>*</a></td>
<td>from NumPy <code class="highlighter-rouge">ndarray</code> (sharing storage without copying)</td>
<td> </td>
<td> </td>
</tr>
<tr>
<td><a href="http://pytorch.org/docs/0.4.0/torch.html#torch.arange"><code class="highlighter-rouge">torch.arange</code></a>, <a href="http://pytorch.org/docs/0.4.0/torch.html#torch.range"><code class="highlighter-rouge">torch.range</code></a>, and <a href="http://pytorch.org/docs/0.4.0/torch.html#torch.linspace"><code class="highlighter-rouge">torch.linspace</code></a></td>
<td>uniformly spaced values in a given range</td>
<td> </td>
<td> </td>
</tr>
<tr>
<td><a href="http://pytorch.org/docs/0.4.0/torch.html#torch.logspace"><code class="highlighter-rouge">torch.logspace</code></a></td>
<td>logarithmically spaced values in a given range</td>
<td> </td>
<td> </td>
</tr>
<tr>
<td><a href="http://pytorch.org/docs/0.4.0/torch.html#torch.eye"><code class="highlighter-rouge">torch.eye</code></a></td>
<td>identity matrix</td>
<td> </td>
<td> </td>
</tr>
</tbody>
</table>
<p>*: <a href="http://pytorch.org/docs/0.4.0/torch.html#torch.from_numpy"><code class="highlighter-rouge">torch.from_numpy</code></a> only takes in a NumPy <code class="highlighter-rouge">ndarray</code> as its input argument.</p>
<h2 id="writing-device-agnostic-code">Writing device-agnostic code</h2>
<p>Previous versions of PyTorch made it difficult to write code that was device agnostic (i.e. that could run on both CUDA-enabled and CPU-only machines without modification).</p>
<p>PyTorch 0.4.0 makes this easier in two ways:</p>
<ul>
<li>The <code class="highlighter-rouge">device</code> attribute of a Tensor gives the <a href="http://pytorch.org/docs/0.4.0/tensor_attributes.html#torch.torch.device">torch.device</a> for all Tensors (<code class="highlighter-rouge">get_device</code> only works for CUDA tensors)</li>
<li>The <code class="highlighter-rouge">to</code> method of <code class="highlighter-rouge">Tensors</code> and <code class="highlighter-rouge">Modules</code> can be used to easily move objects to different devices (instead of having to call <code class="highlighter-rouge">cpu()</code> or <code class="highlighter-rouge">cuda()</code> based on the context)</li>
</ul>
<p>We recommend the following pattern:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># at beginning of the script</span>
<span class="n">device</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">device</span><span class="p">(</span><span class="s">"cuda:0"</span> <span class="k">if</span> <span class="n">torch</span><span class="o">.</span><span class="n">cuda</span><span class="o">.</span><span class="n">is_available</span><span class="p">()</span> <span class="k">else</span> <span class="s">"cpu"</span><span class="p">)</span>
<span class="o">...</span>
<span class="c"># then whenever you get a new Tensor or Module</span>
<span class="c"># this won't copy if they are already on the desired device</span>
<span class="nb">input</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">to</span><span class="p">(</span><span class="n">device</span><span class="p">)</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">MyModule</span><span class="p">(</span><span class="o">...</span><span class="p">)</span><span class="o">.</span><span class="n">to</span><span class="p">(</span><span class="n">device</span><span class="p">)</span>
</code></pre></div></div>
<h2 id="new-edge-case-constraints-on-names-of-submodules-parameters-and-buffers-in-nnmodule">New edge-case constraints on names of submodules, parameters, and buffers in <code class="highlighter-rouge">nn.Module</code></h2>
<p><code class="highlighter-rouge">name</code> that is an empty string or contains <code class="highlighter-rouge">"."</code> is no longer permitted in <code class="highlighter-rouge">module.add_module(name, value)</code>, <code class="highlighter-rouge">module.add_parameter(name, value)</code> or <code class="highlighter-rouge">module.add_buffer(name, value)</code> because such names may cause lost data in the <code class="highlighter-rouge">state_dict</code>. If you are loading a checkpoint for modules containing such names, please update the module definition and patch the <code class="highlighter-rouge">state_dict</code> before loading it.</p>
<h2 id="code-samples-putting-it-all-together">Code Samples (Putting it all together)</h2>
<p>To get a flavor of the overall recommended changes in 0.4.0, let’s look at a quick example for a common code pattern in both 0.3.1 and 0.4.0:</p>
<ul>
<li>0.3.1 (old):
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">model</span> <span class="o">=</span> <span class="n">MyRNN</span><span class="p">()</span>
<span class="k">if</span> <span class="n">use_cuda</span><span class="p">:</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">cuda</span><span class="p">()</span>
<span class="c"># train</span>
<span class="n">total_loss</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">for</span> <span class="nb">input</span><span class="p">,</span> <span class="n">target</span> <span class="ow">in</span> <span class="n">train_loader</span><span class="p">:</span>
<span class="nb">input</span><span class="p">,</span> <span class="n">target</span> <span class="o">=</span> <span class="n">Variable</span><span class="p">(</span><span class="nb">input</span><span class="p">),</span> <span class="n">Variable</span><span class="p">(</span><span class="n">target</span><span class="p">)</span>
<span class="n">hidden</span> <span class="o">=</span> <span class="n">Variable</span><span class="p">(</span><span class="n">torch</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="o">*</span><span class="n">h_shape</span><span class="p">))</span> <span class="c"># init hidden</span>
<span class="k">if</span> <span class="n">use_cuda</span><span class="p">:</span>
<span class="nb">input</span><span class="p">,</span> <span class="n">target</span><span class="p">,</span> <span class="n">hidden</span> <span class="o">=</span> <span class="nb">input</span><span class="o">.</span><span class="n">cuda</span><span class="p">(),</span> <span class="n">target</span><span class="o">.</span><span class="n">cuda</span><span class="p">(),</span> <span class="n">hidden</span><span class="o">.</span><span class="n">cuda</span><span class="p">()</span>
<span class="o">...</span> <span class="c"># get loss and optimize</span>
<span class="n">total_loss</span> <span class="o">+=</span> <span class="n">loss</span><span class="o">.</span><span class="n">data</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="c"># evaluate</span>
<span class="k">for</span> <span class="nb">input</span><span class="p">,</span> <span class="n">target</span> <span class="ow">in</span> <span class="n">test_loader</span><span class="p">:</span>
<span class="nb">input</span> <span class="o">=</span> <span class="n">Variable</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">volatile</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="k">if</span> <span class="n">use_cuda</span><span class="p">:</span>
<span class="o">...</span>
<span class="o">...</span>
</code></pre></div> </div>
</li>
<li>0.4.0 (new):
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># torch.device object used throughout this script</span>
<span class="n">device</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">device</span><span class="p">(</span><span class="s">"cuda"</span> <span class="k">if</span> <span class="n">use_cuda</span> <span class="k">else</span> <span class="s">"cpu"</span><span class="p">)</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">MyRNN</span><span class="p">()</span><span class="o">.</span><span class="n">to</span><span class="p">(</span><span class="n">device</span><span class="p">)</span>
<span class="c"># train</span>
<span class="n">total_loss</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">for</span> <span class="nb">input</span><span class="p">,</span> <span class="n">target</span> <span class="ow">in</span> <span class="n">train_loader</span><span class="p">:</span>
<span class="nb">input</span><span class="p">,</span> <span class="n">target</span> <span class="o">=</span> <span class="nb">input</span><span class="o">.</span><span class="n">to</span><span class="p">(</span><span class="n">device</span><span class="p">),</span> <span class="n">target</span><span class="o">.</span><span class="n">to</span><span class="p">(</span><span class="n">device</span><span class="p">)</span>
<span class="n">hidden</span> <span class="o">=</span> <span class="nb">input</span><span class="o">.</span><span class="n">new_zeros</span><span class="p">(</span><span class="o">*</span><span class="n">h_shape</span><span class="p">)</span> <span class="c"># has the same device &amp; dtype as `input`</span>
<span class="o">...</span> <span class="c"># get loss and optimize</span>
<span class="n">total_loss</span> <span class="o">+=</span> <span class="n">loss</span><span class="o">.</span><span class="n">item</span><span class="p">()</span> <span class="c"># get Python number from 1-element Tensor</span>
<span class="c"># evaluate</span>
<span class="k">with</span> <span class="n">torch</span><span class="o">.</span><span class="n">no_grad</span><span class="p">():</span> <span class="c"># operations inside don't track history</span>
<span class="k">for</span> <span class="nb">input</span><span class="p">,</span> <span class="n">target</span> <span class="ow">in</span> <span class="n">test_loader</span><span class="p">:</span>
<span class="o">...</span>
</code></pre></div> </div>
</li>
</ul>
<p>Thank you for reading! Please refer to our <a href="http://pytorch.org/docs/0.4.0/index.html">documentation</a> and <a href="https://github.com/pytorch/pytorch/releases/tag/v0.4.0">release notes</a> for more details.</p>
<p>Happy PyTorch-ing!</p></content><author><name>Facebook</name></author><summary type="html">Welcome to the migration guide for PyTorch 0.4.0. In this release we introduced many exciting new features and critical bug fixes, with the goal of providing users a better and cleaner interface. In this guide, we will cover the most important changes in migrating existing code from previous versions:</summary></entry><entry><title type="html">Tensor Comprehensions in PyTorch</title><link href="https://pytorch.org/blog/tensor-comprehensions/" rel="alternate" type="text/html" title="Tensor Comprehensions in PyTorch" /><published>2018-03-05T00:00:00-08:00</published><updated>2018-03-05T00:00:00-08:00</updated><id>https://pytorch.org/blog/tensor-comprehensions</id><content type="html" xml:base="https://pytorch.org/blog/tensor-comprehensions/"><p>Tensor Comprehensions (TC) is a tool that lowers the barrier for writing high-performance code. It generates GPU code from a simple high-level language and autotunes the code for specific input sizes.</p>
<p><strong>We highly recommend reading the <a href="https://research.fb.com/announcing-tensor-comprehensions/">Tensor Comprehensions blogpost</a> first.</strong></p>
<p>If you ran into any of the following scenarios, TC is a useful tool for you.</p>
<ul>
<li>
<p>Your PyTorch layer is large and slow, and you contemplated writing a dedicated C++ or CUDA code for it. But you don’t know how to program in CUDA or write low-level code.</p>
</li>
<li>
<p>You wrote a CUDA layer, but it took a week to write, debug, optimize for speed. You wished you could do this in an hour.</p>
</li>
<li>
<p>You want to fuse multiple layers like Conv-ReLU-BatchNorm or Linear-ReLU-Linear-ReLU in your network for speed, but it was quite difficult to comprehend</p>
</li>
<li>
<p>Your research involves weird Tensor shapes that CuDNN and MKL are not optimized for. For example, you do convolutions of 13 x 24 with an input image of 143 x 55. You tried running it with CuDNN and it was slower than you wished.</p>
</li>
<li>
<p>Your code is slowed-down by transposing Tensors constantly to fit a particular memory layout. You wish it was easy to write custom code that operates efficiently on your input layout.</p>
</li>
</ul>
<p>Tensor Comprehensions are seamless to use in PyTorch, interoperating with PyTorch Tensors and <code class="highlighter-rouge">nn</code> Variables.</p>
<p>Let us run through using TC with PyTorch.</p>
<h4 id="1-install-the-package">1. Install the package</h4>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>conda install <span class="nt">-c</span> pytorch <span class="nt">-c</span> tensorcomp tensor_comprehensions
</code></pre></div></div>
<p>At this time we only provide Linux-64 binaries which have been tested on Ubuntu 16.04 and CentOS7.</p>
<p>TC depends on heavyweight C++ projects such as <a href="http://halide-lang.org/">Halide</a>, <a href="https://github.com/wsmoses/Tapir-LLVM">Tapir-LLVM</a> and <a href="http://isl.gforge.inria.fr/">ISL</a>. Hence, we rely on Anaconda to distribute these dependencies reliably. For the same reason, TC is not available via PyPI.</p>
<h4 id="2-import-the-python-package">2. Import the python package</h4>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">tensor_comprehensions</span> <span class="k">as</span> <span class="n">tc</span>
</code></pre></div></div>
<h4 id="3-define-the-tc-expression-and-create-a-python-function">3. Define the TC expression and create a python function</h4>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">lang</span> <span class="o">=</span> <span class="s">"""
def fcrelu(float(B,M) I, float(N,M) W1, float(N) B1) -&gt; (O1) {
O1(b, n) +=! I(b, m) * W1(n, m)
O1(b, n) = O1(b, n) + B1(n)
O1(b, n) = fmax(O1(b, n), 0)
}
"""</span>
<span class="n">fcrelu</span> <span class="o">=</span> <span class="n">tc</span><span class="o">.</span><span class="n">define</span><span class="p">(</span><span class="n">lang</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">"fcrelu"</span><span class="p">)</span>
</code></pre></div></div>
<p>This <code class="highlighter-rouge">fcrelu</code> function takes PyTorch Tensors as input and returns a PyTorch Tensor. It takes input <code class="highlighter-rouge">I</code>, weight <code class="highlighter-rouge">W1</code>, bias <code class="highlighter-rouge">B1</code> and returns output <code class="highlighter-rouge">O1</code>.</p>
<h4 id="4-lets-create-some-dummy-input-tensors">4. Let’s create some dummy input tensors</h4>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">B</span><span class="p">,</span> <span class="n">M</span><span class="p">,</span> <span class="n">N</span> <span class="o">=</span> <span class="mi">100</span><span class="p">,</span> <span class="mi">128</span><span class="p">,</span> <span class="mi">100</span>
<span class="n">I</span><span class="p">,</span> <span class="n">W1</span><span class="p">,</span> <span class="n">B1</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">randn</span><span class="p">(</span><span class="n">B</span><span class="p">,</span> <span class="n">M</span><span class="p">)</span><span class="o">.</span><span class="n">cuda</span><span class="p">(),</span> <span class="n">torch</span><span class="o">.</span><span class="n">randn</span><span class="p">(</span><span class="n">N</span><span class="p">,</span> <span class="n">M</span><span class="p">)</span><span class="o">.</span><span class="n">cuda</span><span class="p">(),</span> <span class="n">torch</span><span class="o">.</span><span class="n">randn</span><span class="p">(</span><span class="n">N</span><span class="p">)</span><span class="o">.</span><span class="n">cuda</span><span class="p">()</span>
</code></pre></div></div>
<h4 id="5-now-autotune-the-function-for-your-input-sizes">5. Now autotune the function for your input sizes</h4>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">fcrelu</span><span class="o">.</span><span class="n">autotune</span><span class="p">(</span><span class="n">I</span><span class="p">,</span> <span class="n">W1</span><span class="p">,</span> <span class="n">B1</span><span class="p">,</span> <span class="n">cache</span><span class="o">=</span><span class="s">"fcrelu_100_128_100.tc"</span><span class="p">)</span>
</code></pre></div></div>
<p>The autotuner is your biggest friend. You generally do not want to use a <code class="highlighter-rouge">tc</code> function without autotuning it first.</p>
<p>When the autotuning is running, the current best performance is displayed. If you are satisfied with the current result or you are out of time, stop the tuning procedure by pressing <code class="highlighter-rouge">Ctrl+C</code>.</p>
<p><img src="https://pytorch.org/static/img/tc_autotuner.gif" alt="tc-autotuner" /></p>
<p><code class="highlighter-rouge">cache</code> saves the results of the autotuned kernel search and saves it to the file <code class="highlighter-rouge">fcrelu_100_128_100.tc</code>. The next time you call the same line of code, it loads the results of the autotuning without recomputing it.</p>
<p>The autotuner has a few hyperparameters (just like your ConvNet has learning rate, number of layers, etc.). We pick reasonable defaults, but you can read about using advanced options <a href="https://facebookresearch.github.io/TensorComprehensions/framework/pytorch_integration/writing_layers.html#specifying-mapping-options">here</a>.</p>
<h4 id="6-call-the-function-with-the-inputs-to-get-your-result">6. Call the function with the inputs, to get your result</h4>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">out</span> <span class="o">=</span> <span class="n">fcrelu</span><span class="p">(</span><span class="n">I</span><span class="p">,</span> <span class="n">W1</span><span class="p">,</span> <span class="n">B1</span><span class="p">)</span>
</code></pre></div></div>
<p>Now, let’s look at how to write TC expressions.</p>
<h2 id="a-quick-primer-on-the-tc-language">A quick primer on the TC language</h2>
<p>The TC notation focuses on the mathematical nature of the layer, leaving performance considerations to it’s backend code that uses Halide and polyhedral compilation techniques which accumulate decades of cutting edge Loop Nest Optimization (LNO) research.</p>
<p>TC is close to <a href="https://docs.scipy.org/doc/numpy/reference/generated/numpy.einsum.html">np.einsum</a>. We shall quickly learn TC by example</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">lang</span> <span class="o">=</span> <span class="s">"""
def matmul(float(M,N) A, float(N,K) B) -&gt; (output) {
output(i, j) +=! A(i, kk) * B(kk, j)
}
"""</span>
</code></pre></div></div>
<p>In this example, we define a function <code class="highlighter-rouge">matmul</code> which takes two input <code class="highlighter-rouge">A</code> and <code class="highlighter-rouge">B</code> of shapes <code class="highlighter-rouge">M x N</code> and <code class="highlighter-rouge">N x K</code> and returns a single <code class="highlighter-rouge">output</code>. The shape of <code class="highlighter-rouge">output</code> is automatically inferred by the TC language (discussed below).</p>
<p>Let’s look at this line:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">output</span><span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">)</span> <span class="o">+=</span><span class="err">!</span> <span class="n">A</span><span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">kk</span><span class="p">)</span> <span class="o">*</span> <span class="n">B</span><span class="p">(</span><span class="n">kk</span><span class="p">,</span> <span class="n">j</span><span class="p">)</span>
</code></pre></div></div>
<p>It says:</p>
<ul>
<li><code class="highlighter-rouge">output(i, j)</code> means output is 2D.</li>
<li>for each location <code class="highlighter-rouge">output(i, j)</code>, we add (<code class="highlighter-rouge">+=</code>) <code class="highlighter-rouge">A(i, kk) * B(kk, j)</code>.</li>
<li><code class="highlighter-rouge">i</code> is well-defined as all locations in <code class="highlighter-rouge">A</code> dim=0, i.e. <code class="highlighter-rouge">i in range(0, M)</code></li>
<li><code class="highlighter-rouge">j</code> is well-defined as all locations in <code class="highlighter-rouge">B</code> dim=1, i.e. <code class="highlighter-rouge">j in range(0, K)</code></li>
<li><code class="highlighter-rouge">kk</code> is inferred as all locations from <code class="highlighter-rouge">0</code> to <code class="highlighter-rouge">N</code></li>
</ul>
<p>The shape of output is inferred from the maximum values <code class="highlighter-rouge">i</code> and <code class="highlighter-rouge">j</code> can take, which is <code class="highlighter-rouge">M</code> and <code class="highlighter-rouge">K</code>, so output is of size <code class="highlighter-rouge">M x K</code>.</p>
<p>The <code class="highlighter-rouge">!</code> symbol initializes output with <code class="highlighter-rouge">0.0</code>. It is equivalent to:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">output</span><span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">)</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">output</span><span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">)</span> <span class="o">+=</span> <span class="n">A</span><span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">kk</span><span class="p">)</span> <span class="o">*</span> <span class="n">B</span><span class="p">(</span><span class="n">kk</span><span class="p">,</span> <span class="n">j</span><span class="p">)</span>
</code></pre></div></div>
<p><strong>Scalar inputs and range constraints: implement AvgPool2d</strong></p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="s">"""
def avgpool(float(B, C, H, W) input) -&gt; (output) {{
output(b, c, h, w) += input(b, c, h * {sH} + kh, w * {sW} + kw) where kh in 0:{kH}, kw in 0:{kW}
}}
"""</span>
<span class="n">avgpool</span> <span class="o">=</span> <span class="n">tc</span><span class="o">.</span><span class="n">define</span><span class="p">(</span><span class="n">LANG</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">"avgpool"</span><span class="p">,</span> <span class="n">constants</span><span class="o">=</span><span class="p">{</span><span class="s">"sH"</span><span class="p">:</span><span class="mi">1</span><span class="p">,</span> <span class="s">"sW"</span><span class="p">:</span><span class="mi">1</span><span class="p">,</span> <span class="s">"kH"</span><span class="p">:</span><span class="mi">2</span><span class="p">,</span> <span class="s">"kW"</span><span class="p">:</span><span class="mi">2</span><span class="p">})</span>
</code></pre></div></div>
<p>here the <code class="highlighter-rouge">where</code> keyword can take ranges of values to operate on. <code class="highlighter-rouge">0:{kH}</code> is equivalent <code class="highlighter-rouge">range(kH)</code> in Python.</p>
<p>Note: the syntax for passing in scalars is subject to change in the next release.</p>
<h2 id="torchnn-layers">torch.nn layers</h2>
<p>We added some sugar-coating around the basic PyTorch integration of TC to make it easy to integrate TC into larger <code class="highlighter-rouge">torch.nn</code> models by defining the forward and backward TC expressions and taking <code class="highlighter-rouge">Variable</code> inputs / outputs. Here is an <a href="https://github.com/facebookresearch/TensorComprehensions/blob/master/test_python/layers/test_convolution_train.py">example</a> of defining a convolution layer with TC.</p>
<h2 id="some-essentials-that-you-will-miss-were-working-on-them">Some essentials that you will miss (we’re working on them)</h2>
<h3 id="autotuning-for-variable-length-sequences">Autotuning for variable-length sequences</h3>
<p>The TC auto-tuner requires all input sizes to be specified before-hand. For example, if you have input <code class="highlighter-rouge">I1</code> which is an image batch, the autotuner wants to know the exact shape of <code class="highlighter-rouge">I1</code> to generate an optimized kernel. You cannot specify: <code class="highlighter-rouge">image with height between 200 and 300</code>. This is more essential in sequence data such as NLP, where each sentence can have a different length.</p>
<p>The reason why the autotuner is non-parametric is because it’s harder and harder to auto-tune parametric constraints, this is active research. Hence, for the first release, we made a conscious decision to give you the tool in a form where we know it works well.</p>
<p>As a work-around, if you know that you have a few specific shapes of interest, you can run the autotuner with these multiple shapes.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">relu</span> <span class="o">=</span> <span class="n">tc</span><span class="o">.</span><span class="n">define</span><span class="p">(</span><span class="n">LANG</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">"relu"</span><span class="p">)</span>
<span class="n">batch</span><span class="p">,</span> <span class="n">channels</span> <span class="o">=</span> <span class="mi">16</span><span class="p">,</span> <span class="mi">3</span>
<span class="n">tc</span><span class="o">.</span><span class="n">autotune</span><span class="p">((</span><span class="n">batch</span><span class="p">,</span> <span class="n">channels</span><span class="p">,</span> <span class="mi">32</span><span class="p">,</span> <span class="mi">32</span><span class="p">))</span> <span class="c"># image of size 32 x 32</span>
<span class="n">tc</span><span class="o">.</span><span class="n">autotune</span><span class="p">((</span><span class="n">batch</span><span class="p">,</span> <span class="n">channels</span><span class="p">,</span> <span class="mi">48</span><span class="p">,</span> <span class="mi">48</span><span class="p">))</span> <span class="c"># image of size 48 x 48</span>
<span class="n">tc</span><span class="o">.</span><span class="n">autotune</span><span class="p">((</span><span class="n">batch</span><span class="p">,</span> <span class="n">channels</span><span class="p">,</span> <span class="mi">64</span><span class="p">,</span> <span class="mi">64</span><span class="p">))</span> <span class="c"># image of size 64 x 64</span>
</code></pre></div></div>
<p>Now the autotuner is tuned for these three specific image sizes <code class="highlighter-rouge">32x32</code>, <code class="highlighter-rouge">48x48</code> and <code class="highlighter-rouge">64x64</code>.</p>
<h3 id="lack-of-loops">Lack of loops</h3>
<p>If you want to write an RNN, it’s easy to see it as a <code class="highlighter-rouge">for</code> loop over time. However, the TC language does not have loops yet. If you reallly want to write RNNs, you can write unrolled loops.</p>
<h3 id="strided-tensors">Strided-Tensors</h3>
<p>The TC backend does not support non-contiguous Tensors yet. If the inputs you give are not contiguous, they are made contiguous before passing to the TC backend.</p>
<h3 id="reshaping-tensors-within-a-tc-expression">Reshaping Tensors within a TC expression</h3>
<p>You cannot write this operation in TC: <code class="highlighter-rouge">torch.matmul(...).view(...).mean(...)</code>. Whenever there is need for a <code class="highlighter-rouge">view</code> to change the shape of an input, you have to get the output, <code class="highlighter-rouge">view</code> it at the PyTorch level.</p>
<h2 id="getting-started">Getting Started</h2>
<ul>
<li><a href="https://facebookresearch.github.io/TensorComprehensions/framework/pytorch_integration/writing_layers.html">Walk through Tutorial</a> to quickly get started with understanding and using Tensor Comprehensions PyTorch package.</li>
<li><a href="https://github.com/facebookresearch/TensorComprehensions/tree/master/test_python/layers">Over 20 examples</a> of various ML layers with TC, including <code class="highlighter-rouge">avgpool</code>, <code class="highlighter-rouge">maxpool</code>, <code class="highlighter-rouge">matmul</code>, matmul - give output buffers and <code class="highlighter-rouge">batch-matmul</code>, <code class="highlighter-rouge">convolution</code>, <code class="highlighter-rouge">strided-convolution</code>, <code class="highlighter-rouge">batchnorm</code>, <code class="highlighter-rouge">copy</code>, <code class="highlighter-rouge">cosine similarity</code>, <code class="highlighter-rouge">Linear</code>, <code class="highlighter-rouge">Linear + ReLU</code>, <code class="highlighter-rouge">group-convolutions</code>, strided <code class="highlighter-rouge">group-convolutions</code>, <code class="highlighter-rouge">indexing</code>, <code class="highlighter-rouge">Embedding</code> (lookup table), small-mobilenet, <code class="highlighter-rouge">softmax</code>, <code class="highlighter-rouge">tensordot</code>, <code class="highlighter-rouge">transpose</code></li>
<li><a href="https://facebookresearch.github.io/TensorComprehensions/framework/pytorch_integration/getting_started.html">Detailed docs</a> on Tensor Comprehensions and integration with PyTorch.</li>
</ul>
<h2 id="communication">Communication</h2>
<ul>
<li><a href="https://tensorcomprehensions.herokuapp.com/">Slack</a>: For discussion around framework integration, build support, collaboration, etc. join our slack channel.</li>
<li>Email: [email protected]</li>
<li><a href="https://github.com/facebookresearch/TensorComprehensions">GitHub</a>: bug reports, feature requests, install issues, RFCs, thoughts, etc.</li>
</ul>
<h2 id="acknowledgements">Acknowledgements</h2>
<p>We would like to thank Soumith Chintala, <a href="https://github.com/ezyang">Edward Yang</a> and <a href="https://github.com/colesbury">Sam Gross</a> for their immense guidance and help in making the integration API nice and smooth. We would also like to thank rest of the PyTorch team and our pre-release users for their helpful feedback that guided us in making the integration better.</p></content><author><name>Priya Goyal (FAIR), Nicolas Vasilache (FAIR), Oleksandr Zinenko (Inria & DI ENS), Theodoros Theodoridis (ETH Zürich), Zachary DeVito (FAIR), William S. Moses (MIT CSAIL), Sven Verdoolaege (FAIR), Andrew Adams (FAIR), Albert Cohen (Inria & DI ENS & FAIR)</name></author><summary type="html">Tensor Comprehensions (TC) is a tool that lowers the barrier for writing high-performance code. It generates GPU code from a simple high-level language and autotunes the code for specific input sizes.</summary></entry><entry><title type="html">PyTorch, a year in….</title><link href="https://pytorch.org/blog/a-year-in/" rel="alternate" type="text/html" title="PyTorch, a year in...." /><published>2018-01-19T09:00:00-08:00</published><updated>2018-01-19T09:00:00-08:00</updated><id>https://pytorch.org/blog/a-year-in</id><content type="html" xml:base="https://pytorch.org/blog/a-year-in/"><p>Today marks 1 year since PyTorch was released publicly. It’s been a wild ride — our quest to build a flexible deep learning research platform. Over the last year, we’ve seen an amazing community of people using, contributing to and evangelizing PyTorch — thank you for the love.</p>
<p>Looking back, we wanted to summarize PyTorch over the past year: the progress, the news and highlights from the community.</p>
<h2 id="community">Community</h2>
<p>We’ve been blessed with a strong organic community of researchers and engineers who fell in love with PyTorch. The core team has engineers and researchers from multiple countries, companies and universities, and we couldn’t have made PyTorch what it is without each contribution.</p>
<h3 id="research-papers-packages-and-github">Research papers, packages and Github</h3>
<p>Within days of release, users from the community started to implement their favorite research papers in PyTorch and release the code on Github. Open-source code is a primary and essential tool for researchers today.</p>
<p>Folks came together to create <a href="https://github.com/pytorch/text">torchtext</a>, <a href="https://github.com/pytorch/vision">torchvision</a> and <a href="https://github.com/pytorch/audio">torchaudio</a> packages to help facilitate and democratize research in different domains.</p>
<p>The first community package based on PyTorch came from Brandon Amos, <a href="https://twitter.com/brandondamos/status/828652480573607937">titled Block</a>, and helped with easier manipulation of block matrices. The Locus Lab at <strong>CMU</strong> subsequently went on to <a href="https://github.com/locuslab">publish PyTorch packages</a> and implementations for most of their research. The first research paper code came from Sergey Zagoruyko titled <a href="https://twitter.com/PyTorch/status/822561885744726016">Paying more attention to attention</a>.</p>
<p>Jun-Yan Zhu, Taesung Park, Phillip Isola, Alyosha Efros and team from <strong>U.C.Berkeley</strong> released the hugely popular <a href="https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix">Cycle-GAN and pix2pix</a> which does image to image transforms.</p>
<div class="text-center">
<img src="https://pytorch.org/assets/images/horse2zebra.gif" />
</div>
<p>The researchers at <strong>HarvardNLP</strong> and <strong>Systran</strong> started developing and improving <a href="https://github.com/OpenNMT/OpenNMT-py">OpenNMT in PyTorch</a>, seeded by initial reimplementation of the [Lua]Torch code from Adam Lerer at Facebook.</p>
<p>The MagicPony team at <strong>Twitter</strong> contributed implementations of their <a href="https://twitter.com/Rob_Bishop/status/821793080877588480">Super-resolution work early on into PyTorch’s examples</a>.</p>
<p><strong>Salesforce Research</strong> released several packages, including their highlight release of <a href="https://twitter.com/Smerity/status/917472260851560448">PyTorch-QRNN</a>, a type of RNN that is 2x to 17x faster than standard LSTMs optimized by CuDNN. James Bradbury and team form one of the most active and engaging forces in the PyTorch community.</p>
<blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">We&#39;re releasing <a href="https://twitter.com/PyTorch?ref_src=twsrc%5Etfw">@PyTorch</a>-QRNN, 2-17x faster than NVIDIA&#39;s cuDNN LSTM.<br />Speed thanks to 50 lines of CUDA via CuPy.<a href="https://t.co/KaWhN4yDZd">https://t.co/KaWhN4yDZd</a> <a href="https://t.co/yoLYj3pMI0">pic.twitter.com/yoLYj3pMI0</a></p>&mdash; Smerity (@Smerity) <a href="https://twitter.com/Smerity/status/917472260851560448?ref_src=twsrc%5Etfw">October 9, 2017</a></blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
<p>Researchers from <strong>Uber</strong>, <strong>Northeastern</strong> and <strong>Stanford</strong> came together to form an active probabilistic programming community around their packages <a href="http://pyro.ai/">Pyro</a> and <a href="https://github.com/probtorch/probtorch">ProbTorch</a>. They are actively developing the torch.distributions core package. This community is so active and fast-moving, we had our first pytorch-probabilistic-programming meetup at NIPS 2017 with Fritz Obermeyer, Noah Goodman, Jan-Willem van de Meent, Brooks Paige, Dustin Tran and 22 additional attendees discussing how to make the world bayesian.</p>
<div class="text-center">
<img src="https://pytorch.org/assets/images/probpackages.png" width="40%" />
</div>
<p><strong>NVIDIA</strong> Researchers released three high-quality repositories that implemented <a href="https://github.com/NVIDIA/pix2pixHD">pix2pix-HD</a>, <a href="https://github.com/NVIDIA/sentiment-discovery">Sentiment Neuron</a> and <a href="https://github.com/NVIDIA/flownet2-pytorch">FlowNet2</a> papers. Their analysis of scalability of different <a href="https://github.com/NVIDIA/sentiment-discovery/blob/master/analysis/scale.md">Data Parallel models in PyTorch</a> was helpful to the community.</p>
<div class="text-center">
<img src="https://pytorch.org/assets/images/sentiment.png" width="40%" />
</div>
<p>The Allen Institute for AI released <a href="http://allennlp.org/">AllenNLP</a> which includes several state-of-the-art models in NLP — reference implementations and easy to use <a href="http://demo.allennlp.org/machine-comprehension">web demos</a> for standard NLP tasks.</p>
<div class="text-center">
<img src="https://pytorch.org/assets/images/allennlp.png" width="40%" />
</div>
<p>We also had our first Kaggle winning team grt123 in July. They won the DataScience Bowl 2017 on Lung Cancer detection and <a href="https://twitter.com/PyTorch/status/881573658166267904">subsequently released their PyTorch implementations</a>.</p>
<p>On the visualization front, Tzu-Wei Huang implemented a <a href="https://github.com/lanpa/tensorboard-pytorch">TensorBoard-PyTorch plugin</a> and Facebook AI Research released PyTorch compatibility for their <a href="https://github.com/facebookresearch/visdom">visdom</a> visualization package.</p>
<div class="text-center">
<img src="https://pytorch.org/assets/images/tensorboard_model.png" width="40%" />
<img src="https://pytorch.org/assets/images/visdom.png" width="40%" />
</div>
<p>Lastly, <strong>Facebook AI Research</strong> released several projects such as <a href="https://github.com/facebookresearch/">ParlAI, fairseq-py, VoiceLoop and FaderNetworks</a> that implemented cutting-edge models and interfaced datasets in multiple domains.</p>
<p>There are countless good projects that we haven’t highlighted for the lack of space, you can find a curated list <a href="https://github.com/soumith?tab=stars">here</a>.</p>
<p>We would also like to give a huge shout-out to folks who actively help others out on the Forums, especially <a href="https://discuss.pytorch.org/u/ptrblck/summary">ptrblck</a>, <a href="https://discuss.pytorch.org/u/jpeg729/summary">jpeg729</a>, <a href="https://discuss.pytorch.org/u/quantscientist/summary">QuantScientist</a>, <a href="https://discuss.pytorch.org/u/alband/summary">albanD</a>, <a href="https://discuss.pytorch.org/u/tom/summary">Thomas Viehmann</a> and <a href="https://discuss.pytorch.org/u/chenyuntc/summary">chenyuntc</a>. You are providing an invaluable service, thank you so much!</p>
<h2 id="metrics">Metrics</h2>
<p>In terms of sheer numbers,</p>
<ul>
<li>87,769 lines of Python code on github that <a href="https://github.com/search?l=Python&amp;q=import+torch&amp;type=Code">import torch</a></li>
<li><a href="https://github.com/search?q=pytorch&amp;type=Repositories">3,983 repositories on Github that mention PyTorch in their name or description</a></li>
<li>More than half a million downloads of PyTorch binaries. 651,916 to be precise.</li>
<li><strong>5,400 users</strong> wrote <strong>21,500 posts</strong> discussing 5,200 topics on our forums discuss.pytorch.org (http://discuss.pytorch.org/)</li>
<li>131 mentions of PyTorch on Reddit’s /r/machinelearning since the day of release. In the same period, TensorFlow was mentioned 255 times.</li>
</ul>
<h3 id="research-metrics">Research Metrics</h3>
<p>PyTorch is a research-focused framework. So one of the metrics of interest is to see the usage of PyTorch in machine learning research papers.</p>
<ul>
<li>
<p>In the recent ICLR2018 conference submissions, PyTorch was mentioned in <strong>87 papers</strong>, compared to TensorFlow at 228 papers, Keras at 42 papers, Theano and Matlab at 32 papers.</p>
</li>
<li>
<p><a href="https://twitter.com/fchollet/status/951828914103402497">Monthly arxiv.org mentions for frameworks</a> had PyTorch at 72 mentions, with TensorFlow at 273 mentions, Keras at 100 mentions, Caffe at 94 mentions and Theano at 53 mentions.</p>
</li>
</ul>
<h2 id="courses-tutorials-and-books">Courses, Tutorials and Books</h2>
<p>When we released PyTorch, we had good API documentation, but our tutorials were limited to a few ipython notebooks — helpful, but not good enough.</p>
<p><a href="https://github.com/chsasank">Sasank Chilamkurthy</a> took it upon himself to revamp the tutorials into the <a href="http://pytorch.org/tutorials/">beautiful website</a> that it is today.</p>
<div class="text-center">
<img src="https://pytorch.org/assets/images/blog_combined_tutorials.png" width="40%" />
</div>
<p><a href="https://github.com/spro/practical-pytorch">Sean Robertson</a> and <a href="https://github.com/jcjohnson/pytorch-examples">Justin Johnson</a> wrote great new tutorials — in NLP, and to learn by example. <a href="https://github.com/yunjey/pytorch-tutorial">Yunjey Choi</a> wrote a beautiful tutorial where most models were implemented in 30 lines or less.
Each new tutorial helped users find their way faster, with different approaches to learning.</p>
<p><a href="https://twitter.com/PyTorch/status/888500355943641088">Goku Mohandas and Delip Rao</a> switched the code content of their book-in-progress to use PyTorch.</p>
<p>We’ve seen quite a few university machine learning courses being taught with PyTorch as the primary tool, such as Harvard’s <a href="https://harvard-ml-courses.github.io/cs287-web/">CS287</a>. Taking it one step further and democratizing learning, we had three online courses pop up that teach using PyTorch.</p>
<ul>
<li><strong>Fast.ai’s</strong> “Deep Learning for Coders” is a popular online course. In September, Jeremy and Rachel <a href="http://www.fast.ai/2017/09/08/introducing-pytorch-for-fastai/">announced that the next fast.ai courses will be nearly entirely based on PyTorch</a>.</li>
<li>Ritchie Ng, a researcher with ties to NUS Singapore and Tsinghua released <a href="https://www.udemy.com/practical-deep-learning-with-pytorch/">a Udemy course</a> titled Practical Deep Learning with PyTorch.</li>
<li>Sung Kim from HKUST released an <a href="https://www.youtube.com/playlist?list=PLlMkM4tgfjnJ3I-dbhO9JTw7gNty6o_2m">online course on Youtube</a> that was aimed towards a general audience, titled: “PyTorch Zero to All”.</li>
</ul>
<h2 id="engineering">Engineering</h2>
<p>Over the last year we implemented multiple features, improved performance across the board and fixed lots of bugs. A full list of the work we’ve done is found in our <a href="https://github.com/pytorch/pytorch/releases">release notes</a>.
Here are highlights from our work over the last year:</p>
<h2 id="higher-order-gradients">Higher-order gradients</h2>
<p>With the release of several papers that implement penalties of gradients and with ongoing research in 2nd order gradient methods, this was an essential and sought-after feature. In August, we implemented a generalized interface that can take n-th order derivatives and increased the coverage of functions that support higher-order gradients over time, such that at the moment of writing almost all ops support this.</p>
<h2 id="distributed-pytorch">Distributed PyTorch</h2>
<p>In August, we released a small distributed package that followed the highly popular MPI-collective approach. The package has multiple backends such as TCP, MPI, Gloo and NCCL2 to support various types of CPU/GPU collective operations and use-cases, and integrates distributed technologies such as Infiniband and RoCE. Distributed is hard, and we had bugs in the initial iteration. Over subsequent releases, we made the package more stable and improved performance.</p>
<h2 id="closer-to-numpy">Closer to NumPy</h2>
<p>One of the biggest demands from users were NumPy features that they were familiar with. Features such as Broadcasting and Advanced Indexing are convenient and save users a lot of verbosity. We implemented these features and started to align our API to be closer to NumPy. Over time, we expect to get closer and closer to NumPy’s API where appropriate.</p>
<h2 id="sparse-tensors">Sparse Tensors</h2>
<p>In March, we released a small package supporting sparse Tensors and in May we released CUDA support for the sparse package. The package is small and limited in functionality, and is used for implementing Sparse Embeddings and commonly used sparse paradigms in deep learning. This package is still small in scope and there’s demand to expand it — if you are interested in working on expanding the sparse package, reach out to us on our <a href="https://discuss.pytorch.org/">Discussion Boards</a></p>
<h2 id="performance">Performance</h2>
<p>Performance is always an ongoing battle, especially for PyTorch which is a dynamic framework that wants to maximize flexibility. Over the last year, we’ve improved performance across board, from our core Tensor library to the neural network operators, writing faster micro-optimized across board.</p>
<ul>
<li>We’ve added specialized AVX and AVX2 intrinsics for Tensor operations</li>
<li>Wrote faster GPU kernels for frequent workloads like concatenation and Softmax (among many other things)</li>
<li>Rewrote the code for several neural network operators (too many to list), but notably nn.Embedding and group convolutions.</li>
</ul>
<p><strong>Reducing framework overhead by 10x across board</strong></p>
<p>Since PyTorch is a dynamic graph framework, we create a new graph on the fly at every iteration of a training loop. Hence, the framework overhead has to be low, or the workload has to be large enough that the framework overhead is hidden. In August, the authors of DyNet (Graham Neubig and team) showcased that it’s much faster than PyTorch on small NLP models. This was an interesting challenge, we didn’t realize that models of those sizes were being trained. In a multi-month (and ongoing) effort, we embarked upon a significant rewrite of PyTorch internals that reduced the framework overhead from more than 10 microseconds per operator execution to as little as 1 microsecond.</p>
<p><strong>ATen</strong></p>
<p>As we embarked upon a redesign of the PyTorch internals, we built the <a href="https://github.com/pytorch/pytorch/tree/master/aten">ATen C++11</a> library that now powers all of the PyTorch backend. ATen has an API that mirrors PyTorch’s Python API, which makes it a convenient C++ library for Tensor computation. ATen can be built and used independently of PyTorch.</p>
<h2 id="exporting-models-to-production--onnx-support-and-the-jit-compiler">Exporting models to production — ONNX Support and the JIT compiler</h2>
<p>One of the common requests we’ve received was to export PyTorch models to another framework. Users engaged in a rapid research cycle in PyTorch and when they were done, they wanted to ship it to larger projects with C++ only requirements.</p>
<p>With this in mind, we built a tracer for PyTorch — which can export PyTorch models into an intermediate representation.
The subsequent trace can be either used to run the current PyTorch model more efficiently (by running optimization passes on it), or be converted to the <a href="http://onnx.ai/">ONNX</a> format to be shipped to other frameworks such as Caffe2, MXNet, TensorFlow and others or directly to the hardware accelerated libraries like CoreML or TensorRT. Over the next year, you will hear more about the JIT compiler for performance improvements.</p>
<h2 id="users-being-funny-">Users being funny :)</h2>
<p>Our users express their support in funny ways, made us laugh, thanks for this :)</p>
<blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">I&#39;ve been using PyTorch a few months now and I&#39;ve never felt better. I have more energy. My skin is clearer. My eye sight has improved.</p>&mdash; Andrej Karpathy (@karpathy) <a href="https://twitter.com/karpathy/status/868178954032513024?ref_src=twsrc%5Etfw">May 26, 2017</a></blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
<blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">Talk to your doctor to find out if PyTorch is right for you.</p>&mdash; Sean Robertson (@sprobertson) <a href="https://twitter.com/sprobertson/status/868180795000750080?ref_src=twsrc%5Etfw">May 26, 2017</a></blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
<blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">PyTorch gave me so much life that my skin got cleared, my grades are up, my bills are paid and my crops are watered.</p>&mdash; Adam Will ð️ð (@adam_will_do_it) <a href="https://twitter.com/adam_will_do_it/status/868179679483764736?ref_src=twsrc%5Etfw">May 26, 2017</a></blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
<blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">So have I! But my hair is also shiner and I&#39;ve lost weight. <a href="https://twitter.com/PyTorch?ref_src=twsrc%5Etfw">@PyTorch</a> for the win. <a href="https://t.co/qgU4oIOB4K">https://t.co/qgU4oIOB4K</a></p>&mdash; Mariya (@thinkmariya) <a href="https://twitter.com/thinkmariya/status/868181991212044288?ref_src=twsrc%5Etfw">May 26, 2017</a></blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></content><author><name>The PyTorch Team</name></author><summary type="html">Today marks 1 year since PyTorch was released publicly. It’s been a wild ride — our quest to build a flexible deep learning research platform. Over the last year, we’ve seen an amazing community of people using, contributing to and evangelizing PyTorch — thank you for the love.</summary></entry><entry><title type="html">PyTorch Internals Part II - The Build System</title><link href="https://pytorch.org/blog/a-tour-of-pytorch-internals-2/" rel="alternate" type="text/html" title="PyTorch Internals Part II - The Build System" /><published>2017-06-27T10:00:00-07:00</published><updated>2017-06-27T10:00:00-07:00</updated><id>https://pytorch.org/blog/a-tour-of-pytorch-internals-2</id><content type="html" xml:base="https://pytorch.org/blog/a-tour-of-pytorch-internals-2/"><p>In the first <a href="/blog/a-tour-of-pytorch-internals-1/">post</a> I explained how we generate a <code class="highlighter-rouge">torch.Tensor</code> object that you can use in your Python interpreter. Next, I will explore the build system for PyTorch. The PyTorch codebase has a variety of components:</p>
<ul>
<li>The core Torch libraries: TH, THC, THNN, THCUNN</li>
<li>Vendor libraries: CuDNN, NCCL</li>
<li>Python Extension libraries</li>
<li>Additional third-party libraries: NumPy, MKL, LAPACK</li>
</ul>
<p>How does a simple invocation of <code class="highlighter-rouge">python setup.py install</code> do the work that allows you to call <code class="highlighter-rouge">import torch</code> and use the PyTorch library in your code?</p>
<p>The first part of this document will explain the build process from and end-user point of view. This will explain how we take the components above to build the library. The second part of the document will be important for PyTorch developers. It will document ways to improve your iteration speed by building only a subset of the code that you are working on.</p>
<h3 id="setuptools-and-pytorchs-setup--function">Setuptools and PyTorch’s setup( ) function</h3>
<p>Python uses <a href="https://setuptools.readthedocs.io/en/latest/index.html">Setuptools</a> to build the library. Setuptools is an extension to the original distutils system from the core Python library. The core component of Setuptools is the <code class="highlighter-rouge">setup.py</code> file which contains all the information needed to build the project. The most important function is the <code class="highlighter-rouge">setup()</code> function which serves as the main entry point. Let’s take a look at the one in PyTorch:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">setup</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s">"torch"</span><span class="p">,</span> <span class="n">version</span><span class="o">=</span><span class="n">version</span><span class="p">,</span>
<span class="n">description</span><span class="o">=</span><span class="s">"Tensors and Dynamic neural networks in Python with strong GPU acceleration"</span><span class="p">,</span>
<span class="n">ext_modules</span><span class="o">=</span><span class="n">extensions</span><span class="p">,</span>
<span class="n">cmdclass</span><span class="o">=</span><span class="p">{</span>
<span class="s">'build'</span><span class="p">:</span> <span class="n">build</span><span class="p">,</span>
<span class="s">'build_py'</span><span class="p">:</span> <span class="n">build_py</span><span class="p">,</span>
<span class="s">'build_ext'</span><span class="p">:</span> <span class="n">build_ext</span><span class="p">,</span>
<span class="s">'build_deps'</span><span class="p">:</span> <span class="n">build_deps</span><span class="p">,</span>
<span class="s">'build_module'</span><span class="p">:</span> <span class="n">build_module</span><span class="p">,</span>
<span class="s">'develop'</span><span class="p">:</span> <span class="n">develop</span><span class="p">,</span>
<span class="s">'install'</span><span class="p">:</span> <span class="n">install</span><span class="p">,</span>
<span class="s">'clean'</span><span class="p">:</span> <span class="n">clean</span><span class="p">,</span>
<span class="p">},</span>
<span class="n">packages</span><span class="o">=</span><span class="n">packages</span><span class="p">,</span>
<span class="n">package_data</span><span class="o">=</span><span class="p">{</span><span class="s">'torch'</span><span class="p">:</span> <span class="p">[</span>
<span class="s">'lib/*.so*'</span><span class="p">,</span> <span class="s">'lib/*.dylib*'</span><span class="p">,</span>
<span class="s">'lib/torch_shm_manager'</span><span class="p">,</span>
<span class="s">'lib/*.h'</span><span class="p">,</span>
<span class="s">'lib/include/TH/*.h'</span><span class="p">,</span> <span class="s">'lib/include/TH/generic/*.h'</span><span class="p">,</span>
<span class="s">'lib/include/THC/*.h'</span><span class="p">,</span> <span class="s">'lib/include/THC/generic/*.h'</span><span class="p">]},</span>
<span class="n">install_requires</span><span class="o">=</span><span class="p">[</span><span class="s">'pyyaml'</span><span class="p">],</span>
<span class="p">)</span>
</code></pre></div></div>
<p>The function is composed entirely of keyword arguments, which serve two purposes:</p>
<ul>
<li>Metadata (e.g. name, description, version)</li>
<li>The contents of the package</li>
</ul>