Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TEST] #6

Open
wants to merge 712 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
712 commits
Select commit Hold shift + click to select a range
190f06d
[pipelining] Lower _configure_data_parallel_mode to stage (#127946)
kwen2501 Jun 6, 2024
00c6ca4
[compiled autograd][cudagraphs] Inputs runtime wrapper to move cpu sc…
xmfan Jun 6, 2024
70724bd
Bugfix for nondeterminstic torch_key (#128111)
jansel Jun 6, 2024
01601eb
Retire torch.distributed.pipeline (#127354)
kwen2501 Jun 7, 2024
0c16800
[pipelining] include lifted constants in input_to_state (#128173)
pianpwk Jun 7, 2024
7efaeb1
[AOTI] docs: add suggestion to turn on freezing on CPU (#128010)
chunyuan-w Jun 7, 2024
5f81265
[Traceable FSDP2] Return early from _register_post_backward_hook when…
yf225 Jun 6, 2024
543a870
[pipelining] Rename ManualPipelineStage -> PipelineStage (#128157)
H-Huang Jun 7, 2024
3f9798a
add docstring to masked_fill, expand, select, unsqueeze, cat fns (#12…
zabboud Jun 7, 2024
771be55
Documenting `torch.onnx.operator.shape_as_tensor` (#128051)
GdoongMathew Jun 7, 2024
6e75024
Run TestAOTAutograd with dynamo (#128047)
jamesjwu Jun 6, 2024
224b433
Revert "Make ValueRange repr less chatty by default (#128043)"
pytorchmergebot Jun 7, 2024
3090667
[pipelining] pipeline() taking microbatch as example input (#128163)
kwen2501 Jun 7, 2024
a1b664a
Add default values to PyTorchMemEffAttention::AttentionKernel::Params…
cyyever Jun 7, 2024
23c156c
Revert "[inductor] simplify indexing (#127661)"
pytorchmergebot Jun 7, 2024
ac51f78
Revert "Complete revamp of float/promotion sympy handling (#126905)"
pytorchmergebot Jun 7, 2024
852b7b4
[inductor] Enable subprocess-based parallel compile as the default (#…
masnesral Jun 6, 2024
8d16a73
Manipulate triton_hash_with_backend so that it doesn't contain any ke…
masnesral Jun 6, 2024
c219fa5
[3/N] Remove unused functions (#128179)
cyyever Jun 7, 2024
1289526
Revert "Added memory budget to partitioner (#126320)"
pytorchmergebot Jun 7, 2024
fc6e3ff
[ROCm] Update triton pin to fix libtanh issue (#125396)
pragupta Jun 7, 2024
d9696ea
[AOTInductor] [Tooling] Update NaN and INF Checker for AOTInductor (#…
muchulee8 Jun 7, 2024
b9b89ed
[pipelining] fix LoopedBFS (#127796)
H-Huang Jun 7, 2024
6c824cd
[BE][c10d] fix use of TORCH_ERROR in TCPStore libuv backend (#127956)
XilunWu Jun 4, 2024
85758fa
[c10d][TCPStore] make TCPStore server use libuv by default (#127957)
XilunWu Jun 7, 2024
754e6d4
Make jobs with LF runners still pass lint (#128175)
ZainRizvi Jun 7, 2024
3aa623d
Fix assume_constant_result for UnspecializedNNModuleVariable methods …
BowenBao Jun 7, 2024
b741819
Fix 'get_attr' call in dynamo 'run_node' (#127696)
BowenBao Jun 7, 2024
19b31d8
Fix 'get_real_value' on placeholder nodes (#127698)
BowenBao Jun 7, 2024
662a78f
[dynamo] Inline the getattr of fx graph and proxy graph (#128172)
anijain2305 Jun 6, 2024
0c7f435
[inductor] simplify indexing (#127661)
shunting314 Jun 7, 2024
82d7a36
Added torchao nightly workflow (#128152)
xuzhao9 Jun 7, 2024
0a6df4f
delete inductor config.trace.compile_profile (#127143)
dshi7 Jun 7, 2024
8ca4cef
[C10D] Ensure gil is not released when calling toPyBytes (#128212)
wconstab Jun 7, 2024
cafbcb6
[BE]: Update ruff to 0.4.8 (#128214)
Skylion007 Jun 7, 2024
dcb63fc
[pipelining] Remove num_microbatches from stage (#128201)
H-Huang Jun 7, 2024
e647ea5
[pipelining] redirect README to document (#128205)
kwen2501 Jun 7, 2024
fdf1666
Change lerp decomp to use aten.as_strided_copy instead of prims.copy_…
angelayi Jun 7, 2024
8892dda
[TD] Test removal on sm86 (#127131)
clee2000 Jun 7, 2024
3a620a0
bug fix of dynamo_timed in cprofile (#128203)
dshi7 Jun 7, 2024
ba81c3c
[inductor] add cpp builder code. (take 2) (#125849)
xuhancn Jun 7, 2024
5b36241
update test_issue175 to handle inline_inbuilt_nn_modules (#128026)
laithsakka Jun 7, 2024
11f2d8e
Move inductor cuda 124 jobs to a separate workflow that is not trigge…
clee2000 Jun 7, 2024
09cccbc
[RFC] add per-collective timeout value in flight recorder (#128190)
c-p-i-o Jun 7, 2024
bef5861
[pipelining] pipelining.rst updates (#128228)
H-Huang Jun 7, 2024
39dd474
[inductor][dynamo-inline-nn-modules] Fix test with inlining flag (#12…
anijain2305 Jun 7, 2024
ef2b5ed
[4/N] Remove unused functions (#128193)
cyyever Jun 8, 2024
6478150
Inductor: Allow small sizes of m for mixed mm autotuning (#127663)
AlnisM Jun 8, 2024
5ef0810
[MPS] Include MPSGraphVenturaOps.h for complex types on macOS 12 (#12…
qqaatw Jun 7, 2024
ad96f99
[pipelining] Add pipe.build_stage() (#128240)
kwen2501 Jun 8, 2024
921aa19
[pipelining] Move modify_graph_op_device to _IR.py (#128241)
kwen2501 Jun 8, 2024
fe74bbd
init sigmoid comments (#127983)
Jun 8, 2024
f9508b4
[pipelining] Update Pipelining Docs (#128236)
wconstab Jun 7, 2024
0ef5229
Revert "Change lerp decomp to use aten.as_strided_copy instead of pri…
pytorchmergebot Jun 8, 2024
6220602
[torchbind] support query schema of methods (#128267)
ydwu4 Jun 7, 2024
6e5c2a1
[inductor] Add missing files to torch_key (#128230)
jansel Jun 7, 2024
1d84c7e
[DeviceMesh] Update get_group and add get_all_groups (#128097)
wz337 Jun 8, 2024
8a45cf4
[AOTI] align data_size of the constants (#127610)
chunyuan-w Jun 7, 2024
0e3fe69
[pipelining] Restore a stage constructor for tracer path (#128273)
kwen2501 Jun 8, 2024
2e42671
[pipelining] Rename to stage.py and schedules.py (#128278)
kwen2501 Jun 8, 2024
613c7d2
[pipelining] Format doc (#128279)
kwen2501 Jun 8, 2024
c446851
[fsdp2] update foreach_reduce accumulate_grad (#128117)
wanchaol Jun 7, 2024
ffc202a
Added remove_noop_ops to joint_graph_passes (#124451)
Chillee Jun 7, 2024
310f809
Added memory budget to partitioner (#126320)
Chillee Jun 7, 2024
cbb7e30
View specialization (#127641)
shazqadeer Jun 8, 2024
8a0bc8c
[fsdp2] simplify fsdp_param logic with DTensorSpec (#128242)
wanchaol Jun 8, 2024
94165db
Revert "[dynamo] Inline the getattr of fx graph and proxy graph (#128…
pytorchmergebot Jun 8, 2024
6e13c7e
Revert "[dynamo] Support if cond on UnspecializedNNModuleVariable and…
pytorchmergebot Jun 8, 2024
44371bd
Revert "[dynamo][nn-modules] Trace through nn.Module dunder methods f…
pytorchmergebot Jun 8, 2024
0e6c204
[pipelining] Friendly error message when not traceable (#128276)
kwen2501 Jun 8, 2024
73d6ec2
Increase verbosity of FX graph dumps (#128042)
ezyang Jun 8, 2024
695502c
[3/N] Change static functions in headers to inline (#128194)
cyyever Jun 8, 2024
917387f
[AOTI] fix a constant tensor device move issue (#128265)
desertfire Jun 8, 2024
348b181
Deprecate `torch._utils.is_compiling()` and `torch._dynamo.external_u…
XuehaiPan Jun 8, 2024
57a24c4
Revert "[RFC] add per-collective timeout value in flight recorder (#1…
pytorchmergebot Jun 8, 2024
02a901f
Revert "[RFC] Provide optional switches to _dump_nccl_trace (#127651)"
pytorchmergebot Jun 8, 2024
2369c71
[DSD][BE] Cleanup unused variables and rename variables to avoid expo…
fegin Jun 7, 2024
dcfa770
Flip default value for mypy disallow_untyped_defs [1/11] (#127838)
aorenste Jun 8, 2024
ea614fb
Flip default value for mypy disallow_untyped_defs [2/11] (#127839)
aorenste Jun 8, 2024
afe15d2
Flip default value for mypy disallow_untyped_defs [3/11] (#127840)
aorenste Jun 8, 2024
62bcdc0
Flip default value for mypy disallow_untyped_defs [4/11] (#127841)
aorenste Jun 8, 2024
3a0d088
Flip default value for mypy disallow_untyped_defs [5/11] (#127842)
aorenste Jun 8, 2024
7c12cc7
Flip default value for mypy disallow_untyped_defs [6/11] (#127843)
aorenste Jun 8, 2024
038b927
Flip default value for mypy disallow_untyped_defs [7/11] (#127844)
aorenste Jun 8, 2024
27f9d3b
Flip default value for mypy disallow_untyped_defs [8/11] (#127845)
aorenste Jun 8, 2024
8db9dfa
Flip default value for mypy disallow_untyped_defs [9/11] (#127846)
aorenste Jun 8, 2024
5753628
Flip default value for mypy disallow_untyped_defs [10/11] (#127847)
aorenste Jun 8, 2024
33972df
[easy][inline-inbuilt-nn-modules] Fix expected graph for control flow…
anijain2305 Jun 8, 2024
3494f3f
[dynamo] Skip inlining builtin nn modules for torch.compile inside co…
anijain2305 Jun 8, 2024
0dd55ee
Fix bug in _update_process_group API (#128262)
pritamdamania87 Jun 8, 2024
aee154e
[Traceable FSDP2] Make FSDPParam._unsharded_param creation traceable …
yf225 Jun 8, 2024
6e7a234
[easy] Run autograd if any mutations on inputs that require grad (#12…
jamesjwu Jun 8, 2024
d34075e
Add Efficient Attention support on ROCM (#124885)
xinyazhang Jun 8, 2024
2c2cf1d
[dtensor][experiment] experimenting with displaying model parameters …
sinhaanshul Jun 7, 2024
f681e36
[dtensor][experiment] experimenting with displaying distributed model…
sinhaanshul Jun 7, 2024
7bfd1db
[4/N] Change static functions in headers to inline (#128286)
cyyever Jun 9, 2024
31c3fa6
[audio hash update] update the pinned audio hash (#128178)
pytorchupdatebot Jun 9, 2024
3964a3e
Complete revamp of float/promotion sympy handling (#126905)
ezyang Jun 9, 2024
4c97193
[cuDNN][SDPA] Remove `TORCH_CUDNN_SDPA_ENABLED=1`, enable cuDNN SDPA …
eqy Jun 9, 2024
75b0720
Revert "Use hidden visibility in OBJECTCXX files (#127265)"
pytorchmergebot Jun 9, 2024
0bf2fe5
[RFC] Provide optional switches to _dump_nccl_trace (#127651)
c-p-i-o Jun 6, 2024
c7e2c9c
[c10d][doc] add a doc page for NCCL ENVs (#128235)
shuqiangzhang Jun 8, 2024
5e7377e
[Dynamo][TVM] Make the `opt_level` parameter adjustable (#127876)
mshr-h Jun 9, 2024
55b2a0a
[AOTAutograd] Use _set_grad_enabled instead of no_grad (#128183)
peterbell10 Jun 7, 2024
253fa9c
[AOTAutograd] Remove runtime import from view replay function (#128184)
peterbell10 Jun 7, 2024
cd2ad29
[inductor] Reduce binding overhead of _reinterpret_tensor (#128185)
peterbell10 Jun 7, 2024
d3817d8
Don't create python tuple when _maybe_handle_torch_function is called…
peterbell10 Jun 9, 2024
26f6a87
[5/N] Remove unused functions (#127185)
cyyever Jun 10, 2024
df43d58
fix miss isa bool check (#128274)
xuhancn Jun 10, 2024
b66e3f0
Set simdlen based on ATEN_CPU_CAPABILITY (#123514)
CaoE Jun 6, 2024
04da6ae
Add OpInfo entry for alias_copy (#127232) (#128142)
rec Jun 9, 2024
c993f1b
Fix edge cases for gather in inductor (#126893)
isuruf Jun 5, 2024
3b73f5d
Revert "Add OpInfo entry for alias_copy (#127232) (#128142)"
pytorchmergebot Jun 10, 2024
d22287d
Revert "Fix 'get_real_value' on placeholder nodes (#127698)"
pytorchmergebot Jun 10, 2024
ca561d6
Revert "Fix 'get_attr' call in dynamo 'run_node' (#127696)"
pytorchmergebot Jun 10, 2024
7b9c5e0
Turn on GraphTransformObserver for inductor (#127962)
shengfukevin Jun 10, 2024
8e482e9
Add some guard to size oblivious has_internal_overlap (#128328)
ezyang Jun 10, 2024
ab3a0b1
[RFC] add per-collective timeout value in flight recorder (#128190)
c-p-i-o Jun 7, 2024
4694830
[c10d] integrate PMI NCCL initialization to NCCL-PG (#128243)
shengbao-zheng Jun 10, 2024
08d038f
[PT2] Fix a typo and lint problem (#128258)
mengluy0125 Jun 10, 2024
8394148
Add docstring for the torch.distributed.elastic.utils.distributed.get…
afrittoli Jun 10, 2024
136bdb9
Update Kineto submodule with fix to test_basic_chrome_trace (#128333)
aaronenyeshi Jun 10, 2024
fa8ec8e
[dynamo] handle hashable exceptions in trace_rules lookup (#128078)
masnesral Jun 6, 2024
093a4ff
[export] FIx unflattener for preserving modules containing unused inp…
angelayi Jun 10, 2024
db2fa7b
Revert "[export] FIx unflattener for preserving modules containing un…
pytorchmergebot Jun 10, 2024
9cab598
Introduce int_oo (#127693)
ezyang Jun 9, 2024
5564655
[EZ] Fix typos in SECURITY.md (#128340)
malfet Jun 10, 2024
946f554
Flip default value for mypy disallow_untyped_defs [10+1/11] (#128293)
aorenste Jun 8, 2024
38e0a04
[AMD] Default to hipblaslt in gemm (#127944)
xw285cornell Jun 10, 2024
90bb510
Revert "Deprecate `torch._utils.is_compiling()` and `torch._dynamo.ex…
pytorchmergebot Jun 10, 2024
4460e48
Disable jacrev/jacfwd/hessian if compiling with dynamo (#128255)
guilhermeleobas Jun 10, 2024
b459713
[aota] compiled forward outputs requires_grad alignment with eager (#…
IvanKobzarev Jun 10, 2024
3a2d075
enable test_ParameterList with dynamo if nn module inlining enabled o…
laithsakka Jun 9, 2024
6630dcd
Add docstring for the torch.serialization.default_restore_location fu…
afrittoli Jun 10, 2024
58083ff
Improve unbacked reasoning involving has internal overlap (#128332)
ezyang Jun 10, 2024
a2d4fea
[easy] Move state_dict hooks tests to test_module_hooks and decorate …
mikaylagawarecki Jun 10, 2024
c38b338
Make nn.Module state_dict load_state_dict pre-hook and state_dict pos…
mikaylagawarecki Jun 10, 2024
583a56d
DOC: add docstring to construct_and_record_rdzv_event() (#128189)
loganthomas Jun 10, 2024
2176ef7
[compiled autograd] support .backward(inputs=) (#128252)
xmfan Jun 7, 2024
4bbadee
Revert "Set simdlen based on ATEN_CPU_CAPABILITY (#123514)"
pytorchmergebot Jun 10, 2024
a287ff7
Use init_torchbind_implementations in inductor torchbind tests. (#128…
ydwu4 Jun 10, 2024
05711ee
[dynamo][inlining inbuilt modules] Ensure BC for nn_module_stack (#12…
anijain2305 Jun 10, 2024
b2d6023
[RELAND][dynamo][nn-modules] Trace through nn.Module dunder methods f…
anijain2305 Jun 10, 2024
739aa22
[Fix] Parameter un/lifting issues in the TorchScript to ExportedProgr…
jiashenC Jun 10, 2024
2126ae1
Remove caffe2/perfkernels files (#128186)
cyyever Jun 10, 2024
3087595
[1/N] Remove inclusion of c10/util/string_utils.h (#128300)
cyyever Jun 10, 2024
f843ccb
[MTIA] Add set_device support (#128040)
egienvalue Jun 10, 2024
99f5a85
[Clang Tidy] Fix misc-header-include-cycle errors in clang-tidy and i…
cyyever Jun 10, 2024
734e8f6
[inductor] enable fx graph cache on torchbench (#128239)
masnesral Jun 7, 2024
d43745b
Merge branch 'main' into support_set_module_name
yiliu30 Jun 11, 2024
3b555ba
Add docstring for torch.utils.data.datapipes.decoder.basicandlers (#1…
arunppsg Jun 11, 2024
841d871
Make sure #126704 is BC for torch.save-ed `nn.Module` (#128344)
mikaylagawarecki Jun 10, 2024
d1d9bc7
init add comment (#128083)
Jun 11, 2024
793df7b
Prevent expansion of cat indexing to avoid int64 intermediate (#127815)
eellison Jun 10, 2024
e4bd0ad
[6/N] Remove unused functions (#128309)
cyyever Jun 11, 2024
4077cdd
[pipelining][doc] Update arg list of pipeline API (#128361)
kwen2501 Jun 10, 2024
665e568
[inductor][inlining nn module] Skip batchnorm version check test for …
anijain2305 Jun 10, 2024
ca45649
[easy][dynamo][inline work] Fix test with inlining inbuilt nn modules…
anijain2305 Jun 10, 2024
7afffdf
[CI] Comment hf_T5_generate, hf_GPT2 and timm_efficientnet in inducto…
zxd1997066 Jun 11, 2024
16e67be
Also preserve unbacked SymInts when partitioning as backward inputs (…
ezyang Jun 10, 2024
cba195c
Support aten operations with out tensor (#124926)
EikanWang Jun 7, 2024
fe39c07
[pipelining][doc] Remove duplicated words (#128368)
kwen2501 Jun 10, 2024
fa88f39
Revert "[inductor] enable fx graph cache on torchbench (#128239)"
pytorchmergebot Jun 11, 2024
5b5d269
Speed up fx graph iteration by implementing it in C++ (#128288)
oulgen Jun 10, 2024
24e7f29
Lowering for avg_pool_3d_backward (Fixes:#127101) (#127722)
Lourencom Jun 11, 2024
a32157c
Mark params static if inlining modules and freezing (#128355)
mlazos Jun 11, 2024
402b289
Properly register parameter for binary folding test (#128356)
mlazos Jun 11, 2024
f2d7f23
[dynamo][yolov3] Track UnspecializedNNModuleVariable for mutation (#1…
anijain2305 Jun 10, 2024
a206dcc
fb_memcache: Move to fbcode from thirdparty (#128174)
c00w Jun 11, 2024
207c224
[inductor] Fix lowering full with SymBool value (#128213)
peterbell10 Jun 10, 2024
648625b
Make TraceUtils.h to be device-agnostic (#126969)
FFFrog Jun 11, 2024
fc77fdc
[guard_size_oblivious] Add gso ExpandUtils:_sym_to (#128224)
IvanKobzarev Jun 10, 2024
55901fb
[fx] Preserve Fx graph node order in partitioner across runs (#115621)
kareemshaik80 Jun 11, 2024
9a38cae
[AOTI] Switch to use shim v2 (#127674)
hl475 Jun 11, 2024
053930e
[MPS][BE] Remove code duplication (#128373)
malfet Jun 11, 2024
c13e03c
Flip default value for mypy disallow_untyped_defs [10+2/11] (#128374)
aorenste Jun 11, 2024
f8c4599
[MPS] Make erfinv compilable for bfloat16 (#128375)
malfet Jun 11, 2024
2908105
[Static Runtime] Fix & run gen_static_runtime_ops (#128299)
davidberard98 Jun 10, 2024
a838e90
Add Intel Gaudi device/HPU to auto load in instantiate_device_type_t…
ankurneog Jun 11, 2024
4345d98
[dynamo] Fix for #127696 (#128358)
angelayi Jun 11, 2024
491c4a5
Revert "Make sure #126704 is BC for torch.save-ed `nn.Module` (#128344)"
pytorchmergebot Jun 11, 2024
1d233b8
Revert "Make nn.Module state_dict load_state_dict pre-hook and state_…
pytorchmergebot Jun 11, 2024
8a09940
[inductor] fix compile time regression by caching get_gpu_type (#128363)
wanchaol Jun 10, 2024
cac7a22
[cuDNN][Quantization] Don't print when plan finalization fails in cuD…
eqy Jun 11, 2024
205410c
add xpu to torch.tensors (#127280)
jingxu10 Jun 11, 2024
984b1a8
Fix 'get_attr' call in dynamo 'run_node' (#127696)
BowenBao Jun 7, 2024
61f922c
Fix 'get_real_value' on placeholder nodes (#127698)
BowenBao Jun 7, 2024
3e09123
Enable UFMT on test_nestedtensor.py (#128359)
YuqingJ Jun 11, 2024
45dccfd
[cuDNN][SDPA] Support different key, value dimension in cuDNN SDPA (#…
eqy Jun 11, 2024
adb6991
Revert "[RELAND][dynamo][nn-modules] Trace through nn.Module dunder m…
pytorchmergebot Jun 11, 2024
70a1e85
[Traceable FSDP2] Use custom ops for AllGather copy-in / copy-out and…
yf225 Jun 11, 2024
8c1247c
[BE] Fixed CPU autocast warning (#127774)
awgu Jun 4, 2024
a55d0d9
Fix side effect pruning (#128028)
zou3519 Jun 7, 2024
5fcb5f0
init reshape_from_tensor_shape comment (#128171)
Jun 11, 2024
1dd2431
[Test] Add test for only_active flag (#128191)
c-p-i-o Jun 11, 2024
eb567b1
Pass params to dump_nccl_trace_pickle (#128307)
c-p-i-o Jun 11, 2024
b79d056
[export] FIx unflattener for preserving modules containing unused inp…
angelayi Jun 11, 2024
4471731
Add docstring for the torch.fx.operator_schemas.create_type_hint func…
afrittoli Jun 11, 2024
94fea82
init sub comment (#128082)
Jun 11, 2024
c9c1fed
Revert "Flip default value for mypy disallow_untyped_defs [10+2/11] (…
pytorchmergebot Jun 11, 2024
5d8c7f3
Revert "Introduce int_oo (#127693)"
pytorchmergebot Jun 11, 2024
786c24a
[inductor] Always realize sigmoid for CPU (#128339)
desertfire Jun 11, 2024
6af4c6a
Migrate test to internal base class, fixes (#128367)
kurman Jun 12, 2024
fb013ec
Remove unused private List::ptr_to_first_element (#128405)
cyyever Jun 12, 2024
219da29
[7/N] Remove unused functions (#128407)
cyyever Jun 12, 2024
9538bf4
[2/N] Remove inclusion of c10/util/string_utils.h (#128372)
cyyever Jun 12, 2024
bb2a995
Back out "[Dynamo] Treat integers stored on nn.Modules as dynamic (#1…
doctorweichen Jun 12, 2024
3d55d84
[Fix] Check tensor dtype before using torch.allclose in _trace log (#…
jiashenC Jun 12, 2024
7f6daf2
[inductor] parallel compile: set LD_LIBRARY_PATH for sub-processes in…
masnesral Jun 12, 2024
85eeb90
[dynamo] Fix graph breaks related to HF ModelOutput (#127780)
williamwen42 Jun 11, 2024
3ddec71
Revert "[cuDNN][Quantization] Don't print when plan finalization fail…
pytorchmergebot Jun 12, 2024
7c20583
Improve convert fp32 to fp16 fx pass (#127829)
trieuat Jun 12, 2024
86b5df3
Documenting the torch.fx.annotate.annotate function (#128337)
kiszk Jun 12, 2024
8cf302d
[5/N] Change static functions in headers to inline (#128406)
cyyever Jun 12, 2024
02e7519
DOC: strip inaccurate either float32 or float64 statement from set_de…
loganthomas Jun 12, 2024
c0b87af
[RELAND2][dynamo][nn-modules] Trace through nn.Module dunder methods …
anijain2305 Jun 11, 2024
77a0ca6
Add threadfence to 2-stage reduction for correct writes visibility (#…
ngimel Jun 12, 2024
089f9a1
[tp] refactor and fix PrepareModuleInput for DTensor inputs (#128431)
wanchaol Jun 11, 2024
6231125
Add 1 test case for Convtranspose1D in op microbenchmark (#127216)
DiweiSun Jun 12, 2024
dcc0093
[BE][Easy] export explicitly imported public submodules (#127703)
XuehaiPan Jun 10, 2024
a421699
Revert "[tp] refactor and fix PrepareModuleInput for DTensor inputs (…
pytorchmergebot Jun 12, 2024
8b3daf1
Add FloatTrueDiv and ToFloat to SYMPY_INTERP (#128418)
masnesral Jun 11, 2024
0b331fd
[CUDA] Abate `SoftMax.cu` compiler warning spam (#128468)
eqy Jun 12, 2024
04037f3
[BE] sort imports in `torch/__init__.py` (#127708)
XuehaiPan Jun 10, 2024
1602c7d
[dynamo] Enable some inlining inbuilt nn module tests (#128440)
anijain2305 Jun 11, 2024
ebb00a9
[dynamo] Skip freezing expect failure for inlining inbuilt nn modules…
anijain2305 Jun 12, 2024
1edcb31
[RELAND][inductor][cpp] bf16/fp16 gemm template computed with fp32 (#…
jgong5 Jun 12, 2024
2386045
Add OpInfo entry for alias_copy (#127232) (#128142)
rec Jun 11, 2024
26433b8
[BE][Easy] sort `__all__` in `torch/__init__.py` (#127709)
XuehaiPan Jun 10, 2024
46a35a1
[BE] enable UFMT for `torch/__init__.py` (#127710)
XuehaiPan Jun 10, 2024
2e065f2
[Quant][Inductor] Bug fix: mutation nodes not handled correctly for Q…
Xia-Weiwen Jun 12, 2024
abc3eec
First version of AOTAutogradCache (#126791)
jamesjwu Jun 11, 2024
71f4915
Revert "First version of AOTAutogradCache (#126791)"
pytorchmergebot Jun 12, 2024
5ef70fa
Revert "Make torch_geometric models compatible with export (#123403)"…
chunyuan-w Jun 12, 2024
15ab636
Revert "Fix side effect pruning (#128028)"
pytorchmergebot Jun 12, 2024
3c971d2
Flip default value for mypy disallow_untyped_defs [final] (#127836)
aorenste Jun 12, 2024
b19c231
[ROCm] TunableOp for gemm_and_bias (#128143)
jeffdaily Jun 12, 2024
8df56af
Add support in Python API for the recommended max working set size. (…
kulinseth Jun 12, 2024
f2dcbe8
Revert "Prevent expansion of cat indexing to avoid int64 intermediate…
pytorchmergebot Jun 12, 2024
9e39c62
correct avx512_vnni isa name. (#128318)
xuhancn Jun 12, 2024
c5172b8
Revert "[AOTI] Switch to use shim v2 (#127674)"
pytorchmergebot Jun 12, 2024
81e4e12
Revert "Support aten operations with out tensor (#124926)"
pytorchmergebot Jun 12, 2024
f89574f
Revert "Pass params to dump_nccl_trace_pickle (#128307)"
pytorchmergebot Jun 12, 2024
5001f41
Revert "Make TraceUtils.h to be device-agnostic (#126969)"
pytorchmergebot Jun 12, 2024
48ef7e7
Merge branch 'main' into support_set_module_name
yiliu30 Jun 12, 2024
3d8dab9
correct the UT's doc
yiliu30 Jun 13, 2024
73b2bc8
fix lint
yiliu30 Jun 13, 2024
688c0b0
fix docstring
yiliu30 Jun 13, 2024
621ff11
update the cur mode
yiliu30 Jun 13, 2024
5378b56
fix litn
yiliu30 Jun 14, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
5 changes: 5 additions & 0 deletions .ci/docker/aotriton_version.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
0.6b
manylinux_2_17
rocm6
04b5df8c8123f90cba3ede7e971e6fbc6040d506
3db6ecbc915893ff967abd6e1b43bd5f54949868873be60dc802086c3863e648
50 changes: 25 additions & 25 deletions .ci/docker/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -91,9 +91,9 @@ _UCC_COMMIT=20eae37090a4ce1b32bcce6144ccad0b49943e0b
# configuration, so we hardcode everything here rather than do it
# from scratch
case "$image" in
pytorch-linux-focal-cuda12.4-cudnn8-py3-gcc9)
pytorch-linux-focal-cuda12.4-cudnn9-py3-gcc9)
CUDA_VERSION=12.4.0
CUDNN_VERSION=8
CUDNN_VERSION=9
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=9
PROTOBUF=yes
Expand All @@ -105,9 +105,9 @@ case "$image" in
CONDA_CMAKE=yes
TRITON=yes
;;
pytorch-linux-focal-cuda12.1-cudnn8-py3-gcc9)
pytorch-linux-focal-cuda12.1-cudnn9-py3-gcc9)
CUDA_VERSION=12.1.1
CUDNN_VERSION=8
CUDNN_VERSION=9
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=9
PROTOBUF=yes
Expand All @@ -119,9 +119,9 @@ case "$image" in
CONDA_CMAKE=yes
TRITON=yes
;;
pytorch-linux-focal-cuda12.4-cudnn8-py3-gcc9-inductor-benchmarks)
pytorch-linux-focal-cuda12.4-cudnn9-py3-gcc9-inductor-benchmarks)
CUDA_VERSION=12.4.0
CUDNN_VERSION=8
CUDNN_VERSION=9
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=9
PROTOBUF=yes
Expand All @@ -134,9 +134,9 @@ case "$image" in
TRITON=yes
INDUCTOR_BENCHMARKS=yes
;;
pytorch-linux-focal-cuda12.1-cudnn8-py3-gcc9-inductor-benchmarks)
pytorch-linux-focal-cuda12.1-cudnn9-py3-gcc9-inductor-benchmarks)
CUDA_VERSION=12.1.1
CUDNN_VERSION=8
CUDNN_VERSION=9
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=9
PROTOBUF=yes
Expand All @@ -149,9 +149,9 @@ case "$image" in
TRITON=yes
INDUCTOR_BENCHMARKS=yes
;;
pytorch-linux-focal-cuda12.1-cudnn8-py3.12-gcc9-inductor-benchmarks)
pytorch-linux-focal-cuda12.1-cudnn9-py3.12-gcc9-inductor-benchmarks)
CUDA_VERSION=12.1.1
CUDNN_VERSION=8
CUDNN_VERSION=9
ANACONDA_PYTHON_VERSION=3.12
GCC_VERSION=9
PROTOBUF=yes
Expand All @@ -164,9 +164,9 @@ case "$image" in
TRITON=yes
INDUCTOR_BENCHMARKS=yes
;;
pytorch-linux-focal-cuda12.4-cudnn8-py3.12-gcc9-inductor-benchmarks)
pytorch-linux-focal-cuda12.4-cudnn9-py3.12-gcc9-inductor-benchmarks)
CUDA_VERSION=12.4.0
CUDNN_VERSION=8
CUDNN_VERSION=9
ANACONDA_PYTHON_VERSION=3.12
GCC_VERSION=9
PROTOBUF=yes
Expand All @@ -179,9 +179,9 @@ case "$image" in
TRITON=yes
INDUCTOR_BENCHMARKS=yes
;;
pytorch-linux-focal-cuda11.8-cudnn8-py3-gcc9)
pytorch-linux-focal-cuda11.8-cudnn9-py3-gcc9)
CUDA_VERSION=11.8.0
CUDNN_VERSION=8
CUDNN_VERSION=9
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=9
PROTOBUF=yes
Expand All @@ -193,9 +193,9 @@ case "$image" in
CONDA_CMAKE=yes
TRITON=yes
;;
pytorch-linux-focal-cuda12.4-cudnn8-py3-gcc9)
pytorch-linux-focal-cuda12.4-cudnn9-py3-gcc9)
CUDA_VERSION=12.4.0
CUDNN_VERSION=8
CUDNN_VERSION=9
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=9
PROTOBUF=yes
Expand All @@ -207,9 +207,9 @@ case "$image" in
CONDA_CMAKE=yes
TRITON=yes
;;
pytorch-linux-focal-cuda12.1-cudnn8-py3-gcc9)
pytorch-linux-focal-cuda12.1-cudnn9-py3-gcc9)
CUDA_VERSION=12.1.1
CUDNN_VERSION=8
CUDNN_VERSION=9
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=9
PROTOBUF=yes
Expand All @@ -221,9 +221,9 @@ case "$image" in
CONDA_CMAKE=yes
TRITON=yes
;;
pytorch-linux-focal-cuda12.4-cudnn8-py3-gcc9)
pytorch-linux-focal-cuda12.4-cudnn9-py3-gcc9)
CUDA_VERSION=12.4.0
CUDNN_VERSION=8
CUDNN_VERSION=9
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=9
PROTOBUF=yes
Expand Down Expand Up @@ -330,10 +330,10 @@ case "$image" in
DOCS=yes
INDUCTOR_BENCHMARKS=yes
;;
pytorch-linux-jammy-cuda11.8-cudnn8-py3.8-clang12)
pytorch-linux-jammy-cuda11.8-cudnn9-py3.8-clang12)
ANACONDA_PYTHON_VERSION=3.8
CUDA_VERSION=11.8
CUDNN_VERSION=8
CUDNN_VERSION=9
CLANG_VERSION=12
PROTOBUF=yes
DB=yes
Expand Down Expand Up @@ -380,7 +380,7 @@ case "$image" in
ANACONDA_PYTHON_VERSION=3.9
CONDA_CMAKE=yes
;;
pytorch-linux-jammy-cuda11.8-cudnn8-py3.9-linter)
pytorch-linux-jammy-cuda11.8-cudnn9-py3.9-linter)
ANACONDA_PYTHON_VERSION=3.9
CUDA_VERSION=11.8
CONDA_CMAKE=yes
Expand Down Expand Up @@ -447,7 +447,7 @@ tmp_tag=$(basename "$(mktemp -u)" | tr '[:upper:]' '[:lower:]')
#when using cudnn version 8 install it separately from cuda
if [[ "$image" == *cuda* && ${OS} == "ubuntu" ]]; then
IMAGE_NAME="nvidia/cuda:${CUDA_VERSION}-cudnn${CUDNN_VERSION}-devel-ubuntu${UBUNTU_VERSION}"
if [[ ${CUDNN_VERSION} == 8 ]]; then
if [[ ${CUDNN_VERSION} == 9 ]]; then
IMAGE_NAME="nvidia/cuda:${CUDA_VERSION}-devel-ubuntu${UBUNTU_VERSION}"
fi
fi
Expand Down Expand Up @@ -499,7 +499,7 @@ docker build \
"$@" \
.

# NVIDIA dockers for RC releases use tag names like `11.0-cudnn8-devel-ubuntu18.04-rc`,
# NVIDIA dockers for RC releases use tag names like `11.0-cudnn9-devel-ubuntu18.04-rc`,
# for this case we will set UBUNTU_VERSION to `18.04-rc` so that the Dockerfile could
# find the correct image. As a result, here we have to replace the
# "$UBUNTU_VERSION" == "18.04-rc"
Expand Down
7 changes: 7 additions & 0 deletions .ci/docker/centos-rocm/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,13 @@ COPY triton_version.txt triton_version.txt
RUN if [ -n "${TRITON}" ]; then bash ./install_triton.sh; fi
RUN rm install_triton.sh common_utils.sh triton-rocm.txt triton_version.txt

# Install AOTriton (Early fail)
COPY ./aotriton_version.txt aotriton_version.txt
COPY ./common/common_utils.sh common_utils.sh
COPY ./common/install_aotriton.sh install_aotriton.sh
RUN ["/bin/bash", "-c", "./install_aotriton.sh /opt/rocm && rm -rf install_aotriton.sh aotriton_version.txt common_utils.sh"]
ENV AOTRITON_INSTALLED_PREFIX /opt/rocm/aotriton

# Install ccache/sccache (do this last, so we get priority in PATH)
COPY ./common/install_cache.sh install_cache.sh
ENV PATH /opt/cache/bin:$PATH
Expand Down
2 changes: 1 addition & 1 deletion .ci/docker/ci_commit_pins/triton-rocm.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
bbe6246e37d8aa791c67daaf9d9d61b26c9ccfdc
01cbe5045a6898c9a925f01435c8277b2fe6afcc
23 changes: 23 additions & 0 deletions .ci/docker/common/install_aotriton.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
#!/bin/bash

set -ex

source "$(dirname "${BASH_SOURCE[0]}")/common_utils.sh"

TARBALL='aotriton.tar.bz2'
# This read command alwasy returns with exit code 1
read -d "\n" VER MANYLINUX ROCMBASE PINNED_COMMIT SHA256 < aotriton_version.txt || true
ARCH=$(uname -m)
AOTRITON_INSTALL_PREFIX="$1"
AOTRITON_URL="https://github.com/ROCm/aotriton/releases/download/${VER}/aotriton-${VER}-${MANYLINUX}_${ARCH}-${ROCMBASE}.tar.bz2"

cd "${AOTRITON_INSTALL_PREFIX}"
# Must use -L to follow redirects
curl -L --retry 3 -o "${TARBALL}" "${AOTRITON_URL}"
ACTUAL_SHA256=$(sha256sum "${TARBALL}" | cut -d " " -f 1)
if [ "${SHA256}" != "${ACTUAL_SHA256}" ]; then
echo -n "Error: The SHA256 of downloaded tarball is ${ACTUAL_SHA256},"
echo " which does not match the expected value ${SHA256}."
exit
fi
tar xf "${TARBALL}" && rm -rf "${TARBALL}"
2 changes: 1 addition & 1 deletion .ci/docker/common/install_base.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
set -ex

install_ubuntu() {
# NVIDIA dockers for RC releases use tag names like `11.0-cudnn8-devel-ubuntu18.04-rc`,
# NVIDIA dockers for RC releases use tag names like `11.0-cudnn9-devel-ubuntu18.04-rc`,
# for this case we will set UBUNTU_VERSION to `18.04-rc` so that the Dockerfile could
# find the correct image. As a result, here we have to check for
# "$UBUNTU_VERSION" == "18.04"*
Expand Down
17 changes: 6 additions & 11 deletions .ci/docker/common/install_cudnn.sh
Original file line number Diff line number Diff line change
@@ -1,23 +1,18 @@
#!/bin/bash

if [[ ${CUDNN_VERSION} == 8 ]]; then
if [[ -n "${CUDNN_VERSION}" ]]; then
# cuDNN license: https://developer.nvidia.com/cudnn/license_agreement
mkdir tmp_cudnn
pushd tmp_cudnn
if [[ ${CUDA_VERSION:0:4} == "12.4" ]]; then
CUDNN_NAME="cudnn-linux-x86_64-8.9.7.29_cuda12-archive"
curl --retry 3 -OLs https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-x86_64/${CUDNN_NAME}.tar.xz
elif [[ ${CUDA_VERSION:0:4} == "12.1" ]]; then
CUDNN_NAME="cudnn-linux-x86_64-8.9.2.26_cuda12-archive"
curl --retry 3 -OLs https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-x86_64/${CUDNN_NAME}.tar.xz
elif [[ ${CUDA_VERSION:0:4} == "11.8" ]]; then
CUDNN_NAME="cudnn-linux-x86_64-8.7.0.84_cuda11-archive"
curl --retry 3 -OLs https://developer.download.nvidia.com/compute/redist/cudnn/v8.7.0/local_installers/11.8/${CUDNN_NAME}.tar.xz
if [[ ${CUDA_VERSION:0:2} == "12" ]]; then
CUDNN_NAME="cudnn-linux-x86_64-9.1.0.70_cuda12-archive"
elif [[ ${CUDA_VERSION:0:2} == "11" ]]; then
CUDNN_NAME="cudnn-linux-x86_64-9.1.0.70_cuda11-archive"
else
print "Unsupported CUDA version ${CUDA_VERSION}"
exit 1
fi

curl --retry 3 -OLs https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-x86_64/${CUDNN_NAME}.tar.xz
tar xf ${CUDNN_NAME}.tar.xz
cp -a ${CUDNN_NAME}/include/* /usr/local/cuda/include/
cp -a ${CUDNN_NAME}/lib/* /usr/local/cuda/lib64/
Expand Down
6 changes: 3 additions & 3 deletions .ci/docker/common/install_onnx.sh
Original file line number Diff line number Diff line change
Expand Up @@ -30,10 +30,10 @@ pip_install \

pip_install coloredlogs packaging

pip_install onnxruntime==1.17.0
pip_install onnx==1.15.0
pip_install onnxruntime==1.18
pip_install onnx==1.16.0
# pip_install "onnxscript@git+https://github.com/microsoft/onnxscript@3e869ef8ccf19b5ebd21c10d3e9c267c9a9fa729" --no-deps
pip_install onnxscript==0.1.0.dev20240315 --no-deps
pip_install onnxscript==0.1.0.dev20240523 --no-deps

# Cache the transformers model to be used later by ONNX tests. We need to run the transformers
# package to download the model. By default, the model is cached at ~/.cache/huggingface/hub/
Expand Down
4 changes: 2 additions & 2 deletions .ci/docker/ubuntu-cuda/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,7 @@ COPY --from=pytorch/llvm:9.0.1 /opt/llvm /opt/llvm
ARG CUDNN_VERSION
ARG CUDA_VERSION
COPY ./common/install_cudnn.sh install_cudnn.sh
RUN if [ "${CUDNN_VERSION}" -eq 8 ]; then bash install_cudnn.sh; fi
RUN if [ -n "${CUDNN_VERSION}" ]; then bash install_cudnn.sh; fi
RUN rm install_cudnn.sh

# Install CUSPARSELT
Expand All @@ -152,7 +152,7 @@ RUN rm install_cusparselt.sh
RUN if [ -h /usr/local/cuda-11.6/cuda-11.6 ]; then rm /usr/local/cuda-11.6/cuda-11.6; fi
RUN if [ -h /usr/local/cuda-11.7/cuda-11.7 ]; then rm /usr/local/cuda-11.7/cuda-11.7; fi
RUN if [ -h /usr/local/cuda-12.1/cuda-12.1 ]; then rm /usr/local/cuda-12.1/cuda-12.1; fi
RUN if [ -h /usr/local/cuda-12.1/cuda-12.4 ]; then rm /usr/local/cuda-12.1/cuda-12.4; fi
RUN if [ -h /usr/local/cuda-12.4/cuda-12.4 ]; then rm /usr/local/cuda-12.4/cuda-12.4; fi

USER jenkins
CMD ["bash"]
7 changes: 7 additions & 0 deletions .ci/docker/ubuntu-rocm/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,13 @@ COPY triton_version.txt triton_version.txt
RUN if [ -n "${TRITON}" ]; then bash ./install_triton.sh; fi
RUN rm install_triton.sh common_utils.sh triton-rocm.txt triton_version.txt

# Install AOTriton
COPY ./aotriton_version.txt aotriton_version.txt
COPY ./common/common_utils.sh common_utils.sh
COPY ./common/install_aotriton.sh install_aotriton.sh
RUN ["/bin/bash", "-c", "./install_aotriton.sh /opt/rocm && rm -rf install_aotriton.sh aotriton_version.txt common_utils.sh"]
ENV AOTRITON_INSTALLED_PREFIX /opt/rocm/aotriton

# Install ccache/sccache (do this last, so we get priority in PATH)
COPY ./common/install_cache.sh install_cache.sh
ENV PATH /opt/cache/bin:$PATH
Expand Down
40 changes: 32 additions & 8 deletions .ci/pytorch/test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -264,6 +264,18 @@ elif [[ $TEST_CONFIG == 'nogpu_AVX512' ]]; then
export ATEN_CPU_CAPABILITY=avx2
fi

# temp workarounds for https://github.com/pytorch/pytorch/issues/126692, remove when fixed
if [[ "$BUILD_ENVIRONMENT" != *-bazel-* ]]; then
pushd test
CUDA_VERSION=$(python -c "import torch; print(torch.version.cuda)")
if [ "$CUDA_VERSION" == "12.4" ]; then
ISCUDA124="cu124"
else
ISCUDA124=""
fi
popd
fi

test_python_legacy_jit() {
time python test/run_test.py --include test_jit_legacy test_jit_fuser_legacy --verbose
assert_git_not_dirty
Expand Down Expand Up @@ -356,15 +368,15 @@ test_inductor_cpp_wrapper_abi_compatible() {

echo "Testing Inductor cpp wrapper mode with TORCHINDUCTOR_ABI_COMPATIBLE=1"
# cpu stack allocation causes segfault and needs more investigation
python test/run_test.py --include inductor/test_cpu_cpp_wrapper
PYTORCH_TESTING_DEVICE_ONLY_FOR="" python test/run_test.py --include inductor/test_cpu_cpp_wrapper
python test/run_test.py --include inductor/test_cuda_cpp_wrapper

TORCHINDUCTOR_CPP_WRAPPER=1 python benchmarks/dynamo/timm_models.py --device cuda --accuracy --amp \
--training --inductor --disable-cudagraphs --only vit_base_patch16_224 \
--output "$TEST_REPORTS_DIR/inductor_cpp_wrapper_training.csv"
python benchmarks/dynamo/check_accuracy.py \
--actual "$TEST_REPORTS_DIR/inductor_cpp_wrapper_training.csv" \
--expected "benchmarks/dynamo/ci_expected_accuracy/inductor_timm_training.csv"
--expected "benchmarks/dynamo/ci_expected_accuracy/${ISCUDA124}/inductor_timm_training.csv"
}

# "Global" flags for inductor benchmarking controlled by TEST_CONFIG
Expand Down Expand Up @@ -526,10 +538,10 @@ test_single_dynamo_benchmark() {
--output "$TEST_REPORTS_DIR/${name}_${suite}.csv"
python benchmarks/dynamo/check_accuracy.py \
--actual "$TEST_REPORTS_DIR/${name}_$suite.csv" \
--expected "benchmarks/dynamo/ci_expected_accuracy/${TEST_CONFIG}_${name}.csv"
--expected "benchmarks/dynamo/ci_expected_accuracy/${ISCUDA124}/${TEST_CONFIG}_${name}.csv"
python benchmarks/dynamo/check_graph_breaks.py \
--actual "$TEST_REPORTS_DIR/${name}_$suite.csv" \
--expected "benchmarks/dynamo/ci_expected_accuracy/${TEST_CONFIG}_${name}.csv"
--expected "benchmarks/dynamo/ci_expected_accuracy/${ISCUDA124}/${TEST_CONFIG}_${name}.csv"
fi
}

Expand All @@ -553,7 +565,11 @@ test_dynamo_benchmark() {
test_single_dynamo_benchmark "dashboard" "$suite" "$shard_id" "$@"
else
if [[ "${TEST_CONFIG}" == *cpu_inductor* ]]; then
test_single_dynamo_benchmark "inference" "$suite" "$shard_id" --inference --float32 "$@"
if [[ "${TEST_CONFIG}" == *freezing* ]]; then
test_single_dynamo_benchmark "inference" "$suite" "$shard_id" --inference --float32 --freezing "$@"
else
test_single_dynamo_benchmark "inference" "$suite" "$shard_id" --inference --float32 "$@"
fi
elif [[ "${TEST_CONFIG}" == *aot_inductor* ]]; then
test_single_dynamo_benchmark "inference" "$suite" "$shard_id" --inference --bfloat16 "$@"
else
Expand All @@ -572,9 +588,11 @@ test_inductor_torchbench_smoketest_perf() {
--bfloat16 --inference --inductor --only hf_T5 --output "$TEST_REPORTS_DIR/inductor_cpp_wrapper_inference.csv"
TORCHINDUCTOR_ABI_COMPATIBLE=1 TORCHINDUCTOR_CPP_WRAPPER=1 python benchmarks/dynamo/torchbench.py --device cuda --accuracy \
--bfloat16 --inference --inductor --only llama --output "$TEST_REPORTS_DIR/inductor_cpp_wrapper_inference.csv"
TORCHINDUCTOR_ABI_COMPATIBLE=1 TORCHINDUCTOR_CPP_WRAPPER=1 python benchmarks/dynamo/torchbench.py --device cuda --accuracy \
--bfloat16 --inference --inductor --only moco --output "$TEST_REPORTS_DIR/inductor_cpp_wrapper_inference.csv"
python benchmarks/dynamo/check_accuracy.py \
--actual "$TEST_REPORTS_DIR/inductor_cpp_wrapper_inference.csv" \
--expected "benchmarks/dynamo/ci_expected_accuracy/inductor_torchbench_inference.csv"
--expected "benchmarks/dynamo/ci_expected_accuracy/${ISCUDA124}/inductor_torchbench_inference.csv"

python benchmarks/dynamo/torchbench.py --device cuda --performance --backend inductor --float16 --training \
--batch-size-file "$(realpath benchmarks/dynamo/torchbench_models_list.txt)" --only hf_Bert \
Expand All @@ -589,7 +607,13 @@ test_inductor_torchbench_smoketest_perf() {
# https://github.com/pytorch/pytorch/actions/runs/7158691360/job/19491437314,
# and thus we lower its threshold to reduce flakiness. If this continues to be a problem,
# we switch to use some other model.
python benchmarks/dynamo/check_perf_csv.py -f "$TEST_REPORTS_DIR/inductor_inference_smoketest.csv" -t 4.9
# Use 4.7 for cuda 12.4, change back to 4.9 after fixing https://github.com/pytorch/pytorch/issues/126692
if [ "$CUDA_VERSION" == "12.4" ]; then
THRESHOLD=4.7
else
THRESHOLD=4.9
fi
python benchmarks/dynamo/check_perf_csv.py -f "$TEST_REPORTS_DIR/inductor_inference_smoketest.csv" -t $THRESHOLD

# Check memory compression ratio for a few models
for test in hf_Albert timm_vision_transformer; do
Expand All @@ -608,7 +632,7 @@ test_inductor_torchbench_smoketest_perf() {
--only $test --output "$TEST_REPORTS_DIR/inductor_warm_start_smoketest_$test.csv"
python benchmarks/dynamo/check_accuracy.py \
--actual "$TEST_REPORTS_DIR/inductor_warm_start_smoketest_$test.csv" \
--expected "benchmarks/dynamo/ci_expected_accuracy/inductor_huggingface_training.csv"
--expected "benchmarks/dynamo/ci_expected_accuracy/${ISCUDA124}/inductor_huggingface_training.csv"
done
}

Expand Down
Loading