[Tool] - Add mechanism to save operators' tensors to file #1174

goliaro · 2023-10-05T20:57:55Z

Description of changes:

Many times we have had to rewrite custom code to save an operator's tensors to file for debugging purposes. This operation is often complicated by the fact that tensors may be partitioned, and the partitions accessed simultaneously (if using tensor parallelism). In addition, tensors are usually overwritten at each decoding step. Further, in speculative inference mode, we want to be able to distinguish the tensors from each SSM and the LLM.

This PR introduces a simple way to automatically save the inference tensors to file. The weights are saved only once per model, whereas the inputs/outputs once per iteration. Currently, the tensors are saved in text format for simplicity, but we can later switch to using a binary format to save space on disk.

To use this debugging tool, set the parameter inference_debugging to True when initializing the Python runtime, or pass the --inference-debugging flag when launching a FlexFlow C++ program.

Related Issues:

Linked Issues:

Issue #

Issues closed by this PR:

Closes #

This change is

…to file

goliaro added 4 commits October 4, 2023 19:12

add model id, layer_id and op_name to opmeta

942cb66

pass model id to opmeta

d5eb970

.

ab65516

implement inference tensor save function

edbb333

goliaro changed the title ~~[Tool] - Add mechanism to save an operator's tensors to file~~ [Tool] - Add mechanism to save operators' tensors to file Oct 5, 2023

This comment was marked as resolved.

Sign in to view

goliaro added 4 commits October 6, 2023 14:47

add calls to save tensors function in ops

af8f82d

more ops

e1d1bc9

Merge branch 'inference' into debugging

0e7fe24

done

bdc47ff

goliaro marked this pull request as ready for review October 6, 2023 22:24

goliaro added 8 commits October 7, 2023 04:49

fix bugs, implement batchconfig << operator, add function to save bc …

2071925

…to file

fixes

36769d3

hip_rocm fixes

42302a9

fix

c01b3a5

fix bug

6fcd980

fix ci

5592ead

removed out of date incmha inference test

2afed8f

add save tensors function to fused.cu

b7f19be

goliaro merged commit 50ff264 into inference Oct 8, 2023

goliaro deleted the debugging branch October 23, 2023 02:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Tool] - Add mechanism to save operators' tensors to file #1174

[Tool] - Add mechanism to save operators' tensors to file #1174

goliaro commented Oct 5, 2023 •

edited

Loading

This comment was marked as resolved.

[Tool] - Add mechanism to save operators' tensors to file #1174

[Tool] - Add mechanism to save operators' tensors to file #1174

Conversation

goliaro commented Oct 5, 2023 • edited Loading

This comment was marked as resolved.

goliaro commented Oct 5, 2023 •

edited

Loading