[Tool] - Add mechanism to save operators' tensors to file #1174
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description of changes:
Many times we have had to rewrite custom code to save an operator's tensors to file for debugging purposes. This operation is often complicated by the fact that tensors may be partitioned, and the partitions accessed simultaneously (if using tensor parallelism). In addition, tensors are usually overwritten at each decoding step. Further, in speculative inference mode, we want to be able to distinguish the tensors from each SSM and the LLM.
This PR introduces a simple way to automatically save the inference tensors to file. The weights are saved only once per model, whereas the inputs/outputs once per iteration. Currently, the tensors are saved in text format for simplicity, but we can later switch to using a binary format to save space on disk.
To use this debugging tool, set the parameter
inference_debugging
toTrue
when initializing the Python runtime, or pass the--inference-debugging
flag when launching a FlexFlow C++ program.Related Issues:
Linked Issues:
Issues closed by this PR:
This change is