Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add library to cache decode instr info in analysis tools #7113

Open
abhinav92003 opened this issue Dec 9, 2024 · 0 comments
Open

Add library to cache decode instr info in analysis tools #7113

abhinav92003 opened this issue Dec 9, 2024 · 0 comments
Assignees

Comments

@abhinav92003
Copy link
Contributor

Various analyzers require decoding instr encodings, e.g. opcode_mix and invariant_checker. Some want to remember the full decoded instr_t, whereas others want to record only specific information about the decoded instr. It would be nice to have a shared library to obviate the need for new analysis tools to re-implement the logic.

@abhinav92003 abhinav92003 self-assigned this Dec 9, 2024
abhinav92003 added a commit that referenced this issue Dec 9, 2024
Adds a new library to cache information about decoded instructions. This can be
used by analysis tools that need to decode the instr encodings in the trace.

The library allows the tools to specify what information they need to cache.

Refactors the invariant checker tool to use this library.

Issue: #7113
abhinav92003 added a commit that referenced this issue Dec 11, 2024
Moves the module mapper and related functionality into raw2trace_shared. This is in preparation
for further changes that include the module mapper logic in a new instr_decode_cache_t. To avoid
having to pull in the whole drmemtrace_raw2trace implementation into the instr decode cache
library, which will end up including it in the invariant checker and everything that depends on,
we separate out the module mapper into raw2trace_shared which is intended for such cases.

Issue: #7113
abhinav92003 added a commit that referenced this issue Dec 11, 2024
This is in preparation for further changes that include the module mapper logic in a new library (drmemtrace_instr_decode_cache) that provides an instr decode cache (instr_decode_cache_t). To avoid having to pull in the whole drmemtrace_raw2trace implementation into drmemtrace_instr_decode_cache, which will end up including it in the invariant checker and everything that depends on, we separate out the module mapper into raw2trace_shared which is intended for such cases.

Issue: #7113
abhinav92003 added a commit that referenced this issue Dec 12, 2024
…7123)

Moves the module mapper and related functionality into raw2trace_shared.

This is in preparation for further changes that include the module
mapper logic in a new library (drmemtrace_instr_decode_cache) that
provides an instr decode cache (instr_decode_cache_t). To avoid having
to pull in the whole drmemtrace_raw2trace implementation into
drmemtrace_instr_decode_cache, which will end up including it in the
invariant checker and everything that depends on, we separate out the
module mapper into raw2trace_shared which is intended for such cases.

This is a pure code move, no changes were made to logic or style.

Issue: #7113
abhinav92003 added a commit that referenced this issue Dec 15, 2024
Moves the module read functionality into raw2trace_shared.

Refactors opcode_mix and view to use the new function instead of
creating a raw2trace_directory_t.

This is in preparation for further changes that include the module read
logic in a new library (drmemtrace_decode_cache) that provides an instr
decode cache (decode_cache_t). To avoid having to pull in the whole
drmemtrace_raw2trace implementation into drmemtrace_decode_cache, which
will end up including it in the invariant checker and everything that
depends on, we separate out the module read logic into raw2trace_shared
which is intended for such cases.

While the raw2trace and raw2trace_shared implementation usually is made
to deal with only file streams, we make an exception here for
read_module_file() which uses DR file APIs instead. This is likely fine
because this function is not intended for use with code that wants file
streams, and it is better than other alternatives: moving
raw2trace-using functionality out of raw2trace_directory_t so we can use
raw2trace_directory_t in decode_cache_t (this would still add zlib/lz4
etc unnecessary file libs to drmemtrace_decode_cache), create yet
another raw2trace_shared_non_stream.cpp for shared code that uses DR
file APIs, or duplicating the read_module_file impl in decode_cache_t.

This also removes raw2trace_directory_t::initialize_module_file() as
that functionality is now provided by read_module_file in
raw2trace_shared.


Issue: #7113
abhinav92003 added a commit that referenced this issue Jan 24, 2025
Adds a new drmemtrace_decode_cache library to cache information about
decoded instructions using decode_cache_t. This can be used by analysis
tools that need to decode the instr encodings in the trace, to avoid
overhead of redundant decodes which can get expensive, and also to avoid
duplication of various related logic.

The library allows the tools to decide what information they need to
cache and add implementation for how to obtain it. Also, it uses
instr_noalloc_t when possible to reduce heap usage and
allocation/deallocation overhead.

If the trace does not include embedded encodings or if the user wants to
get encodings from the app binaries using module_mapper_t instead, they
can provide the module file path to the init API on the decode_cache_t
object. decode_cache_t keeps a single initialized module_mapper_t at any
time, which is shared between all decode_cache_t objects (even the ones
of different template types); this is done by tracking the count of
active objects that use the module mapper.

decode_cache_t provides the clear_cache() API which can be used in
parallel_shard_exit() to keep memory consumption in check by freeing up
cached decode info that may not be needed for result computation in
later print_results() which has to wait until all shards are done.

Refactors the invariant checker and opcode mix tools to use this
library.

Modifies add_encodings_to_memrefs to support a mode where encodings are
not set in the generated test memrefs but only the instr addr and size
fields are set.

Makes the opcode cache in opcode_mix_t per-shard instead of per-worker.
Decode info must not be cached per-worker as that may cause stale
encodings for non-first shards processed by the worker. This means the
worker init and worker exit APIs can be removed now from opcode_mix_t.

Adds decode_cache_test and opcode_mix_test unit tests that verify
operation of the decode_cache_t.

Issue: #7113
abhinav92003 added a commit that referenced this issue Jan 27, 2025
Refactors the view_t to use the new decode_cache_t instead of duplicating the logic.

Modifies decode_info_base_t::set_decode_info_derived to allow it to return the error
string. This is useful for cases where the decode_info_t object encountered some
error during its operation, which is especially possible when used with a
decode_cache_t with include_decoded_instr_=false in which case the decode_info_t
also performs the decoding on its own. Adds new tests to decode_cache_test to
verify some related cases.

Issue: #7113
abhinav92003 added a commit that referenced this issue Jan 27, 2025
Fixes the opcode_mix tool to get the filetype from the memtrace_stream_t interface instead
of the TRACE_MARKER_TYPE_FILETYPE marker memref which may not be seen if -skip_instrs is
applied by the user.

Adds a new unit test that runs the opcode_mix tool with -skip_instrs, which would fail
because of an unseen filetype prior to this fix.

Issue: #7224, #7113
abhinav92003 added a commit that referenced this issue Jan 28, 2025
Fixes the opcode_mix tool to get the filetype from the memtrace_stream_t
interface instead of the TRACE_MARKER_TYPE_FILETYPE marker memref which
may not be seen if -skip_instrs is applied by the user.

Modifies opcode_mix_t to use initialize_stream instead of initialize, to
get the serial stream.

Adds a new unit test that runs the opcode_mix tool with -skip_instrs,
which would fail because of an unseen filetype prior to this fix.

Augments decode_cache_t::init() documentation to suggest a reliable way
to obtain the filetype.

Issue: #7224, #7113
abhinav92003 added a commit that referenced this issue Jan 30, 2025
Moves the check that compares the current build enviroment with the arch bit of the trace file to decode_cache_t::init.

Due to some complexities in how build_target_arch_type() is declared, it had to be wrapped by a new API in
decode_cache_base_t. build_target_arch_type() is defined conditionally in trace_entry.h only if dr_defines.h
has already defined certain symbols. It is hard to ensure that for all builds, so we ensure it just for
decode_cache.cpp which is packaged as a separate lib unit drmemtrace_decode_cache.

Issue: #7113
abhinav92003 added a commit that referenced this issue Jan 30, 2025
Refactors the view_t tool to use the new decode_cache_t instead of
duplicating the instr decoding logic.

Modifies decode_info_base_t::set_decode_info_derived to allow it to
return the error string. This is useful for cases where there was an
error during its operation, which is especially possible when used with
a decode_cache_t with include_decoded_instr_ = false in which case the
decode_info_t also performs the decoding on its own.

Adds new tests to decode_cache_test to verify some related cases. Not
adding a new release note for the API signature change as there was no
intervening release since it was initially added.

Also fixes the view_t logic to get filetype, which should be obtained
from the memtrace_stream_t object instead of TRACE_MARKER_TYPE_FILETYPE
to handle cases where the filetype marker is not seen by the tool
because of analyzer options like -skip_instrs that skip over a certain
initial part of the trace.

Issue: #7113, #7224
abhinav92003 added a commit that referenced this issue Jan 30, 2025
Moves the check that compares the current build enviroment with the arch
bit of the trace file to decode_cache_t::init.

Due to some complexities in how build_target_arch_type() is declared in
trace_entry.h, it had to be wrapped by a new API in decode_cache_base_t.
build_target_arch_type() is defined conditionally in trace_entry.h only
if dr_defines.h has already defined certain symbols. It is hard to
ensure that for all builds dr_defines.h is included before
trace_entry.h, so we ensure it just for decode_cache.cpp which is
packaged as a separate lib unit drmemtrace_decode_cache. This also means
that the new API could not be in decode_cache_t which is a template
class that has to be defined in the header.

Issue: #7113, #7236
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant