drcachesim provides a
drmemtrace analysis tool framework to make it easy to create new trace analysis tools. A new tool should subclass dynamorio::drmemtrace::analysis_tool_t.
Concurrent processing of traces is supported by logically splitting a trace into "shards" which are each processed sequentially. The default shard is a traced application thread, but the tool interface also supports using physical cores as shards with each containing an interleaved mix of application threads provided by the Scheduler. The shard type is available to a tool by overriding the initialize_shard_type() funcion.
For tools that support concurrent processing of shards and do not need to see a single time-sorted interleaved merged trace, the interface functions with the parallel_ prefix should be overridden, and parallel_shard_supported() should return true. parallel_shard_init_stream() will be invoked for each shard prior to invoking parallel_shard_memref() for each entry in that shard; the data structure returned from parallel_shard_init() will be passed to parallel_shard_memref() for each trace entry for that shard. The concurrency model used guarantees that all entries from any one shard are processed by the same single worker thread, so no synchronization is needed inside the parallel_ functions. A single worker thread invokes print_results() as well.
For core-sharded analysis, if the thread-to-core scheduling occurs dynamically (this depends on the options passed to the analyzer: see the
-core_sharding option documentation under Simulator Parameters), the speed of each parallel analysis thread affects the actual schedule. If a tool has a significant asymmetry and does not wish this to affect the schedule, a desired schedule should be recorded without the tool and then replayed with the tool. In replay mode the tool's speed will not affect the schedule.
For serial operation, process_memref(), operates on a trace entry in a single, sorted, interleaved stream of trace entries. In the default mode of operation, the dynamorio::drmemtrace::analyzer_t class iterates over the trace and calls the process_memref() function of each tool. An alternative mode is supported which exposes the iterator and allows a separate control infrastructure to be built. This alternative mode does not support parallel operation at this time.
Both parallel and serial operation can be supported by a tool, typically by having process_memref() create data on a newly seen traced thread and invoking parallel_shard_memref() to do its work.
For both parallel and serial operation, the function print_results() should be overridden. It is called just once after processing all trace data and it should present the results of the analysis. For parallel operation, any desired aggregation across the whole trace should occur here as well, while shard-specific results can be presented in parallel_shard_exit().
Tools can also perform trace analysis by intervals, e.g. to generate a time series of their results, using the
-interval_microseconds option. The generate_interval_snapshot() API allows the tool to create a snapshot of its internal state when a trace interval ends. These snapshots are then passed to the tool in a later print_interval_results() API call where the tool can generate and print results for each trace interval. The length of a trace interval is defined by the
-interval_microseconds option, measured using the dynamorio::drmemtrace::TRACE_MARKER_TYPE_TIMESTAMP marker values. Trace interval analysis is supported also for the parallel mode where the tool implements generate_shard_interval_snapshot() to generate a snapshot for shard-local intervals and the framework automatically combines the shard-local interval snapshots to create the whole-trace interval snapshots, using the tool's combine_interval_snapshots() API.
Today, parallel analysis is only supported for offline traces. Support for online traces may be added in the future.
In the default mode of operation, the dynamorio::drmemtrace::analyzer_t class iterates over the trace and calls the appropriate dynamorio::drmemtrace::analysis_tool_t functions for each tool. An alternative mode is supported which exposes the iterator and allows a separate control infrastructure to be built.
As explained in Trace Format, each trace entry is of type dynamorio::drmemtrace::memref_t and represents one instruction or data reference or a metadata operation such as a thread exit or marker. There are built-in scheduling markers providing the timestamp and cpu identifier on each thread transition. Other built-in markers indicate disruptions in user mode control flow such as signal handler entry and exit.
The absolute ordinals for trace records and instruction fetches are available via the dynamorio::drmemtrace::memtrace_stream_t interface passed to the initialize_stream() function for serial operation and parallel_shard_init_stream() for parallel operation. If the iterator skips over some records that are not passed to the tools, these ordinals will include those skipped records. If a tool wishes to count only those records or instructions that it sees, it can add its own counters.
In some cases, a tool may want to observe the exact sequence of dynamorio::drmemtrace::trace_entry_t in an offline trace stored on disk. To support such use cases, the dynamorio::drmemtrace::trace_entry_t specialization of dynamorio::drmemtrace::analysis_tool_tmpl_t and dynamorio::drmemtrace::analyzer_tmpl_t can be used. Specifically, such tools should subclass dynamorio::drmemtrace::record_analysis_tool_t, and use the dynamorio::drmemtrace::record_analyzer_t class.
CMake support is provided for including the headers and linking the libraries of the
drmemtrace framework. A new CMake function is defined in the DynamoRIO package which sets the include directory for using the
drmemtrace_analyzer library exported by the DynamoRIO package is the main library to link when building a new tool. The tools described above are also exported as the libraries
drmemtrace_syscall_mix and can be created using the basic_counts_tool_create(), opcode_mix_tool_create(), histogram_tool_create(), reuse_distance_tool_create(), reuse_time_tool_create(), view_tool_create(), cache_simulator_create(), tlb_simulator_create(), func_view_create(), and syscall_mix_tool_create() functions.
drmemtrace analysis tool framework allows to load non-predefined separately-built external tools. This tool can be loaded by drcachesim using the
The tool package should consist of
Static librarycontaining a subclass of dynamorio::drmemtrace::analysis_tool_t with tool internal logic. This library was described in previous section.
Tool creator dynamic librarycontaining tool factory function.
Registration file should be placed to the
tools subdirectory of the root of the DynamoRIO installation. Here
toolname is the desired external name of the tool. This file should contain the following lines:
drcachesim to locate the tool's creator library. The 32 and 64 specifiers allow pointing at alternate-bitwidth paths for use if the target application creates a child process of a different bitwidth.
For more extensive actions on launching the tool, a custom front-end executable can be created that replaces
drcachesim modeled after histogram_launcher.cpp or opcode_mix_launcher.cpp.
The creator dynamic library should contain 2 export functions:
drcachesim to create an analyis tool. As an example, see minimal external analysis tool.
In addition to the analysis tool framework, which targets running multiple tools at once either in parallel across all traced threads or in a serial fashion, we provide a scheduler which will map inputs to a given set of outputs in a specified manner. This allows a tool such as a core simulator, or just a tool wanting its own control over advancing the trace stream (unlike the analysis tool framework where the framework controls the iteration), to request the next trace record for each output on its own. This scheduling is also available to any analysis tool when the input traces are sharded by core (see the
-core_sharding option documentation under Simulator Parameters as well as Creating New Analysis Tools).
Here is a simple example of a single-output, serial stream. This also serves as an example of how to replace the now-removed old analysis tool framework's "external iterator" interface: