drmemtrace's behavior can be controlled through options passed after the -t drmemtrace but prior to the "--" delimiter on the command line:

$ bin64/drrun -t drmemtrace <options> <to> <drmemtrace> -- /path/to/target/app <args> <for> <app>

Boolean options can be disabled using a "-no_" prefix.

The parameters available are described below:

-offline
default value: false
By default, traces are processed online, sent over a pipe to a simulator. If this option is enabled, trace data is instead written to files in -outdir for later offline analysis. No simulator is executed.
-ipc_name
default value: drcachesimpipe
For online tracing and simulation (the default, unless -offline is requested), specifies the name of the named pipe used to communicate between the target application processes and the caching device simulator. On Linux this can include an absolute path (if it doesn't, a default temp directory will be used). A unique name must be chosen for each instance of the simulator being run at any one time. On Windows, the name is limited to 247 characters.
-outdir
default value: .
For the offline analysis mode (when -offline is requested), specifies the path to a directory where per-thread trace files will be written. The contents of this directory are internal to the tool. Do not alter, add, or delete files here.
-subdir_prefix
default value: drmemtrace
For the offline analysis mode (when -offline is requested), specifies the prefix for the name of the sub-directory where per-thread trace files will be written. The sub-directory is created inside -outdir and has the form 'prefix.app-name.pid.id.dir'.
-indir
default value: ""
After a trace file is produced via -offline into -outdir, it can be passed to the simulator via this flag pointing at the subdirectory created in -outdir. The -offline tracing produces raw data files which are converted into final trace files on the first execution with -indir. The raw files can also be manually converted using the drraw2trace tool. Legacy single trace files with all threads interleaved into one are not supported with this option: use -infile instead. The contents of this directory are internal to the tool. Do not alter, add, or delete files here.
-multi_indir
default value: ""
Identical to -indir except this takes a colon-separated list of directories for offline analysis in core-sharded mode. Multiple inputs are not supported in any other mode; they are not supported currently with thread or shard limits; they are not supported for interval analysis, for replaying as-traced, or for core-sharded-on-disk inputs. Skipping is applied to every input thread. Auxiliary files such as the traced function list (see -funclist_file), the v2p file (see -v2p_file), schedule files for the invariant checker, and the module file (for legacy traces without encodings) are only auto-located in the first directory listed. Additional support may be added in the future. See -indir for further information on each input directory.
-infile
default value: ""
Directs the framework to use a single trace file. This could be a legacy all-software-threads-interleaved-into-one trace file, a core-sharded single hardware thread file mixing multiple software threads, or a single software thread selected from a directory (though in that case it is better to use -only_thread, -only_threads, or -only_shards). This method of input does not support any features that require auxiliary metadata files. Passing '-' will read from stdin as plain or gzip-compressed data.
-jobs
default value: -1
By default, both post-processing of offline raw trace files and analysis of trace files is parallelized. This option controls the number of concurrent jobs. 0 disables concurrency and uses a single thread to perform all operations. A negative value sets the job count to the number of hardware threads, with a cap of 16. This is ignored for -core_sharded where -cores sets the parallelism.
-module_file
default value: ""
The opcode_mix tool needs the modules.log file (generated by the offline post-processing step in the raw/ subdirectory) in addition to the trace file. If the file is named modules.log and is in the same directory as the trace file, or a raw/ subdirectory below the trace file, this parameter can be omitted.
-alt_module_dir
default value: ""
Specifies a directory containing libraries referenced in -module_file for analysis tools, or in the raw modules file for post-prcoessing of offline raw trace files. This directory takes precedence over the recorded path.
-chunk_instr_count
default value: 10000000
Specifies the size in instructions of the chunks into which a trace output file is split inside a zipfile. This is the granularity of a fast seek. This only applies when generating .zip-format traces; when built without support for writing .zip files, this option is ignored. For 32-bit this cannot exceed 4G.
-instr_encodings
default value: false
By default instruction encodings are not sent to online tools, to reduce overhead. (Offline tools have them added by default.)
-funclist_file
default value: ""
The func_view tool needs the mapping from function name to identifier that was recorded during offline tracing. This data is stored in its own separate file in the raw/ subdirectory. If the file is named funclist.log and is in the same directory as the trace file, or a raw/ subdirectory below the trace file, this parameter can be omitted.
-cores
default value: 4
Specifies the number of cores to simulate. For -core_sharded, each core executes in parallel in its own worker thread and -jobs is ignored.
-line_size
default value: 64
Specifies the cache line size, which is assumed to be identical for L1 and L2 caches. Must be at least 4 and a power of 2.
-L1I_size
default value: 32K
Specifies the total size of each L1 instruction cache. L1I_size/L1I_assoc must be a power of 2 and a multiple of line_size.
-L1D_size
default value: 32K
Specifies the total size of each L1 data cache. L1D_size/L1D_assoc must be a power of 2 and a multiple of line_size.
-L1I_assoc
default value: 8
Specifies the associativity of each L1 instruction cache. L1I_size/L1I_assoc must be a power of 2 and a multiple of line_size.
-L1D_assoc
default value: 8
Specifies the associativity of each L1 data cache. L1D_size/L1D_assoc must be a power of 2 and a multiple of line_size.
-LL_size
default value: 8M
Specifies the total size of the unified last-level (L2) cache. LL_size/LL_assoc must be a power of 2 and a multiple of line_size.
-LL_assoc
default value: 16
Specifies the associativity of the unified last-level (L2) cache. LL_size/LL_assoc must be a power of 2 and a multiple of line_size.
-LL_miss_file
default value: ""
If non-empty, when running the cache simulator, requests that every last-level cache miss be written to a file at the specified path. Each miss is written in text format as a <process id, program counter, address> tuple. If this tool is linked with zlib, the file is written in gzip-compressed format. If non-empty, when running the cache miss analyzer, requests that prefetching hints based on the miss analysis be written to the specified file. Each hint is written in text format as a <program counter, stride, locality level> tuple.
-L0_filter
default value: false
DEPRECATED: Use the -L0I_filter and -L0D_filter options instead.
-L0I_filter
default value: false
Filters out instruction hits in a 'zero-level' cache during tracing itself, shrinking the final trace to only contain instructions that miss in this initial cache. This cache is direct-mapped with size equal to L0I_size. It uses virtual addresses regardless of -use_physical. The dynamic (pre-filtered) per-thread instruction count is tracked and supplied via a dynamorio::drmemtrace::TRACE_MARKER_TYPE_INSTRUCTION_COUNT marker at thread buffer boundaries and at thread exit.
-L0D_filter
default value: false
Filters out data hits in a 'zero-level' cache during tracing itself, shrinking the final trace to only contain data accesses that miss in this initial cache. This cache is direct-mapped with size equal to L0D_size. It uses virtual addresses regardless of -use_physical.
-L0I_size
default value: 32K
Specifies the size of the 'zero-level' instruction cache for L0I_filter. Must be a power of 2 and a multiple of line_size, unless it is set to 0, which disables instruction fetch entries from appearing in the trace.
-L0D_size
default value: 32K
Specifies the size of the 'zero-level' data cache for L0D_filter. Must be a power of 2 and a multiple of line_size, unless it is set to 0, which disables data entries from appearing in the trace.
-instr_only_trace
default value: false
If -instr_only_trace, only instruction fetch entries are included in the trace and data entries are omitted.
-coherence
default value: false
Writes to cache lines will invalidate other private caches that hold that line.
-use_physical
default value: false
If available, metadata with virtual-to-physical-address translation information is added to the trace. This is not possible from user mode on all platforms. The regular trace entries remain virtual, with a pair of markers of types dynamorio::drmemtrace::TRACE_MARKER_TYPE_PHYSICAL_ADDRESS and dynamorio::drmemtrace::TRACE_MARKER_TYPE_VIRTUAL_ADDRESS inserted at some prior point for each new or changed page mapping to show the corresponding physical addresses. If translation fails, a dynamorio::drmemtrace::TRACE_MARKER_TYPE_PHYSICAL_ADDRESS_NOT_AVAILABLE is inserted. This option may incur significant overhead both for the physical translation and as it requires disabling optimizations.For -offline, this option must be passed to both the tracer (to insert the markers) and the simulator (to use the markers).
-virt2phys_freq
default value: 0
This option only applies if -use_physical is enabled. The virtual to physical mapping is cached for performance reasons, yet the underlying mapping can change without notice. This option controls the frequency with which the cached value is ignored in order to re-access the actual mapping and ensure accurate results. The units are the number of memory accesses per forced access. A value of 0 uses the cached values for the entire application execution.
-v2p_file
default value: ""
The TLB_simulator simulator can use v2p.textproto to translate virtual addresses to physical ones during offline analysis. If the file is named v2p.textproto and is in the same directory as the trace file, or a raw/ subdirectory below the trace file, this parameter can be omitted. This option overwrites both -page_size and the page size marker in the trace (if present) with the page size in v2p.textproto. The option -use_physical (in offline mode) must be set to use the v2p.textproto mapping. Note that -use_physical does not need to be set during tracing.
-cpu_scheduling
default value: false
By default for online analysis, the simulator schedules threads to simulated cores in a static round-robin fashion. This option causes the scheduler to instead use the recorded cpu that each thread executed on (at a granularity of the trace buffer size) for scheduling, mapping traced cpu's to cores and running each segment of each thread on the core that owns the recorded cpu for that segment. This option is not supported with -core_serial; use -cpu_schedule_file with -core_serial instead. For offline analysis, the recommendation is to not recreate the as-traced schedule (as it is not accurate due to overhead) and instead use a dynamic schedule via -core_serial. If only core-sharded-preferring tools are enabled (e.g., cache_simulator, TLB_simulator, schedule_stats), -core_serial is automatically turned on for offline analysis.
-max_trace_size
default value: 0
If non-zero, this sets a maximum size on the amount of raw trace data gathered for each thread. This is not an exact limit: it may be exceeded by the size of one internal buffer. Once reached, instrumentation continues for that thread, but no further data is recorded.
-max_global_trace_refs
default value: 0
If non-zero, this sets a maximum size on the amount of trace entry references (of any type: instructions, loads, stores, markers, etc.) recorded. Once reached, instrumented execution continues, but no further data is recorded. This is similar to -exit_after_tracing but without terminating the process.The reference count is approximate.
-align_endpoints
default value: true
When using attach/detach to trace a burst, the attach and detach processes are staggered, with the set of threads producing trace data incrementally growing or shrinking. This results in uneven thread activity at the start and end of the burst. If this option is enabled, tracing is nop-ed until fully attached to all threads and is nop-ed as soon as detach starts, eliminating the unevenness. This also allows omitting threads that did nothing during the burst.
-memdump_on_window
default value: false
Capture a memory dump upon the initiation of tracing triggered by -trace_after_instrs, -trace_instr_intervals_file, or -retrace_every_instrs. If -retrace_every_instrs is also enabled, a memory dump will be captured for each individual tracing window. This is only supported on X64 Linux.
-trace_after_instrs
default value: 0
If non-zero, this causes tracing to be suppressed until this many dynamic instruction executions are observed from the start of the application. At that point, regular tracing is put into place. The threshold should be considered approximate, especially for larger values. Use -trace_for_instrs, -max_trace_size, or -max_global_trace_refs to set a limit on the subsequent trace length. Use -retrace_every_instrs to trace repeatedly.
-trace_for_instrs
default value: 0
If non-zero, this stops recording a trace after the specified number of instructions are traced. Unlike -exit_after_tracing, which kills the application (and counts data as well as instructions), the application continues executing. This can be combined with -retrace_every_instrs. The actual trace period may vary slightly from this number due to optimizations that reduce the overhead of instruction counting.
-retrace_every_instrs
default value: 0
This option augments -trace_for_instrs. After tracing concludes, this option causes non-traced instructions to be counted and after the number specified by this option, tracing will start up again for the -trace_for_instrs duration. This process repeats itself. This can be combined with -trace_after_instrs for an initial period of non-tracing. Each tracing window is delimited by TRACE_MARKER_TYPE_WINDOW_ID markers. For -offline traces, each window is placed into its own separate set of output files, unless -no_split_windows is set.
-trace_instr_intervals_file
default value: ""
File containing instruction intervals to trace in csv format. Intervals are specified as a <start, duration> pair per line. Example in: clients/drcachesim/tests/instr_intervals_example.csv
-split_windows
default value: true
By default, offline traces in separate windows from -retrace_every_instrs are written to a different set of files for each window. If this option is disabled, all windows are concatenated into a single trace, separated by TRACE_MARKER_TYPE_WINDOW_ID markers.
-exit_after_tracing
default value: 0
If non-zero, after tracing the specified number of references, the process is exited with an exit code of 0. The reference count is approximate. Use -max_global_trace_refs instead to avoid terminating the process.
-raw_compress
default value: lz4
Specifies the compression type to use for raw offline files: "snappy", "snappy_nocrc" (snappy without checksums, which is much faster), "gzip", "zlib", "lz4", or "none". Whether this reduces overhead depends on the storage type: for an SSD, zlib and gzip typically add overhead and would only be used if space is at a premium; snappy_nocrc and lz4 are nearly always performance wins.
-compress
default value: zip
Specifies the compression type to use for trace files: "zip", "gzip", "zlib", "lz4", or "none". In most cases where fast skipping by instruction count is not needed lz4 compression generally improves performance and is recommended. When it comes to storage types, the impact on overhead varies: for SSDs, zip and gzip often increase overhead and should only be chosen if space is limited.
-online_instr_types
default value: false
By default, offline traces include some information on the types of instructions, branches in particular. For online traces, this comes at a performance cost, so it is turned off by default.
-replace_policy
default value: LRU
Specifies the replacement policy for caches. Supported policies: LRU (Least Recently Used), LFU (Least Frequently Used), FIFO (First-In-First-Out).
-data_prefetcher
default value: nextline
Specifies the hardware data prefetcher policy. The currently supported policies are 'nextline' (fetch the subsequent cache line) and 'none' (disables hardware prefetching). The prefetcher is located between the L1D and LL caches.
-page_size
default value: 4K
Specifies the virtual/physical page size.
-TLB_L1I_entries
default value: 32
Specifies the number of entries in each L1 instruction TLB. Must be a power of 2.
-TLB_L1D_entries
default value: 32
Specifies the number of entries in each L1 data TLB. Must be a power of 2.
-TLB_L1I_assoc
default value: 32
Specifies the associativity of each L1 instruction TLB. Must be a power of 2.
-TLB_L1D_assoc
default value: 32
Specifies the associativity of each L1 data TLB. Must be a power of 2.
-TLB_L2_entries
default value: 1024
Specifies the number of entries in each unified L2 TLB. Must be a power of 2.
-TLB_L2_assoc
default value: 4
Specifies the associativity of each unified L2 TLB. Must be a power of 2.
-TLB_replace_policy
default value: LFU
Specifies the replacement policy for TLBs. Supported policies: LFU (Least Frequently Used), BIT_PLRU (Pseudo Least Recently Used) LRU (Least Recently Used) FIFO (First-In-First-Out)
-tool
default value: cache_simulator
Predefined types: cache_simulator, miss_analyzer, TLB_simulator, reuse_distance, reuse_time, histogram, basic_counts, invariant_checker, schedule_stats, or record_filter. The record_filter tool cannot be combined with the others as it operates on raw disk records. To invoke an external tool: specify its name as identified by a name.drcachesim config file in the DR tools directory.
-verbose
default value: 0
Verbosity level for notifications.
-show_func_trace
default value: true
In the func_trace tool, this controls whether every traced call is shown or instead only aggregate statistics are shown.
-test_mode
default value: false
Run extra analyses for sanity checks on the trace.
-test_mode_name
default value: ""
Run extra analyses for specific sanity checks by name on the trace.
-disable_optimizations
default value: false
Disables various optimizations where information is omitted from offline trace recording when it can be reconstructed during post-processing. This is meant for testing purposes.
-dr
default value: ""
Specifies the path of the DynamoRIO root directory.
-dr_debug
default value: false
Requests use of the debug build of DynamoRIO rather than the release build.
-dr_ops
default value: ""
Specifies the options to pass to DynamoRIO.
-tracer
default value: ""
The full path to the tracer library.
-tracer_alt
default value: ""
The full path to the tracer library for the other bitwidth, for use on child processes with a different bitwidth from their parent. If empty, such child processes will die with fatal errors.
-interval_microseconds
default value: 0
Desired length of each trace interval, defined in microseconds of trace time. Trace intervals are measured using the TRACE_MARKER_TYPE_TIMESTAMP marker values. If set, analysis tools receive a callback at the end of each interval, and one at the end of trace analysis to print the whole-trace interval results.
-interval_instr_count
default value: 0
Desired length of each trace interval, defined in instr count of each shard. With -parallel, this does not support whole trace intervals, only per-shard intervals. If set, analysis tools receive a callback at the end of each interval, and separate callbacks per shard at the end of trace analysis to print each shard's interval results.
-only_thread
default value: 0
Limits analyis to the single thread with the given identifier. 0 enables all threads. Applies only to -indir, not to -infile. Cannot be combined with -only_threads or -only_shards.
-only_threads
default value: ""
Limits analyis to the list of comma-separated thread ids. Applies only to -indir, not to -infile. Cannot be combined with -only_thread or -only_shards.
-only_shards
default value: ""
Limits analyis to the list of comma-separated shard ordinals. A shard is typically an input thread but might be a core for core-sharded-on-disk traces. The ordinal is 0-based and indexes into the sorted order of input filenames. Applies only to -indir, not to -infile. Cannot be combined with -only_thread or -only_threads.
-skip_instrs
default value: 0
Specifies the number of instructions to skip in the beginning of the trace analysis. For serial iteration, this number is computed just once across the interleaving sequence of all threads; for parallel iteration, each thread skips this many instructions (see -skip_to_timestamp for an alternative which does align all threads). When built with zipfile support, this skipping is optimized and large instruction counts can be quickly skipped; this is not the case for -skip_records or -skip_refs. This will skip over top-level metadata records (such as dynamorio::drmemtrace::TRACE_MARKER_TYPE_VERSION, dynamorio::drmemtrace::TRACE_MARKER_TYPE_FILETYPE, dynamorio::drmemtrace::TRACE_MARKER_TYPE_PAGE_SIZE, and dynamorio::drmemtrace::TRACE_MARKER_TYPE_CACHE_LINE_SIZE) and so those records will not appear to analysis tools; however, their contents can be obtained from dynamorio::drmemtrace::memtrace_stream_t API accessors. Synthetic records, such as dynamically injected system call or context switch sequences, are not counted at all.
-skip_records
default value: 0
Specifies the number of records to skip in the beginning of the trace analysis. For serial iteration, this number is computed just once across the interleaving sequence of all threads; for parallel iteration, each worker skips this many records (see -skip_to_timestamp for an alternative which aligns all threads). This skipping is not as fast as -skip_instrs. This will skip over top-level metadata records (such as dynamorio::drmemtrace::TRACE_MARKER_TYPE_VERSION, dynamorio::drmemtrace::TRACE_MARKER_TYPE_FILETYPE, dynamorio::drmemtrace::TRACE_MARKER_TYPE_PAGE_SIZE, and dynamorio::drmemtrace::TRACE_MARKER_TYPE_CACHE_LINE_SIZE) and so those records will not appear to analysis tools; however, their contents can be obtained from dynamorio::drmemtrace::memtrace_stream_t API accessors. Synthetic records, such as dynamically injected system call or context switch sequences, are not counted at all.
-skip_refs
default value: 0
This option is honored by certain tools such as the cache and TLB simulators. It causes them to ignore the specified count of non-marker (i.e., actual address reference ('ref') records) at the start of the trace and only start processing after that count is reached. Since the framework is still iterating over those records and it is the tool who is ignoring them, this skipping may be slow for large skip values; consider -skip_instrs for a faster method of skipping inside the framework itself. To skip markers also, use -skip_records.
-skip_to_timestamp
default value: 0
Specifies a timestamp to start at, skipping over prior records in the trace. This is cross-cutting across all threads. If the target timestamp is not present as a timestamp marker, interpolation is used to approximate the target location in each thread. Only one of this and -skip_instrs can be specified. Requires -cpu_schedule_file to also be specified as a schedule file is required to translate the timestamp into per-thread instruction ordinals.When built with zipfile support, this skipping is optimized and large instruction counts can be quickly skipped. This will skip over top-level metadata records (such as dynamorio::drmemtrace::TRACE_MARKER_TYPE_VERSION, dynamorio::drmemtrace::TRACE_MARKER_TYPE_FILETYPE, dynamorio::drmemtrace::TRACE_MARKER_TYPE_PAGE_SIZE, and dynamorio::drmemtrace::TRACE_MARKER_TYPE_CACHE_LINE_SIZE) and so those records will not appear to analysis tools; however, their contants can be obtained from dynamorio::drmemtrace::memtrace_stream_t API accessors.
-L0_filter_until_instrs
default value: 0
Specifies the number of instructions to run in warmup mode. This instruction count is per-thread. In warmup mode, we filter accesses through the -L0{D,I}_filter caches. If neither -L0D_filter nor -L0I_filter are specified then both are assumed to be true. The size of these can be specified using -L0{D,I}_size. The filter instructions come after the -trace_after_instrs count and before the full trace. This is intended to be used together with other trace options (e.g., -trace_for_instrs, -exit_after_tracing, -max_trace_size etc.) but with the difference that a filter trace is also collected. The filter trace and full trace are stored in a single file separated by a TRACE_MARKER_TYPE_FILTER_ENDPOINT marker. When used with windows (i.e., -retrace_every_instrs), each window contains a filter trace and a full trace. Therefore TRACE_MARKER_TYPE_WINDOW_ID markers indicate start of filtered records.
-warmup_refs
default value: 0
This option is honored by certain tools such as the cache and TLB simulators. It causes them to not start analysis/simulation until this many non-marker records (i.e., actual memory reference ('ref') records) are seen. If -skip_refs is specified, the warmup records start after the skipped ones end. This flag is incompatible with warmup_fraction.
-warmup_fraction
default value: 0
Specifies the fraction of last level cache blocks to be loaded such that the cache is considered to be warmed up before simulation. The warmup fraction is computed after the skipped references and before simulated references. This flag is incompatible with warmup_refs.
-sim_refs
default value: 8589934592G
This option is honored by certain tools such as the cache and TLB simulators. It causes them to only analyze this many non-marker (i.e., actual memory reference ('ref') records) and then exit. If -skip_refs is specified, the analyzed records start after the skipped ones end; similarly, if -warmup_refs or -warmup_fraction is specified, the warmup records come prior to the -sim_refs records. Use -exit_after_records for a similar feature that works on all tools (but does not work with -warmup_*, -sim_refs, or -skip_refs).
-exit_after_records
default value: 8589934592G
Causes trace analyzers to only analyze this many records and then exit. If instructions are skipped (-skip_instrs), that happens first before record counting starts here. This is similar to -sim_refs, though it is implemented in the framework and so applies to all tools. This option is not compatible with -sim_refs, -skip_refs, -warmup_refs, or -warmup_fraction. For traces with multiple shards, each shard separately stops when it reaches this count within the shard.
-view_syntax
default value: att/arm/dr/riscv
Specifies the syntax to use when viewing disassembled offline traces. The option can be set to one of "att" (AT&T style), "intel" (Intel style), "dr" (DynamoRIO's native style with all implicit operands listed), "arm" (32-bit ARM style), and "riscv" (RISC-V style). An invalid specification falls back to the default, which is "att" for x86, "arm" for ARM (32-bit), "dr" for AArch64, and "riscv" for RISC-V.
-config_file
default value: ""
The full path to the cache hierarchy configuration file.
-add_noise_generator
default value: false
Adds synthetic trace records produced by a noise generator as another input workload to the scheduler. These synthetic records are interleaved by the scheduler with the target trace(s) records. Currently it only adds a single-process, single thread noise generator, but that may change in the future.
-report_top
default value: 10
Specifies the number of top results to be reported.
-reuse_distance_threshold
default value: 100
Specifies the reuse distance threshold for reporting the distant repeated references. A reference is a distant repeated reference if the distance to the previous reference on the same cache line exceeds the threshold.
-reuse_distance_histogram
default value: false
By default only the mean, median, and standard deviation of the reuse distances are reported. This option prints out the full histogram of reuse distances.
-reuse_skip_dist
default value: 500
Specifies the distance between nodes in the skip list. For optimal performance, set this to a value close to the estimated average reuse distance of the dataset.
-reuse_distance_limit
default value: 0
Specifies the maximum length of the access history list used for distance calculation. Setting this limit can significantly improve performance and reduce memory consumption for very long traces.
-reuse_verify_skip
default value: false
Verifies every skip list-calculated reuse distance with a full list walk. This incurs significant additional overhead. This option is only available in debug builds.
-reuse_histogram_bin_multiplier
default value: 1
The first histogram bin has a size of 1, meaning it contains the count for one distance. Each subsequent bin size is increased by this multiplier. For multipliers >1.0, this results in geometric growth of bin sizes, with multiple distance values being reported for each bin. For large traces, a value of 1.05 works well to limit the output to a reasonable number of bins. Note that this option only affects the printing of histograms via the -reuse_distance_histogram option; the raw histogram data is always collected at full precision.
-record_function
default value: ""
Record invocations trace for the specified function(s) in the option value. Default value is empty. The value should fit this format: function_name|func_args_num (e.g., -record_function "memset|3") with an optional suffix "|noret" (e.g., -record_function "free|1|noret"). The trace would contain information for each function invocation's return address, function argument value(s), and (unless "|noret" is specified) function return value. (If multiple requested functions map to the same address and differ in whether "noret" was specified, the attribute from the first one requested will be used. If they differ in the number of args, the minimum value will be used.) We only record pointer-sized arguments and return values. The trace identifies which function is involved via a numeric ID entry prior to each set of value entries. The mapping from numeric ID to library-qualified symbolic name is recorded during tracing in a file "funclist.log" whose format is described by the drmemtrace_get_funclist_path() function's documentation. If the target function is in the dynamic symbol table, then the function_name should be a mangled name (e.g. "_Znwm" for "operator new", "_ZdlPv" for "operator delete"). Otherwise, the function_name should be a demangled name. Recording multiple functions can be achieved by using the separator "&" (e.g., -record_function "memset|3&memcpy|3"), or specifying multiple -record_function options (e.g., -record_function "memset|3" -record_function "memcpy|3"). Note that the provided function name should be unique, and not collide with existing heap functions (see -record_heap_value) if -record_heap option is enabled.
-record_heap
default value: false
It is a convenience option to enable recording a trace for the defined heap functions in -record_heap_value. Specifying this option is equivalent to -record_function [heap_functions], where [heap_functions] is the value in -record_heap_value.
-record_heap_value
default value: malloc|1&free|1|noret&realloc|2&calloc|2&tc_malloc|1&tc_free|1|noret&tc_realloc|2&tc_calloc|2&__libc_malloc|1&__libc_free|1|noret&__libc_realloc|2&__libc_calloc|2&_Znwm|1&_ZnwmRKSt9nothrow_t|2&_ZnwmSt11align_val_t|2&_ZnwmSt11align_val_tRKSt9nothrow_t|3&_ZnwmPv|2&_Znam|1&_ZnamRKSt9nothrow_t|2&_ZnamSt11align_val_t|2&_ZnamSt11align_val_tRKSt9nothrow_t|3&_ZnamPv|2&_ZdlPv|1|noret&_ZdlPvRKSt9nothrow_t|2|noret&_ZdlPvSt11align_val_t|2|noret&_ZdlPvSt11align_val_tRKSt9nothrow_t|3|noret&_ZdlPvm|2|noret&_ZdlPvmSt11align_val_t|3|noret&_ZdlPvS_|2|noret&_ZdaPv|1|noret&_ZdaPvRKSt9nothrow_t|2|noret&_ZdaPvSt11align_val_t|2|noret&_ZdaPvSt11align_val_tRKSt9nothrow_t|3|noret&_ZdaPvm|2|noret&_ZdaPvmSt11align_val_t|3|noret&_ZdaPvS_|2|noret
Functions recorded by -record_heap. The option value should fit the same format required by -record_function. These functions will not be traced unless -record_heap is specified.
-record_dynsym_only
default value: false
Symbol lookup can be expensive for large applications and libraries. This option causes the symbol lookup for -record_function and -record_heap to look in the dynamic symbol table only.
-record_replace_retaddr
default value: false
Function wrapping can be expensive for large concurrent applications. This option causes the post-function control point to be located using return address replacement, which has lower overhead, but runs the risk of breaking an application that examines or changes its own return addresses in the recorded functions.
-record_syscall
default value: ""
Record the parameters and success of the specified system call number(s). The option value should fit this format: sycsall_number|parameter_number E.g., -record_syscall "2|2" will record SYS_open's 2 parameters and whether successful (1 for success or 0 for failure, in a function return value record) for x86 Linux. SYS_futex is recorded by default on Linux and this option's value adds to futex rather than replacing it (setting futex to 0 parameters disables). The trace identifies which syscall owns each set of parameter and return value records via a numeric ID equal to the syscall number + TRACE_FUNC_ID_SYSCALL_BASE. Recording multiple syscalls can be achieved by using the separator "&" (e.g., -record_syscall "202|6&3|1"), or specifying multiple -record_syscall options. It is up to the user to ensure the values are correct; a too-large parameter count may cause tracing to fail with an error mid-run.
-miss_count_threshold
default value: 50000
Specifies the minimum number of LLC misses of a load for it to be eligible for analysis in search of patterns in the miss address stream.
-miss_frac_threshold
default value: 0.005
Specifies the minimum fraction of LLC misses of a load (from all misses) for it to be eligible for analysis in search of patterns in the miss address stream.
-confidence_threshold
default value: 0.75
Specifies the minimum confidence to include a discovered pattern in the output results. Confidence in a discovered pattern for a load instruction is calculated as the fraction of the load's misses with the discovered pattern over all the load's misses.
-enable_drstatecmp
default value: false
When true, this option enables the drstatecmp library that performs state comparisons to detect instrumentation-induced bugs due to state clobbering.
-enable_kernel_tracing
default value: false
By default, offline tracing only records a userspace trace. If this option is enabled, offline tracing will record each syscall's Kernel PT and write every syscall's PT and metadata to files in -outdir/kernel.raw/ for later offline analysis. And this feature is available only on Intel CPUs that support Intel@ Processor Trace.
-skip_kcore_dump
default value: false
By default, when -enable_kernel_tracing is set, offline tracing will dump kcore and kallsyms to the raw trace directory, which requires the user to run the target application with superuser permissions. However, if this option is enabled, we skip the dump, and since collecting kernel trace data using Intel-PT does not necessarily need superuser permissions, the target application can be run as normal. This may be useful if it is not feasible to run the application with superuser permissions and the user wants to use a different kcore dump, from a prior trace or created separately.
-kernel_trace_buffer_size_shift
default value: 8
When -enable_kernel_tracing is set, this is used to compute the size of the buffer used to collect kernel trace data. The size is computed as (1 << kernel_trace_buffer_size_shift) * page_size. Too large buffers can cause OOMs on apps with many threads, whereas too small buffers can cause decoding issues in raw2trace due to dropped trace data.
-core_sharded
default value: false
By default, the sharding mode is determined by the preferred shard type of thetools selected (unless overridden, the default preferred type is thread-sharded). This option enables core-sharded, overriding tool defaults. Core-sharded anlysis schedules the input software threads onto virtual cores and analyzes each core in parallel. Thus, each shard consists of pieces from many software threads. How the scheduling is performed is controlled by a set of options with the prefix "sched_" along with -cores. If only core-sharded-preferring tools are enabled (e.g., cache_simulator, TLB_simulator, schedule_stats) and they all support parallel operation, -core_sharded is automatically turned on for offline analysis.
-core_serial
default value: false
In this mode, scheduling is performed just like for -core_sharded. However, the resulting schedule is acted upon by a single analysis threadwhich walks the N cores in lockstep in round robin fashion. How the scheduling is performed is controlled by a set of options with the prefix "sched_" along with -cores. If only core-sharded-preferring tools are enabled (e.g., cache_simulator, TLB_simulator, schedule_stats) and not all of them support parallel operation, -core_serial is automatically turned on for offline analysis.
-sched_quantum
default value: 10000000
Applies to -core_sharded and -core_serial. Scheduling quantum in instructions, unless -sched_time is set in which case this value is the quantum in simulated microseconds (equal to wall-clock microseconds multiplied by -sched_time_per_us).
-sched_time
default value: false
Applies to -core_sharded and -core_serial. Whether to use wall-clock time (multiplied by -sched_time_per_us) for measuring idle time and for the scheduling quantum (see -sched_quantum).
-sched_order_time
default value: true
Applies to -core_sharded and -core_serial. Whether to honor recorded timestamps for ordering
-sched_syscall_switch_us
default value: 30000000
Minimum latency in timestamp units (us) to consider a non-blocking syscall as incurring a context switch (see -sched_blocking_switch_us for maybe-blocking syscalls). Applies to -core_sharded and -core_serial.
-sched_blocking_switch_us
default value: 500
Minimum latency in timestamp units (us) to consider any syscall that is marked as maybe-blocking to incur a context switch. Applies to -core_sharded and -core_serial.
-sched_block_scale
default value: 0.1
A system call considered to block (see -sched_blocking_switch_us) will block in the trace scheduler for an amount of simulator time equal to its as-traced latency in trace-time microseconds multiplied by this parameter and by -sched_time_per_us in simulated microseconds, subject to a maximum of –sched_block_max_us. A higher value here results in blocking syscalls keeping inputs unscheduled for longer. There is indirect overhead inflating the as-traced times, so a value below 1 is typical.
-sched_block_max_us
default value: 2500
The maximum blocked time, after scaling with -sched_block_scale.
-record_file
default value: ""
Applies to -core_sharded and -core_serial. Path for storing record of schedule.
-replay_file
default value: ""
Applies to -core_sharded and -core_serial. Path with stored schedule for replay.
-cpu_schedule_file
default value: ""
Applies to -core_sharded and -core_serial. Path with stored as-traced schedule for replay. If specified with a non-zero -skip_to_timestamp, there is no replay and instead the file is used for the skip request.
-sched_switch_file
default value: ""
Applies to -core_sharded and -core_serial. Path to file holding context switch sequences. The file can contain multiple sequences each with regular trace headers and the sequence proper bracketed by TRACE_MARKER_TYPE_CONTEXT_SWITCH_START and TRACE_MARKER_TYPE_CONTEXT_SWITCH_END markers.
-sched_syscall_file
default value: ""
Path to file holding system call sequences. The file can contain multiple sequences each with regular trace headers and the sequence proper bracketed by TRACE_MARKER_TYPE_SYSCALL_TRACE_START and TRACE_MARKER_TYPE_SYSCALL_TRACE_END markers.
-sched_randomize
default value: false
Applies to -core_sharded and -core_serial. Disables the normal methods of choosing the next input based on priority, timestamps (if -sched_order_time is set), and FIFO order and instead selects the next input randomly. This is intended for experimental use in sensitivity studies.
-sched_disable_direct_switches
default value: false
Applies to -core_sharded and -core_serial. Disables switching to the recorded targets of TRACE_MARKER_TYPE_DIRECT_THREAD_SWITCH system call metadata markers and causes the associated system call to be treated like any other call with a switch being determined by latency and the next input in the queue. The TRACE_MARKER_TYPE_DIRECT_THREAD_SWITCH markers are not removed from the trace.
-sched_infinite_timeouts
default value: false
Applies to -core_sharded and -core_serial. Determines whether an unscheduled-indefinitely input really is never scheduled (set to true), or instead is treated as blocked for the maximum time (scaled by the regular block scale) (set to false).
-sched_time_units_per_us
default value: 1000
Time units per simulated microsecond. The units are either the instruction count plus the idle count (the default) or if -sched_time is selected wall-clock microseconds. This option value scales all of the -sched_*_us values as it converts time units into the simulated microseconds measured by those options.
-sched_migration_threshold_us
default value: 500
The minimum time in simulated microseconds that must have elapsed since an input last ran on a core before it can be migrated to another core.
-sched_rebalance_period_us
default value: 50000
The period in simulated microseconds at which per-core run queues are re-balanced to redistribute load.
-sched_exit_if_fraction_inputs_left
default value: 0.1
Applies to -core_sharded and -core_serial. When an input reaches EOF, if the number of non-EOF inputs left as a fraction of the original inputs is equal to or less than this value then the scheduler exits (sets all outputs to EOF) rather than finishing off the final inputs. This helps avoid long sequences of idles during staggered endings with fewer inputs left than cores and only a small fraction of the total instructions left in those inputs. Since the remaining instruction count is not considered (as it is not available), use discretion when raising this value on uneven inputs.
-sched_max_cores
default value: 0
If non-zero, only this many live cores can be scheduled at any one time. Other cores will remain idle.
-schedule_stats_print_every
default value: 500000
A letter is printed every N instrs or N waits
-syscall_template_file
default value: ""
Path to the file that contains system call trace templates. If set, system call traces will be injected from the file into the resulting trace. This is still experimental so the template file format may change without backward compatibility.
-filter_stop_timestamp
default value: 0
Record filtering will be disabled (everything will be output) when the tool sees a TRACE_MARKER_TYPE_TIMESTAMP marker with timestamp greater than the specified value.
-filter_cache_size
default value: 0
Enable data cache filter with given size (in bytes), with 64 byte line size and a direct mapped LRU cache.
-filter_trace_types
default value: ""
Comma-separated integers for trace types to remove. See trace_type_t for the list of trace entry types.
-filter_marker_types
default value: ""
Comma-separated integers for marker types to remove. See trace_marker_type_t for the list of marker types.
-filter_encodings2regdeps
default value: false
This option is for -tool record_filter. When present, it converts the encoding of instructions from a real ISA to the DR_ISA_REGDEPS synthetic ISA.
-filter_keep_func_ids
default value: ""
This option is for -tool record_filter. It preserves TRACE_MARKER_TYPE_FUNC_[ID | ARG | RETVAL | RETADDR] markers for the listed function IDs and removes those belonging to unlisted function IDs.
-filter_modify_marker_value
default value: ""
This option is for -tool record_filter. It modifies the value of all listed TRACE_MARKER_TYPE_ markers in the trace with their corresponding new_value. The list must have an even size. Example: -filter_modify_marker_value 3,24,18,2048 sets all TRACE_MARKER_TYPE_CPU_ID == 3 in the trace to core 24 and TRACE_MARKER_TYPE_PAGE_SIZE == 18 to 2k.
-trim_before_timestamp
default value: 0
Removes all records (after headers) before the first TRACE_MARKER_TYPE_TIMESTAMP marker in the trace with timestamp greater than or equal to the specified value.
-trim_after_timestamp
default value: 0
Removes all records from the first TRACE_MARKER_TYPE_TIMESTAMP marker with timestamp larger than the specified value.
-trim_before_instr
default value: 0
Removes all records (after headers) before the first TRACE_MARKER_TYPE_TIMESTAMP marker in the trace that comes after the specified instruction ordinal.
-trim_after_instr
default value: 0
Removes all records from the first TRACE_MARKER_TYPE_TIMESTAMP marker in the trace that comes after the specified instruction ordinal.
-abort_on_invariant_error
default value: true
When set to true, the trace invariant checker analysis tool aborts when a trace invariant error is found. Otherwise it prints the error and continues. Also, the total invariant error count is printed at the end; a non-zero error count does not affect the exit code of the analyzer.
-pt2ir_best_effort
default value: false
By default, errors in decoding the kernel syscall PT trace in pt2ir are fatal to raw2trace. With -pt2ir_best_effort, those errors do not cause failures and their counts are reported by raw2trace at the end. This may result in a trace where not all syscalls have a trace, and the ones that do may have some PC discontinuities due to non-fatal decoding errors (these discontinuities will still be reported by the invariant checker). When using this option, it is prudent to check raw2trace stats to confirm that the error counts are within expected bounds (total syscall traces converted, syscall traces that were dropped from final trace because conversion failed, syscall traces found to be empty, and non-fatal decode errors seen in converted syscall traces).
-scale_timers
default value: 1
If >1, application timer initial durations and periodic durations are inflated by this scale. This can help preserve relative timing between timer-based application work and other application work in the presence of significant slowdowns from tracing. Currently only supported on Linux.
-scale_timeouts
default value: 1
If >1, time arguments to certain system calls (currently Linux-only sleeps) are multiplied by the specified value. This can help preserve relative timing among application threads in the presence of significant slowdowns from tracing.