DynamoRIO
Simulator Parameters

drcachesim's behavior can be controlled through options passed after the -c drcachesim but prior to the "--" delimiter on the command line:

$ bin64/drrun -t drcachesim <options> <to> <drcachesim> -- /path/to/target/app <args> <for> <app>

Boolean options can be disabled using a "-no_" prefix.

The parameters available are described below:

  • -offline
    default value: false
    By default, traces are processed online, sent over a pipe to a simulator. If this option is enabled, trace data is instead written to files in -outdir for later offline analysis. No simulator is executed.
  • -ipc_name
    default value: drcachesimpipe
    For online tracing and simulation (the default, unless -offline is requested), specifies the name of the named pipe used to communicate between the target application processes and the caching device simulator. On Linux this can include an absolute path (if it doesn't, a default temp directory will be used). A unique name must be chosen for each instance of the simulator being run at any one time. On Windows, the name is limited to 247 characters.
  • -outdir
    default value: .
    For the offline analysis mode (when -offline is requested), specifies the path to a directory where per-thread trace files will be written.
  • -subdir_prefix
    default value: drmemtrace
    For the offline analysis mode (when -offline is requested), specifies the prefix for the name of the sub-directory where per-thread trace files will be written. The sub-directory is created inside -outdir and has the form 'prefix.app-name.pid.id.dir'.
  • -indir
    default value: ""
    After a trace file is produced via -offline into -outdir, it can be passed to the simulator via this flag pointing at the subdirectory created in -outdir. The -offline tracing produces raw data files which are converted into final trace files on the first execution with -indir. The raw files can also be manually converted using the drraw2trace tool. Legacy single trace files with all threads interleaved into one are not supported with this option: use -infile instead.
  • -infile
    default value: ""
    Directs the simulator to use a single all-threads-interleaved-into-one trace file. This is a legacy file format that is no longer produced.
  • -jobs
    default value: -1
    By default, both post-processing of offline raw trace files and analysis of trace files is parallelized. This option controls the number of concurrent jobs. 0 disables concurrency and uses a single thread to perform all operations. A negative value sets the job count to the number of hardware threads, with a cap of 16.
  • -module_file
    default value: ""
    The opcode_mix tool needs the modules.log file (generated by the offline post-processing step in the raw/ subdirectory) in addition to the trace file. If the file is named modules.log and is in the same directory as the trace file, or a raw/ subdirectory below the trace file, this parameter can be omitted.
  • -alt_module_dir
    default value: ""
    Specifies a directory containing libraries referenced in -module_file for analysis tools, or in the raw modules file for post-prcoessing of offline raw trace files. This directory takes precedence over the recorded path.
  • -funclist_file
    default value: ""
    The func_view tool needs the mapping from function name to identifier that was recorded during offline tracing. This data is stored in its own separate file in the raw/ subdirectory. If the file is named funclist.log and is in the same directory as the trace file, or a raw/ subdirectory below the trace file, this parameter can be omitted.
  • -cores
    default value: 4
    Specifies the number of cores to simulate.
  • -line_size
    default value: 64
    Specifies the cache line size, which is assumed to be identical for L1 and L2 caches. Must be a power of 2.
  • -L1I_size
    default value: 32K
    Specifies the total size of each L1 instruction cache. Must be a power of 2 and a multiple of -line_size.
  • -L1D_size
    default value: 32K
    Specifies the total size of each L1 data cache. Must be a power of 2 and a multiple of -line_size.
  • -L1I_assoc
    default value: 8
    Specifies the associativity of each L1 instruction cache. Must be a power of 2.
  • -L1D_assoc
    default value: 8
    Specifies the associativity of each L1 data cache. Must be a power of 2.
  • -LL_size
    default value: 8M
    Specifies the total size of the unified last-level (L2) cache. Must be a power of 2 and a multiple of -line_size.
  • -LL_assoc
    default value: 16
    Specifies the associativity of the unified last-level (L2) cache. Must be a power of 2.
  • -LL_miss_file
    default value: ""
    If non-empty, when running the cache simulator, requests that every last-level cache miss be written to a file at the specified path. Each miss is written in text format as a <program counter, address> pair. If this tool is linked with zlib, the file is written in gzip-compressed format. If non-empty, when running the cache miss analyzer, requests that prefetching hints based on the miss analysis be written to the specified file. Each hint is written in text format as a <program counter, stride, locality level> tuple.
  • -L0_filter
    default value: false
    Filters out instruction and data hits in a 'zero-level' cache during tracing itself, shrinking the final trace to only contain instruction and data accesses that miss in this initial cache. This cache is direct-mapped with sizes equal to -L0I_size and -L0D_size. It uses virtual addresses regardless of -use_physical. The dynamic (pre-filtered) per-thread instruction count is tracked and supplied via a TRACE_MARKER_TYPE_INSTRUCTION_COUNT marker at thread buffer boundaries and at thread exit.
  • -L0I_size
    default value: 32K
    Specifies the size of the 'zero-level' instruction cache for -L0_filter. Must be a power of 2 and a multiple of -line_size, unless it is set to 0, which disables instruction fetch entries from appearing in the trace.
  • -L0D_size
    default value: 32K
    Specifies the size of the 'zero-level' data cache for -L0_filter. Must be a power of 2 and a multiple of -line_size, unless it is set to 0, which disables data entries from appearing in the trace.
  • -instr_only_trace
    default value: false
    If -instr_only_trace, only instruction fetch entries are included in the trace and data entries are omitted.
  • -coherence
    default value: false
    Writes to cache lines will invalidate other private caches that hold that line.
  • -use_physical
    default value: false
    If available, the default virtual addresses will be translated to physical. This is not possible from user mode on all platforms. This is not supported with -offline at this time.
  • -virt2phys_freq
    default value: 0
    This option only applies if -use_physical is enabled. The virtual to physical mapping is cached for performance reasons, yet the underlying mapping can change without notice. This option controls the frequency with which the cached value is ignored in order to re-access the actual mapping and ensure accurate results. The units are the number of memory accesses per forced access. A value of 0 uses the cached values for the entire application execution.
  • -cpu_scheduling
    default value: false
    By default, the simulator schedules threads to simulated cores in a static round-robin fashion. This option causes the scheduler to instead use the recorded cpu that each thread executed on (at a granularity of the trace buffer size) for scheduling, mapping traced cpu's to cores and running each segment of each thread on the core that owns the recorded cpu for that segment.
  • -max_trace_size
    default value: 0
    If non-zero, this sets a maximum size on the amount of raw trace data gathered for each thread. This is not an exact limit: it may be exceeded by the size of one internal buffer. Once reached, instrumentation continues for that thread, but no further data is recorded.
  • -max_global_trace_refs
    default value: 0
    If non-zero, this sets a maximum size on the amount of trace entry references (of any type: instructions, loads, stores, markers, etc.) recorded. Once reached, instrumented execution continues, but no further data is recorded. This is similar to -exit_after_tracing but without terminating the process.The reference count is approximate.
  • -trace_after_instrs
    default value: 0
    If non-zero, this causes tracing to be suppressed until this many dynamic instruction executions are observed. At that point, regular tracing is put into place. Use -max_trace_size to set a limit on the subsequent trace length.
  • -exit_after_tracing
    default value: 0
    If non-zero, after tracing the specified number of references, the process is exited with an exit code of 0. The reference count is approximate. Use -max_global_trace_refs instead to avoid terminating the process.
  • -online_instr_types
    default value: false
    By default, offline traces include some information on the types of instructions, branches in particular. For online traces, this comes at a performance cost, so it is turned off by default.
  • -replace_policy
    default value: LRU
    Specifies the replacement policy for caches. Supported policies: LRU (Least Recently Used), LFU (Least Frequently Used), FIFO (First-In-First-Out).
  • -data_prefetcher
    default value: nextline
    Specifies the hardware data prefetcher policy. The currently supported policies are 'nextline' (fetch the subsequent cache line) and 'none' (disables hardware prefetching). The prefetcher is located between the L1D and LL caches.
  • -page_size
    default value: 4K
    Specifies the virtual/physical page size.
  • -TLB_L1I_entries
    default value: 32
    Specifies the number of entries in each L1 instruction TLB. Must be a power of 2.
  • -TLB_L1D_entries
    default value: 32
    Specifies the number of entries in each L1 data TLB. Must be a power of 2.
  • -TLB_L1I_assoc
    default value: 32
    Specifies the associativity of each L1 instruction TLB. Must be a power of 2.
  • -TLB_L1D_assoc
    default value: 32
    Specifies the associativity of each L1 data TLB. Must be a power of 2.
  • -TLB_L2_entries
    default value: 1024
    Specifies the number of entries in each unified L2 TLB. Must be a power of 2.
  • -TLB_L2_assoc
    default value: 4
    Specifies the associativity of each unified L2 TLB. Must be a power of 2.
  • -TLB_replace_policy
    default value: LFU
    Specifies the replacement policy for TLBs. Supported policies: LFU (Least Frequently Used).
  • -simulator_type
    default value: cache
    Specifies the type of the simulator. Supported types: cache, miss_analyzer, TLB, reuse_distance, reuse_time, histogramor basic_counts.
  • -verbose
    default value: 0
    Verbosity level for notifications.
  • -show_func_trace
    default value: true
    In the func_trace tool, this controls whether every traced call is shown or instead only aggregate statistics are shown.
  • -disable_optimizations
    default value: false
    Disables various optimizations where information is omitted from offline trace recording when it can be reconstructed during post-processing. This is meant for testing purposes.
  • -dr
    default value: ""
    Specifies the path of the DynamoRIO root directory.
  • -dr_debug
    default value: false
    Requests use of the debug build of DynamoRIO rather than the release build.
  • -dr_ops
    default value: ""
    Specifies the options to pass to DynamoRIO.
  • -tracer
    default value: ""
    The full path to the tracer library.
  • -tracer_alt
    default value: ""
    The full path to the tracer library for the other bitwidth, for use on child processes with a different bitwidth from their parent. If empty, such child processes will die with fatal errors.
  • -skip_refs
    default value: 0
    Specifies the number of references to skip in the beginning of the application execution. These memory references are dropped instead of being simulated.
  • -warmup_refs
    default value: 0
    Specifies the number of memory references to warm up caches before simulation. The warmup references come after the skipped references and before the simulated references. This flag is incompatible with warmup_fraction.
  • -warmup_fraction
    default value: 0
    Specifies the fraction of last level cache blocks to be loaded such that the cache is considered to be warmed up before simulation. The warmup fraction is computed after the skipped references and before simulated references. This flag is incompatible with warmup_refs.
  • -sim_refs
    default value: 8589934592G
    Specifies the number of memory references to simulate. The simulated references come after the skipped and warmup references, and the references following the simulated ones are dropped.
  • -view_syntax
    default value: att/arm/dr
    Specifies the syntax to use when viewing disassembled offline traces. The option can be set to one of "att" (AT&T style), "intel" (Intel style), "dr" (DynamoRIO's native style with all implicit operands listed), and "arm" (32-bit ARM style). An invalid specification falls back to the default, which is "att" for x86, "arm" for ARM (32-bit), and "dr" for AArch64.
  • -config_file
    default value: ""
    The full path to the cache hierarchy configuration file.
  • -report_top
    default value: 10
    Specifies the number of top results to be reported.
  • -reuse_distance_threshold
    default value: 100
    Specifies the reuse distance threshold for reporting the distant repeated references. A reference is a distant repeated reference if the distance to the previous reference on the same cache line exceeds the threshold.
  • -reuse_distance_histogram
    default value: false
    By default only the mean, median, and standard deviation of the reuse distances are reported. This option prints out the full histogram of reuse distances.
  • -reuse_skip_dist
    default value: 500
    Specifies the distance between nodes in the skip list. For optimal performance, set this to a value close to the estimated average reuse distance of the dataset.
  • -reuse_verify_skip
    default value: false
    Verifies every skip list-calculated reuse distance with a full list walk. This incurs significant additional overhead. This option is only available in debug builds.
  • -record_function
    default value: ""
    Record invocations trace for the specified function(s) in the option value. Default value is empty. The value should fit this format: function_name|func_args_num (e.g., -record_function "memset|3") with an optional suffix "|noret" (e.g., -record_function "free|1|noret"). The trace would contain information for each function invocation's return address, function argument value(s), and (unless "|noret" is specified) function return value. (If multiple requested functions map to the same address and differ in whether "noret" was specified, the attribute from the first one requested will be used. If they differ in the number of args, the minimum value will be used.) We only record pointer-sized arguments and return values. The trace identifies which function is involved via a numeric ID entry prior to each set of value entries. The mapping from numeric ID to library-qualified symbolic name is recorded during tracing in a file "funclist.log" whose format is described by the drmemtrace_get_funclist_path() function's documentation. If the target function is in the dynamic symbol table, then the function_name should be a mangled name (e.g. "_Znwm" for "operator new", "_ZdlPv" for "operator delete"). Otherwise, the function_name should be a demangled name. Recording multiple functions can be achieved by using the separator "&" (e.g., -record_function "memset|3&memcpy|3"), or specifying multiple -record_function options (e.g., -record_function "memset|3" -record_function "memcpy|3"). Note that the provided function name should be unique, and not collide with existing heap functions (see -record_heap_value) if -record_heap option is enabled.
  • -record_heap
    default value: false
    It is a convenience option to enable recording a trace for the defined heap functions in -record_heap_value. Specifying this option is equivalent to -record_function [heap_functions], where [heap_functions] is the value in -record_heap_value.
  • -record_heap_value
    default value: malloc|1&free|1|noret&tc_malloc|1&tc_free|1|noret&__libc_malloc|1&__libc_free|1|noret&calloc|2&_Znwm|1&_ZnwmRKSt9nothrow_t|2&_ZnwmSt11align_val_t|2&_ZnwmSt11align_val_tRKSt9nothrow_t|3&_ZnwmPv|2&_Znam|1&_ZnamRKSt9nothrow_t|2&_ZnamSt11align_val_t|2&_ZnamSt11align_val_tRKSt9nothrow_t|3&_ZnamPv|2&_ZdlPv|1|noret&_ZdlPvRKSt9nothrow_t|2|noret&_ZdlPvSt11align_val_t|2|noret&_ZdlPvSt11align_val_tRKSt9nothrow_t|3|noret&_ZdlPvm|2|noret&_ZdlPvmSt11align_val_t|3|noret&_ZdlPvS_|2|noret&_ZdaPv|1|noret&_ZdaPvRKSt9nothrow_t|2|noret&_ZdaPvSt11align_val_t|2|noret&_ZdaPvSt11align_val_tRKSt9nothrow_t|3|noret&_ZdaPvm|2|noret&_ZdaPvmSt11align_val_t|3|noret&_ZdaPvS_|2|noret
    Functions recorded by -record_heap. The option value should fit the same format required by -record_function. These functions will not be traced unless -record_heap is specified.
  • -record_dynsym_only
    default value: false
    Symbol lookup can be expensive for large applications and libraries. This option causes the symbol lookup for -record_function and -record_heap to look in the dynamic symbol table only.
  • -record_replace_retaddr
    default value: false
    Function wrapping can be expensive for large concurrent applications. This option causes the post-function control point to be located using return address replacement, which has lower overhead, but runs the risk of breaking an application that examines or changes its own return addresses in the recorded functions.
  • -miss_count_threshold
    default value: 50000
    Specifies the minimum number of LLC misses of a load for it to be eligible for analysis in search of patterns in the miss address stream.
  • -miss_frac_threshold
    default value: 0.005
    Specifies the minimum fraction of LLC misses of a load (from all misses) for it to be eligible for analysis in search of patterns in the miss address stream.
  • -confidence_threshold
    default value: 0.75
    Specifies the minimum confidence to include a discovered pattern in the output results. Confidence in a discovered pattern for a load instruction is calculated as the fraction of the load's misses with the discovered pattern over all the load's misses.