In addition to a CPU cache simulator, other predefined analysis tools are available that operate on memory address traces. Which set of tools is used can be selected with the -tool
parameter. New, custom tools can also be created, as described in Creating New Analysis Tools.
Cache Simulator
This is the default tool. Here is an exmample of running it on an offline trace:
$ bin64/drrun -t drmemtrace -offline -- ~/test/pi_estimator
Estimation of pi is 3.142425985001098
$ bin64/drrun -t drmemtrace -indir drmemtrace.*.dir
Cache simulation results:
Core #0 (1 thread(s))
L1I stats:
Hits: 258,433
Misses: 1,148
Miss rate: 0.44%
L1D stats:
Hits: 93,654
Misses: 2,624
Prefetch hits: 458
Prefetch misses: 2,166
Miss rate: 2.73%
Core #1 (1 thread(s))
L1I stats:
Hits: 8,895
Misses: 99
Miss rate: 1.10%
L1D stats:
Hits: 3,448
Misses: 156
Prefetch hits: 26
Prefetch misses: 130
Miss rate: 4.33%
Core #2 (1 thread(s))
L1I stats:
Hits: 4,150
Misses: 101
Miss rate: 2.38%
L1D stats:
Hits: 1,578
Misses: 130
Prefetch hits: 25
Prefetch misses: 105
Miss rate: 7.61%
Core #3 (0 thread(s))
LL stats:
Hits: 1,414
Misses: 2,844
Prefetch hits: 824
Prefetch misses: 1,577
Local miss rate: 66.79%
Child hits: 370,667
Total miss rate: 0.76%
TLB Simulator
To simulate TLB devices instead of caches, pass TLB
to -tool:
$ bin64/drrun -t drmemtrace -tool TLB -- ~/test/pi_estimator
Estimation of pi is 3.142425985001098
---- <application exited with code 0> ----
TLB simulation results:
Core #0 (1 thread(s))
L1I stats:
Hits: 252,412
Misses: 401
Miss rate: 0.16%
L1D stats:
Hits: 87,132
Misses: 9,127
Miss rate: 9.48%
LL stats:
Hits: 9,315
Misses: 213
Local miss rate: 2.24%
Child hits: 339,544
Total miss rate: 0.06%
Core #1 (1 thread(s))
L1I stats:
Hits: 8,709
Misses: 20
Miss rate: 0.23%
L1D stats:
Hits: 3,544
Misses: 55
Miss rate: 1.53%
LL stats:
Hits: 15
Misses: 60
Local miss rate: 80.00%
Child hits: 12,253
Total miss rate: 0.49%
Core #2 (1 thread(s))
L1I stats:
Hits: 1,622
Misses: 21
Miss rate: 1.28%
L1D stats:
Hits: 689
Misses: 35
Miss rate: 4.83%
LL stats:
Hits: 3
Misses: 53
Local miss rate: 94.64%
Child hits: 2,311
Total miss rate: 2.24%
Core #3 (0 thread(s))
Reuse Distance
To compute reuse distance metrics:
$ bin64/drrun -t drmemtrace -tool reuse_distance -reuse_distance_histogram -- ~/test/pi_estimator
Estimation of pi is 3.142425985001098
---- <application exited with code 0> ----
Reuse distance tool aggregated results:
Total accesses: 349632
Unique accesses: 196603
Unique cache lines accessed: 4235
Reuse distance mean: 14.64
Reuse distance median: 1
Reuse distance standard deviation: 104.10
Reuse distance histogram:
Distance Count Percent Cumulative
0 153029 44.36% 44.36%
1 101294 29.37% 73.73%
2 14116 4.09% 77.82%
3 14248 4.13% 81.95%
4 8894 2.58% 84.53%
5 2733 0.79% 85.32%
...
==================================================
Reuse distance tool results for shard 29327 (thread 29327):
Total accesses: 335084
Unique accesses: 187927
Unique cache lines accessed: 4148
Reuse distance mean: 14.77
Reuse distance median: 1
Reuse distance standard deviation: 106.02
Reuse distance histogram:
Distance Count Percent Cumulative
0 147157 44.47% 44.47%
1 96820 29.26% 73.72%
2 13613 4.11% 77.84%
3 13834 4.18% 82.02%
4 8666 2.62% 84.64%
5 2552 0.77% 85.41%
...
3658 29 0.01% 100.00%
3851 1 0.00% 100.00%
Reuse distance threshold = 100 cache lines
Top 10 frequently referenced cache lines
cache line: #references #distant refs
0x7f2a86b3fd80: 27980, 0
0x7f2a86b3fdc0: 18823, 0
0x7f2a88388fc0: 16409, 111
0x7f2a8838abc0: 15176, 6
0x7f2a883884c0: 9930, 20
0x7f2a88388480: 7944, 20
0x7f2a88388500: 7574, 20
0x7f2a88398d00: 7390, 100
0x7f2a86b3fd40: 6668, 0
0x7f2a88388440: 5717, 20
Top 10 distant repeatedly referenced cache lines
cache line: #references #distant refs
0x7f2a885a4180: 246, 132
0x7f2a87504ec0: 202, 128
0x7f2a875044c0: 323, 126
0x7f2a885a4480: 220, 126
0x7f2a87504f00: 293, 124
0x7f2a86fd7e00: 289, 124
0x7f2a875049c0: 221, 124
0x7f2a875053c0: 270, 122
0x7f2a86db9c00: 269, 122
0x7f2a875047c0: 201, 122
==================================================
Reuse distance tool results for shard 29328 (thread 29328):
Total accesses: 12216
Unique accesses: 7251
Unique cache lines accessed: 319
Reuse distance mean: 12.98
Reuse distance median: 1
Reuse distance standard deviation: 38.19
Reuse distance histogram:
Distance Count Percent Cumulative
0 4965 41.73% 41.73%
1 3758 31.59% 73.32%
2 411 3.45% 76.78%
3 348 2.93% 79.70%
4 179 1.50% 81.21%
5 152 1.28% 82.48%
...
Reuse Time
A reuse time tool is also provided, which counts the total number of memory accesses (without considering uniqueness) between accesses to the same address:
$ bin64/drrun -t drmemtrace -tool reuse_time -- ~/test/pi_estimator
Estimation of pi is 3.142425985001098
---- <application exited with code 0> ----
Reuse time tool aggregated results:
Total accesses: 88281
Total instructions: 261315
Mean reuse time: 433.47
Reuse time histogram:
Distance Count Percent Cumulative
1 27893 32.84% 32.84%
2 10948 12.89% 45.73%
3 5789 6.82% 52.54%
...
==================================================
Reuse time tool results for shard 29482 (thread 29482):
Total accesses: 84194
Total instructions: 250854
Mean reuse time: 450.01
Reuse time histogram:
Distance Count Percent Cumulative
1 26677 32.86% 32.86%
2 10508 12.95% 45.81%
3 5427 6.69% 52.50%
...
==================================================
Reuse time tool results for shard 29483 (thread 29483):
Total accesses: 3411
Total instructions: 8805
Mean reuse time: 86.36
Reuse time histogram:
Distance Count Percent Cumulative
1 1014 31.56% 31.56%
2 363 11.30% 42.86%
3 308 9.59% 52.44%
Event Counts
To simply see the counts of instructions and memory references broken down by thread use the basic counts tool:
$ bin64/drrun -t drmemtrace -tool basic_counts -- ~/test/pi_estimator
Estimation of pi is 3.142425985001098
---- <application exited with code 0> ----
Basic counts tool results:
Total counts:
134566 total (fetched) instructions
13838 total unique (fetched) instructions
423 total non-fetched instructions
0 total prefetches
30919 total data loads
13122 total data stores
0 total icache flushes
0 total dcache flushes
3 total threads
1134 total timestamp + cpuid markers
0 total idle markers
0 total wait markers
5 total kernel transfer markers
10 total function id markers
0 total function return address markers
30 total function argument markers
5 total function return value markers
0 total physical address + virtual address marker pairs
0 total physical address unavailable markers
75 total system call number markers
0 total blocking system call markers
12 total other markers
0 total encodings
Thread 64951 counts:
130900 (fetched) instructions
13469 unique (fetched) instructions
423 non-fetched instructions
0 prefetches
29844 data loads
12594 data stores
0 icache flushes
0 dcache flushes
1072 timestamp + cpuid markers
0 idle markers
0 wait markers
5 kernel transfer markers
4 function id markers
0 function return address markers
12 function argument markers
2 function return value markers
0 physical address + virtual address marker pairs
0 physical address unavailable markers
56 system call number markers
0 blocking system call markers
4 other markers
0 encodings
Thread 64958 counts:
1861 (fetched) instructions
1009 unique (fetched) instructions
0 non-fetched instructions
0 prefetches
538 data loads
262 data stores
0 icache flushes
0 dcache flushes
30 timestamp + cupid markers
0 idle markers
0 wait markers
0 kernel transfer markers
2 function id markers
0 function return address markers
6 function argument markers
1 function return value markers
0 physical address + virtual address marker pairs
0 physical address unavailable markers
9 system call number markers
0 blocking system call markers
4 other markers
0 encodings
Thread 64959 counts:
1805 (fetched) instructions
978 unique (fetched) instructions
0 non-fetched instructions
0 prefetches
537 data loads
266 data stores
0 icache flushes
0 dcache flushes
32 timestamp + cpuid markers
0 idle markers
0 wait markers
0 kernel transfer markers
4 function id markers
0 function return address markers
12 function argument markers
2 function return value markers
0 physical address + virtual address marker pairs
0 physical address unavailable markers
10 system call number markers
0 blocking system call markers
4 other markers
0 encodings
The non-fetched instructions are x86 string loop instructions, where subsequent iterations do not incur a fetch. They are included in the trace as a different type of trace entry to support core simulators in addition to cache simulators.
Opcode Mix
The opcode_mix tool uses the non-fetched instruction information along with the preserved libraries and binaries from the traced execution to gather more information on each executed instruction than was stored in the trace. To run on online traces, pass the -instr_encodings
option. The results are broken down by the opcodes used in DR's IR, where for x86 mov
is split into a separate opcode for load and store but both have the same public string "mov":
$ bin64/drrun -t drmemtrace -offline -- ~/test/pi_estimator
Estimation of pi is 3.142425985001098
$ bin64/drrun -t drmemtrace -tool opcode_mix -indir drmemtrace.*.dir
Opcode mix tool results:
267271 : total executed instructions
36432 : mov
31075 : mov
24715 : add
22579 : test
22539 : cmp
12137 : lea
11136 : jnz
10568 : movzx
10243 : jz
9056 : and
8064 : jnz
7279 : jz
5659 : push
4528 : sub
4357 : pop
4001 : shr
3427 : jnbe
2634 : mov
2469 : shl
2344 : jb
2291 : ret
2178 : xor
2164 : call
2111 : pcmpeqb
1472 : movdqa
...
Human-Readable View
The view tool prints out the contents of the trace for human viewing, including disassembling instructions in AT&T, Intel, Arm, or DR format (to see disassembly for online traces, pass the -instr_encodings
option). The -skip_refs and -sim_refs flags can be used to set a start point and end point for the disassembled view. Note that these flags compute the number of trace entry records which are skipped or displayed which is distinct from the number of instruction records.
The tool displays loads and stores, as well as metadata marker entries for timestamps, on which core and thread the subsequent instruction sequence was executed, and kernel and system call transfers (these correspond to signal or event handler interruptions of the regular execution flow).
In its first two columns, the tool displays the trace record ordinal and the instruction fetch ordinal.
$ $ bin64/drrun -t drmemtrace -tool view -indir drmemtrace.*.dir -sim_refs 20
Output format:
<--record#-> <--instr#->: <---tid---> <record details>
------------------------------------------------------------
1 0: 3256418 <marker: version 6>
2 0: 3256418 <marker: filetype 0x240>
3 0: 3256418 <marker: cache line size 64>
4 0: 3256418 <marker: chunk instruction count 1024>
5 0: 3256418 <marker: page size 4096>
6 0: 3256418 <marker: timestamp 13312410768080478>
7 0: 3256418 <marker: tid 3256418 on core 7>
8 1: 3256418 ifetch 3 byte(s) @ 0x00007fc205a61940 48 89 e7 mov %rsp, %rdi
9 2: 3256418 ifetch 5 byte(s) @ 0x00007fc205a61943 e8 b8 0c 00 00 call $0x00007fc205a62600
10 2: 3256418 write 8 byte(s) @ 0x00007fff9a9e3528 by PC 0x00007fc205a61943
11 3: 3256418 ifetch 1 byte(s) @ 0x00007fc205a62600 55 push %rbp
12 3: 3256418 write 8 byte(s) @ 0x00007fff9a9e3520 by PC 0x00007fc205a62600
13 4: 3256418 ifetch 3 byte(s) @ 0x00007fc205a62601 48 89 e5 mov %rsp, %rbp
14 5: 3256418 ifetch 2 byte(s) @ 0x00007fc205a62604 41 57 push %r15
15 5: 3256418 write 8 byte(s) @ 0x00007fff9a9e3518 by PC 0x00007fc205a62604
16 6: 3256418 ifetch 2 byte(s) @ 0x00007fc205a62606 41 56 push %r14
17 6: 3256418 write 8 byte(s) @ 0x00007fff9a9e3510 by PC 0x00007fc205a62606
18 7: 3256418 ifetch 2 byte(s) @ 0x00007fc205a62608 41 55 push %r13
19 7: 3256418 write 8 byte(s) @ 0x00007fff9a9e3508 by PC 0x00007fc205a62608
20 8: 3256418 ifetch 2 byte(s) @ 0x00007fc205a6260a 41 54 push %r12
View tool results:
8 : total instructions
An example of thread switches:
------------------------------------------------------------
46 0: 3264758 <marker: timestamp 13312413437398055>
47 0: 3264758 <marker: tid 3264758 on core 2>
48 1: 3264758 ifetch 3 byte(s) @ 0x00007f4ea89e4940 48 89 e7 mov %rsp, %rdi
49 2: 3264758 ifetch 5 byte(s) @ 0x00007f4ea89e4943 e8 b8 0c 00 00 call $0x00007f4ea89e5600
50 2: 3264758 write 8 byte(s) @ 0x00007ffd93a0cf18 by PC 0x00007f4ea89e4943
...
2854543 2149665: 3264758 ifetch 5 byte(s) @ 0x00007f4ea7c87f8c b8 0e 00 00 00 mov $0x0000000e, %eax
2854544 2149666: 3264758 ifetch 2 byte(s) @ 0x00007f4ea7c87f91 0f 05 syscall
------------------------------------------------------------
2854545 2149666: 3264760 <marker: timestamp 13312413438835999>
2854546 2149666: 3264760 <marker: tid 3264760 on core 11>
2854547 2149667: 3264760 ifetch 3 byte(s) @ 0x00007f4ea7d0b099 48 85 c0 test %rax, %rax
2854548 2149668: 3264760 ifetch 2 byte(s) @ 0x00007f4ea7d0b09c 7c 18 jl $0x00007f4ea7d0b0b6
...
Here is an example of a signal handler interrupting the regular flow, with metadata showing that the signal was delivered just after an untaken conditional branch:
801343 601827: 1159769 ifetch 2 byte(s) @ 0x00007fc2c3aa5c70 75 57 jnz $0x00007fc2c3aa5cc9 (untaken)
801344 601827: 1159769 <marker: kernel xfer from 0x7fc2c3aa5c72 to handler>
801345 601827: 1159769 <marker: timestamp 13335923552684013>
801346 601827: 1159769 <marker: tid 1159769 on core 7>
801347 601828: 1159769 ifetch 1 byte(s) @ 0x00007fc2c03fa259 55 push %rbp
801348 601828: 1159769 write 8 byte(s) @ 0x00007fff8044e930 by PC 0x00007fc2c03fa259
801349 601829: 1159769 ifetch 3 byte(s) @ 0x00007fc2c03fa25a 48 89 e5 mov %rsp, %rbp
801350 601830: 1159769 ifetch 3 byte(s) @ 0x00007fc2c03fa25d 89 7d fc mov %edi, -0x04(%rbp)
801351 601830: 1159769 write 4 byte(s) @ 0x00007fff8044e92c by PC 0x00007fc2c03fa25d
801352 601831: 1159769 ifetch 4 byte(s) @ 0x00007fc2c03fa260 48 89 75 f0 mov %rsi, -0x10(%rbp)
801353 601831: 1159769 write 8 byte(s) @ 0x00007fff8044e920 by PC 0x00007fc2c03fa260
801354 601832: 1159769 ifetch 4 byte(s) @ 0x00007fc2c03fa264 48 89 55 e8 mov %rdx, -0x18(%rbp)
801355 601832: 1159769 write 8 byte(s) @ 0x00007fff8044e918 by PC 0x00007fc2c03fa264
801356 601833: 1159769 ifetch 4 byte(s) @ 0x00007fc2c03fa268 83 7d fc 1a cmp -0x04(%rbp), $0x1a
801357 601833: 1159769 read 4 byte(s) @ 0x00007fff8044e92c by PC 0x00007fc2c03fa268
801358 601834: 1159769 ifetch 2 byte(s) @ 0x00007fc2c03fa26c 75 0f jnz $0x00007fc2c03fa27d (untaken)
801359 601835: 1159769 ifetch 6 byte(s) @ 0x00007fc2c03fa26e 8b 05 c0 3e 00 00 mov <rel> 0x00007fc2c03fe134, %eax
801360 601835: 1159769 read 4 byte(s) @ 0x00007fc2c03fe134 by PC 0x00007fc2c03fa26e
801361 601836: 1159769 ifetch 3 byte(s) @ 0x00007fc2c03fa274 83 c0 01 add $0x01, %eax
801362 601837: 1159769 ifetch 6 byte(s) @ 0x00007fc2c03fa277 89 05 b7 3e 00 00 mov %eax, <rel> 0x00007fc2c03fe134
801363 601837: 1159769 write 4 byte(s) @ 0x00007fc2c03fe134 by PC 0x00007fc2c03fa277
801364 601838: 1159769 ifetch 1 byte(s) @ 0x00007fc2c03fa27d 90 nop
801365 601839: 1159769 ifetch 1 byte(s) @ 0x00007fc2c03fa27e 5d pop %rbp
801366 601839: 1159769 read 8 byte(s) @ 0x00007fff8044e930 by PC 0x00007fc2c03fa27e
801367 601840: 1159769 ifetch 1 byte(s) @ 0x00007fc2c03fa27f c3 ret (target 0x7fc2c3a5af90)
801368 601840: 1159769 read 8 byte(s) @ 0x00007fff8044e938 by PC 0x00007fc2c03fa27f
801369 601841: 1159769 ifetch 7 byte(s) @ 0x00007fc2c3a5af90 48 c7 c0 0f 00 00 00 mov $0x0000000f, %rax
801370 601842: 1159769 ifetch 2 byte(s) @ 0x00007fc2c3a5af97 0f 05 syscall
801371 601842: 1159769 <marker: system call 15>
801372 601842: 1159769 <marker: timestamp 13335923552684023>
801373 601842: 1159769 <marker: tid 1159769 on core 7>
801374 601842: 1159769 <marker: syscall xfer from 0x7fc2c3a5af99>
801375 601842: 1159769 <marker: timestamp 13335923552684029>
801376 601842: 1159769 <marker: tid 1159769 on core 7>
801377 601843: 1159769 ifetch 4 byte(s) @ 0x00007fc2c3aa5c72 48 83 c4 48 add $0x48, %rsp
View Function Calls
The func_view tool records function argument and return values for function names specified at tracing time. See Tracing Function Calls for more information.
$ bin64/drrun -t drmemtrace -offline -record_function 'fib|1' -- ~/test/fib 5
Estimation of pi is 3.142425985001098
$ bin64/drrun -t drmemtrace -tool func_view -indir drmemtrace.*.dir
0x7fc06d2288eb => common.fib!fib(0x5)
0x7fc06d22888e => common.fib!fib(0x4)
0x7fc06d22888e => common.fib!fib(0x3)
0x7fc06d22888e => common.fib!fib(0x2)
0x7fc06d22888e => common.fib!fib(0x1) => 0x1
0x7fc06d22889d => common.fib!fib(0x0) => 0x1
=> 0x2
0x7fc06d22889d => common.fib!fib(0x1) => 0x1
=> 0x3
0x7fc06d22889d => common.fib!fib(0x2)
0x7fc06d22888e => common.fib!fib(0x1) => 0x1
0x7fc06d22889d => common.fib!fib(0x0) => 0x1
=> 0x2
=> 0x5
0x7fc06d22889d => common.fib!fib(0x3)
0x7fc06d22888e => common.fib!fib(0x2)
0x7fc06d22888e => common.fib!fib(0x1) => 0x1
0x7fc06d22889d => common.fib!fib(0x0) => 0x1
=> 0x2
0x7fc06d22889d => common.fib!fib(0x1) => 0x1
=> 0x3
=> 0x8
Function view tool results:
Function id=0: common.fib!fib
15 calls
15 returns
Cache Line Histogram
The top referenced cache lines are displayed by the histogram
tool:
$ bin64/drrun -t drmemtrace -tool histogram -- ~/test/pi_estimator
Estimation of pi is 3.142425985001098
---- <application exited with code 0> ----
Cache line histogram tool results:
icache: 1134 unique cache lines
dcache: 3062 unique cache lines
icache top 10
0x7facdd013780: 30929
0x7facdb789fc0: 27664
0x7facdb78a000: 18629
0x7facdd003e80: 18176
0x7facdd003500: 11121
0x7facdd0034c0: 9763
0x7facdd005940: 8865
0x7facdd003480: 8277
0x7facdb789f80: 6660
0x7facdd003540: 5888
dcache top 10
0x7ffcc35e7d80: 4088
0x7ffcc35e7d40: 3497
0x7ffcc35e7e00: 3478
0x7ffcc35e7f40: 2919
0x7ffcc35e7dc0: 2837
0x7facdbe2e980: 2452
0x7facdbe2ec80: 2273
0x7ffcc35e7e80: 2194
0x7facdb6625c0: 2016
0x7ffcc35e7e40: 1997
Invariant Checker
The invariant_checker tool performs sanity checks on a trace, focusing on program counter continuity and guarantees around kernel control transfer interruptions. It optionally checks for restricted behavior that technically is legal but is not expected to happen in the target trace, helping to identify tracing problems and suitability for use of a trace for core simulation (see Core Simulation Support).
System Call Mix
The system call mix tool counts the frequency of system calls in a trace. It works for both online and offline traces. It uses the dynamorio::drmemtrace::TRACE_MARKER_TYPE_SYSCALL markers that store the system call number. This works only with traces that have these markers; that is, offline traces must have dynamorio::drmemtrace::OFFLINE_FILE_TYPE_SYSCALL_NUMBERS in their file type.
$ bin64/drrun -t drmemtrace -indir drmemtrace.ls.*.dir -tool syscall_mix
Syscall mix tool results:
count : syscall_num
17 : 9
16 : 1
8 : 3
7 : 262
6 : 257
5 : 0
5 : 10
3 : 12
2 : 217
2 : 16
2 : 17
2 : 137
2 : 21
1 : 158
1 : 302
1 : 334
1 : 218
1 : 231
1 : 318
1 : 11
1 : 273
Record Filter
The record filter tool modifies a target trace. It contains several varieties of filters which selectively remove records from the tool. The filters currently provided include:
- Removing records of types specified by the -filter_trace_types option. The types are identified by their dynamorio::drmemtrace::trace_type_t enum numeric value.
- Remove marker records of marker types specified by the -filter_marker_types option. The types are identified by their dynamorio::drmemtrace::trace_marker_type_t enum numeric value.
- Running a simple data cache filter and removing hits. The cache is enbabled and its size specified by the -filter_cache_size option.
- Trimming the start (via -trim_before_timestamp) and end (via -trim_after_timestamp) of a trace. Any now-empty shards are deleted entirely.
A filter can be applied only to the start of a trace using the -filter_stop_timestamp option.
Example of removing function markers:
$ bin64/drrun -t drmemtrace -indir mytracedir -tool basic_counts
...
9009 total function id markers
5006 total function return address markers
6007 total function argument markers
4003 total function return value markers
...
$ bin64/drrun -t drmemtrace -tool record_filter -filter_marker_types 4,5,6,7 -indir mytracedir -outdir newdir
Output 1280800 entries from 1304825 entries.
$ bin64/drrun -t drmemtrace -indir newdir -tool basic_counts
...
0 total function id markers
0 total function return address markers
0 total function argument markers
0 total function return value markers
...