DynamoRIO
Analysis Tool Suite

In addition to a CPU cache simulator, other analysis tools are available that operate on memory address traces. Which tool is used can be selected with the -simulator_type parameter. New, custom tools can also be created, as described in Creating New Analysis Tools.

Cache Simulator

This is the default tool. Here is an exmample of running it on an offline trace:

$ bin64/drrun -t drcachesim -offline -- ~/test/pi_estimator
Estimation of pi is 3.142425985001098
$ bin64/drrun -t drcachesim -indir drmemtrace.*.dir
Cache simulation results:
Core #0 (1 thread(s))
L1I stats:
Hits: 258,433
Misses: 1,148
Miss rate: 0.44%
L1D stats:
Hits: 93,654
Misses: 2,624
Prefetch hits: 458
Prefetch misses: 2,166
Miss rate: 2.73%
Core #1 (1 thread(s))
L1I stats:
Hits: 8,895
Misses: 99
Miss rate: 1.10%
L1D stats:
Hits: 3,448
Misses: 156
Prefetch hits: 26
Prefetch misses: 130
Miss rate: 4.33%
Core #2 (1 thread(s))
L1I stats:
Hits: 4,150
Misses: 101
Miss rate: 2.38%
L1D stats:
Hits: 1,578
Misses: 130
Prefetch hits: 25
Prefetch misses: 105
Miss rate: 7.61%
Core #3 (0 thread(s))
LL stats:
Hits: 1,414
Misses: 2,844
Prefetch hits: 824
Prefetch misses: 1,577
Local miss rate: 66.79%
Child hits: 370,667
Total miss rate: 0.76%

TLB Simulator

To simulate TLB devices instead of caches, pass TLB to -simulator_type:

$ bin64/drrun -t drcachesim -simulator_type TLB -- ~/test/pi_estimator
Estimation of pi is 3.142425985001098
---- <application exited with code 0> ----
TLB simulation results:
Core #0 (1 thread(s))
L1I stats:
Hits: 252,412
Misses: 401
Miss rate: 0.16%
L1D stats:
Hits: 87,132
Misses: 9,127
Miss rate: 9.48%
LL stats:
Hits: 9,315
Misses: 213
Local miss rate: 2.24%
Child hits: 339,544
Total miss rate: 0.06%
Core #1 (1 thread(s))
L1I stats:
Hits: 8,709
Misses: 20
Miss rate: 0.23%
L1D stats:
Hits: 3,544
Misses: 55
Miss rate: 1.53%
LL stats:
Hits: 15
Misses: 60
Local miss rate: 80.00%
Child hits: 12,253
Total miss rate: 0.49%
Core #2 (1 thread(s))
L1I stats:
Hits: 1,622
Misses: 21
Miss rate: 1.28%
L1D stats:
Hits: 689
Misses: 35
Miss rate: 4.83%
LL stats:
Hits: 3
Misses: 53
Local miss rate: 94.64%
Child hits: 2,311
Total miss rate: 2.24%
Core #3 (0 thread(s))

Reuse Distance

To compute reuse distance metrics:

$ bin64/drrun -t drcachesim -simulator_type reuse_distance -reuse_distance_histogram -- ~/test/pi_estimator
Estimation of pi is 3.142425985001098
---- <application exited with code 0> ----
Reuse distance tool aggregated results:
Total accesses: 349632
Unique accesses: 196603
Unique cache lines accessed: 4235
Reuse distance mean: 14.64
Reuse distance median: 1
Reuse distance standard deviation: 104.10
Reuse distance histogram:
Distance Count Percent Cumulative
0 153029 44.36% 44.36%
1 101294 29.37% 73.73%
2 14116 4.09% 77.82%
3 14248 4.13% 81.95%
4 8894 2.58% 84.53%
5 2733 0.79% 85.32%
...
==================================================
Reuse distance tool results for shard 29327 (thread 29327):
Total accesses: 335084
Unique accesses: 187927
Unique cache lines accessed: 4148
Reuse distance mean: 14.77
Reuse distance median: 1
Reuse distance standard deviation: 106.02
Reuse distance histogram:
Distance Count Percent Cumulative
0 147157 44.47% 44.47%
1 96820 29.26% 73.72%
2 13613 4.11% 77.84%
3 13834 4.18% 82.02%
4 8666 2.62% 84.64%
5 2552 0.77% 85.41%
...
3658 29 0.01% 100.00%
3851 1 0.00% 100.00%
Reuse distance threshold = 100 cache lines
Top 10 frequently referenced cache lines
cache line: #references #distant refs
0x7f2a86b3fd80: 27980, 0
0x7f2a86b3fdc0: 18823, 0
0x7f2a88388fc0: 16409, 111
0x7f2a8838abc0: 15176, 6
0x7f2a883884c0: 9930, 20
0x7f2a88388480: 7944, 20
0x7f2a88388500: 7574, 20
0x7f2a88398d00: 7390, 100
0x7f2a86b3fd40: 6668, 0
0x7f2a88388440: 5717, 20
Top 10 distant repeatedly referenced cache lines
cache line: #references #distant refs
0x7f2a885a4180: 246, 132
0x7f2a87504ec0: 202, 128
0x7f2a875044c0: 323, 126
0x7f2a885a4480: 220, 126
0x7f2a87504f00: 293, 124
0x7f2a86fd7e00: 289, 124
0x7f2a875049c0: 221, 124
0x7f2a875053c0: 270, 122
0x7f2a86db9c00: 269, 122
0x7f2a875047c0: 201, 122
==================================================
Reuse distance tool results for shard 29328 (thread 29328):
Total accesses: 12216
Unique accesses: 7251
Unique cache lines accessed: 319
Reuse distance mean: 12.98
Reuse distance median: 1
Reuse distance standard deviation: 38.19
Reuse distance histogram:
Distance Count Percent Cumulative
0 4965 41.73% 41.73%
1 3758 31.59% 73.32%
2 411 3.45% 76.78%
3 348 2.93% 79.70%
4 179 1.50% 81.21%
5 152 1.28% 82.48%
...

Reuse Time

A reuse time tool is also provided, which counts the total number of memory accesses (without considering uniqueness) between accesses to the same address:

$ bin64/drrun -t drcachesim -simulator_type reuse_time -- ~/test/pi_estimator
Estimation of pi is 3.142425985001098
---- <application exited with code 0> ----
Reuse time tool aggregated results:
Total accesses: 88281
Total instructions: 261315
Mean reuse time: 433.47
Reuse time histogram:
Distance Count Percent Cumulative
1 27893 32.84% 32.84%
2 10948 12.89% 45.73%
3 5789 6.82% 52.54%
...
==================================================
Reuse time tool results for shard 29482 (thread 29482):
Total accesses: 84194
Total instructions: 250854
Mean reuse time: 450.01
Reuse time histogram:
Distance Count Percent Cumulative
1 26677 32.86% 32.86%
2 10508 12.95% 45.81%
3 5427 6.69% 52.50%
...
==================================================
Reuse time tool results for shard 29483 (thread 29483):
Total accesses: 3411
Total instructions: 8805
Mean reuse time: 86.36
Reuse time histogram:
Distance Count Percent Cumulative
1 1014 31.56% 31.56%
2 363 11.30% 42.86%
3 308 9.59% 52.44%

Event Counts

To simply see the counts of instructions and memory references broken down by thread use the basic counts tool:

$ bin64/drrun -t drcachesim -simulator_type basic_counts -- ~/test/pi_estimator
Estimation of pi is 3.142425985001098
---- <application exited with code 0> ----
Basic counts tool results:
Total counts:
267193 total (fetched) instructions
345 total non-fetched instructions
0 total prefetches
67686 total data loads
22503 total data stores
3 total threads
280 total scheduling markers
0 total transfer markers
3 total other markers
Thread 247451 counts:
255009 (fetched) instructions
345 non-fetched instructions
0 prefetches
64453 data loads
21243 data stores
258 scheduling markers
0 transfer markers
1 other markers
Thread 247453 counts:
9195 (fetched) instructions
0 non-fetched instructions
0 prefetches
2444 data loads
937 data stores
12 scheduling markers
0 transfer markers
1 other markers
Thread 247454 counts:
2989 (fetched) instructions
0 non-fetched instructions
0 prefetches
789 data loads
323 data stores
10 scheduling markers
0 transfer markers
1 other markers

The non-fetched instructions are x86 string loop instructions, where subsequent iterations do not incur a fetch. They are included in the trace as a different type of trace entry to support core simulators in addition to cache simulators.

Opcode Mix

The opcode_mix tool uses the non-fetched instruction information along with the preserved libraries and binaries from the traced execution to gather more information on each executed instruction than was stored in the trace. It only supports offline traces, and the modules.log file created during post-processing of the trace must be preserved. The results are broken down by the opcodes used in DR's IR, where for x86 mov is split into a separate opcode for load and store but both have the same public string "mov":

$ bin64/drrun -t drcachesim -offline -- ~/test/pi_estimator
Estimation of pi is 3.142425985001098
$ bin64/drrun -t drcachesim -simulator_type opcode_mix -indir drmemtrace.*.dir
Opcode mix tool results:
267271 : total executed instructions
36432 : mov
31075 : mov
24715 : add
22579 : test
22539 : cmp
12137 : lea
11136 : jnz
10568 : movzx
10243 : jz
9056 : and
8064 : jnz
7279 : jz
5659 : push
4528 : sub
4357 : pop
4001 : shr
3427 : jnbe
2634 : mov
2469 : shl
2344 : jb
2291 : ret
2178 : xor
2164 : call
2111 : pcmpeqb
1472 : movdqa
...

View Disassembly

The view tool prints out disassembled instructions in att, intel, arm or DR format for offline traces. The -skip_refs and -sim_refs flags can be used to set a start point and end point for the disassembled view. Note that these flags compute the number of instructions which are skipped or displayed which is distinct from the number of trace entries.

The tool also displays metadata marker entries for timestamps, on which core and thread the subsequent instruction sequence was executed, and kernel and system call transfers (these correspond to signal or event handler interruptions of the regular execution flow).

$ bin64/drrun -t drcachesim -simulator_type view -sim_refs 20 -indir drmemtrace.*.dir
<marker: timestamp 13218166936578899>
<marker: tid 46977 on core 7>
0x00007f3a5127d870 48 83 ec 48 sub $0x48, %rsp
0x00007f3a5127d874 0f 31 rdtsc
0x00007f3a5127d876 48 c1 e2 20 shl $0x20, %rdx
0x00007f3a5127d87a 89 c0 mov %eax, %eax
0x00007f3a5127d87c 48 09 c2 or %rax, %rdx
0x00007f3a5127d87f 48 8b 05 ea 25 22 00 mov <rel> 0x00007f3a5149fe70, %rax
0x00007f3a5127d886 48 89 15 d3 23 22 00 mov %rdx, <rel> 0x00007f3a5149fc60
0x00007f3a5127d88d 48 8d 15 dc 25 22 00 lea <rel> 0x00007f3a5149fe70, %rdx
0x00007f3a5127d894 49 89 d6 mov %rdx, %r14
0x00007f3a5127d897 4c 2b 35 62 27 22 00 sub <rel> 0x00007f3a514a0000, %r14
0x00007f3a5127d89e 48 85 c0 test %rax, %rax
0x00007f3a5127d8a1 48 89 15 40 31 22 00 mov %rdx, <rel> 0x00007f3a514a09e8
0x00007f3a5127d8a8 4c 89 35 29 31 22 00 mov %r14, <rel> 0x00007f3a514a09d8
0x00007f3a5127d8af 0f 84 9b 00 00 00 jz $0x00007f3a5127d950
0x00007f3a5127d8b5 4c 8d 05 84 27 22 00 lea <rel> 0x00007f3a514a0040, %r8
0x00007f3a5127d8bc 49 b9 d8 03 00 80 03 mov $0x00000003800003d8, %r9
00 00 00
0x00007f3a5127d8c6 48 b9 78 fb ff 7f 03 mov $0x000000037ffffb78, %rcx
00 00 00
0x00007f3a5127d8d0 48 8d 35 41 31 22 00 lea <rel> 0x00007f3a514a0a18, %rsi
0x00007f3a5127d8d7 bf ff ff ff 6f mov $0x6fffffff, %edi
0x00007f3a5127d8dc 41 bb ff fd ff 6f mov $0x6ffffdff, %r11d
View tool results:
20 : total disassembled instructions

Here is an example of a signal handler interrupting the regular flow:

0x00007fa87c6c0512 eb 5a jmp $0x00007fa87c6c056e
0x00007fa87c6c056e 80 bd 7c ff ff ff 00 cmp -0x84(%rbp), $0x00
0x00007fa87c6c0575 0f 85 e5 03 00 00 jnz $0x00007fa87c6c0960
<marker: kernel xfer to handler>
<marker: timestamp 13218875821472138>
<marker: tid 159754 on core 0>
0x00007fa879bb88dc 55 push %rbp
0x00007fa879bb88dd 48 89 e5 mov %rsp, %rbp
0x00007fa879bb88e0 48 83 ec 40 sub $0x40, %rsp
0x00007fa879bb88e4 89 7d dc mov %edi, -0x24(%rbp)
0x00007fa879bb88e7 48 89 75 d0 mov %rsi, -0x30(%rbp)
0x00007fa879bb88eb 48 89 55 c8 mov %rdx, -0x38(%rbp)
0x00007fa879bb88ef 83 7d dc 0a cmp -0x24(%rbp), $0x0a
0x00007fa879bb88f3 74 0e jz $0x00007fa879bb8903
0x00007fa879bb8903 48 8b 45 c8 mov -0x38(%rbp), %rax
0x00007fa879bb8907 48 83 c0 28 add $0x28, %rax
0x00007fa879bb890b 48 89 45 f8 mov %rax, -0x08(%rbp)
0x00007fa879bb890f 48 8b 45 f8 mov -0x08(%rbp), %rax
0x00007fa879bb8913 48 8b 80 80 00 00 00 mov 0x80(%rax), %rax
0x00007fa879bb891a 48 89 45 f0 mov %rax, -0x10(%rbp)
0x00007fa879bb891e eb 6d jmp $0x00007fa879bb898d
0x00007fa879bb898d 90 nop
0x00007fa879bb898e c9 leave
0x00007fa879bb898f c3 ret
0x00007fa87c6ca3a0 48 c7 c0 0f 00 00 00 mov $0x0000000f, %rax
0x00007fa87c6ca3a7 0f 05 syscall
<marker: timestamp 13218875821472148>
<marker: tid 159754 on core 0>
<marker: syscall xfer>
<marker: timestamp 13218875821475975>
<marker: tid 159754 on core 4>
0x00007fa87c6c057b 48 8b 75 c8 mov -0x38(%rbp), %rsi
0x00007fa87c6c057f 64 48 33 34 25 28 00 xor %fs:0x28, %rsi
00 00

View Function Calls

The func_view tool records function argument and return values for function names specified at tracing time. See Tracing Function Calls for more information.

$ bin64/drrun -t drcachesim -offline -record_function 'fib|1' -- ~/test/fib 5
Estimation of pi is 3.142425985001098
$ bin64/drrun -t drcachesim -simulator_type func_view -indir drmemtrace.*.dir
0x7fc06d2288eb => common.fib!fib(0x5)
0x7fc06d22888e => common.fib!fib(0x4)
0x7fc06d22888e => common.fib!fib(0x3)
0x7fc06d22888e => common.fib!fib(0x2)
0x7fc06d22888e => common.fib!fib(0x1) => 0x1
0x7fc06d22889d => common.fib!fib(0x0) => 0x1
=> 0x2
0x7fc06d22889d => common.fib!fib(0x1) => 0x1
=> 0x3
0x7fc06d22889d => common.fib!fib(0x2)
0x7fc06d22888e => common.fib!fib(0x1) => 0x1
0x7fc06d22889d => common.fib!fib(0x0) => 0x1
=> 0x2
=> 0x5
0x7fc06d22889d => common.fib!fib(0x3)
0x7fc06d22888e => common.fib!fib(0x2)
0x7fc06d22888e => common.fib!fib(0x1) => 0x1
0x7fc06d22889d => common.fib!fib(0x0) => 0x1
=> 0x2
0x7fc06d22889d => common.fib!fib(0x1) => 0x1
=> 0x3
=> 0x8
Function view tool results:
Function id=0: common.fib!fib
15 calls
15 returns

The top referenced cache lines are displayed by the histogram tool:

$ bin64/drrun -t drcachesim -simulator_type histogram -- ~/test/pi_estimator
Estimation of pi is 3.142425985001098
---- <application exited with code 0> ----
Cache line histogram tool results:
icache: 1134 unique cache lines
dcache: 3062 unique cache lines
icache top 10
0x7facdd013780: 30929
0x7facdb789fc0: 27664
0x7facdb78a000: 18629
0x7facdd003e80: 18176
0x7facdd003500: 11121
0x7facdd0034c0: 9763
0x7facdd005940: 8865
0x7facdd003480: 8277
0x7facdb789f80: 6660
0x7facdd003540: 5888
dcache top 10
0x7ffcc35e7d80: 4088
0x7ffcc35e7d40: 3497
0x7ffcc35e7e00: 3478
0x7ffcc35e7f40: 2919
0x7ffcc35e7dc0: 2837
0x7facdbe2e980: 2452
0x7facdbe2ec80: 2273
0x7ffcc35e7e80: 2194
0x7facdb6625c0: 2016
0x7ffcc35e7e40: 1997