DynamoRIO
Analysis Tool Suite

In addition to a CPU cache simulator, other analysis tools are available that operate on memory address traces. Which tool is used can be selected with the -simulator_type parameter. New, custom tools can also be created, as described in Creating New Analysis Tools.

Cache Simulator

This is the default tool. Here is an exmample of running it on an offline trace:

$ bin64/drrun -t drcachesim -offline -- ~/test/pi_estimator
Estimation of pi is 3.142425985001098
$ bin64/drrun -t drcachesim -indir drmemtrace.*.dir
Cache simulation results:
Core #0 (1 thread(s))
L1I stats:
Hits: 258,433
Misses: 1,148
Miss rate: 0.44%
L1D stats:
Hits: 93,654
Misses: 2,624
Prefetch hits: 458
Prefetch misses: 2,166
Miss rate: 2.73%
Core #1 (1 thread(s))
L1I stats:
Hits: 8,895
Misses: 99
Miss rate: 1.10%
L1D stats:
Hits: 3,448
Misses: 156
Prefetch hits: 26
Prefetch misses: 130
Miss rate: 4.33%
Core #2 (1 thread(s))
L1I stats:
Hits: 4,150
Misses: 101
Miss rate: 2.38%
L1D stats:
Hits: 1,578
Misses: 130
Prefetch hits: 25
Prefetch misses: 105
Miss rate: 7.61%
Core #3 (0 thread(s))
LL stats:
Hits: 1,414
Misses: 2,844
Prefetch hits: 824
Prefetch misses: 1,577
Local miss rate: 66.79%
Child hits: 370,667
Total miss rate: 0.76%

TLB Simulator

To simulate TLB devices instead of caches, pass TLB to -simulator_type:

$ bin64/drrun -t drcachesim -simulator_type TLB -- ~/test/pi_estimator
Estimation of pi is 3.142425985001098
---- <application exited with code 0> ----
TLB simulation results:
Core #0 (1 thread(s))
L1I stats:
Hits: 252,412
Misses: 401
Miss rate: 0.16%
L1D stats:
Hits: 87,132
Misses: 9,127
Miss rate: 9.48%
LL stats:
Hits: 9,315
Misses: 213
Local miss rate: 2.24%
Child hits: 339,544
Total miss rate: 0.06%
Core #1 (1 thread(s))
L1I stats:
Hits: 8,709
Misses: 20
Miss rate: 0.23%
L1D stats:
Hits: 3,544
Misses: 55
Miss rate: 1.53%
LL stats:
Hits: 15
Misses: 60
Local miss rate: 80.00%
Child hits: 12,253
Total miss rate: 0.49%
Core #2 (1 thread(s))
L1I stats:
Hits: 1,622
Misses: 21
Miss rate: 1.28%
L1D stats:
Hits: 689
Misses: 35
Miss rate: 4.83%
LL stats:
Hits: 3
Misses: 53
Local miss rate: 94.64%
Child hits: 2,311
Total miss rate: 2.24%
Core #3 (0 thread(s))

Reuse Distance

To compute reuse distance metrics:

$ bin64/drrun -t drcachesim -simulator_type reuse_distance -reuse_distance_histogram -- ~/test/pi_estimator
Estimation of pi is 3.142425985001098
---- <application exited with code 0> ----
Reuse distance tool aggregated results:
Total accesses: 349632
Unique accesses: 196603
Unique cache lines accessed: 4235
Reuse distance mean: 14.64
Reuse distance median: 1
Reuse distance standard deviation: 104.10
Reuse distance histogram:
Distance Count Percent Cumulative
0 153029 44.36% 44.36%
1 101294 29.37% 73.73%
2 14116 4.09% 77.82%
3 14248 4.13% 81.95%
4 8894 2.58% 84.53%
5 2733 0.79% 85.32%
...
==================================================
Reuse distance tool results for shard 29327 (thread 29327):
Total accesses: 335084
Unique accesses: 187927
Unique cache lines accessed: 4148
Reuse distance mean: 14.77
Reuse distance median: 1
Reuse distance standard deviation: 106.02
Reuse distance histogram:
Distance Count Percent Cumulative
0 147157 44.47% 44.47%
1 96820 29.26% 73.72%
2 13613 4.11% 77.84%
3 13834 4.18% 82.02%
4 8666 2.62% 84.64%
5 2552 0.77% 85.41%
...
3658 29 0.01% 100.00%
3851 1 0.00% 100.00%
Reuse distance threshold = 100 cache lines
Top 10 frequently referenced cache lines
cache line: #references #distant refs
0x7f2a86b3fd80: 27980, 0
0x7f2a86b3fdc0: 18823, 0
0x7f2a88388fc0: 16409, 111
0x7f2a8838abc0: 15176, 6
0x7f2a883884c0: 9930, 20
0x7f2a88388480: 7944, 20
0x7f2a88388500: 7574, 20
0x7f2a88398d00: 7390, 100
0x7f2a86b3fd40: 6668, 0
0x7f2a88388440: 5717, 20
Top 10 distant repeatedly referenced cache lines
cache line: #references #distant refs
0x7f2a885a4180: 246, 132
0x7f2a87504ec0: 202, 128
0x7f2a875044c0: 323, 126
0x7f2a885a4480: 220, 126
0x7f2a87504f00: 293, 124
0x7f2a86fd7e00: 289, 124
0x7f2a875049c0: 221, 124
0x7f2a875053c0: 270, 122
0x7f2a86db9c00: 269, 122
0x7f2a875047c0: 201, 122
==================================================
Reuse distance tool results for shard 29328 (thread 29328):
Total accesses: 12216
Unique accesses: 7251
Unique cache lines accessed: 319
Reuse distance mean: 12.98
Reuse distance median: 1
Reuse distance standard deviation: 38.19
Reuse distance histogram:
Distance Count Percent Cumulative
0 4965 41.73% 41.73%
1 3758 31.59% 73.32%
2 411 3.45% 76.78%
3 348 2.93% 79.70%
4 179 1.50% 81.21%
5 152 1.28% 82.48%
...

Reuse Time

A reuse time tool is also provided, which counts the total number of memory accesses (without considering uniqueness) between accesses to the same address:

$ bin64/drrun -t drcachesim -simulator_type reuse_time -- ~/test/pi_estimator
Estimation of pi is 3.142425985001098
---- <application exited with code 0> ----
Reuse time tool aggregated results:
Total accesses: 88281
Total instructions: 261315
Mean reuse time: 433.47
Reuse time histogram:
Distance Count Percent Cumulative
1 27893 32.84% 32.84%
2 10948 12.89% 45.73%
3 5789 6.82% 52.54%
...
==================================================
Reuse time tool results for shard 29482 (thread 29482):
Total accesses: 84194
Total instructions: 250854
Mean reuse time: 450.01
Reuse time histogram:
Distance Count Percent Cumulative
1 26677 32.86% 32.86%
2 10508 12.95% 45.81%
3 5427 6.69% 52.50%
...
==================================================
Reuse time tool results for shard 29483 (thread 29483):
Total accesses: 3411
Total instructions: 8805
Mean reuse time: 86.36
Reuse time histogram:
Distance Count Percent Cumulative
1 1014 31.56% 31.56%
2 363 11.30% 42.86%
3 308 9.59% 52.44%

Event Counts

To simply see the counts of instructions and memory references broken down by thread use the basic counts tool:

$ bin64/drrun -t drcachesim -simulator_type basic_counts -- ~/test/pi_estimator
Estimation of pi is 3.142425985001098
---- <application exited with code 0> ----
Basic counts tool results:
Total counts:
267193 total (fetched) instructions
345 total non-fetched instructions
0 total prefetches
67686 total data loads
22503 total data stores
3 total threads
280 total scheduling markers
0 total transfer markers
3 total other markers
Thread 247451 counts:
255009 (fetched) instructions
345 non-fetched instructions
0 prefetches
64453 data loads
21243 data stores
258 scheduling markers
0 transfer markers
1 other markers
Thread 247453 counts:
9195 (fetched) instructions
0 non-fetched instructions
0 prefetches
2444 data loads
937 data stores
12 scheduling markers
0 transfer markers
1 other markers
Thread 247454 counts:
2989 (fetched) instructions
0 non-fetched instructions
0 prefetches
789 data loads
323 data stores
10 scheduling markers
0 transfer markers
1 other markers

The non-fetched instructions are x86 string loop instructions, where subsequent iterations do not incur a fetch. They are included in the trace as a different type of trace entry to support core simulators in addition to cache simulators.

Opcode Mix

The opcode_mix tool uses the non-fetched instruction information along with the preserved libraries and binaries from the traced execution to gather more information on each executed instruction than was stored in the trace. It only supports offline traces, and the modules.log file created during post-processing of the trace must be preserved. The results are broken down by the opcodes used in DR's IR, where for x86 mov is split into a separate opcode for load and store but both have the same public string "mov":

$ bin64/drrun -t drcachesim -offline -- ~/test/pi_estimator
Estimation of pi is 3.142425985001098
$ bin64/drrun -t drcachesim -simulator_type opcode_mix -indir drmemtrace.*.dir
Opcode mix tool results:
267271 : total executed instructions
36432 : mov
31075 : mov
24715 : add
22579 : test
22539 : cmp
12137 : lea
11136 : jnz
10568 : movzx
10243 : jz
9056 : and
8064 : jnz
7279 : jz
5659 : push
4528 : sub
4357 : pop
4001 : shr
3427 : jnbe
2634 : mov
2469 : shl
2344 : jb
2291 : ret
2178 : xor
2164 : call
2111 : pcmpeqb
1472 : movdqa
...

Human-Readable View

The view tool prints out the contents of the trace for human viewing, including disassembling instructions in AT&T, Intel, Arm, or DR format, for offline traces. The -skip_refs and -sim_refs flags can be used to set a start point and end point for the disassembled view. Note that these flags compute the number of instructions which are skipped or displayed which is distinct from the number of trace entries.

The tool displays loads and stores, as well as metadata marker entries for timestamps, on which core and thread the subsequent instruction sequence was executed, and kernel and system call transfers (these correspond to signal or event handler interruptions of the regular execution flow).

$ bin64/drrun -t drcachesim -simulator_type view -sim_refs 20 -indir drmemtrace.*.dir
T80431 <marker: version 3>
T80431 <marker: filetype 0x40>
T80431 <marker: cache line size 64>
T80431 <marker: timestamp 13269546858099127>
T80431 <marker: tid 80431 on core 1>
T80431 0x00007f2ae335d090 48 89 e7 mov %rsp, %rdi
T80431 0x00007f2ae335d093 e8 48 0d 00 00 call $0x00007f2ae335dde0
T80431 write 8 byte(s) @ 0x7ffdf5770ac8
T80431 0x00007f2ae335dde0 55 push %rbp
T80431 write 8 byte(s) @ 0x7ffdf5770ac0
T80431 0x00007f2ae335dde1 48 89 e5 mov %rsp, %rbp
T80431 0x00007f2ae335dde4 41 57 push %r15
T80431 write 8 byte(s) @ 0x7ffdf5770ab8
T80431 0x00007f2ae335dde6 49 89 ff mov %rdi, %r15
T80431 0x00007f2ae335dde9 41 56 push %r14
T80431 write 8 byte(s) @ 0x7ffdf5770ab0
T80431 0x00007f2ae335ddeb 41 55 push %r13
T80431 write 8 byte(s) @ 0x7ffdf5770aa8
T80431 0x00007f2ae335dded 41 54 push %r12
T80431 write 8 byte(s) @ 0x7ffdf5770aa0
T80431 0x00007f2ae335ddef 53 push %rbx
T80431 write 8 byte(s) @ 0x7ffdf5770a98
T80431 0x00007f2ae335ddf0 48 83 ec 38 sub $0x38, %rsp
T80431 0x00007f2ae335ddf4 0f 31 rdtsc
T80431 0x00007f2ae335ddf6 48 c1 e2 20 shl $0x20, %rdx
T80431 0x00007f2ae335ddfa 48 09 d0 or %rdx, %rax
T80431 0x00007f2ae335ddfd 48 8d 15 74 90 02 00 lea <rel> 0x00007f2ae3386e78, %rdx
T80431 0x00007f2ae335de04 48 89 05 75 87 02 00 mov %rax, <rel> 0x00007f2ae3386580
T80431 write 8 byte(s) @ 0x7f2ae3386580
T80431 0x00007f2ae335de0b 48 8b 05 66 90 02 00 mov <rel> 0x00007f2ae3386e78, %rax
T80431 read 8 byte(s) @ 0x7f2ae3386e78
T80431 0x00007f2ae335de12 49 89 d4 mov %rdx, %r12
T80431 0x00007f2ae335de15 4c 2b 25 e4 91 02 00 sub <rel> 0x00007f2ae3387000, %r12
T80431 read 8 byte(s) @ 0x7f2ae3387000
T80431 0x00007f2ae335de1c 48 89 15 d5 9b 02 00 mov %rdx, <rel> 0x00007f2ae33879f8
View tool results:
20 : total disassembled instructions

An example of thread switches:

------------------------------------------------------------
T342625 <marker: timestamp 13260900247983768>
T342625 <marker: tid 342625 on core 3>
T342625 0x0000000000402460 31 ed xor %ebp, %ebp
T342625 0x0000000000402462 49 89 d1 mov %rdx, %r9
T342625 0x0000000000402465 5e pop %rsi
T342625 read 8 byte(s) @ 0x7ffe70dce480
T342625 0x0000000000402466 48 89 e2 mov %rsp, %rdx
...
T342625 0x0000000000467c42 4d 89 c8 mov %r9, %r8
T342625 0x0000000000467c45 4c 8b 54 24 08 mov 0x08(%rsp), %r10
T342625 read 8 byte(s) @ 0x7ffe70dce100
T342625 0x0000000000467c4a b8 38 00 00 00 mov $0x00000038, %eax
T342625 0x0000000000467c4f 0f 05 syscall
------------------------------------------------------------
T342626 <marker: timestamp 13260900248221723>
T342626 <marker: tid 342626 on core 0>
T342626 0x0000000000467c51 48 85 c0 test %rax, %rax
T342626 0x0000000000467c54 7c 13 jl $0x0000000000467c69
T342626 0x0000000000467c56 74 01 jz $0x0000000000467c59
T342626 0x0000000000467c59 31 ed xor %ebp, %ebp
T342626 0x0000000000467c5b 58 pop %rax
T342626 read 8 byte(s) @ 0x7f899f928e70
T342626 0x0000000000467c5c 5f pop %rdi
T342626 read 8 byte(s) @ 0x7f899f928e78
T342626 0x0000000000467c5d ff d0 call %rax
T342626 write 8 byte(s) @ 0x7f899f928e78
T342626 0x0000000000404a30 41 54 push %r12
T342626 write 8 byte(s) @ 0x7f899f928e70
...

Here is an example of a signal handler interrupting the regular flow, with metadata showing that the signal was delivered just after a non-taken conditional branch:

T585061 0x00007fdb4e95128f 41 f6 44 24 08 08 test 0x08(%r12), $0x08
T585061 read 1 byte(s) @ 0x7ffd5af76b08
T585061 0x00007fdb4e951295 0f 85 28 04 00 00 jnz $0x00007fdb4e9516c3
T585061 <marker: kernel xfer from 0x7fdb4e95129b to handler>
T585061 <marker: timestamp 13269730052517230>
T585061 <marker: tid 585061 on core 3>
T585061 0x00007fdb4ace9dba 55 push %rbp
T585061 write 8 byte(s) @ 0x7ffd5af763d0
T585061 0x00007fdb4ace9dbb 48 89 e5 mov %rsp, %rbp
T585061 0x00007fdb4ace9dbe 89 7d fc mov %edi, -0x04(%rbp)
T585061 write 4 byte(s) @ 0x7ffd5af763cc
T585061 0x00007fdb4ace9dc1 48 89 75 f0 mov %rsi, -0x10(%rbp)
T585061 write 8 byte(s) @ 0x7ffd5af763c0
T585061 0x00007fdb4ace9dc5 48 89 55 e8 mov %rdx, -0x18(%rbp)
T585061 write 8 byte(s) @ 0x7ffd5af763b8
T585061 0x00007fdb4ace9dc9 83 7d fc 1a cmp -0x04(%rbp), $0x1a
T585061 read 4 byte(s) @ 0x7ffd5af763cc
T585061 0x00007fdb4ace9dcd 75 0f jnz $0x00007fdb4ace9dde
T585061 0x00007fdb4ace9dcf 8b 05 7f 23 20 00 mov <rel> 0x00007fdb4aeec154, %eax
T585061 read 4 byte(s) @ 0x7fdb4aeec154
T585061 0x00007fdb4ace9dd5 83 c0 01 add $0x01, %eax
T585061 0x00007fdb4ace9dd8 89 05 76 23 20 00 mov %eax, <rel> 0x00007fdb4aeec154
T585061 write 4 byte(s) @ 0x7fdb4aeec154
T585061 0x00007fdb4ace9dde 90 nop
T585061 0x00007fdb4ace9ddf 5d pop %rbp
T585061 read 8 byte(s) @ 0x7ffd5af763d0
T585061 0x00007fdb4ace9de0 c3 ret
T585061 read 8 byte(s) @ 0x7ffd5af763d8
T585061 0x00007fdb4e95c140 48 c7 c0 0f 00 00 00 mov $0x0000000f, %rax
T585061 0x00007fdb4e95c147 0f 05 syscall
T585061 <marker: timestamp 13269730052517239>
T585061 <marker: tid 585061 on core 3>
T585061 <marker: syscall xfer from 0x7fdb4e95c149>
T585061 <marker: timestamp 13269730052520271>
T585061 <marker: tid 585061 on core 3>
T585061 0x00007fdb4e95129b 48 8b 1d 8e 40 01 00 mov <rel> 0x00007fdb4e965330, %rbx
T585061 read 8 byte(s) @ 0x7fdb4e965330

View Function Calls

The func_view tool records function argument and return values for function names specified at tracing time. See Tracing Function Calls for more information.

$ bin64/drrun -t drcachesim -offline -record_function 'fib|1' -- ~/test/fib 5
Estimation of pi is 3.142425985001098
$ bin64/drrun -t drcachesim -simulator_type func_view -indir drmemtrace.*.dir
0x7fc06d2288eb => common.fib!fib(0x5)
0x7fc06d22888e => common.fib!fib(0x4)
0x7fc06d22888e => common.fib!fib(0x3)
0x7fc06d22888e => common.fib!fib(0x2)
0x7fc06d22888e => common.fib!fib(0x1) => 0x1
0x7fc06d22889d => common.fib!fib(0x0) => 0x1
=> 0x2
0x7fc06d22889d => common.fib!fib(0x1) => 0x1
=> 0x3
0x7fc06d22889d => common.fib!fib(0x2)
0x7fc06d22888e => common.fib!fib(0x1) => 0x1
0x7fc06d22889d => common.fib!fib(0x0) => 0x1
=> 0x2
=> 0x5
0x7fc06d22889d => common.fib!fib(0x3)
0x7fc06d22888e => common.fib!fib(0x2)
0x7fc06d22888e => common.fib!fib(0x1) => 0x1
0x7fc06d22889d => common.fib!fib(0x0) => 0x1
=> 0x2
0x7fc06d22889d => common.fib!fib(0x1) => 0x1
=> 0x3
=> 0x8
Function view tool results:
Function id=0: common.fib!fib
15 calls
15 returns

The top referenced cache lines are displayed by the histogram tool:

$ bin64/drrun -t drcachesim -simulator_type histogram -- ~/test/pi_estimator
Estimation of pi is 3.142425985001098
---- <application exited with code 0> ----
Cache line histogram tool results:
icache: 1134 unique cache lines
dcache: 3062 unique cache lines
icache top 10
0x7facdd013780: 30929
0x7facdb789fc0: 27664
0x7facdb78a000: 18629
0x7facdd003e80: 18176
0x7facdd003500: 11121
0x7facdd0034c0: 9763
0x7facdd005940: 8865
0x7facdd003480: 8277
0x7facdb789f80: 6660
0x7facdd003540: 5888
dcache top 10
0x7ffcc35e7d80: 4088
0x7ffcc35e7d40: 3497
0x7ffcc35e7e00: 3478
0x7ffcc35e7f40: 2919
0x7ffcc35e7dc0: 2837
0x7facdbe2e980: 2452
0x7facdbe2ec80: 2273
0x7ffcc35e7e80: 2194
0x7facdb6625c0: 2016
0x7ffcc35e7e40: 1997