In addition to a CPU cache simulator, other analysis tools are available that operate on memory address traces. Which tool is used can be selected with the -simulator_type
parameter. New, custom tools can also be created, as described in Creating New Analysis Tools.
Cache Simulator
This is the default tool. Here is an exmample of running it on an offline trace:
$ bin64/drrun -t drcachesim -offline -- ~/test/pi_estimator
Estimation of pi is 3.142425985001098
$ bin64/drrun -t drcachesim -indir drmemtrace.*.dir
Cache simulation results:
Core #0 (1 thread(s))
L1I stats:
Hits: 258,433
Misses: 1,148
Miss rate: 0.44%
L1D stats:
Hits: 93,654
Misses: 2,624
Prefetch hits: 458
Prefetch misses: 2,166
Miss rate: 2.73%
Core #1 (1 thread(s))
L1I stats:
Hits: 8,895
Misses: 99
Miss rate: 1.10%
L1D stats:
Hits: 3,448
Misses: 156
Prefetch hits: 26
Prefetch misses: 130
Miss rate: 4.33%
Core #2 (1 thread(s))
L1I stats:
Hits: 4,150
Misses: 101
Miss rate: 2.38%
L1D stats:
Hits: 1,578
Misses: 130
Prefetch hits: 25
Prefetch misses: 105
Miss rate: 7.61%
Core #3 (0 thread(s))
LL stats:
Hits: 1,414
Misses: 2,844
Prefetch hits: 824
Prefetch misses: 1,577
Local miss rate: 66.79%
Child hits: 370,667
Total miss rate: 0.76%
TLB Simulator
To simulate TLB devices instead of caches, pass TLB
to -simulator_type:
$ bin64/drrun -t drcachesim -simulator_type TLB -- ~/test/pi_estimator
Estimation of pi is 3.142425985001098
---- <application exited with code 0> ----
TLB simulation results:
Core #0 (1 thread(s))
L1I stats:
Hits: 252,412
Misses: 401
Miss rate: 0.16%
L1D stats:
Hits: 87,132
Misses: 9,127
Miss rate: 9.48%
LL stats:
Hits: 9,315
Misses: 213
Local miss rate: 2.24%
Child hits: 339,544
Total miss rate: 0.06%
Core #1 (1 thread(s))
L1I stats:
Hits: 8,709
Misses: 20
Miss rate: 0.23%
L1D stats:
Hits: 3,544
Misses: 55
Miss rate: 1.53%
LL stats:
Hits: 15
Misses: 60
Local miss rate: 80.00%
Child hits: 12,253
Total miss rate: 0.49%
Core #2 (1 thread(s))
L1I stats:
Hits: 1,622
Misses: 21
Miss rate: 1.28%
L1D stats:
Hits: 689
Misses: 35
Miss rate: 4.83%
LL stats:
Hits: 3
Misses: 53
Local miss rate: 94.64%
Child hits: 2,311
Total miss rate: 2.24%
Core #3 (0 thread(s))
Reuse Distance
To compute reuse distance metrics:
$ bin64/drrun -t drcachesim -simulator_type reuse_distance -reuse_distance_histogram -- ~/test/pi_estimator
Estimation of pi is 3.142425985001098
---- <application exited with code 0> ----
Reuse distance tool aggregated results:
Total accesses: 349632
Unique accesses: 196603
Unique cache lines accessed: 4235
Reuse distance mean: 14.64
Reuse distance median: 1
Reuse distance standard deviation: 104.10
Reuse distance histogram:
Distance Count Percent Cumulative
0 153029 44.36% 44.36%
1 101294 29.37% 73.73%
2 14116 4.09% 77.82%
3 14248 4.13% 81.95%
4 8894 2.58% 84.53%
5 2733 0.79% 85.32%
...
==================================================
Reuse distance tool results for shard 29327 (thread 29327):
Total accesses: 335084
Unique accesses: 187927
Unique cache lines accessed: 4148
Reuse distance mean: 14.77
Reuse distance median: 1
Reuse distance standard deviation: 106.02
Reuse distance histogram:
Distance Count Percent Cumulative
0 147157 44.47% 44.47%
1 96820 29.26% 73.72%
2 13613 4.11% 77.84%
3 13834 4.18% 82.02%
4 8666 2.62% 84.64%
5 2552 0.77% 85.41%
...
3658 29 0.01% 100.00%
3851 1 0.00% 100.00%
Reuse distance threshold = 100 cache lines
Top 10 frequently referenced cache lines
cache line: #references #distant refs
0x7f2a86b3fd80: 27980, 0
0x7f2a86b3fdc0: 18823, 0
0x7f2a88388fc0: 16409, 111
0x7f2a8838abc0: 15176, 6
0x7f2a883884c0: 9930, 20
0x7f2a88388480: 7944, 20
0x7f2a88388500: 7574, 20
0x7f2a88398d00: 7390, 100
0x7f2a86b3fd40: 6668, 0
0x7f2a88388440: 5717, 20
Top 10 distant repeatedly referenced cache lines
cache line: #references #distant refs
0x7f2a885a4180: 246, 132
0x7f2a87504ec0: 202, 128
0x7f2a875044c0: 323, 126
0x7f2a885a4480: 220, 126
0x7f2a87504f00: 293, 124
0x7f2a86fd7e00: 289, 124
0x7f2a875049c0: 221, 124
0x7f2a875053c0: 270, 122
0x7f2a86db9c00: 269, 122
0x7f2a875047c0: 201, 122
==================================================
Reuse distance tool results for shard 29328 (thread 29328):
Total accesses: 12216
Unique accesses: 7251
Unique cache lines accessed: 319
Reuse distance mean: 12.98
Reuse distance median: 1
Reuse distance standard deviation: 38.19
Reuse distance histogram:
Distance Count Percent Cumulative
0 4965 41.73% 41.73%
1 3758 31.59% 73.32%
2 411 3.45% 76.78%
3 348 2.93% 79.70%
4 179 1.50% 81.21%
5 152 1.28% 82.48%
...
Reuse Time
A reuse time tool is also provided, which counts the total number of memory accesses (without considering uniqueness) between accesses to the same address:
$ bin64/drrun -t drcachesim -simulator_type reuse_time -- ~/test/pi_estimator
Estimation of pi is 3.142425985001098
---- <application exited with code 0> ----
Reuse time tool aggregated results:
Total accesses: 88281
Total instructions: 261315
Mean reuse time: 433.47
Reuse time histogram:
Distance Count Percent Cumulative
1 27893 32.84% 32.84%
2 10948 12.89% 45.73%
3 5789 6.82% 52.54%
...
==================================================
Reuse time tool results for shard 29482 (thread 29482):
Total accesses: 84194
Total instructions: 250854
Mean reuse time: 450.01
Reuse time histogram:
Distance Count Percent Cumulative
1 26677 32.86% 32.86%
2 10508 12.95% 45.81%
3 5427 6.69% 52.50%
...
==================================================
Reuse time tool results for shard 29483 (thread 29483):
Total accesses: 3411
Total instructions: 8805
Mean reuse time: 86.36
Reuse time histogram:
Distance Count Percent Cumulative
1 1014 31.56% 31.56%
2 363 11.30% 42.86%
3 308 9.59% 52.44%
Event Counts
To simply see the counts of instructions and memory references broken down by thread use the basic counts tool:
$ bin64/drrun -t drcachesim -simulator_type basic_counts -- ~/test/pi_estimator
Estimation of pi is 3.142425985001098
---- <application exited with code 0> ----
Basic counts tool results:
Total counts:
267193 total (fetched) instructions
345 total non-fetched instructions
0 total prefetches
67686 total data loads
22503 total data stores
3 total threads
280 total scheduling markers
0 total transfer markers
0 total function id markers
0 total function return address markers
0 total function argument markers
0 total function return value markers
0 total physical address + virtual address marker pairs
0 total physical address unavailable markers
3 total other markers
Thread 247451 counts:
255009 (fetched) instructions
345 non-fetched instructions
0 prefetches
64453 data loads
21243 data stores
258 scheduling markers
0 transfer markers
0 function id markers
0 function return address markers
0 function argument markers
0 function return value markers
0 physical address + virtual address marker pairs
0 physical address unavailable markers
1 other markers
Thread 247453 counts:
9195 (fetched) instructions
0 non-fetched instructions
0 prefetches
2444 data loads
937 data stores
12 scheduling markers
0 transfer markers
0 function id markers
0 function return address markers
0 function argument markers
0 function return value markers
0 physical address + virtual address marker pairs
0 physical address unavailable markers
1 other markers
Thread 247454 counts:
2989 (fetched) instructions
0 non-fetched instructions
0 prefetches
789 data loads
323 data stores
10 scheduling markers
0 transfer markers
0 function id markers
0 function return address markers
0 function argument markers
0 function return value markers
0 physical address + virtual address marker pairs
0 physical address unavailable markers
1 other markers
The non-fetched instructions are x86 string loop instructions, where subsequent iterations do not incur a fetch. They are included in the trace as a different type of trace entry to support core simulators in addition to cache simulators.
Opcode Mix
The opcode_mix tool uses the non-fetched instruction information along with the preserved libraries and binaries from the traced execution to gather more information on each executed instruction than was stored in the trace. To run on online traces, pass the -instr_encodings
option. The results are broken down by the opcodes used in DR's IR, where for x86 mov
is split into a separate opcode for load and store but both have the same public string "mov":
$ bin64/drrun -t drcachesim -offline -- ~/test/pi_estimator
Estimation of pi is 3.142425985001098
$ bin64/drrun -t drcachesim -simulator_type opcode_mix -indir drmemtrace.*.dir
Opcode mix tool results:
267271 : total executed instructions
36432 : mov
31075 : mov
24715 : add
22579 : test
22539 : cmp
12137 : lea
11136 : jnz
10568 : movzx
10243 : jz
9056 : and
8064 : jnz
7279 : jz
5659 : push
4528 : sub
4357 : pop
4001 : shr
3427 : jnbe
2634 : mov
2469 : shl
2344 : jb
2291 : ret
2178 : xor
2164 : call
2111 : pcmpeqb
1472 : movdqa
...
Human-Readable View
The view tool prints out the contents of the trace for human viewing, including disassembling instructions in AT&T, Intel, Arm, or DR format (to see disassembly for online traces, pass the -instr_encodings
option). The -skip_refs and -sim_refs flags can be used to set a start point and end point for the disassembled view. Note that these flags compute the number of trace entry records which are skipped or displayed which is distinct from the number of instruction records.
The tool displays loads and stores, as well as metadata marker entries for timestamps, on which core and thread the subsequent instruction sequence was executed, and kernel and system call transfers (these correspond to signal or event handler interruptions of the regular execution flow).
In its first two columns, the tool displays the trace record ordinal and the instruction fetch ordinal.
$ $ bin64/drrun -t drcachesim -simulator_type view -indir drmemtrace.*.dir -sim_refs 20
Output format:
<--record#-> <--instr#->: <---tid---> <record details>
------------------------------------------------------------
1 0: 3256418 <marker: version 4>
2 0: 3256418 <marker: filetype 0x240>
3 0: 3256418 <marker: cache line size 64>
4 0: 3256418 <marker: chunk instruction count 1024>
5 0: 3256418 <marker: page size 4096>
6 0: 3256418 <marker: timestamp 13312410768080478>
7 0: 3256418 <marker: tid 3256418 on core 7>
8 1: 3256418 ifetch 3 byte(s) @ 0x00007fc205a61940 48 89 e7 mov %rsp, %rdi
9 2: 3256418 ifetch 5 byte(s) @ 0x00007fc205a61943 e8 b8 0c 00 00 call $0x00007fc205a62600
10 2: 3256418 write 8 byte(s) @ 0x00007fff9a9e3528 by PC 0x00007fc205a61943
11 3: 3256418 ifetch 1 byte(s) @ 0x00007fc205a62600 55 push %rbp
12 3: 3256418 write 8 byte(s) @ 0x00007fff9a9e3520 by PC 0x00007fc205a62600
13 4: 3256418 ifetch 3 byte(s) @ 0x00007fc205a62601 48 89 e5 mov %rsp, %rbp
14 5: 3256418 ifetch 2 byte(s) @ 0x00007fc205a62604 41 57 push %r15
15 5: 3256418 write 8 byte(s) @ 0x00007fff9a9e3518 by PC 0x00007fc205a62604
16 6: 3256418 ifetch 2 byte(s) @ 0x00007fc205a62606 41 56 push %r14
17 6: 3256418 write 8 byte(s) @ 0x00007fff9a9e3510 by PC 0x00007fc205a62606
18 7: 3256418 ifetch 2 byte(s) @ 0x00007fc205a62608 41 55 push %r13
19 7: 3256418 write 8 byte(s) @ 0x00007fff9a9e3508 by PC 0x00007fc205a62608
20 8: 3256418 ifetch 2 byte(s) @ 0x00007fc205a6260a 41 54 push %r12
View tool results:
8 : total instructions
An example of thread switches:
------------------------------------------------------------
46 0: 3264758 <marker: timestamp 13312413437398055>
47 0: 3264758 <marker: tid 3264758 on core 2>
48 1: 3264758 ifetch 3 byte(s) @ 0x00007f4ea89e4940 48 89 e7 mov %rsp, %rdi
49 2: 3264758 ifetch 5 byte(s) @ 0x00007f4ea89e4943 e8 b8 0c 00 00 call $0x00007f4ea89e5600
50 2: 3264758 write 8 byte(s) @ 0x00007ffd93a0cf18 by PC 0x00007f4ea89e4943
...
2854543 2149665: 3264758 ifetch 5 byte(s) @ 0x00007f4ea7c87f8c b8 0e 00 00 00 mov $0x0000000e, %eax
2854544 2149666: 3264758 ifetch 2 byte(s) @ 0x00007f4ea7c87f91 0f 05 syscall
------------------------------------------------------------
2854545 2149666: 3264760 <marker: timestamp 13312413438835999>
2854546 2149666: 3264760 <marker: tid 3264760 on core 11>
2854547 2149667: 3264760 ifetch 3 byte(s) @ 0x00007f4ea7d0b099 48 85 c0 test %rax, %rax
2854548 2149668: 3264760 ifetch 2 byte(s) @ 0x00007f4ea7d0b09c 7c 18 jl $0x00007f4ea7d0b0b6
...
Here is an example of a signal handler interrupting the regular flow, with metadata showing that the signal was delivered just after a non-taken conditional branch:
2851502 2147588: 3264758 ifetch 2 byte(s) @ 0x00007f4ea7c87a54 eb 14 jmp $0x00007f4ea7c87a6a
2851503 2147588: 3264758 <marker: kernel xfer from 0x7f4ea7c87a6a to handler>
2851504 2147588: 3264758 <marker: timestamp 13312413438786440>
2851505 2147588: 3264758 <marker: tid 3264758 on core 8>
2851506 2147589: 3264758 ifetch 1 byte(s) @ 0x00007f4ea47fbdba 55 push %rbp
2851507 2147589: 3264758 write 8 byte(s) @ 0x00007ffd93a0be30 by PC 0x00007f4ea47fbdba
2851508 2147590: 3264758 ifetch 3 byte(s) @ 0x00007f4ea47fbdbb 48 89 e5 mov %rsp, %rbp
2851509 2147591: 3264758 ifetch 3 byte(s) @ 0x00007f4ea47fbdbe 89 7d fc mov %edi, -0x04(%rbp)
2851510 2147591: 3264758 write 4 byte(s) @ 0x00007ffd93a0be2c by PC 0x00007f4ea47fbdbe
2851511 2147592: 3264758 ifetch 4 byte(s) @ 0x00007f4ea47fbdc1 48 89 75 f0 mov %rsi, -0x10(%rbp)
2851512 2147592: 3264758 write 8 byte(s) @ 0x00007ffd93a0be20 by PC 0x00007f4ea47fbdc1
2851513 2147593: 3264758 ifetch 4 byte(s) @ 0x00007f4ea47fbdc5 48 89 55 e8 mov %rdx, -0x18(%rbp)
2851514 2147593: 3264758 write 8 byte(s) @ 0x00007ffd93a0be18 by PC 0x00007f4ea47fbdc5
2851515 2147594: 3264758 ifetch 4 byte(s) @ 0x00007f4ea47fbdc9 83 7d fc 1a cmp -0x04(%rbp), $0x1a
2851516 2147594: 3264758 read 4 byte(s) @ 0x00007ffd93a0be2c by PC 0x00007f4ea47fbdc9
2851517 2147595: 3264758 ifetch 2 byte(s) @ 0x00007f4ea47fbdcd 75 0f jnz $0x00007f4ea47fbdde
2851518 2147596: 3264758 ifetch 6 byte(s) @ 0x00007f4ea47fbdcf 8b 05 7f 23 20 00 mov <rel> 0x00007f4ea49fe154, %eax
2851519 2147596: 3264758 read 4 byte(s) @ 0x00007f4ea49fe154 by PC 0x00007f4ea47fbdcf
2851520 2147597: 3264758 ifetch 3 byte(s) @ 0x00007f4ea47fbdd5 83 c0 01 add $0x01, %eax
2851521 2147598: 3264758 ifetch 6 byte(s) @ 0x00007f4ea47fbdd8 89 05 76 23 20 00 mov %eax, <rel> 0x00007f4ea49fe154
2851522 2147598: 3264758 write 4 byte(s) @ 0x00007f4ea49fe154 by PC 0x00007f4ea47fbdd8
2851523 2147599: 3264758 ifetch 1 byte(s) @ 0x00007f4ea47fbdde 90 nop
2851524 2147600: 3264758 ifetch 1 byte(s) @ 0x00007f4ea47fbddf 5d pop %rbp
2851525 2147600: 3264758 read 8 byte(s) @ 0x00007ffd93a0be30 by PC 0x00007f4ea47fbddf
2851526 2147601: 3264758 ifetch 1 byte(s) @ 0x00007f4ea47fbde0 c3 ret
2851527 2147601: 3264758 read 8 byte(s) @ 0x00007ffd93a0be38 by PC 0x00007f4ea47fbde0
2851528 2147602: 3264758 ifetch 7 byte(s) @ 0x00007f4ea7c3daa0 48 c7 c0 0f 00 00 00 mov $0x0000000f, %rax
2851529 2147603: 3264758 ifetch 2 byte(s) @ 0x00007f4ea7c3daa7 0f 05 syscall
2851530 2147603: 3264758 <marker: timestamp 13312413438787645>
2851531 2147603: 3264758 <marker: tid 3264758 on core 8>
2851532 2147603: 3264758 <marker: syscall xfer from 0x7f4ea7c3daa9>
2851533 2147603: 3264758 <marker: timestamp 13312413438787652>
2851534 2147603: 3264758 <marker: tid 3264758 on core 8>
2851535 2147604: 3264758 ifetch 5 byte(s) @ 0x00007f4ea7c87a6a e8 11 8b 07 00 call $0x00007f4ea7d00580
View Function Calls
The func_view tool records function argument and return values for function names specified at tracing time. See Tracing Function Calls for more information.
$ bin64/drrun -t drcachesim -offline -record_function 'fib|1' -- ~/test/fib 5
Estimation of pi is 3.142425985001098
$ bin64/drrun -t drcachesim -simulator_type func_view -indir drmemtrace.*.dir
0x7fc06d2288eb => common.fib!fib(0x5)
0x7fc06d22888e => common.fib!fib(0x4)
0x7fc06d22888e => common.fib!fib(0x3)
0x7fc06d22888e => common.fib!fib(0x2)
0x7fc06d22888e => common.fib!fib(0x1) => 0x1
0x7fc06d22889d => common.fib!fib(0x0) => 0x1
=> 0x2
0x7fc06d22889d => common.fib!fib(0x1) => 0x1
=> 0x3
0x7fc06d22889d => common.fib!fib(0x2)
0x7fc06d22888e => common.fib!fib(0x1) => 0x1
0x7fc06d22889d => common.fib!fib(0x0) => 0x1
=> 0x2
=> 0x5
0x7fc06d22889d => common.fib!fib(0x3)
0x7fc06d22888e => common.fib!fib(0x2)
0x7fc06d22888e => common.fib!fib(0x1) => 0x1
0x7fc06d22889d => common.fib!fib(0x0) => 0x1
=> 0x2
0x7fc06d22889d => common.fib!fib(0x1) => 0x1
=> 0x3
=> 0x8
Function view tool results:
Function id=0: common.fib!fib
15 calls
15 returns
Cache Line Histogram
The top referenced cache lines are displayed by the histogram
tool:
$ bin64/drrun -t drcachesim -simulator_type histogram -- ~/test/pi_estimator
Estimation of pi is 3.142425985001098
---- <application exited with code 0> ----
Cache line histogram tool results:
icache: 1134 unique cache lines
dcache: 3062 unique cache lines
icache top 10
0x7facdd013780: 30929
0x7facdb789fc0: 27664
0x7facdb78a000: 18629
0x7facdd003e80: 18176
0x7facdd003500: 11121
0x7facdd0034c0: 9763
0x7facdd005940: 8865
0x7facdd003480: 8277
0x7facdb789f80: 6660
0x7facdd003540: 5888
dcache top 10
0x7ffcc35e7d80: 4088
0x7ffcc35e7d40: 3497
0x7ffcc35e7e00: 3478
0x7ffcc35e7f40: 2919
0x7ffcc35e7dc0: 2837
0x7facdbe2e980: 2452
0x7facdbe2ec80: 2273
0x7ffcc35e7e80: 2194
0x7facdb6625c0: 2016
0x7ffcc35e7e40: 1997
Invariant Checker
The invariant_checker tool performs sanity checks on a trace, focusing on program counter continuity and guarantees around kernel control transfer interruptions. It optionally checks for restricted behavior that technically is legal but is not expected to happen in the target trace, helping to identify tracing problems and suitability for use of a trace for core simulation.