Generally, the simulator is able to be extended to model a variety of caching devices. Currently, CPU caches and TLBs are implemented. The type of devices to simulate can be specified by the parameter "-tool" (see Simulator Parameters).

The CPU cache simulator models a configurable number of cores, each with an L1 data cache and an L1 instruction cache. Currently there is a single shared L2 unified cache, but we would like to extend support to arbitrary cache hierarchies (see Current Limitations). The cache line size and each cache's total size and associativity are user-specified (see Simulator Parameters).

The TLB simulator models a configurable number of cores, each with an L1 instruction TLB, an L1 data TLB, and an L2 unified TLB. Each TLB's entry number and associativity, and the virtual/physical page size, are user-specified (see Simulator Parameters).

Neither simulator has a simple way to know which core any particular thread executed on for each of its instructions. The tracer records which core a thread is on each time it writes out a full trace buffer, giving an approximation of the actual scheduling: but this is not representative due to overhead (see As-Traced Schedule Limitations). For online analysis, by default, these cache and TLB simulators ignore that information and schedule threads to simulated cores in a static round-robin fashion with load balancing to fill in gaps with new threads after threads exit. The option "-cpu_scheduling" (see Simulator Parameters) can be used to instead map each physical cpu to a simulated core and use the recorded cpu that each segment of thread execution occurred on to schedule execution following the "as traced" schedule, but as just noted this is not representative. Instead, we recommend using offline traces and dynamic re-scheduling in core-sharded mode as explained in Dynamic Scheduling using the -core_serial parameter. In offline mode, -core_serial is the default for these simulators.

$ bin64/drrun -t drmemtrace -offline -- ~/test/pi_estimator 8 20
Estimation of pi is 3.141592653798125
$ bin64/drrun -t drcachesim -cores 3 -indir drmemtrace.pi_estimator.*.dir
Cache simulation results:
Core #0 (traced CPU(s): #0)
  L1I0 (size=32768, assoc=8, block=64, LRU) stats:
    Hits:                        1,853,727
    Misses:                          2,152
    Compulsory misses:               2,045
    Invalidations:                       0
    Miss rate:                        0.12%
  L1D0 (size=32768, assoc=8, block=64, LRU) stats:
    Hits:                          605,114
    Misses:                         11,973
    Compulsory misses:               9,845
    Invalidations:                       0
    Prefetch hits:                   1,880
    Prefetch misses:                10,093
    Miss rate:                        1.94%
Core #1 (traced CPU(s): #1)
  L1I1 (size=32768, assoc=8, block=64, LRU) stats:
    Hits:                          942,992
    Misses:                            461
    Compulsory misses:                 366
    Invalidations:                       0
    Miss rate:                        0.05%
  L1D1 (size=32768, assoc=8, block=64, LRU) stats:
    Hits:                          385,134
    Misses:                            534
    Compulsory misses:                 775
    Invalidations:                       0
    Prefetch hits:                     144
    Prefetch misses:                   390
    Miss rate:                        0.14%
Core #2 (traced CPU(s): #2)
  L1I2 (size=32768, assoc=8, block=64, LRU) stats:
    Hits:                          944,622
    Misses:                            453
    Compulsory misses:                 365
    Invalidations:                       0
    Miss rate:                        0.05%
  L1D2 (size=32768, assoc=8, block=64, LRU) stats:
    Hits:                          385,808
    Misses:                            537
    Compulsory misses:                 791
    Invalidations:                       0
    Prefetch hits:                     140
    Prefetch misses:                   397
    Miss rate:                        0.14%
LL (size=8388608, assoc=16, block=64, LRU) stats:
    Hits:                            8,091
    Misses:                          8,019
    Compulsory misses:              13,173
    Invalidations:                       0
    Prefetch hits:                   5,693
    Prefetch misses:                 5,187
    Local miss rate:                 49.78%
    Child hits:                  5,119,561
    Total miss rate:                  0.16%

The memory access traces contain some optimizations that combine references for one basic block together. This may result in not considering some thread interleavings that could occur natively. There are no other disruptions to thread ordering, however, and the application runs with all of its threads concurrently just like it would natively (although slower).

Once every process has exited, the simulator prints cache miss statistics for each cache to stderr. The simulator is designed to be extensible, allowing for different cache studies to be carried out: see Extending the Simulator.

For L2 caching devices, the L1 caching devices are considered its children. Two separate miss rates are computed, one (the "Local miss rate") considering just requests that reach L2 while the other (the "Total miss rate") includes the child hits. This generalizes to deeper hierarchies: lower level caches are children and reported child hits are cumulative across all lower levels.

For memory requests that cross blocks, each block touched is considered separately, resulting in separate hit and miss statistics. This can be changed by implementing a custom statistics gatherer (see Extending the Simulator).

Software and hardware prefetches are combined in the prefetch hit and miss statistics, which are reported separately from regular loads and stores. To isolate software prefetch statistics, disable the hardware prefetcher by running with "-data_prefetcher none" (see Simulator Parameters). While misses from software prefetches are included in cache miss files, misses from hardware prefetches are not.