DynamoRIO
|
Linux
DynamoRIO PC self-sampling
The client can use dr_set_itimer() for programmatic PC self-sampling, with dr_where_am_i() providing information on where the sample was taken. This provides general categorization of where time is being spent in the overall instrumentation system, with potential to drill down further offline based on the PC.
For PC sampling via DR's -prof_pcs
runtime option instead, that is available internally in varying degrees on different platforms but is not polished enough and is missing some pieces (see the bottom of this page).
External sampling tools
Perf and oprofile are the two prominent sampling profilers on Linux today. Perf is newer and has a nicer interface, but it requires patching and building from source in order to get symbols for DR. oprofile is typically available on older distros, but it's not available on Ubuntu Precise, it seems to cause system lockups, and we're not sure we trust the results.
Before doing any micro-optimization based on the profile, make sure to disable CPU frequency scaling before taking measurements:
oprofile
To install oprofile, type:
To make sudo opcontrol work w/o a password, type sudo visudo
and add one line to the /etc/sudoers
file:
To run oprofile, you can use a script like the following to start and stop it around the command you wish to run:
Example report output:
perf
Perf currently does not handle symbols in DSOs that have a preferred base, and they only recently added support for following .gnu_debuglink. Since profiling without symbols isn't very useful, the following instructions are for building perf from source with a patch I wrote to fix the problem.
The patch to get good symbols with perf is available here: https://github.com/rnk/linux/compare/perf-p_vaddr.diff
You can clone the entire branch, or you can apply the patch to some other copy of the Linux kernel source. Either way, cd into tools/perf and run 'make' to build just perf. It will warn you about each library or header that it can't find, and you can install the appropriate package.
Running 'make install' as a normal user will install to $HOME/bin and $HOME/libexec.
To do a run and get a report, it's quite simple:
Example output:
The perf-NNN.map DSO corresponds to DR's code cache. As you can see from above, at the time of writing, stub updating is a hotspot. You can focus in on just DR by passing "-d libdynamorio.3.2".
To get a combined source and asm annotation, you can use "perf annotate -s insert_exit_stub_other_flags". Example output:
97% of the samples in this function were on "add $0x5, %rax", which is misleading. The expensive instruction is more likely the "xchgl %r14d, (%rdx)" before it, which instruction we use to atomically update the code cache. In this particular case, we happen to be emitting the full exit stub, so it's unlikely that this needs to be an atomic update.
Windows
We've used Code Analyst successfully.
TODO, more detail.
Cross-platform -prof_pcs
There are many open issues for cleaning this up, such as issue 140, issue 359, issue 767. On Linux the dr_set_itimer() solution above provides programmatic support.