We provide several examples in the
samples directory of the release package to illustrate how the DynamoRIO API is used to build a DynamoRIO client.
There are also samples for the Dr. Memory Framework located in the
drmemory/drmf/samples directory of the release package.
For larger examples of clients, see the provided Available Tools, which are larger and more polished end-user clients than these samples.
The sample bbbuf.c demonstrates how to use a TLS field for per-thread basic block profiling.
The sample bbcount.c illustrates how to perform performant instrumentation for reporting the dynamic execution count of all basic blocks.
The sample bbsize.c collects statistics on the sizes of all basic blocks in the target application.
The sample cbr.c collects conditional branch execution information and shows how to dynamically update or replace instrumented code after it executes.
The sample cbrtrace.c collects conditional branch execution traces and writes them into files.
The sample countcalls.c reports the dynamic execution count for direct calls, indirect calls, and returns in the target application. It illustrates how to perform performant inline increments and use per-thread data structures.
The sample div.c demonstrates profiling the types of values fed to a particular opcode.
The sample empty.c is provided as an example client that does nothing.
The sample inc2add.c performs a dynamic optimization: it converts the "inc" instruction to "add 1" without perturbing the target application's behavior.
The sample inline.c performs an optimization that uses the custom trace API to inline entire callees into traces.
The sample inscount.cpp reports the dynamic count of the total number of instructions executed via inserting performant clean calls which are auto-inlined by DynamoRIO. It also illustrates use of the DynamoRIO Option Parser.
The sample instrace_simple.c is provided as an example client that illustrates how to gather an instruction trace in a simple, cross-platform way. It is much slower than instrace_x86_binary, however.
The sample instrace_x86.c is provided as an example client that illustrates how to create a private code cache and perform lean procedure calls to generate an instruction trace in a performant manner. It is compiled into two different libraries: instrace_x86_text, which produces a readable, text-mode trace, but is much slower than instrace_x86_binary, which produces a binary-format trace file.
The sample instrcall.c demonstrates how to instrument direct calls, indirect calls and returns.
The sample memtrace_simple.c is provided as an example client that illustrates how to gather a memory trace in a simple, cross-platform way. It is much slower than memtrace_x86_binary, however.
The sample memtrace_x86.c is provided as an example client that illustrates how to create a private code cache and perform lean procedure calls. It is compiled into two different libraries: memtrace_x86_text, which produces a readable, text-mode trace, but is much slower than memtrace_x86_binary, which produces a binary-format trace file.
The sample hot_bbcount.c uses the drbbdup extension to count the execution of hot basic blocks which have exceeded a given hit threshold.
The sample opcode_count.cpp takes an opcode as input and uses drmgr's instrumentation events to count the number of times instructions with that opcode are executed.
The sample modxfer.c reports the control flow transfers between modules.
The sample modxfer_app2lib.c reports the control flow transfers between the application executable and other dynamic libraries and modules. It illustrates how to perform performant clean calls on different modules.
The sample opcodes.c computes dynamic execution counts broken down by instruction opcode.
The sample prefetch.c demonstrates modifying the dynamic code stream for compatibility between different processor types.
The sample signal.c demonstrates how to use the signal event.
The sample stl_test.c is provided as an example client that uses C++ STL containers.
The sample syscall.c displays how to use the system call events and API routines.
The sample tracedump.c is provided as a standalone application that disassembles a trace dump in binary format produced by the -tracedump_binary option.
The sample ssljack.c demonstrates how to hook OpenSSL and GnuTLS functions, using the drwrap extension.
We now illustrate how to use the above API to implement a simple instrumentation client for counting the number of executed call and return instructions in the input program. Full code for this example is in the file countcalls.c.
The client maintains set of three counters: num_direct_calls, num_indirect_calls, and num_returns to count three different types of instructions during execution. It keeps both thread-private and global versions of these counters. The client initializes everything by supplying the following
The client provides an event_exit routine that displays the final values of the global counters as well as a thread_exit routine that shows the counter totals on a per-thread basis.
The client keeps track of each thread's instruction counts separately. To do this, it creates a data structure that will be separately allocated for each thread:
Now the thread hooks are used to initialize the data structure and to display the thread-private totals :
The real work is done in the basic block hook. We simply look for the instructions we're interested in and insert an increment of the appropriate thread-local and global counters, remembering to save the flags, of course. This sample has separate paths for incrementing the thread private counts for shared vs. thread-private caches (see the -thread_private option) to illustrate the differences in targeting for them. Note that the shared path would work fine with private caches.
- Building the Example
For general instructions on building a client, see Building a Client.
To build the
instrcalls.c client using CMake, if
DYNAMORIO_HOME is set to the base of the DynamoRIO release package:
To build 32-bit samples when using gcc with a default of 64-bit, use:
The result is a shared library instrcalls.dll or libinstrcalls.so. To invoke the client library, follow the instructions under Deployment.
The next example shows how to use the provided control flow instrumentation routines, which allow more sophisticated profiling than simply counting instructions. Full code for this example is in the file instrcalls.c.
As in the previous example, the client is interested in direct and indirect calls and returns. The client wants to analyze the target address of each dynamic instance of a call or return. For our example, we simply dump the data in text format to a separate file for each thread. Since FILE cannot be exported from a DLL on Windows, we use the DynamoRIO-provided file_t type that hides the distinction between FILE and HANDLE to allow the same code to work on Linux and Windows. We make use of the thread initialization and exit routines to open and close the file. We store the file for a thread in the user slot in the drcontext.
The basic block hook inserts a call to a procedure for each type of instruction, using the API-provided dr_insert_call_instrumentation and dr_insert_mbr_instrumentation routines, which insert calls to procedures with a certain signature.
These procedures look like this :
The address of the instruction and the address of its target are both provided. These routines could perform some sort of analysis based on these addresses. In our example we simply print out the data.
In this example, we show how to update or replace existing instrumentation after it executes. This ability is useful for clients performing adaptive optimization. In this example, however, we are interested in recording the direction of all conditional branches, but wish to remove the overhead of instrumentation once we've gathered that information. This code could form part of a dynamic CFG builder, where we want to observe the control-flow edges that execute at runtime, but remove the instrumentation after it executes.
While DynamoRIO supports direct fragment replacement, another method for re-instrumentation is to flush the fragment from the code cache and rebuild it in the basic block event callback. In other words, we take the following approach:
- In the basic block event callback, insert separate instrumentation for the taken and fall-through edges.
- When the basic block executes, note the direction taken and flush the fragment from the code cache.
- When the basic block event triggers again, insert instrumentation only for the unseen edge. After both edges have triggered, remove all instrumentation for the cbr.
We insert separate clean calls for the taken and fall-through cases. In each clean call, we record the observed direction and immediately flush the basic block using dr_flush_region(). Since that routine removes the calling block, we redirect execution to the target or fall-through address with dr_redirect_execution(). The file cbr.c contains the full code for this sample.
For the next example we consider a client application for a simple optimization. The optimizer replaces every increment/decrement operation with a corresponding add/subtract operation if running on a Pentium 4, where the add/subtract is less expensive. For optimizations, we are less concerned with covering all the code that is executed; on the contrary, in order to amortize the optimization overhead, we only want to apply the optimization to hot code. Thus, we apply the optimization at the trace level rather than the basic block level. Full code for this example is in the file inc2add.c.
This example demonstrates the custom tracing interface. It changes DynamoRIO's tracing behavior to favor making traces that start at a call and end right after a return. It demonstrates the use of both custom trace API elements :
Full code for this example is in the file inline.c.
Because saving the x87 floating point state is very expensive, on x86 DynamoRIO seeks to do so on an as needed basis. If a client wishes to use floating point operations and is unsure whether its compiler will use x87 or not, or if it wishes to use MMX registers, it must save and restore the application's floating point state around the usage. For an inserted clean call out of the code cache, this can be conveniently done using dr_insert_clean_call() and passing true for the save_fpstate parameter. It can also be done explicitly using these routines:
These routines must be used if x87 floating point operations are performed in non-inserted-call locations, such as event callbacks. Note that there are restrictions on how these methods may be called: see the documentation in the header files for additional information. Note also that the floating point state must be saved around calls to our provided printing routines when they are used to print floats. However, it is not necessary to save and restore the floating point state around floating point operations if they are being used in the initialization or termination routines.
On ARM and AArch64 the SIMD/FP registers are always saved, so proc_save_fpstate and proc_restore_fpstate are no-ops. On x86, modern compilers typically do not use x87 operations, but to be safe clients are still advised to either avoid floating-point operations or use the preservation routines listed here.
This example client counts the number of basic blocks processed and keeps statistics on their average size using floating point operations. Full code for this example is in the file bbsize.c.
The new Windows GUI will display custom client statistics, if they are placed in shared memory with a certain name. The sample stats.c gives code for the protocol used in the form of a sample client that counts total instructions, floating-point instructions, and system calls.
Note that the stats.c example client and the Windows GUI must both be run within the same session in order for the statistics to be shared properly. They can be modified to use a "Global" prefix instead of "Local" for cross-session sharing, though this requires running with administrative privileges.