DynamoRIO
Symbol Access Library

The drsyms DynamoRIO Extension provides symbol information. Currently drsyms supports reading symbol information from Windows PDB files or Linux ELF, Mac Mach-O, or Windows PECOFF files with DWARF2 line information.

Setup

To use drsyms with your client simply include this line in your client's CMakeLists.txt file:

use_DynamoRIO_extension(clientname drsyms)

That will automatically set up the include path and library dependence.

The drsyms library on Windows relies on the dbghelp.dll library from Microsoft. A copy of the dbghelp.dll library is included. You can also download the Debugging Tools for Windows package from http://www.microsoft.com/whdc/devtools/debugging/default.mspx and place the dbghelp.dll in the same directory as either drsyms.dll or as your client library. Version 6.0 or higher of dbghelp.dll is required, and 6.3 or higher is recommended. Recent versions of Windows do have these versions of dbghelp.dll in their system directories, but be aware that those versions are not redistributable.

The 64-bit version 6.5.0003.7 of dbghelp.dll (the one included with Visual Studio 2005) has a bug that can cause crashes. We recommend avoiding this version.

More recent versions of dbghelp.dll on Windows can use significant amounts of stack space. We have observed usage of 36KB. This is within the default stack size for a DynamoRIO client, but be aware if trying to trim the stack size via the DynamoRIO runtime option -stack_size that anything lower than 36KB is likely to be problematic on Windows.

On Windows, drsyms does support Cygwin and MinGW symbols, and will find file and line information if in DWARF2 format. The stabs format is not supported. Cygwin and MinGW gcc versions prior to 4.3 use stabs by default; pass the -ggdb flag to request DWARF2.

The drsyms library on Linux and Mac uses bundled copies of libelf, libdwarf, and libelftc built from the elftoolchain project and requires no setup.

Search Paths

On Linux, drsyms will look in the default debug directories for symbols for a library: a .debug subdirectory of the library's directory or in /usr/lib/debug.

On Mac, for binary foo, drsyms will look in the default dsymutil path of foo.dSYM/Contents/Resources/DWARF/foo for symbols.

On Windows, the _NT_SYMBOL_PATH environment variable is honored by drsyms as a local cache of pdb files. However, drsyms does not support symbol store paths (those that contain srv) when used inside of a DynamoRIO client. Such paths should work fine when used in standalone applications, provided that both symsrv.dll and dbghelp.dll are locatable by the Windows loader.

Exported Functions

For clients interested only in locating specific functions exported from a library, it is not necessary to use drsyms as the core DynamoRIO API provides functions for iterating modules and looking up module exports. The following core DynamoRIO API functions are relevant:

These core API routines can be more efficient to use than the drsyms routines, as the latter must locate, load, and parse debug information. However, the core API provides no mechanism for iterating over all exported functions: for that, drsyms must be used. The drsyms API will operate even when no debug information is available, in which case only exported functions will be considered.

On Linux, drsyms searches both .dynsym and .symtab, while dr_get_proc_address() only searches .dynsym. Global symbols in an executable (i.e., a non-library) are only present in .symtab by default. If the executable is built with the -rdynamic flag to gcc, its global symbols will be placed in .dynsym and thus dr_get_proc_address() will find them. Regardless of how it was built, if it's not stripped, drsyms will find the global symbols.

API

All functions return a success code of type drsym_error_t.

Prior to use, drsyms must be initialized by a call to drsym_init(). The drsyms API will eventually support both sideline and online use, and the parameter to drsym_init() will specify the symbol server to use for sideline use. Today only online use is supported and NULL should be passed.

Symbol lookup is supported in both directions: from an address to a symbol via drsym_lookup_address(), and from a symbol to an address via drsym_lookup_symbol(). All symbols in a given module can be enumerated via drsym_enumerate_symbols(), though on Windows using drsym_search_symbols() for a particular match where a non-full search is not required (i.e., the search is only targeting function symbols) is significantly faster and uses less memory than a full enumeration. In fact, drsym_search_symbols() is usually faster than drsym_lookup_symbol().

For C++ applications, each routine that handles symbols accepts a flags argument that controls how or whether C++ symbols are demangled or undecorated. Currently there are three modes:

  • DRSYM_LEAVE_MANGLED: Matches against or returns the mangled C++ symbol.
  • DRSYM_DEMANGLE: Matches against or returns a "short" demangled C++ symbol. Templates and are collapsed to <> while parameters are omitted entirely without parentheses.
  • DRSYM_DEMANGLE_FULL: Matches against or returns the fully demangled C++ symbol with both template arguments and parameter types.
  • DRSYM_DEMANGLE_PDB_TEMPLATES: Only applies to Windows PDB symbols. Does not affect matches, but returns symbols with parameters omitted without parentheses and templates fully expanded.
  • DRSYM_DEFAULT_FLAGS: This is equivalent to DRSYM_DEMANGLE.

On Windows, this functionality is reduced due to the limitations of dbghelp.dll. For all routines except drsym_demangle_symbol(), the only flag that the user passes that has any effect is DRSYM_DEMANGLE_PDB_TEMPLATES: all symbols returned or searched either use default demangling or additionally use DRSYM_DEMANGLE_PDB_TEMPLATES if requested.

For an example of usage see the instrcalls sample client distributed with DynamoRIO.

When finished with the library, call drsym_exit().

Memory Usage

When running large applications, loading debug information for all of their modules can occupy a lot of memory, from hundreds of megabytes into the gigabyte range. Use the drsym_free_resources() routine when finished with a module to unload its debug information. Normally, a client will query for symbols in the module load event, and then won't need symbols again until perhaps a callstack walk later on. Thus, we recommend that a client call drsym_free_resources() at the end of its module load event. Due to fragmentation concerns, it is not easy for drsyms itself to perform internal garbage collection at any high frequency.

Module Bases

All drsyms functions operate on relative offsets from a module base, rather than absolute addresses for any given mapping. These are offsets from the in-memory mapping of the module. For Mach-O executables, the module base is considered to be placed after any __PAGEZERO segment. This is especially important for PIE executables on MacOS, where the randomized shift is between the __PAGEZERO segment and the __TEXT segment.