DynamoRIO
|
The drsyms
DynamoRIO Extension provides symbol information. Currently drsyms supports reading symbol information from Windows PDB files or Linux ELF, Mac Mach-O, or Windows PECOFF files with DWARF2 line information.
Setup
To use drsyms
with your client simply include this line in your client's CMakeLists.txt
file:
That will automatically set up the include path and library dependence.
The drsyms
library on Windows relies on the dbghelp.dll
library from Microsoft. A copy of the dbghelp.dll
library is included. You can also download the Debugging Tools for Windows package from http://www.microsoft.com/whdc/devtools/debugging/default.mspx and place the dbghelp.dll
in the same directory as either drsyms.dll
or as your client library. Version 6.0 or higher of dbghelp.dll
is required, and 6.3 or higher is recommended. Recent versions of Windows do have these versions of dbghelp.dll
in their system directories, but be aware that those versions are not redistributable.
The 64-bit version 6.5.0003.7 of dbghelp.dll
(the one included with Visual Studio 2005) has a bug that can cause crashes. We recommend avoiding this version.
More recent versions of dbghelp.dll
on Windows can use significant amounts of stack space. We have observed usage of 36KB. This is within the default stack size for a DynamoRIO client, but be aware if trying to trim the stack size via the DynamoRIO runtime option -stack_size
that anything lower than 36KB is likely to be problematic on Windows.
On Windows, drsyms
does support Cygwin and MinGW symbols, and will find file and line information if in DWARF2 format. The stabs
format is not supported. Cygwin and MinGW gcc versions prior to 4.3 use stabs
by default; pass the -ggdb
flag to request DWARF2.
The drsyms
library on Linux and Mac uses bundled copies of libelf
, libdwarf
, and libelftc
built from the elftoolchain project and requires no setup.
Search Paths
On Linux, drsyms
will look in the default debug directories for symbols for a library: a .debug
subdirectory of the library's directory or in /usr/lib/debug
.
On Mac, for binary foo
, drsyms
will look in the default dsymutil
path of foo.dSYM/Contents/Resources/DWARF/foo
for symbols.
On Windows, the _NT_SYMBOL_PATH
environment variable is honored by drsyms
as a local cache of pdb
files. However, drsyms
does not support symbol store paths (those that contain srv
) when used inside of a DynamoRIO client. Such paths should work fine when used in standalone applications, provided that both symsrv.dll
and dbghelp.dll
are locatable by the Windows loader.
Exported Functions
For clients interested only in locating specific functions exported from a library, it is not necessary to use drsyms
as the core DynamoRIO API provides functions for iterating modules and looking up module exports. The following core DynamoRIO API functions are relevant:
- dr_get_proc_address()
- dr_get_application_name()
- dr_register_module_load_event()
- dr_lookup_module()
- dr_lookup_module_by_name()
- dr_module_iterator_start()
These core API routines can be more efficient to use than the drsyms
routines, as the latter must locate, load, and parse debug information. However, the core API provides no mechanism for iterating over all exported functions: for that, drsyms
must be used. The drsyms
API will operate even when no debug information is available, in which case only exported functions will be considered.
On Linux, drsyms
searches both .dynsym and
.symtab, while dr_get_proc_address() only searches
.dynsym. Global symbols in an executable (i.e., a non-library) are only present in
.symtab by default. If the executable is built with the
-rdynamic
flag to gcc
, its global symbols will be placed in .dynsym and thus dr_get_proc_address() will find them. Regardless of how it was built, if it's not stripped,
drsyms
will find the global symbols.
API
All functions return a success code of type drsym_error_t.
Prior to use, drsyms
must be initialized by a call to drsym_init(). The drsyms
API will eventually support both sideline and online use, and the parameter to drsym_init() will specify the symbol server to use for sideline use. Today only online use is supported and NULL
should be passed.
Symbol lookup is supported in both directions: from an address to a symbol via drsym_lookup_address(), and from a symbol to an address via drsym_lookup_symbol(). All symbols in a given module can be enumerated via drsym_enumerate_symbols(), though on Windows using drsym_search_symbols() for a particular match where a non-full search is not required (i.e., the search is only targeting function symbols) is significantly faster and uses less memory than a full enumeration. In fact, drsym_search_symbols() is usually faster than drsym_lookup_symbol().
For C++ applications, each routine that handles symbols accepts a flags
argument that controls how or whether C++ symbols are demangled or undecorated. Currently there are three modes:
DRSYM_LEAVE_MANGLED:
Matches against or returns the mangled C++ symbol.DRSYM_DEMANGLE:
Matches against or returns a "short" demangled C++ symbol. Templates and are collapsed to <> while parameters are omitted entirely without parentheses.DRSYM_DEMANGLE_FULL:
Matches against or returns the fully demangled C++ symbol with both template arguments and parameter types.DRSYM_DEMANGLE_PDB_TEMPLATES:
Only applies to Windows PDB symbols. Does not affect matches, but returns symbols with parameters omitted without parentheses and templates fully expanded.DRSYM_DEFAULT_FLAGS:
This is equivalent toDRSYM_DEMANGLE
.
On Windows, this functionality is reduced due to the limitations of dbghelp.dll
. For all routines except drsym_demangle_symbol(), the only flag that the user passes that has any effect is DRSYM_DEMANGLE_PDB_TEMPLATES: all symbols returned or searched either use default demangling or additionally use DRSYM_DEMANGLE_PDB_TEMPLATES if requested.
For an example of usage see the instrcalls
sample client distributed with DynamoRIO.
When finished with the library, call drsym_exit().
Memory Usage
When running large applications, loading debug information for all of their modules can occupy a lot of memory, from hundreds of megabytes into the gigabyte range. Use the drsym_free_resources() routine when finished with a module to unload its debug information. Normally, a client will query for symbols in the module load event, and then won't need symbols again until perhaps a callstack walk later on. Thus, we recommend that a client call drsym_free_resources() at the end of its module load event. Due to fragmentation concerns, it is not easy for drsyms itself to perform internal garbage collection at any high frequency.
Module Bases
All drsyms
functions operate on relative offsets from a module base, rather than absolute addresses for any given mapping. These are offsets from the in-memory mapping of the module. For Mach-O executables, the module base is considered to be placed after any __PAGEZERO segment. This is especially important for PIE executables on MacOS, where the randomized shift is between the __PAGEZERO segment and the __TEXT segment.