DynamoRIO
|
The overall goal is to make the source code as readable and as self-documenting as possible. Everyone following the same style guidelines is an important part of keeping the code consistent and maintainable.
Automated Formatting
We employ automated code formatting via clang-format
version 14.0. The .clang-format
top-level file specifies the style rules for all .h
, .c
, and .cpp
source code files. Developers are expected to set up their editors to use clang-format
when saving each file (see the clang-format
documentation for pointers to vim, emacs, and Visual Studio setup instructions). Our test suite includes a format check and will fail any code whose formatting does not match the clang-format
output.
Legacy Code
Some of the style conventions have changed over time, but we have not wanted to incur the cost in time and history confusion of changing all old code. Thus, you may observe old code that does not comply with some of these conventions. These listed conventions overrule surrounding code! Please change the style of old code when you are making other changes to the same lines.
Naming Conventions
- Header files consisting of exported interface content that is also used internally are named with an
_api.h
suffix: e.g.,encode_api.h
. If the content is solely used externally, it should be named the same as its exported name: e.g.,dr_inject.h
. For the former case, where the exported name is different, the include guards should use the exported name. Variable and function names use only lowercase letters. Multi-word function and variable names are all lowercase with underscores delimiting words. Do not use CamelCase for names, unless mirroring Windows-defined data structures.
GOOD:
instr_get_target()
BAD:
instrGetTarget()
- Type names are all lowercase, with underscores dividing words, and ending in
_t
:build_bb_t
This is true for C++ class names as well.
- The name of a struct in a C typedef should be the type name with an underscore prefixed: typedef struct _build_bb_t {...} build_bb_t;
- Constants should be in all capital letters, with underscores dividing words. Enum members should use a common descriptive prefix. static const int MAX_SIZE = 256;enum {DUMPCORE_DEADLOCK = 0x0004,DUMPCORE_ASSERTION = 0x0008,};
- For C++ code, class member fields are named like other variables except they contain an underscore suffix. Structs that contain methods other than constructors also have this suffix (but usually that should be a class); simple structs with no method other than a constructor do not have this suffix, nor do constants if they follow the constant naming conventions. class apple_t {...int num_seeds_;};struct truck_t {int num_wheels;};
- Preprocessor defines and macros should be in all capital letters, with underscores dividing words. #ifdef WINDOWS# define IF_WINDOWS(x) x#else# define IF_WINDOWS(x)#endif
- Preprocessor defines that include a leading or trailing comma should have a corresponding leading or trailing underscore: #define _IF_WINDOWS(x) , x#define IF_WINDOWS_(x) x,
- Functions that operate on a data structure should contain that structure as a prefix. For example, all of the routines that operate on the
instr_t
struct begin withinstr_
. - In
core/
, short names or any global name with a chance of colliding with names from an including application linking statically should be qualified with ad_r_
prefx: e.g.,d_r_dispatch
. This is distinct from thedr_
prefix which is used on exported interface names. - Use
static
when possible for every function or variable that is not needed outside of its own file. - Do not shadow global variables (or variables in containing scopes) by using local variables of the same name: choose a distinct name for the local variable.
- Template parameters in C++ should have a descriptive CamelCase identifier.
Types
- See above for naming conventions for types.
When declaring a function with no arguments, always explicitly use the
void
keyword. Otherwise the compiler will not be able to check whether you are incorrectly passing arguments to that function.GOOD:
int foo(void);
BAD:
int foo();
Use the
IN
,OUT
, andINOUT
labels to describe function parameters. This is a recent addition to DynamoRIO so you will see many older functions without these labels, but use them on all new functions.GOOD:
int foo(IN int length, OUT char *buf);
BAD:
int foo(int length, char **buf);
Only use boolean types as conditionals. This means using explicit NULL comparisons and result comparisons. In particular with functions like strcmp() and memcmp(), the use of ! is counter-intuitive.
GOOD:
if (p == NULL) ...
BAD:
if (!p)
GOOD:
if (p != NULL) ...
BAD:
if (p)
GOOD:
if (strncmp(...) == 0) ...
BAD:
if (!strncmp(...))
- Use constants of the appropriate type. Assign or compare a character to '\0' not to
0
. It's much easier to read
if (i == 0)
thanif (0 == i)
. The compiler, with all warnings turned on (which we have), will warn you if you use assignment rather than equality.GOOD:
if (i == 0) ...
BAD:
if (0 == i)
Use the
TEST
and related macros for testing bits.GOOD:
if (TEST(BITMASK, x))
BAD:
if ((x & BITMASK) != 0)
- Write code that is 32-bit and 64-bit aware:
- Use int and uint for 32-bit integers. Do not use long as its size is 64-bit for Linux but 32-bit for Windows. We assume that int is a 32-bit type.
- Use int64 and uint64 for 64-bit integers. Use
INT64_FORMAT
and related macros for printing 64-bit integers. - Use ptr_uint_t and ptr_int_t for pointer-sized integers.
- Use size_t for sizes of memory regions.
- Use reg_t for register-sized values whose type is not known.
- Use
ASSERT_TRUNCATE
macros when casting to a smaller type. - Use
PFX
(rather than p, which is inconsistent across compilers) and other printf macros for printing pointer-sized variables in code using general printing libraries. For code that exclusively uses DR's own printing facilities, p is allowed: its improved code readability and simplicity outweigh the risk of such code being copied into non-DR locations and resulting in inconsistent output. - When generating code or writing assembler code, be aware of stack alignment restrictions.
- Invalid addresses, either pointers to our data structures or application addresses that we're manipulating, have the value NULL, not 0. 0 is only for arithmetic types.
const
makes code easier to read and lets the compiler complain about errors and generate better code. It is also required for the most efficient self-protection. Use whenever possible. However, do not mark simple scalar type function parameters asconst
.Place
*
(or&
for C++ references) prefixing variable names (C style), not suffixing type names (Java style):GOOD:
char *foo;
BAD:
char* foo;
In a struct, union, or class, list each field on its own line with its own type declaration, even when sharing the type of the prior field. Similarly, declare global variables separately. Local variables of the same type can optionally be combined on a line.
GOOD:
struct foo {int field1;int field2;};BAD:
struct foo {int field1, field2;};- Do not assume that
char
is signed: use oursbyte
typedef for a signed one-byte type.
Commenting Conventions
- For C code,
/∗ ∗/
comments are preferable to//
. Put stars on each line of a multi-line comment, like this:/* multi-line comment * with stars */
The trailing∗/
can be either on its own line or the end of the preceding line, but on its own line is preferred.
For C++ code, //
comments are allowed.
- Make liberal use of comments. However, too many comments can impair readability. Choose self-descriptive function and variable names to reduce the number of comments needed.
Do not use large, clunky function headers that simply duplicate information in the code itself. Such headers tend to contain stale, incorrect information, for two reasons: the code is often updated without maintaining the header, and since the headers are a pain to type they are often copied from other functions and not completely modified for their new home. They also make it harder to see the code or to group related functions, as they take up so much screen space. It is better to have leaner, more maintainable, and more readable implementation files by using self-descriptive function and parameter names and placing comments for function parameters next to the parameters themselves.
GOOD:
/* Retrieves the name of the logfile for a particular thread. * Returns false if no such thread exists. */ bool get_logfile(IN thread_id_t thread, OUT char **fname, IN size_t fname_len)
BAD:
/*------------------------------------------------------ * Name: get_logfile * * Purpose: * Retrieves the name of the logfile for a particular thread. * * Parameters: * [IN] thread = which thread * [OUT](IN] fname = where to store the logfile name * [IN] fname_len = the size of the fname buffer * * Returns: * True if successful. * False if no such thread exists. * * Side effects: * None. * ------------------------------------------------------ */ bool get_logfile(thread_id_t thread, char *fname, size_t fname_len)
Use doxygen comments on all function and type declarations that are exported as part of the API. For comments starting with
/∗∗
, leave the rest of the first line empty, unless the entire comment is a single line. Some examples (ignore leading dots – only there to work around GitHub markdown problems with leading spaces in literal blocks in list entries):DR_API /** * Returns the entry point of the function with the given name in the module * with the given base. Returns NULL on failure. * \note Currently Windows only. */ generic_func_t dr_get_proc_address(IN module_handle_t lib, IN const char *name); /** * Data structure passed with a signal event. Contains the machine * context at the signal interruption point and other signal * information. */ typedef struct _dr_siginfo_t { int sig; /**< The signal number. */ void *drcontext; /**< The context of the thread receiving the signal. */ dr_mcontext_t mcontext; /**< The machine state at the signal interruption point. */ siginfo_t siginfo; /**< The signal information provided by the kernel. **/ } dr_siginfo_t;
Within doxygen comments, create links using parentheses for functions
foo()
and a leading#
for other items such as types#dr_siginfo_t
or defines#DR_REG_START_GPR
. See the doxygen documentation for more information: http://www.doxygen.nl/manual/autolink.html- NEVER check in commented-out code. This is unacceptable. If you feel strongly that you need to leave code in that is disabled, use conditional compilation (e.g.,
#if DISABLED_UNTIL_BUG_812_IS_FIXED
), and perhaps additionally explain in a comment why the code is disabled. - Sloppy comments full of misspelled words, etc. are an indication of carelessness. We do not want carelessly written code, and we do not want carelessly written comments.
- Comments that contain more than one sentence should be properly capitalized and punctuated and should use complete sentences. Single-sentence comments should also prefer capitalization, punctuation, and to use a complete sentence when occupying an entire line. For comments that are inside a line of code or at the end of a line of code, sentence fragments or phrases are fine.
- Use
XXX
in comments to indicate code that could be optimized or something that may warrant re-examination later. Include the issue number using the syntaxi#<number>
. For example (ignore leading dots – only there to work around GitHub markdown problems with leading spaces in literal blocks in list entries):/* XXX i#391: this could be done more efficiently via ... */
- Use
FIXME
orTODO
in comments to indicate missing features that are required and not just optimizations or optional improvements (useXXX
for those). Include the issue number using the syntaxi#<number>
. For example (ignore leading dots – only there to work around GitHub markdown problems with leading spaces in literal blocks in list entries):/* FIXME i#999: we do not yet handle a corner case where ... */
- Mark any temporary or unfinished code unsuitable for committing with a
NOCHECKIN
comment. Themake/codereview.cmake
script will remind you to clean up the code.x = 4; /* NOCHECKIN: temporary debugging change */
- For banner comments that separate out groups of related functions, use the following style (ignore leading dots – only there to work around GitHub markdown problems with leading spaces in literal blocks in list entries):
/**************************************************************************** * Name for this group of functions */
If a closing marker is needed use this style:/* ****************************************************************************/
Warnings Are Errors
Uninitialized variables warning (W4701 for cl): Don't initialize variables when you don't need to, so that we can still have good warnings about uninitialized variables in the future. Only if the compiler can't analyze code properly is it better to err on the side of a deterministic bug and set to 0 or
{0}
.Use
do {} while ()
loops to help the compiler figure out that variables will get initialized. The generated code on those constructs is faster and better predicted (although optimizations should be able to transform simple loops).- For suggested use of static analysis tools: PreFAST or /analyze for new code, refer to case 3966.
Program Structure
- Keep the line length to 90 characters or less.
Use an indentation level of 4 spaces (no tabs, always expand them to spaces when saving the file). (Exception: in CMakeLists.txt and other CMake scripts, use an indentation level of 2 spaces.)
WARNING: Emacs defaults are not always correct here. Make sure your .emacs contains the following:
; always expand tabs to spaces(setq-default indent-tabs-mode 'nil); want "gnu" style but indent of 4:(setq c-basic-offset 4)(add-hook 'c-mode-hook '(lambda ()(setq c-basic-offset 4)))For CMake, use cmake-mode which does default to 2 spaces.
- K&R-style braces: opening braces at the end of the line preceding the new code block, closing braces on their own line at the same indentation as the line preceding the code block. Functions are an exception – see below.
Functions should have their type on a separate line from their name. Place the function's opening brace on a line by itself at zero indentation.
intfoo(int x, int y){return 42;}Function declarations should also have the type on a separate line, although this rule can be relaxed for short (single-line) signatures with a one-line comment or no comment beforehand.
Put spaces after commas in parameter and argument lists
GOOD:
foo(x, y, z);
BAD:
foo(x,y,z);
Do not put spaces between a function name and the following parenthesis. Do put a space between a
for
,if
,while
,do
, orswitch
and the following parenthesis. Do not put spaces after an opening parenthesis or before a closing parenthesis, unless the interior expression is complex and contains multiple layers of parentheses.GOOD:
foo(x, y, z);
BAD:
foo (x, y, z);
BAD:
foo( x, y, z);
BAD:
foo(x, y, z );
GOOD:
if (x == 6)
BAD:
if( x==6 )
Always put the body of an if or loop on a separate line from the line containing the keyword.
GOOD:
if (x == 6)y = 5;BAD:
if (x==6) y = 5;- A multi-line (not just multi-statement) body (of an if, loop, etc.) should always be surrounded with braces (to avoid errors in later statements arising from indentation mistakes).
- Statements should always begin on a new line (do not put multiple statements on the same line).
The case statements of a switch statement should not be indented: they should line up with the switch itself. GOOD:
BAD:
- Indent nested preprocessor statements. The
#
character must be in the first column, but the rest of the statement can be indented.#ifdef OUTERDEF# ifdef INNERDEF# define INSIDE# endif#else# define OUTSIDE#endif - Macros, globals, and typedefs should be declared either at the top of the file or at the top of a related group of functions that are delineated by a banner comment. Do not place global declarations or defines at random places in the middle of a file.
- To make the code easier to read, use the
DODEBUG
,DOSTATS
, orDOLOG
macros, or theIF_WINDOWS
and related macros, rather than ifdefs, for common defines. - Do not use
DO_ONCE(SYSLOG_INTERNAL
. Instead use two new macros:DODEBUG_ONCE
andSYSLOG_INTERNAL_*_ONCE
. - Use
make/codereview.cmake
's style checks to examine the code for known poor coding patterns. In the future we may add checks usingastyle
(issue 83). - In .asm files, place opcodes on column 8 and start operands on column 17.
- Multi-statement macros should always be inside "do { ... } while (0)" to avoid mistaken sequences with use as the body of an if() or other construct.
When using DEBUG_DECLARE or other conditional macros at the start of a line, move any code not within the macro to the subsequent line, to aid readability and avoid the reader skipping over it under the assumption that it's debug-only. E.g.:
GOOD:
DEBUG_DECLARE(bool res =)foo(bar);BAD:
DEBUG_DECLARE(bool res =) foo(bar);Avoid embedding assignments inside expressions. We consider a separate assignment statement to be more readable. E.g.:
GOOD:
x = foo();if (x == 0) { ...BAD:
if ((x = foo()) == 0) { ...Avoid embedding non-const expressions inside macros. We consider a separate expression statement to be more readable, as well as avoiding hidden errors when macros such as ASSERT are disabled in non-debug build. Example:
GOOD:
bool success = set_some_value();ASSERT(success);BAD:
ASSERT(set_some_value());Use named constants rather than "magic values" embedded in the code. Recognizing and naming constants up front and centralizing them makes assumptions clearer, code more readable, and modifying and maintaining the code easier.
GOOD:
char buf[MAXIMUM_LINE_LENGTH];
BAD:
char buf[128];
File Organization
- Write OS-independent code as much as possible and keep it in the base
core/
directory. If code must diverge for Windows versus Linux, provide an OS-independent interface documented incore/os_shared.h
and implemented separately incore/unix/
andcore/windows/
.
C++
- While the core DynamoRIO library and API are C, we do support C++ clients and have some C++ tests and clients ourselves. For broad compiler support we limit our code to C++11.
- C++ exception use is not allowed, for maximum interoperability to enable using libraries and source code in other environments where exceptions are not permitted.