This document is meant to serve as a guide to writing C++ in the HHVM codebase, covering when and how to use various language features as well as how code should be formatted. Our goal is to ensure a consistently high-quality codebase that is easy to read and contribute to, especially for newcomers.
The HHVM codebase contains a wide variety of code from many different authors. It's been through a few different major stages in its life, including stints in multiple different repositories. As a result, large (primarily older) parts of the codebase do not fit this guide. When in doubt about how to write or format something, always prefer the advice here over existing conventions in the code. If you're already touching some older code as part of your work, please do clean it up as you go along. But please do not spend hours applying the formatting guidelines here to code you aren't otherwise modifying. While we'd love for the entire codebase to follow this guide, we'd rather get there gradually than lose lots of git history and developer time to purely cosmetic changes. That said, if cosmetic changes that you're making as part of a larger diff keep growing in scope, it may be worth pulling them out into a separate diff.
There's no well-defined cutoff here - just try to minimize effort for your reviewers. A good rule of thumb is that if your cosmetic changes require adding significant new sections to the diff (such as a function rename that touches all callsites), it should probably be pulled out into its own diff.
Every .cpp file in the HHVM repository should have a corresponding .h file with the same name, and which declares its public interfaces. We tend to value API documentation more heavily than inline implementation comments, so all declarations in headers (classes, enums, functions, constants, etc.) should be documented. See Comments and Documentation for more details.
Build times are a frequent source of pain in many large C++ projects. Try not to make large header files that mostly serve to include groups of other large header files. This can discourage "include what you use," discussed in the "What to include section".
To prevent multiple inclusion, all headers should have define guards, which should have the form
#ifndef incl_HPHP_<OPTIONAL-DIR>_<FILE>_H_
#define incl_HPHP_<OPTIONAL-DIR>_<FILE>_H_
// Lots of nice codez.
#endif
The optional dir might be something like JIT
, but does not have to include
the less interesting parts of the path, like RUNTIME
or BASE
. Just make
sure it's unique across the whole codebase.
The golden rule for what to include is "include what you use" (IWYU). In brief, this means you should not rely on any headers you include to transitively include other headers which have definitions you require. You should also prefer to forward declare structs and classes when the definition is not needed (so, "don't include what you don't use"), which helps reduce HHVM's nontrivial build time.
To make it easier to achieve IWYU, we have the following guidelines for includes:
- Always include the corresponding .h for a .cpp first, before even system headers.
- Separate includes into groups: C++ standard library headers, external projects
(such as Boost and Intel TBB), and finally headers within HHVM. Each group
should be separated by a newline, for readability. (Whether to separate HHVM
includes by subsystem (e.g.,
jit
) is left up to the author.) - Keep headers alphabetized within each group. This makes it easier to ensure that all necessary includes are made, and no extraneous ones are left behind.
- Use double quotes for Folly and HHVM headers and angle brackets for all others.
As an example, here is what the include section might look like for a file
named bytecode.cpp
:
#include "hphp/runtime/vm/bytecode.h"
#include <cstdio>
#include <string>
#include <boost/program_options/options_description.hpp>
#include "hphp/runtime/vm/class.h"
#include "hphp/runtime/vm/func.h"
#include "hphp/runtime/vm/hhbc.h"
#include "hphp/util/string.h"
Defining functions inline is encouraged for very short functions.
When defining inline member functions on structs or classes which have tight, compact interfaces (e.g., a smart pointer class, or any wrapper class), prefer to define the functions in the class definition, for concision.
However, for classes with more complex, malleable APIs where inline helpers
proliferate (e.g., Func, Class, IRInstruction, etc.), restrict the class
definition to member function prototypes only. This makes the API much
cleaner. For these classes, define all inline functions in a corresponding
-inl.h
file.
// At the bottom of func.h.
#include "hphp/runtime/vm/func-inl.h"
// After the copyright in func-inl.h.
namespace HPHP {
// Definitions go here.
}
For API's large enough to warrant -inl.h files, move all definitions into the -inl.h, even one-line accessors. This serves both to keep the API cleaner and to avoid splitting implementations among three files (the header, the inline, and the source).
Some files, with or without a corresponding -inl.h file, may need a -defs.h file. This file also contains definitions of inline functions, but it is not included by the main header. It is intended to be used when only a few callers need access to the definitions, or when the definitions can't be in the main header because it would create circular dependencies. It should be included directly by the callers that do need access to the definitions it contains.
Classes are used extensively throughout the HHVM codebase, with a number of coding conventions. See also Naming for conventions around class naming.
In C++, struct
and class
have nearly identical meanings; the only
difference lies in the default accessibility (struct
defaults to public, and
class
, to private).
We do not assign further meaning to these keywords, so we use struct
everywhere. Efforts to compile under MSVC also require that we use the same
keyword between a struct/class definition and its forward declarations due to
MSVC's failure to adhere to the C++ spec, and sticking to struct
everywhere
makes this easier.
Try to avoid the protected
keyword. It tends to give a false sense of
security about encapsulation: since anyone can inherit from your class, anyone
can access the protected
member with a little extra effort.
By default, always use explicit
for single-argument, non-initializer list
constructors.
struct MyStruct {
// We don't want to implicitly convert ints to MyStructs
explicit MyStruct(int foo);
// Two-argument constructor; no need for explicit
MyStruct(const std::string& name, int age);
};
Prefer declaring public member variables to using getters and setters. Getters and setters that don't manage object state in a nontrivial way serve to bloat the API and introduce unnecessary boilerplate.
Getters are, of course, encouraged for private members. Avoid prefixing getters
with get
:
struct Func {
const SVInfoVec& staticVars() const;
void setStaticVars(const SVInfoVec&);
BuiltinFunction builtinFuncPtr() const;
static constexpr ptrdiff_t sharedBaseOff();
};
Adhere to the following order for declarations in a struct or class definition:
- Friend classes.
- Nested classes, enums, typedefs. (If possible, just declare the nested class and define it following the enclosing class definition.)
- Constructors, destructor.
- Member functions, including static functions, documented and grouped coherently.
- Constants and static data members.
- All instance data members, regardless of accessibility.
Private member functions can be interspersed with public functions, or relegated to a single section before the data members. However, all instance properties must occur contiguously at the end of the class definition.
Very few language features are unconditionally banned. However, if you want to
use one of the more controversial constructs such as goto
or operator,()
,
you'd better have a convincing argument as to why it's better than the
alternatives. C++ is a very large and complex language and we don't want to
artificially limit what developers can do, but that puts a lot of
responsibility on your shoulders.
Avoiding restrictions on useful language features (e.g., exceptions, templates, C++11 lambdas) is a major motivating factor for maintaining our own style guide rather than adopting an existing one.
All HHVM code should be scoped in namespace HPHP { /* everything */ }
. Large
submodules such as HPHP::jit
and HPHP::rds
may be contained in their own
namespace within HPHP
. We often use anonymous namespaces instead of the
static
keyword to keep symbols internal to their translation unit. This is
mostly left up to the author; just keep in mind that classes and structs,
unlike functions and variables, must be in an anonymous namespace in order to
be properly hidden.
Avoid using namespace
whenever possible, especially in headers. It is
acceptable in .cpp
files in very limited scopes (function-level or deeper) if
it will significantly aid in readability of the code that follows. using namespace std;
at the top of a .cpp
is explicitly disallowed.
Prefer enum class
whenever possible. Old-style enums are generally only
acceptable if you expect that your type will be frequently used in an integer
context, such as array indexing.
HHVM code adheres to the some broad naming conventions.
When the convention is left open, in general, prefer the local conventions
used in the file you are working on---e.g., in a struct whose data members all
have m_namesLikeThis
, prefer m_anotherNameLikeThis
to m_this_style
, even
though the latter is found in other parts of the codebase.
Use lowerCamelCase
or lower_case_with_underscores
for all local variables,
adhering to whichever is the discernable local convention if possible. Static
variables (whether declared in an anonymous namespace or with the static
keyword) should additionally be prefixed by s
(e.g., s_funcVec
). Globals,
likewise, should be prefixed by g_
(e.g., g_context
).
All constants should be prefixed with k
and use CamelCase
, e.g.,
kInvalidHandle
. Prefer constexpr
to const
whenever possible.
As with variables, use lowerCamelCase
or lower_case_with_underscores
for
all data members. Additionally, private instance members should be prefixed
with m_
(e.g., m_cls
, m_baseCls
, m_base_cls
), and all static members
should be prefixed with s_
(e.g., s_instance
). Prefer to leave public
members unprefixed.
We generally prefer lowerCamelCase
for header-exposed functions, including
member functions, although we use lower_case_with_underscores
as well (e.g.,
hphp_session_init
), more commonly in file-local scopes. As usual, follow the
local naming conventions of the file you are working in.
If you are modeling a class after an existing pattern, such as an STL
container, prefer to follow the appropriate conventions (e.g.,
my_list::push_back
is preferred over my_list::pushBack
).
Classes use UpperCamelCase
, except when modeling existing patterns like STL
containers or smart pointers.
New namespaces should use lowercase
---and single-word namespaces are greatly
prefered for common usage. For longer namespaces (e.g., vasm_detail
), use
lower_case_with_underscores
.
Prefer correctly capitalizing acronyms in new code (e.g., prefer IRTranslator
to HhbcTranslator
). In this vein, prefer ID
(e.g., TransID
) to Id
(e.g., FuncId
) in new code.
While consistent code formatting doesn't directly affect correctness, it makes it easier to read and maintain. For this reason, we've come up with a set of guidelines about how code should be formatted. There's a good chance that some of these will conflict with your own personal preferred style, but we believe that having a consistently easy to read codebase is more important than letting each developer write code that he or she thinks is uncompromisingly beautiful.
Anything not specified here is left up to the judgment of the developer. However, this document is not set in stone, so if a particular formatting issue keeps coming up in code review it probably deserves a few lines in here.
- All indentation is to be done using spaces.
- Each indentation level is 2 spaces wide.
- Lines may be no longer than 80 characters, unless absolutely required for some syntactic reason.
- Lines should not have any trailing whitespace. This includes blank lines at non-zero indentation levels; the only character on those lines should be a newline.
- When declaring a variable or typedef, the
*
and&
characters for pointer and reference types should be adjacent to the type, not the name (e.g.,const Func*& func
). - Limit variable declarations to one per line.
The following function signatures are formatted properly:
// If arguments would fit on 1 line:
inline void Func::appendParam(bool ref, const Func::ParamInfo& info) {
}
// If the arguments need to wrap, we have two accepted styles, both of which
// are OK even if the wrapping wasn't necessary:
SSATmp* HhbcTranslator::ldClsPropAddr(Block* catchBlock,
SSATmp* ssaCls,
SSATmp* ssaName,
bool raise) {
doSomeStuff();
}
// This style is helpful if any of the function, argument, or type names
// involved are particularly long.
SSATmp* HhbcTranslator::ldClsPropAddr(
Block* catchBlock,
SSATmp* ssaCls,
SSATmp* ssaName,
bool raise
) {
doSomeStuff();
}
Always keep the type on the same line as the function name, unless it would
leave insufficient room for arguments. Do likewise with other modifying
keywords (inline
, static
, any attributes).
Wrapped arguments should always be aligned with the argument on the previous line. The opening curly brace should be on the same line as the last argument, with the exception of class constructors (see the Constructor initializer list section). When writing function declarations in headers, include argument names unless they add no value:
struct Person {
// The single string argument here is obviously the name.
void setName(const std::string&);
// Two string arguments, so it's not obvious what each one is without names.
void setFavorites(const std::string& color, const std::string& animal);
};
Conditional and loop statements should be formatted like so:
if (vmpc() == nullptr) {
fprintf(stderr, "whoops!\n");
std::abort();
}
Note that there is a single space after the if
keyword, no spaces between
condition
and the surrounding parentheses, and a single space between the )
and the {
. As with all blocks, the body should be one indentation level
deeper than the if
. If the entire statement (condition and body) fits on
one line, you may leave it on one line, omitting the curly braces. In all
other cases, the braces are required. For example, the following are OK:
if (obj->_count == 0) deleteObject(obj);
for (auto block : blocks) block->setParent(nullptr);
But these are not acceptable:
if (veryLongVariableName.hasVeryLongFieldName() &&
(rand() % 5) == 0) launchRocket();
if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
goto fail;
Avoid assignments in conditional expressions, unless the variable is declared within the condition, e.g.,
if (auto const unit = getMyUnit(from, these, args)) {
// Do stuff with unit.
}
Prefer C++11 foreach syntax to explicit iterators:
for (auto const& thing : thingVec) {
// Do stuff with thing.
}
- All binary operators should have one space on each side, except for
.
,->
,.*
, and->*
which should have zero. - Do not include redundant parentheses unless you think the expression would be
confusing to read otherwise. A good rule of thumb is that if you and/or your
reviewers have to look at a chart of operator precedence to decide if the
expression parses as expected, you probably need some extra parentheses. GCC
or clang may suggest extra parens in certain situations; we compile with
-Werror
so you must always follow those guidelines. - If an expression does not fit on one line, attempt to wrap it after an operator (rather than an identifier or keyword) and indent subsequent lines with the beginning of the current parenthesis/brace nesting level. For example, here are some long expressions, formatted appropriately:
if (RuntimeOption::EvalJitRegionSelector != "" &&
(RuntimeOption::EvalHHIRRefcountOpts ||
RuntimeOption::EvalHHITExtraOptPass) &&
Func::numLoadedFuncs() < 600) {
// ...
}
longFunctionName(argumentTheFirst,
argumentTheSecond,
argumentTheThird,
argumentTheFourth);
- Function calls should be formatted primarily using the previous rule. If one or more of the arguments to the function is very wide, it may be necessary to shift all the arguments down one line and align them one level deeper than the current scope. This is always acceptable, but is especially common when passing lambdas:
m_irb->ifThen(
[&](Block* taken) {
gen(CheckType, Type::Int, taken, src);
},
[&] {
doSomeStuff();
lotsOfNonTrivialCode();
// etc...
}
);
If an initializer list can be kept on a single line, it is fine to do so:
MyClass::MyClass(uint64_t idx) : m_idx(idx) {}
MyClass::MyClass(const Func* func) : m_idx(-1) {
// Do stuff.
}
Otherwise, it is always correct to format lists thusly:
MyClass::MyClass(const Class* cls, const Func* func, const Class* ctx)
: m_cls(cls)
, m_func(func)
, m_ctx(ctx)
, m_isMyConditionMet(false)
{}
MyClass::MyClass(const Class* cls, const Func* func)
: m_cls(cls)
, m_func(func)
, m_ctx(nullptr)
, m_isMyConditionMet(false)
{
// Do stuff.
}
We don't nest namespaces very deeply, so prefer to keep the scoping to a single line:
namespace HPHP { namespace jit { namespace x64 {
///////////////////////////////////////////////////////////////////////////////
/*
* Some nice documentation.
*/
struct SomeNiceThing {
// some nice properties
};
///////////////////////////////////////////////////////////////////////////////
}}}
Do not increase the indentation level when entering namespace scope. Instead, consider adding a line of forward slashes as a separator, to more clearly delineate the namespace (this is especially useful for anonymous namespaces in source files). This form of delineation is encouraged, but we have no strict convention for its formatting (you'll see 70- or 79- or 80-character separators, with or without an extra newline between it and the braces, etc.).
All public and private APIs in headers should be documented in detail. Names and notions which are not obvious (e.g., "persistent" or "simple") should be explained. Preconditions and postconditions should be noted.
Inline code comments are encouraged for complex logic, but their density is left up to the author. Rather than summarizing/paraphrasing what your code is doing, focus on explaining what overarching goal the code is achieving and/or why that goal is necessary or desirable.
Here are some comment styles we use or avoid:
// This style of comment is the most common for relatively short inline
// comments. It's fine if it's multi-line.
//
// It's also fine if it has line breaks. The extra newline aids readability in
// this case.
/*
* This style of comment is the right one to use for struct/function
* documentation. Prefer one star on the opening line, as opposed to the
* doxygen standard of two.
*
* This is also sometimes used for inline code comments, although the // style
* makes it easier to comment out blocks of code.
*/
struct ClassLikeThing {
std::vector<const Func*> methods; // This is fine for short annotations.
/* This is also ok, though try not to mix and match too much in a file. */
std::vector<const ClassLikeThing*> parents;
};
/* Don't write multiline comments where some lines are missing their prefix.
This is pretty weird. */
Try to use complete sentences in all but the shortest of comments. All comments should be flowed to 79 characters in width.
Delineate sections of code with a line of forward slashes. There is no strict
convention, but prefer lines of slashes to other delineators (e.g., /*****/
,
five newlines, ASCII-art cartoon characters).
All files must begin with the HHVM copyright notice:
/*
+----------------------------------------------------------------------+
| HipHop for PHP |
+----------------------------------------------------------------------+
| Copyright (c) 2010-201x Facebook, Inc. (http://www.facebook.com) |
+----------------------------------------------------------------------+
| This source file is subject to version 3.01 of the PHP license, |
| that is bundled with this package in the file LICENSE, and is |
| available through the world-wide-web at the following url: |
| http://www.php.net/license/3_01.txt |
| If you did not receive a copy of the PHP license and are unable to |
| obtain it through the world-wide-web, please send a note to |
| [email protected] so we can mail you a copy immediately. |
+----------------------------------------------------------------------+
*/
// File contents start here.