Skip to content

Commit

Permalink
[native] Add LinuxMemoryChecker check/warning to ensure system-mem-li…
Browse files Browse the repository at this point in the history
…mit-gb is reasonably set

Add additional checks and warnings to ensure
system-memory-gb <= system-mem-limit-gb < available machine memory of deployment.

For cgroup v1:
Set available machine memory of deployment to be the smaller number
between /proc/meminfo and memory.limit_in_bytes.

For cgroup v2:
Set available machine memory of deployment to be the smaller number
between /proc/meminfo and memory.max.
If memory.max contains "max" string, then look at
/proc/meminfo for the MemTotal, otherwise use the
value in memory.max.
  • Loading branch information
minhancao committed Feb 11, 2025
1 parent d827df7 commit 988ccf0
Show file tree
Hide file tree
Showing 7 changed files with 407 additions and 125 deletions.
13 changes: 9 additions & 4 deletions presto-docs/src/main/sphinx/presto_cpp/properties.rst
Original file line number Diff line number Diff line change
Expand Up @@ -134,13 +134,15 @@ The configuration properties of Presto C++ workers are described here, in alphab
^^^^^^^^^^^^^^^^^^^^

* **Type:** ``integer``
* **Default value:** ``40``
* **Default value:** ``57``

Memory allocation limit enforced by an internal memory allocator. It consists of two parts:
1) Memory used by the queries as specified in ``query-memory-gb``; 2) Memory used by the
system, such as disk spilling and cache prefetch.

Set ``system-memory-gb`` to the available machine memory of the deployment.
Set ``system-memory-gb`` to about 90% of available machine memory of the deployment.
This allows some buffer room to handle unaccounted memory in order to prevent out-of-memory conditions.
Default value of 57 gb is calculated based on available machine memory of 64 gb.


``query-memory-gb``
Expand Down Expand Up @@ -309,11 +311,14 @@ server is under low memory pressure.
^^^^^^^^^^^^^^^^^^^^^^^

* **Type:** ``integer``
* **Default value:** ``55``
* **Default value:** ``60``

Specifies the system memory limit that triggers the memory pushback or heap dump if
the server memory usage is beyond this limit. A value of zero means no limit is set.
This only applies if ``system-mem-pushback-enabled`` is ``true``.
This only applies if ``system-mem-pushback-enabled`` is ``true``.
Set ``system-mem-limit-gb`` to be greater than or equal to system-memory-gb but not
higher than the available machine memory of the deployment.
Default value of 60 gb is calculated based on available machine memory of 64 gb.

``system-mem-shrink-gb``
^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down
141 changes: 126 additions & 15 deletions presto-native-execution/presto_cpp/main/LinuxMemoryChecker.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,12 @@
#include <sys/stat.h>
#include "presto_cpp/main/PeriodicMemoryChecker.h"
#include "presto_cpp/main/common/Configs.h"
#include "presto_cpp/main/common/Utils.h"

namespace facebook::presto {

using int128_t = __int128_t;

class LinuxMemoryChecker : public PeriodicMemoryChecker {
public:
explicit LinuxMemoryChecker(const PeriodicMemoryChecker::Config& config)
Expand All @@ -29,13 +32,32 @@ class LinuxMemoryChecker : public PeriodicMemoryChecker {
// it's mounted.
struct stat buffer;
if ((stat(kCgroupV1Path, &buffer) == 0)) {
statFile_ = kCgroupV1Path;
} else if ((stat(kCgroupV2Path, &buffer) == 0)) {
statFile_ = kCgroupV2Path;
} else {
statFile_ = "None";
PRESTO_STARTUP_LOG(INFO) << "Using cgroup v1.";
if (stat(kCgroupV1MemStatFile, &buffer) == 0) {
memStatFile_ = kCgroupV1MemStatFile;
}
if ((stat(kCgroupV1MaxMemFile, &buffer) == 0)) {
memMaxFile_ = kCgroupV1MaxMemFile;
}
}
LOG(INFO) << fmt::format("Using memory stat file {}", statFile_);

// In cgroup v2.
else {
PRESTO_STARTUP_LOG(INFO) << "Using cgroup v2.";
if (stat(kCgroupV2MemStatFile, &buffer) == 0) {
memStatFile_ = kCgroupV2MemStatFile;
}
if ((stat(kCgroupV2MaxMemFile, &buffer) == 0)) {
memMaxFile_ = kCgroupV2MaxMemFile;
}
}

PRESTO_STARTUP_LOG(INFO) << fmt::format(
"Using memory stat file: {}",
memStatFile_.empty() ? memInfoFile_ : memStatFile_);
PRESTO_STARTUP_LOG(INFO) << fmt::format(
"Using memory max file {}",
memMaxFile_.empty() ? memInfoFile_ : memMaxFile_);
}

~LinuxMemoryChecker() override {}
Expand All @@ -45,8 +67,91 @@ class LinuxMemoryChecker : public PeriodicMemoryChecker {
}

void setStatFile(std::string statFile) {
statFile_ = statFile;
LOG(INFO) << fmt::format("Changed to using memory stat file {}", statFile_);
memStatFile_ = statFile;
LOG(INFO) << fmt::format(
"Changed to using memory stat file {}", memStatFile_);
}

// This function is used for testing.
void setMemMaxFile(const std::string& memMaxFile) {
memMaxFile_ = memMaxFile;
}

// This function is used for testing.
void setMemInfoFile(const std::string& memInfoFile) {
memInfoFile_ = memInfoFile;
}

void start() override {
// Check system-memory-gb < system-mem-limit-gb < available machine memory
// of deployment.
auto* systemConfig = SystemConfig::instance();
int64_t systemMemoryInBytes = systemConfig->systemMemoryGb() << 30;
PRESTO_STARTUP_LOG(INFO)
<< fmt::format("System memory in bytes: {}", systemMemoryInBytes);

PRESTO_STARTUP_LOG(INFO) << fmt::format(
"System memory limit in bytes: {}", config_.systemMemLimitBytes);

auto availableMemoryOfDeployment = getAvailableMemoryOfDeployment();
PRESTO_STARTUP_LOG(INFO) << fmt::format(
"Available machine memory of deployment in bytes: {}",
availableMemoryOfDeployment);

VELOX_CHECK_LE(
config_.systemMemLimitBytes,
availableMemoryOfDeployment,
"system memory limit = {} bytes is higher than the available machine memory of deployment = {} bytes.",
config_.systemMemLimitBytes,
availableMemoryOfDeployment);

if (config_.systemMemLimitBytes < systemMemoryInBytes) {
LOG(WARNING) << "system-mem-limit-gb is smaller than system-memory-gb. "
<< "Expected: system-mem-limit-gb >= system-memory-gb.";
}

PeriodicMemoryChecker::start();
}

int128_t getAvailableMemoryOfDeployment() {
// Set the available machine memory of deployment to be the smaller number
// between /proc/meminfo and memMaxFile_.
int128_t availableMemoryOfDeployment = 0;
// meminfo's units is in kB.
folly::gen::byLine(memInfoFile_.c_str()) |
[&](const folly::StringPiece& line) -> void {
if (availableMemoryOfDeployment != 0) {
return;
}
availableMemoryOfDeployment = static_cast<int128_t>(
extractNumericConfigValueWithRegex(line, kMemTotalRegex) * 1024);
};

// For cgroup v1, memory.limit_in_bytes can default to a really big numeric
// value in bytes like 9223372036854771712 to represent that
// memory.limit_in_bytes is not set to a value. The default value here is
// set to PAGE_COUNTER_MAX, which is LONG_MAX/PAGE_SIZE on the 64-bit
// platform. The default value can vary based upon the platform's PAGE_SIZE.
// If memory.limit_in_bytes contains a really big numeric value, then we
// will use MemTotal from /proc/meminfo.

// For cgroup v2, memory.max can contain a numeric value in bytes or string
// "max" which represents no value has been set. If memory.max contains
// "max", then we will use MemTotal from /proc/meminfo.
if (!memMaxFile_.empty()) {
folly::gen::byLine(memMaxFile_.c_str()) |
[&](const folly::StringPiece& line) -> void {
if (line == "max") {
return;
}
availableMemoryOfDeployment =
std::min(availableMemoryOfDeployment, folly::to<int128_t>(line));
return;
};
}

// Unit is in bytes.
return availableMemoryOfDeployment;
}

protected:
Expand Down Expand Up @@ -80,8 +185,8 @@ class LinuxMemoryChecker : public PeriodicMemoryChecker {
size_t inactiveAnon = 0;
size_t activeAnon = 0;

if (statFile_ != "None") {
folly::gen::byLine(statFile_.c_str()) |
if (!memStatFile_.empty()) {
folly::gen::byLine(memStatFile_.c_str()) |
[&](const folly::StringPiece& line) -> void {
if (inactiveAnon == 0) {
inactiveAnon =
Expand All @@ -103,7 +208,7 @@ class LinuxMemoryChecker : public PeriodicMemoryChecker {
}

// Last resort use host machine info.
folly::gen::byLine("/proc/meminfo") |
folly::gen::byLine(memInfoFile_.c_str()) |
[&](const folly::StringPiece& line) -> void {
if (memAvailable == 0) {
memAvailable =
Expand Down Expand Up @@ -143,10 +248,16 @@ class LinuxMemoryChecker : public PeriodicMemoryChecker {
const boost::regex kInactiveAnonRegex{R"!(inactive_anon\s*(\d+)\s*)!"};
const boost::regex kActiveAnonRegex{R"!(active_anon\s*(\d+)\s*)!"};
const boost::regex kMemAvailableRegex{R"!(MemAvailable:\s*(\d+)\s*kB)!"};
const boost::regex kMemTotalRegex{R"!(MemTotal:\s*(\d+)\s*kB)!"};
const char* kCgroupV1Path = "/sys/fs/cgroup/memory/memory.stat";
const char* kCgroupV2Path = "/sys/fs/cgroup/memory.stat";
std::string statFile_;
const boost::regex kMemTotalRegex{R"!(MemTotal:\s*(\d+)\s+kB)!"};
const char* kCgroupV1Path = "/sys/fs/cgroup/memory";
const char* kCgroupV1MemStatFile = "/sys/fs/cgroup/memory/memory.stat";
const char* kCgroupV2MemStatFile = "/sys/fs/cgroup/memory.stat";
const char* kCgroupV1MaxMemFile =
"/sys/fs/cgroup/memory/memory.limit_in_bytes";
const char* kCgroupV2MaxMemFile = "/sys/fs/cgroup/memory.max";
std::string memInfoFile_ = "/proc/meminfo";
std::string memStatFile_;
std::string memMaxFile_;

size_t extractNumericConfigValueWithRegex(
const folly::StringPiece& line,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -71,10 +71,10 @@ class PeriodicMemoryChecker {

/// Starts the 'PeriodicMemoryChecker'. A background scheduler will be
/// launched to perform the checks. This should only be called once.
void start();
virtual void start();

/// Stops the 'PeriodicMemoryChecker'.
void stop();
virtual void stop();

protected:
/// Returns current system memory usage. The returned value is used to compare
Expand Down
4 changes: 2 additions & 2 deletions presto-native-execution/presto_cpp/main/common/Configs.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -168,9 +168,9 @@ SystemConfig::SystemConfig() {
STR_PROP(kSpillerDirectoryCreateConfig, ""),
NONE_PROP(kSpillerSpillPath),
NUM_PROP(kShutdownOnsetSec, 10),
NUM_PROP(kSystemMemoryGb, 40),
NUM_PROP(kSystemMemoryGb, 57),
BOOL_PROP(kSystemMemPushbackEnabled, false),
NUM_PROP(kSystemMemLimitGb, 55),
NUM_PROP(kSystemMemLimitGb, 60),
NUM_PROP(kSystemMemShrinkGb, 8),
BOOL_PROP(kMallocMemHeapDumpEnabled, false),
BOOL_PROP(kSystemMemPushbackAbortEnabled, false),
Expand Down
Loading

0 comments on commit 988ccf0

Please sign in to comment.