You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
mmap() and related calls can be used to control a processes virtual memory space. They are used by malloc() as an alternative to brk()/sbrk() to allocate large blocks of memory. For most applications, negotiating memory allocation directly with the operating system is to be avoided in favor of portable library calls. However for writing a memory manager there are advantages of talking directly to the operating system. In particular, contiguous address spaces, much larger than the amount of usable memory, to which memory is added on demand, simplify the writing of memory management code. These experiments refer to Linux only.
Baseline experiments
These demonstrate that even without the MAP_NORESERVE flag or even using malloc(), Linux uses demand paging.
mmap() without MAP_NORESERVE
#include <cstddef>
#include <cstdint>
#include <sys/resource.h>
#include <assert.h>
std::size_t const K = 1024;
std::size_t const M = K * K;
std::size_t const G = K * M;
std::size_t const T = K * G;
std::size_t const micro = 1'000'000;
void
dumpStats()
{
rusage usage;
int r = getrusage(RUSAGE_SELF, &usage);
int64_t userTime = usage.ru_utime.tv_sec * micro + usage.ru_utime.tv_usec;
int64_t sysTime = usage.ru_stime.tv_sec * micro + usage.ru_stime.tv_usec;
std::cout << "user time: " << userTime << " system time: " << sysTime << std::endl;
std::cout << "minor faults: " << usage.ru_minflt << " major faults: " << usage.ru_majflt << std::endl;
}
int
main(int, char**)
{
//
// We start by reserving 4 GB of memory.
//
void* addr = mmap(NULL, // let OS choose the address
4 * G,
PROT_READ | PROT_WRITE, // read, write but not execute
MAP_ANONYMOUS | MAP_PRIVATE,
-1, // no file backing
0); // no offset
std::cout << "got 4 GB block of memory at " << addr << std::endl;
dumpStats();
//
// Now read or write first byte of every 4 KB block in the first 4 GB.
//
char c = 0;
for (std::size_t i = 0; i < M; ++i)
{
char* location = static_cast<char*>(addr) + i * 4 * K;
c += *location; // read
//*location = 'a'; // write
}
assert(c == 0);
dumpStats();
}
reading
got 4 GB block of memory at 0x7f30d6a00000
user time: 863 system time: 1727
minor faults: 147 major faults: 0
user time: 70997 system time: 437985
minor faults: 1048723 major faults: 0
writing
got 4 GB block of memory at 0x720ef7200000
user time: 1028 system time: 2057
minor faults: 146 major faults: 0
user time: 93972 system time: 1045698
minor faults: 1048722 major faults: 0
malloc()
#include <sys/mman.h>
#include <iostream>
#include <cstddef>
#include <cstdint>
#include <sys/resource.h>
#include <assert.h>
#include <stdlib.h>
std::size_t const K = 1024;
std::size_t const M = K * K;
std::size_t const G = K * M;
std::size_t const T = K * G;
std::size_t const micro = 1'000'000;
void
dumpStats()
{
rusage usage;
int r = getrusage(RUSAGE_SELF, &usage);
int64_t userTime = usage.ru_utime.tv_sec * micro + usage.ru_utime.tv_usec;
int64_t sysTime = usage.ru_stime.tv_sec * micro + usage.ru_stime.tv_usec;
std::cout << "user time: " << userTime << " system time: " << sysTime << std::endl;
std::cout << "minor faults: " << usage.ru_minflt << " major faults: " << usage.ru_majflt << std::endl;
}
int
main(int, char**)
{
//
// We start by reserving 4 GB of memory.
//
void* addr = malloc(4 * G);
std::cout << "got 4 GB block of memory at " << addr << std::endl;
dumpStats();
//
// Now read or write first byte of every 4 KB block in the first 4 GB.
//
char c = 0;
for (std::size_t i = 0; i < M; ++i)
{
char* location = static_cast<char*>(addr) + i * 4 * K;
c += *location; // read
//*location = 'a'; // write
}
assert(c == 0);
dumpStats();
}
reading
got 4 GB block of memory at 0x79e9e7e00010
user time: 0 system time: 2458
minor faults: 147 major faults: 0
user time: 78939 system time: 374712
minor faults: 1048722 major faults: 0
writing
got 4 GB block of memory at 0x7d2a87e00010
user time: 792 system time: 1584
minor faults: 148 major faults: 0
user time: 101010 system time: 1050105
minor faults: 1048723 major faults: 0
Attempts to pre-fault mmap()ed address space
Here we allocate 1 TB of address space and will read or write to the first 4 GB. We try various ways to pre-fault the first 4 GB so page faults don't occur during the loop.
Baseline: Accessing memory without pre-faulting
Here we see that the reading or writing in the loop cause page faults to populate the address space with memory.
#include <sys/mman.h>
#include <iostream>
#include <cstddef>
#include <cstdint>
#include <sys/resource.h>
std::size_t const K = 1024;
std::size_t const M = K * K;
std::size_t const G = K * M;
std::size_t const T = K * G;
std::size_t const micro = 1'000'000;
void
dumpStats()
{
rusage usage;
int r = getrusage(RUSAGE_SELF, &usage);
int64_t userTime = usage.ru_utime.tv_sec * micro + usage.ru_utime.tv_usec;
int64_t sysTime = usage.ru_stime.tv_sec * micro + usage.ru_stime.tv_usec;
std::cout << "user time: " << userTime << " system time: " << sysTime << std::endl;
std::cout << "minor faults: " << usage.ru_minflt << " major faults: " << usage.ru_majflt << std::endl;
}
int
main(int, char**)
{
//
// We start by reserving 1 TB of address space.
//
void* addr = mmap(NULL, // let OS choose the address
T,
PROT_READ | PROT_WRITE, // read, write but not execute
MAP_ANONYMOUS | MAP_PRIVATE | MAP_NORESERVE,
-1, // no file backing
0); // no offset
std::cout << "got 1 TB block of address space at " << addr << std::endl;
dumpStats();
//
// Now read or write first byte of every 4 KB block in the first 4 GB.
//
char c = 0;
for (std::size_t i = 0; i < M; ++i)
{
char* location = static_cast<char*>(addr) + i * 4 * K;
c += *location; // read
//*location = 'a'; // write
}
assert(c == 0);
dumpStats();
}
reading
got 1 TB block of address space at 0x78ad31e00000
user time: 0 system time: 2605
minor faults: 147 major faults: 0
user time: 66970 system time: 441803
minor faults: 1048723 major faults: 0
writing
got 1 TB block of address space at 0x7b3ec2a00000
user time: 888 system time: 1776
minor faults: 147 major faults: 0
user time: 85987 system time: 1050852
minor faults: 1048723 major faults: 0
Remapping with MAP_POPULATE
Here we insert a fragement of code that uses the MAP_FIXED flag to replace the first 4 GB of our mapping with a new mapping using the MAP_POPULATE before executing the loop. Now all the page faults are moved to this mmap() call and the user time is reduced.
//
// Replace the first 4 GB with a map using the MAP_POPULATE flags.
//
addr = mmap(addr,
4 * G,
PROT_READ | PROT_WRITE, // read, write but not execute
MAP_FIXED | MAP_ANONYMOUS | MAP_PRIVATE | MAP_POPULATE,
-1, // no file backing
0); // no offset
dumpStats();
reading
got 1 TB block of address space at 0x785ae9a00000
user time: 887 system time: 1774
minor faults: 147 major faults: 0
user time: 1000 system time: 996421
minor faults: 1048723 major faults: 0
user time: 11562 system time: 996421
minor faults: 1048723 major faults: 0
writing
write
got 1 TB block of address space at 0x78d2d7c00000
user time: 0 system time: 2644
minor faults: 146 major faults: 0
user time: 0 system time: 1000570
minor faults: 1048722 major faults: 0
user time: 10995 system time: 1000591
minor faults: 1048722 major faults: 0
Remapping with MAP_LOCKED
We repeat the above experiment using MAP_LOCKED, inserting the following fragment of code.
addr = mmap(addr,
4 * G,
PROT_READ | PROT_WRITE, // read, write but not execute
MAP_FIXED | MAP_ANONYMOUS | MAP_PRIVATE | MAP_LOCKED,
-1, // no file backing
0); // no offset
dumpStats();
reading
got 1 TB block of address space at 0x6f6ded000000
user time: 0 system time: 2553
minor faults: 146 major faults: 0
user time: 0 system time: 1002611
minor faults: 1048722 major faults: 0
user time: 10605 system time: 1002611
minor faults: 1048722 major faults: 0
writing
got 1 TB block of address space at 0x79c30ae00000
user time: 1042 system time: 2084
minor faults: 146 major faults: 0
user time: 1042 system time: 1005667
minor faults: 1048722 major faults: 0
user time: 11004 system time: 1006385
minor faults: 1048722 major faults: 0
Explicit locking with mlock()
We repeat the above experiment using the mlock() system call, inserting the following fragment of code.
int err = mlock(addr, 4 * G);
if (err)
{
perror("mlock()");
abort();
}
dumpStats();
reading
got 1 TB block of address space at 0x703950400000
user time: 706 system time: 1412
minor faults: 147 major faults: 0
user time: 999 system time: 995277
minor faults: 1048723 major faults: 0
user time: 10996 system time: 995674
minor faults: 1048723 major faults: 0
writing
got 1 TB block of address space at 0x74783a000000
user time: 975 system time: 1951
minor faults: 146 major faults: 0
user time: 999 system time: 1001665
minor faults: 1048722 major faults: 0
user time: 11835 system time: 1001665
minor faults: 1048722 major faults: 0
Conclusion
All three methods of pre-faulting the first 4 GB of address space work. The number of minor page faults is not reduced, however this may be an accounting issue - each page populated during the system call is counted as a minor page fault. The time in the loop accessing the first 4 GB of address space is reduced significantly. One way of exploiting this for the garbage collected arenas is to set the tripwire at a whole megabyte boundary. Then when allocating into a fresh megabyte for the first time, to pre-fault it using one of these three methods; MAP_POPULATE is probably going to cause the fewest issues since it doesn't lock the memory against being swapped out in the future.
The text was updated successfully, but these errors were encountered:
Introduction
mmap() and related calls can be used to control a processes virtual memory space. They are used by malloc() as an alternative to brk()/sbrk() to allocate large blocks of memory. For most applications, negotiating memory allocation directly with the operating system is to be avoided in favor of portable library calls. However for writing a memory manager there are advantages of talking directly to the operating system. In particular, contiguous address spaces, much larger than the amount of usable memory, to which memory is added on demand, simplify the writing of memory management code. These experiments refer to Linux only.
Baseline experiments
These demonstrate that even without the MAP_NORESERVE flag or even using malloc(), Linux uses demand paging.
mmap() without MAP_NORESERVE
reading
writing
malloc()
reading
writing
Attempts to pre-fault mmap()ed address space
Here we allocate 1 TB of address space and will read or write to the first 4 GB. We try various ways to pre-fault the first 4 GB so page faults don't occur during the loop.
Baseline: Accessing memory without pre-faulting
Here we see that the reading or writing in the loop cause page faults to populate the address space with memory.
reading
writing
Remapping with MAP_POPULATE
Here we insert a fragement of code that uses the MAP_FIXED flag to replace the first 4 GB of our mapping with a new mapping using the MAP_POPULATE before executing the loop. Now all the page faults are moved to this mmap() call and the user time is reduced.
reading
writing
Remapping with MAP_LOCKED
We repeat the above experiment using MAP_LOCKED, inserting the following fragment of code.
reading
writing
Explicit locking with mlock()
We repeat the above experiment using the mlock() system call, inserting the following fragment of code.
reading
writing
Conclusion
All three methods of pre-faulting the first 4 GB of address space work. The number of minor page faults is not reduced, however this may be an accounting issue - each page populated during the system call is counted as a minor page fault. The time in the loop accessing the first 4 GB of address space is reduced significantly. One way of exploiting this for the garbage collected arenas is to set the tripwire at a whole megabyte boundary. Then when allocating into a fresh megabyte for the first time, to pre-fault it using one of these three methods; MAP_POPULATE is probably going to cause the fewest issues since it doesn't lock the memory against being swapped out in the future.
The text was updated successfully, but these errors were encountered: