Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mmap() and demand paging experiments #1196

Open
stevenmeker opened this issue Jan 23, 2025 · 0 comments
Open

mmap() and demand paging experiments #1196

stevenmeker opened this issue Jan 23, 2025 · 0 comments

Comments

@stevenmeker
Copy link
Collaborator

Introduction

mmap() and related calls can be used to control a processes virtual memory space. They are used by malloc() as an alternative to brk()/sbrk() to allocate large blocks of memory. For most applications, negotiating memory allocation directly with the operating system is to be avoided in favor of portable library calls. However for writing a memory manager there are advantages of talking directly to the operating system. In particular, contiguous address spaces, much larger than the amount of usable memory, to which memory is added on demand, simplify the writing of memory management code. These experiments refer to Linux only.

Baseline experiments

These demonstrate that even without the MAP_NORESERVE flag or even using malloc(), Linux uses demand paging.

mmap() without MAP_NORESERVE

#include <cstddef>
#include <cstdint>
#include <sys/resource.h>
#include <assert.h>

std::size_t const K = 1024;
std::size_t const M = K * K;
std::size_t const G = K * M;
std::size_t const T = K * G;
std::size_t const micro = 1'000'000;

void
dumpStats()
{
  rusage usage;
  int r = getrusage(RUSAGE_SELF, &usage);
  int64_t userTime = usage.ru_utime.tv_sec * micro + usage.ru_utime.tv_usec;
  int64_t sysTime = usage.ru_stime.tv_sec * micro + usage.ru_stime.tv_usec;
  std::cout << "user time: " << userTime << "  system time: " << sysTime << std::endl;
  std::cout << "minor faults: " << usage.ru_minflt << "  major faults: " << usage.ru_majflt << std::endl;
}

int
main(int, char**)
{
  //
  //	We start by reserving 4 GB of memory.
  //
  void* addr = mmap(NULL,  // let OS choose the address
		    4 * G,
		    PROT_READ | PROT_WRITE,  // read, write but not execute
		    MAP_ANONYMOUS | MAP_PRIVATE,
		    -1,  // no file backing
		    0);  // no offset
  std::cout << "got 4 GB block of memory at " << addr << std::endl;
  dumpStats();
  //
  //	Now read or write first byte of every 4 KB block in the first 4 GB.
  //
  char c = 0;
  for (std::size_t i = 0; i < M; ++i)
    {
      char* location = static_cast<char*>(addr) + i * 4 * K;
      c += *location;  // read
      //*location = 'a';  // write
    }
  assert(c == 0);
  dumpStats();
}

reading

got 4 GB block of memory at 0x7f30d6a00000
user time: 863  system time: 1727
minor faults: 147  major faults: 0
user time: 70997  system time: 437985
minor faults: 1048723  major faults: 0

writing

got 4 GB block of memory at 0x720ef7200000
user time: 1028  system time: 2057
minor faults: 146  major faults: 0
user time: 93972  system time: 1045698
minor faults: 1048722  major faults: 0

malloc()

#include <sys/mman.h>
#include <iostream>
#include <cstddef>
#include <cstdint>
#include <sys/resource.h>
#include <assert.h>
#include <stdlib.h>

std::size_t const K = 1024;
std::size_t const M = K * K;
std::size_t const G = K * M;
std::size_t const T = K * G;
std::size_t const micro = 1'000'000;

void
dumpStats()
{
  rusage usage;
  int r = getrusage(RUSAGE_SELF, &usage);
  int64_t userTime = usage.ru_utime.tv_sec * micro + usage.ru_utime.tv_usec;
  int64_t sysTime = usage.ru_stime.tv_sec * micro + usage.ru_stime.tv_usec;
  std::cout << "user time: " << userTime << "  system time: " << sysTime << std::endl;
  std::cout << "minor faults: " << usage.ru_minflt << "  major faults: " << usage.ru_majflt << std::endl;
}

int
main(int, char**)
{
  //
  //	We start by reserving 4 GB of memory.
  //
  void* addr = malloc(4 * G);
  std::cout << "got 4 GB block of memory at " << addr << std::endl;
  dumpStats();
  //
  //	Now read or write first byte of every 4 KB block in the first 4 GB.
  //
  char c = 0;
  for (std::size_t i = 0; i < M; ++i)
    {
      char* location = static_cast<char*>(addr) + i * 4 * K;
      c += *location;  // read
      //*location = 'a';  // write
    }
  assert(c == 0);
  dumpStats();
}

reading

got 4 GB block of memory at 0x79e9e7e00010
user time: 0  system time: 2458
minor faults: 147  major faults: 0
user time: 78939  system time: 374712
minor faults: 1048722  major faults: 0

writing

got 4 GB block of memory at 0x7d2a87e00010
user time: 792  system time: 1584
minor faults: 148  major faults: 0
user time: 101010  system time: 1050105
minor faults: 1048723  major faults: 0

Attempts to pre-fault mmap()ed address space

Here we allocate 1 TB of address space and will read or write to the first 4 GB. We try various ways to pre-fault the first 4 GB so page faults don't occur during the loop.

Baseline: Accessing memory without pre-faulting

Here we see that the reading or writing in the loop cause page faults to populate the address space with memory.

#include <sys/mman.h>
#include <iostream>
#include <cstddef>
#include <cstdint>
#include <sys/resource.h>

std::size_t const K = 1024;
std::size_t const M = K * K;
std::size_t const G = K * M;
std::size_t const T = K * G;
std::size_t const micro = 1'000'000;

void
dumpStats()
{
  rusage usage;
  int r = getrusage(RUSAGE_SELF, &usage);
  int64_t userTime = usage.ru_utime.tv_sec * micro + usage.ru_utime.tv_usec;
  int64_t sysTime = usage.ru_stime.tv_sec * micro + usage.ru_stime.tv_usec;
  std::cout << "user time: " << userTime << "  system time: " << sysTime << std::endl;
  std::cout << "minor faults: " << usage.ru_minflt << "  major faults: " << usage.ru_majflt << std::endl;
}

int
main(int, char**)
{
  //
  //	We start by reserving 1 TB of address space.
  //
  void* addr = mmap(NULL,  // let OS choose the address
		    T,
		    PROT_READ | PROT_WRITE,  // read, write but not execute
		    MAP_ANONYMOUS | MAP_PRIVATE | MAP_NORESERVE,
		    -1,  // no file backing
		    0);  // no offset
  std::cout << "got 1 TB block of address space at " << addr << std::endl;
  dumpStats();
  //
  //	Now read or write first byte of every 4 KB block in the first 4 GB.
  //
  char c = 0;
  for (std::size_t i = 0; i < M; ++i)
    {
      char* location = static_cast<char*>(addr) + i * 4 * K;
      c += *location;  // read
      //*location = 'a';  // write
    }
  assert(c == 0);
  dumpStats();
}

reading

got 1 TB block of address space at 0x78ad31e00000
user time: 0  system time: 2605
minor faults: 147  major faults: 0
user time: 66970  system time: 441803
minor faults: 1048723  major faults: 0

writing

got 1 TB block of address space at 0x7b3ec2a00000
user time: 888  system time: 1776
minor faults: 147  major faults: 0
user time: 85987  system time: 1050852
minor faults: 1048723  major faults: 0

Remapping with MAP_POPULATE

Here we insert a fragement of code that uses the MAP_FIXED flag to replace the first 4 GB of our mapping with a new mapping using the MAP_POPULATE before executing the loop. Now all the page faults are moved to this mmap() call and the user time is reduced.

  //
  //	Replace the first 4 GB with a map using the MAP_POPULATE flags.
  //
  addr = mmap(addr,
	      4 * G,
	      PROT_READ | PROT_WRITE,  // read, write but not execute
	      MAP_FIXED | MAP_ANONYMOUS | MAP_PRIVATE | MAP_POPULATE,
	      -1,  // no file backing
	      0);  // no offset
  dumpStats();

reading

got 1 TB block of address space at 0x785ae9a00000
user time: 887  system time: 1774
minor faults: 147  major faults: 0
user time: 1000  system time: 996421
minor faults: 1048723  major faults: 0
user time: 11562  system time: 996421
minor faults: 1048723  major faults: 0

writing

write
got 1 TB block of address space at 0x78d2d7c00000
user time: 0  system time: 2644
minor faults: 146  major faults: 0
user time: 0  system time: 1000570
minor faults: 1048722  major faults: 0
user time: 10995  system time: 1000591
minor faults: 1048722  major faults: 0

Remapping with MAP_LOCKED

We repeat the above experiment using MAP_LOCKED, inserting the following fragment of code.

  addr = mmap(addr,
	      4 * G,
	      PROT_READ | PROT_WRITE,  // read, write but not execute
	      MAP_FIXED | MAP_ANONYMOUS | MAP_PRIVATE | MAP_LOCKED,
	      -1,  // no file backing
	      0);  // no offset
  dumpStats();

reading

got 1 TB block of address space at 0x6f6ded000000
user time: 0  system time: 2553
minor faults: 146  major faults: 0
user time: 0  system time: 1002611
minor faults: 1048722  major faults: 0
user time: 10605  system time: 1002611
minor faults: 1048722  major faults: 0

writing

got 1 TB block of address space at 0x79c30ae00000
user time: 1042  system time: 2084
minor faults: 146  major faults: 0
user time: 1042  system time: 1005667
minor faults: 1048722  major faults: 0
user time: 11004  system time: 1006385
minor faults: 1048722  major faults: 0

Explicit locking with mlock()

We repeat the above experiment using the mlock() system call, inserting the following fragment of code.

  int err = mlock(addr, 4 * G);
  if (err)
    {
      perror("mlock()");
      abort();
    }
  dumpStats();

reading

got 1 TB block of address space at 0x703950400000
user time: 706  system time: 1412
minor faults: 147  major faults: 0
user time: 999  system time: 995277
minor faults: 1048723  major faults: 0
user time: 10996  system time: 995674
minor faults: 1048723  major faults: 0

writing

got 1 TB block of address space at 0x74783a000000
user time: 975  system time: 1951
minor faults: 146  major faults: 0
user time: 999  system time: 1001665
minor faults: 1048722  major faults: 0
user time: 11835  system time: 1001665
minor faults: 1048722  major faults: 0

Conclusion

All three methods of pre-faulting the first 4 GB of address space work. The number of minor page faults is not reduced, however this may be an accounting issue - each page populated during the system call is counted as a minor page fault. The time in the loop accessing the first 4 GB of address space is reduced significantly. One way of exploiting this for the garbage collected arenas is to set the tripwire at a whole megabyte boundary. Then when allocating into a fresh megabyte for the first time, to pre-fault it using one of these three methods; MAP_POPULATE is probably going to cause the fewest issues since it doesn't lock the memory against being swapped out in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant