c: option to use mmap when given a file name #2

rmg · 2019-08-08T00:25:39Z

mmap() is supposed to be fast, right? Turns out we skip so aggressively
that the overhead of the extra book keeping and round-tripping with the
kernel that comes with mmap actually ends up costing us more than any
gains.

This is somewhat surprising given that mmap() ends up being a slight
performance boost for ripgrep when dealing with large input files and
that's the specific case I tested with. What I suspect is happening is
that we are skipping so aggressively that the OS hasn't been able to
read the next page for us by the time we finish with the current one.

sam-github · 2019-08-08T21:23:59Z

s/hexgrep (mmap-maybe *% u=) % sudo sh -c "sync; echo 1 > /proc/sys/vm/drop_caches"; sleep 5; sudo perf stat -B ./scan-c-fast-mmap raw.tar > /dev/null

 Performance counter stats for './scan-c-fast-mmap raw.tar':

          8,373.52 msec task-clock                #    0.208 CPUs utilized          
            87,306      context-switches          #    0.010 M/sec                  
             4,535      cpu-migrations            #    0.542 K/sec                  
           111,009      page-faults               #    0.013 M/sec                  
    14,792,287,029      cycles                    #    1.767 GHz                    
     9,287,497,100      stalled-cycles-frontend   #   62.79% frontend cycles idle   
    14,408,571,443      instructions              #    0.97  insn per cycle         
                                                  #    0.64  stalled cycles per insn
     2,728,432,805      branches                  #  325.841 M/sec                  
        52,656,715      branch-misses             #    1.93% of all branches        

      40.302108933 seconds time elapsed

       1.484495000 seconds user
       7.883474000 seconds sys


s/hexgrep (mmap-maybe *% u=) % sudo sh -c "sync; echo 1 > /proc/sys/vm/drop_caches"; sleep 5; sudo perf stat -B ./scan-c-fast raw.tar > /dev/null 

 Performance counter stats for './scan-c-fast raw.tar':

          5,721.63 msec task-clock                #    0.257 CPUs utilized          
            49,614      context-switches          #    0.009 M/sec                  
             1,989      cpu-migrations            #    0.348 K/sec                  
                70      page-faults               #    0.012 K/sec                  
     9,357,265,473      cycles                    #    1.635 GHz                    
     5,854,879,226      stalled-cycles-frontend   #   62.57% frontend cycles idle   
     8,033,607,958      instructions              #    0.86  insn per cycle         
                                                  #    0.73  stalled cycles per insn
     1,570,690,434      branches                  #  274.518 M/sec                  
        38,856,806      branch-misses             #    2.47% of all branches        

      22.280723062 seconds time elapsed

       1.310735000 seconds user
       4.996655000 seconds sys

sam-github · 2019-08-08T21:24:58Z

Not sure if that is interesting... I think its just showing that when using mmap, data comes from page faults, and it doesn't when using read(2), that version would show higher syscall:read counts instead.

More performance tuning that theoretically helps but doesn't seem to move the needle.

In theory this makes certain operations more efficient.

mmap() is supposed to be fast, right? Turns out we skip so aggressively that the overhead of the extra book keeping and round-tripping with the kernel that comes with mmap actually ends up costing us more than any gains. This is somewhat surprising given that mmap() ends up being a slight performance boost for ripgrep when dealing with large input files and that's the specific case I tested with. What I suspect is happening is that we are skipping so aggressively that the OS hasn't been able to read the next page for us by the time we finish with the current one.

rmg · 2019-08-09T16:18:20Z

I see you're clearing the cache for these. Do the results look much different when the cache is warm? I know the wall clock times are a similar ratio when the cache is warm.

sam-github · 2019-08-09T16:40:29Z

s/hexgrep (mmap-maybe *% u=) % /usr/bin/time -v ./scan-c-fast-mmap raw.tar > /dev/null; /usr/bin/time -v ./scan-c-fast-mmap raw.tar > /dev/null; /usr/bin/time -v ./scan-c-fast-mmap raw.tar > /dev/null;
        Command being timed: "./scan-c-fast-mmap raw.tar"
        User time (seconds): 1.42
        System time (seconds): 6.32
        Percent of CPU this job got: 19%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:39.48
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 3467176
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 24
        Minor (reclaiming a frame) page faults: 313124
        Voluntary context switches: 85877
        Involuntary context switches: 3121
        Swaps: 0
        File system inputs: 22100872
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0
        Command being timed: "./scan-c-fast-mmap raw.tar"
        User time (seconds): 1.42
        System time (seconds): 6.52
        Percent of CPU this job got: 19%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:40.23
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 3505092
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 54
        Minor (reclaiming a frame) page faults: 318291
        Voluntary context switches: 88453
        Involuntary context switches: 2935
        Swaps: 0
        File system inputs: 22538984
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0
        Command being timed: "./scan-c-fast-mmap raw.tar"
        User time (seconds): 1.48
        System time (seconds): 6.33
        Percent of CPU this job got: 20%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:38.41
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 3576548
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 49
        Minor (reclaiming a frame) page faults: 311271
        Voluntary context switches: 85585
        Involuntary context switches: 2582
        Swaps: 0
        File system inputs: 21941168
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

rmg added 3 commits August 8, 2019 17:31

c: optional code for fadvise/readahead

b63c30f

More performance tuning that theoretically helps but doesn't seem to move the needle.

c: optionally force buffer to be page aligned

a0d365d

In theory this makes certain operations more efficient.

rmg force-pushed the mmap-maybe branch from b62eb55 to 1eceed0 Compare August 27, 2019 03:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

c: option to use mmap when given a file name #2

c: option to use mmap when given a file name #2

rmg commented Aug 8, 2019

sam-github commented Aug 8, 2019

sam-github commented Aug 8, 2019

rmg commented Aug 9, 2019

sam-github commented Aug 9, 2019

c: option to use mmap when given a file name #2

Are you sure you want to change the base?

c: option to use mmap when given a file name #2

Conversation

rmg commented Aug 8, 2019

sam-github commented Aug 8, 2019

sam-github commented Aug 8, 2019

rmg commented Aug 9, 2019

sam-github commented Aug 9, 2019