Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

c: option to use mmap when given a file name #2

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Conversation

rmg
Copy link
Owner

@rmg rmg commented Aug 8, 2019

mmap() is supposed to be fast, right? Turns out we skip so aggressively
that the overhead of the extra book keeping and round-tripping with the
kernel that comes with mmap actually ends up costing us more than any
gains.

This is somewhat surprising given that mmap() ends up being a slight
performance boost for ripgrep when dealing with large input files and
that's the specific case I tested with. What I suspect is happening is
that we are skipping so aggressively that the OS hasn't been able to
read the next page for us by the time we finish with the current one.

@sam-github
Copy link

s/hexgrep (mmap-maybe *% u=) % sudo sh -c "sync; echo 1 > /proc/sys/vm/drop_caches"; sleep 5; sudo perf stat -B ./scan-c-fast-mmap raw.tar > /dev/null

 Performance counter stats for './scan-c-fast-mmap raw.tar':

          8,373.52 msec task-clock                #    0.208 CPUs utilized          
            87,306      context-switches          #    0.010 M/sec                  
             4,535      cpu-migrations            #    0.542 K/sec                  
           111,009      page-faults               #    0.013 M/sec                  
    14,792,287,029      cycles                    #    1.767 GHz                    
     9,287,497,100      stalled-cycles-frontend   #   62.79% frontend cycles idle   
    14,408,571,443      instructions              #    0.97  insn per cycle         
                                                  #    0.64  stalled cycles per insn
     2,728,432,805      branches                  #  325.841 M/sec                  
        52,656,715      branch-misses             #    1.93% of all branches        

      40.302108933 seconds time elapsed

       1.484495000 seconds user
       7.883474000 seconds sys


s/hexgrep (mmap-maybe *% u=) % sudo sh -c "sync; echo 1 > /proc/sys/vm/drop_caches"; sleep 5; sudo perf stat -B ./scan-c-fast raw.tar > /dev/null 

 Performance counter stats for './scan-c-fast raw.tar':

          5,721.63 msec task-clock                #    0.257 CPUs utilized          
            49,614      context-switches          #    0.009 M/sec                  
             1,989      cpu-migrations            #    0.348 K/sec                  
                70      page-faults               #    0.012 K/sec                  
     9,357,265,473      cycles                    #    1.635 GHz                    
     5,854,879,226      stalled-cycles-frontend   #   62.57% frontend cycles idle   
     8,033,607,958      instructions              #    0.86  insn per cycle         
                                                  #    0.73  stalled cycles per insn
     1,570,690,434      branches                  #  274.518 M/sec                  
        38,856,806      branch-misses             #    2.47% of all branches        

      22.280723062 seconds time elapsed

       1.310735000 seconds user
       4.996655000 seconds sys

@sam-github
Copy link

Not sure if that is interesting... I think its just showing that when using mmap, data comes from page faults, and it doesn't when using read(2), that version would show higher syscall:read counts instead.

rmg added 3 commits August 8, 2019 17:31
More performance tuning that theoretically helps but doesn't seem to
move the needle.
In theory this makes certain operations more efficient.
mmap() is supposed to be fast, right? Turns out we skip so aggressively
that the overhead of the extra book keeping and round-tripping with the
kernel that comes with mmap actually ends up costing us more than any
gains.

This is somewhat surprising given that mmap() ends up being a slight
performance boost for ripgrep when dealing with large input files and
that's the specific case I tested with. What I suspect is happening is
that we are skipping so aggressively that the OS hasn't been able to
read the next page for us by the time we finish with the current one.
@rmg
Copy link
Owner Author

rmg commented Aug 9, 2019

I see you're clearing the cache for these. Do the results look much different when the cache is warm? I know the wall clock times are a similar ratio when the cache is warm.

@sam-github
Copy link

s/hexgrep (mmap-maybe *% u=) % /usr/bin/time -v ./scan-c-fast-mmap raw.tar > /dev/null; /usr/bin/time -v ./scan-c-fast-mmap raw.tar > /dev/null; /usr/bin/time -v ./scan-c-fast-mmap raw.tar > /dev/null;
        Command being timed: "./scan-c-fast-mmap raw.tar"
        User time (seconds): 1.42
        System time (seconds): 6.32
        Percent of CPU this job got: 19%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:39.48
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 3467176
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 24
        Minor (reclaiming a frame) page faults: 313124
        Voluntary context switches: 85877
        Involuntary context switches: 3121
        Swaps: 0
        File system inputs: 22100872
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0
        Command being timed: "./scan-c-fast-mmap raw.tar"
        User time (seconds): 1.42
        System time (seconds): 6.52
        Percent of CPU this job got: 19%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:40.23
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 3505092
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 54
        Minor (reclaiming a frame) page faults: 318291
        Voluntary context switches: 88453
        Involuntary context switches: 2935
        Swaps: 0
        File system inputs: 22538984
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0
        Command being timed: "./scan-c-fast-mmap raw.tar"
        User time (seconds): 1.48
        System time (seconds): 6.33
        Percent of CPU this job got: 20%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:38.41
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 3576548
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 49
        Minor (reclaiming a frame) page faults: 311271
        Voluntary context switches: 85585
        Involuntary context switches: 2582
        Swaps: 0
        File system inputs: 21941168
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants