Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid explicit calls to select.select() #98

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jukuisma
Copy link

Use higher level selectors module instead:

https://docs.python.org/3/library/selectors.html.

Selectors uses the most efficient implementation available on the current platform. On Linux, it defaults to using:

$ python3 -c "import selectors; print(selectors.DefaultSelector())"
<selectors.EpollSelector object at 0x7a6a66f02120>

Fixes: #97

Use higher level `selectors` module instead:

https://docs.python.org/3/library/selectors.html.

Selectors uses the most efficient implementation available on the
current platform. On Linux, it defaults to using:

```
$ python3 -c "import selectors; print(selectors.DefaultSelector())"
<selectors.EpollSelector object at 0x7a6a66f02120>
```
@jukuisma
Copy link
Author

jukuisma commented Sep 24, 2024

TODO: Benchmarking. I don't foresee this being slower than select.select(), but rather safe than sorry. Done here: #98 (comment)

@jukuisma
Copy link
Author

Benchmarking

vagrant@almalinux:~/test$ cat benchmark.py 
import exiftool

images = [ f"images/{i}" for i in range(512)]
with exiftool.ExifToolHelper() as et:
    print(et.get_metadata(images))

for i in range(512):
    print(exiftool.ExifToolHelper().get_metadata(f"images/{i}"))

Old:

vagrant@almalinux:~/test$ rpm -q python3-pyexiftool
python3-pyexiftool-0.5.6-1.el9.noarch
vagrant@almalinux:~/test$ python3 benchmark.py | md5sum
8cb4cc1e001e1abcfe6a4b4bae714288  -
vagrant@almalinux:~/test$ time python3 benchmark.py > /dev/null

real    0m48.851s
user    0m42.612s
sys     0m6.483s

New:

vagrant@almalinux:~/test$ rpm -q python3-pyexiftool
python3-pyexiftool-0.5.6-2.el9.noarch
vagrant@almalinux:~/test$ python3 benchmark.py | md5sum
8cb4cc1e001e1abcfe6a4b4bae714288  -
vagrant@almalinux:~/test$ time python3 benchmark.py > /dev/null

real    0m47.134s
user    0m41.186s
sys     0m6.156s

And just:

vagrant@almalinux:~/test$ cat benchmark.py 
import exiftool

images = [ f"images/{i}" for i in range(512)]
with exiftool.ExifToolHelper() as et:
    print(et.get_metadata(images))

# for i in range(512):
#     print(exiftool.ExifToolHelper().get_metadata(f"images/{i}"))

To get some rounds in, old:

vagrant@almalinux:~/test$ perf stat -r 10 python3 benchmark.py > /dev/null                                                                                                                     
                                                                                                                                                                                               
 Performance counter stats for 'python3 benchmark.py' (10 runs):                                                                                                                               
                                                                                                                                                                                               
           3450.78 msec task-clock:u                     #    1.009 CPUs utilized               ( +-  0.26% )                                                                                  
                 0      context-switches:u               #    0.000 /sec                                                                                                                       
                 0      cpu-migrations:u                 #    0.000 /sec                                                                                                                       
             11170      page-faults:u                    #    3.237 K/sec                       ( +-  1.23% )                                                                                  
       14374934023      cycles:u                         #    4.166 GHz                         ( +-  0.26% )                                                                                  
          17189602      stalled-cycles-frontend:u        #    0.12% frontend cycles idle        ( +-  2.14% )                                                                                  
          71707628      stalled-cycles-backend:u         #    0.50% backend cycles idle         ( +- 11.28% )                                                                                  
       21913527379      instructions:u                   #    1.52  insn per cycle                                                                                                             
                                                  #    0.00  stalled cycles per insn     ( +-  0.04% )                                                                                         
        4602999890      branches:u                       #    1.334 G/sec                       ( +-  0.04% )                                                                                  
                 0      branch-misses:u                                                                                                                                                        
                                                                                                                                                                                               
           3.41934 +- 0.00858 seconds time elapsed  ( +-  0.25% )                                                                                                                              

New:

vagrant@almalinux:~/test$ perf stat -r 10 python3 benchmark.py > /dev/null 

 Performance counter stats for 'python3 benchmark.py' (10 runs):

           3450.74 msec task-clock:u                     #    1.010 CPUs utilized               ( +-  0.13% )
                 0      context-switches:u               #    0.000 /sec                      
                 0      cpu-migrations:u                 #    0.000 /sec                      
             11305      page-faults:u                    #    3.276 K/sec                       ( +-  0.81% )
       14332278239      cycles:u                         #    4.153 GHz                         ( +-  0.11% )
          16894272      stalled-cycles-frontend:u        #    0.12% frontend cycles idle        ( +-  1.35% )
          72376828      stalled-cycles-backend:u         #    0.50% backend cycles idle         ( +- 15.40% )
       21919017057      instructions:u                   #    1.53  insn per cycle            
                                                  #    0.00  stalled cycles per insn     ( +-  0.03% )
        4604247475      branches:u                       #    1.334 G/sec                       ( +-  0.04% )
                 0      branch-misses:u                                                       

           3.41807 +- 0.00510 seconds time elapsed  ( +-  0.15% )

I'm seeing next to no difference between the two solutions on my alma9 VM. Testing larger images could be useful, but all our test images are small not to bloat the git repo. These 512 images are all: https://github.com/Digital-Preservation-Finland/file-scraper/blob/master/tests/data/image_jpeg/valid_2.2.1_exif_metadata.jpg.

@jukuisma jukuisma changed the title Draft: Avoid explicit calls to select.select() Avoid explicit calls to select.select() Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ValueError: filedescriptor out of range in select()
1 participant