Avoid explicit calls to select.select() #98

jukuisma · 2024-09-24T13:29:15Z

Use higher level selectors module instead:

https://docs.python.org/3/library/selectors.html.

Selectors uses the most efficient implementation available on the current platform. On Linux, it defaults to using:

$ python3 -c "import selectors; print(selectors.DefaultSelector())"
<selectors.EpollSelector object at 0x7a6a66f02120>

Fixes: #97

Use higher level `selectors` module instead: https://docs.python.org/3/library/selectors.html. Selectors uses the most efficient implementation available on the current platform. On Linux, it defaults to using: ``` $ python3 -c "import selectors; print(selectors.DefaultSelector())" <selectors.EpollSelector object at 0x7a6a66f02120> ```

jukuisma · 2024-09-24T13:44:09Z

~~TODO: Benchmarking. I don't foresee this being slower than select.select(), but rather safe than sorry.~~ Done here: #98 (comment)

jukuisma · 2024-09-25T13:04:33Z

Benchmarking

vagrant@almalinux:~/test$ cat benchmark.py 
import exiftool

images = [ f"images/{i}" for i in range(512)]
with exiftool.ExifToolHelper() as et:
    print(et.get_metadata(images))

for i in range(512):
    print(exiftool.ExifToolHelper().get_metadata(f"images/{i}"))

Old:

vagrant@almalinux:~/test$ rpm -q python3-pyexiftool
python3-pyexiftool-0.5.6-1.el9.noarch
vagrant@almalinux:~/test$ python3 benchmark.py | md5sum
8cb4cc1e001e1abcfe6a4b4bae714288  -
vagrant@almalinux:~/test$ time python3 benchmark.py > /dev/null

real    0m48.851s
user    0m42.612s
sys     0m6.483s

New:

vagrant@almalinux:~/test$ rpm -q python3-pyexiftool
python3-pyexiftool-0.5.6-2.el9.noarch
vagrant@almalinux:~/test$ python3 benchmark.py | md5sum
8cb4cc1e001e1abcfe6a4b4bae714288  -
vagrant@almalinux:~/test$ time python3 benchmark.py > /dev/null

real    0m47.134s
user    0m41.186s
sys     0m6.156s

And just:

vagrant@almalinux:~/test$ cat benchmark.py 
import exiftool

images = [ f"images/{i}" for i in range(512)]
with exiftool.ExifToolHelper() as et:
    print(et.get_metadata(images))

# for i in range(512):
#     print(exiftool.ExifToolHelper().get_metadata(f"images/{i}"))

To get some rounds in, old:

vagrant@almalinux:~/test$ perf stat -r 10 python3 benchmark.py > /dev/null                                                                                                                     
                                                                                                                                                                                               
 Performance counter stats for 'python3 benchmark.py' (10 runs):                                                                                                                               
                                                                                                                                                                                               
           3450.78 msec task-clock:u                     #    1.009 CPUs utilized               ( +-  0.26% )                                                                                  
                 0      context-switches:u               #    0.000 /sec                                                                                                                       
                 0      cpu-migrations:u                 #    0.000 /sec                                                                                                                       
             11170      page-faults:u                    #    3.237 K/sec                       ( +-  1.23% )                                                                                  
       14374934023      cycles:u                         #    4.166 GHz                         ( +-  0.26% )                                                                                  
          17189602      stalled-cycles-frontend:u        #    0.12% frontend cycles idle        ( +-  2.14% )                                                                                  
          71707628      stalled-cycles-backend:u         #    0.50% backend cycles idle         ( +- 11.28% )                                                                                  
       21913527379      instructions:u                   #    1.52  insn per cycle                                                                                                             
                                                  #    0.00  stalled cycles per insn     ( +-  0.04% )                                                                                         
        4602999890      branches:u                       #    1.334 G/sec                       ( +-  0.04% )                                                                                  
                 0      branch-misses:u                                                                                                                                                        
                                                                                                                                                                                               
           3.41934 +- 0.00858 seconds time elapsed  ( +-  0.25% )

New:

vagrant@almalinux:~/test$ perf stat -r 10 python3 benchmark.py > /dev/null 

 Performance counter stats for 'python3 benchmark.py' (10 runs):

           3450.74 msec task-clock:u                     #    1.010 CPUs utilized               ( +-  0.13% )
                 0      context-switches:u               #    0.000 /sec                      
                 0      cpu-migrations:u                 #    0.000 /sec                      
             11305      page-faults:u                    #    3.276 K/sec                       ( +-  0.81% )
       14332278239      cycles:u                         #    4.153 GHz                         ( +-  0.11% )
          16894272      stalled-cycles-frontend:u        #    0.12% frontend cycles idle        ( +-  1.35% )
          72376828      stalled-cycles-backend:u         #    0.50% backend cycles idle         ( +- 15.40% )
       21919017057      instructions:u                   #    1.53  insn per cycle            
                                                  #    0.00  stalled cycles per insn     ( +-  0.03% )
        4604247475      branches:u                       #    1.334 G/sec                       ( +-  0.04% )
                 0      branch-misses:u                                                       

           3.41807 +- 0.00510 seconds time elapsed  ( +-  0.15% )

I'm seeing next to no difference between the two solutions on my alma9 VM. Testing larger images could be useful, but all our test images are small not to bloat the git repo. These 512 images are all: https://github.com/Digital-Preservation-Finland/file-scraper/blob/master/tests/data/image_jpeg/valid_2.2.1_exif_metadata.jpg.

jukuisma force-pushed the select-bugfix branch from 35a2e60 to c340c3f Compare September 24, 2024 13:37

jukuisma changed the title ~~Draft: Avoid explicit calls to select.select()~~ Avoid explicit calls to select.select() Sep 25, 2024

jukuisma mentioned this pull request Sep 25, 2024

ValueError: filedescriptor out of range in select() #97

Open

jukuisma mentioned this pull request Dec 23, 2024

Too many file errors cause library to hang #96

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid explicit calls to select.select() #98

Avoid explicit calls to select.select() #98

jukuisma commented Sep 24, 2024

jukuisma commented Sep 24, 2024 •

edited

Loading

jukuisma commented Sep 25, 2024

Avoid explicit calls to select.select() #98

Are you sure you want to change the base?

Avoid explicit calls to select.select() #98

Conversation

jukuisma commented Sep 24, 2024

jukuisma commented Sep 24, 2024 • edited Loading

jukuisma commented Sep 25, 2024

jukuisma commented Sep 24, 2024 •

edited

Loading