faroit · faroit · Oct 25, 2019 · Oct 25, 2019 · Oct 25, 2019 · Oct 28, 2019
diff --git a/AUDIO/.gitkeep b/AUDIO/.gitkeep
diff --git a/Dockerfile b/Dockerfile
@@ -34,8 +34,6 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
          libavutil-dev \
          libswresample-dev \
          libfftw3-dev \
-         libmad0 \
-         libmad0-dev \
          python-gst-1.0 \
          python3-gst-1.0 \
          libsndfile1 &&\
@@ -47,7 +45,4 @@ WORKDIR /app
 
 # install requirements, starting with pycairo because it fails in a different order
 RUN pip install pycairo
-RUN pip install --requirement /app/requirements.txt
-
-# install torchaudio from source
-RUN git clone https://github.com/pytorch/audio.git pytorchaudio && cd pytorchaudio && python setup.py install
+RUN pip install --requirement /app/requirements.txt
diff --git a/README.md b/README.md
@@ -4,28 +4,28 @@ The aim of his repository is to evaluate the loading performance of various audi
 
 This is relevant for machine learning models that today often process raw (time domain) audio and assembling a batch on the fly. It is therefore important to load the audio as fast as possible. At the same time a library should ideally support a variety of uncompressed and compressed audio formats and also is capable of loading only chunks of audio (seeking). The latter is especially important for models that cannot easily work with samples of variable length (convnets).
 
-## Tested Libraries 
+## Tested Libraries
 
 | Library                 | Version | Short-Name/Code       | Out Type          | Supported codecs  | Excerpts/Seeking |
 |-------------------------|---------|-----------------------|-------------------|-------------------| -----------------|
-| [scipy.io.wavfile](https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.io.wavfile.read.html#scipy.io.wavfile.read) | 0.14.0 | [`scipy`](https://github.com/faroit/python_audio_loading_benchmark/blob/master/loaders.py#L55)       | Numpy      | PCM (only 16 bit)   | ❌        |
-| [scipy.io.wavfile memmap](https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.io.wavfile.read.html#scipy.io.wavfile.read) | 0.14.0 | [`scipy_mmap`](https://github.com/faroit/python_audio_loading_benchmark/blob/master/loaders.py#L61)  | Numpy      | PCM (only 16 bit)   | ✅        |
-| [soundfile](https://pysoundfile.readthedocs.io/en/0.9.0/) ([libsndfile](http://www.mega-nerd.com/libsndfile/)) | 0.9.0 | [`soundfile`](https://github.com/faroit/python_audio_loading_benchmark/blob/master/loaders.py#L50)   | Numpy   | PCM, Ogg, Flac | ✅             |
+| [scipy.io.wavfile](https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.io.wavfile.read.html#scipy.io.wavfile.read) | 1.4.1 | [`scipy`](https://github.com/faroit/python_audio_loading_benchmark/blob/master/loaders.py#L55)       | Numpy      | PCM (only 16 bit)   | ❌        |
+| [scipy.io.wavfile memmap](https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.io.wavfile.read.html#scipy.io.wavfile.read) | 1.4.1 | [`scipy_mmap`](https://github.com/faroit/python_audio_loading_benchmark/blob/master/loaders.py#L61)  | Numpy      | PCM (only 16 bit)   | ✅        |
+| [soundfile](https://pysoundfile.readthedocs.io/en/0.9.0/) ([libsndfile](http://www.mega-nerd.com/libsndfile/)) | 0.9.0 | [`soundfile`](https://github.com/faroit/python_audio_loading_benchmark/blob/master/loaders.py#L50)   | Numpy | PCM, Ogg, Flac | ✅             |
 | [pydub](https://github.com/jiaaro/pydub) | 0.23.1 | [`pydub`](https://github.com/faroit/python_audio_loading_benchmark/blob/master/loaders.py#L97) | Python Array |  PCM, MP3, OGG or other FFMPEG/libav supported codec | ❌ |
 | [aubio](https://github.com/aubio/aubio) | 0.4.9 | [`aubio`](https://github.com/faroit/python_audio_loading_benchmark/blob/master/loaders.py#L32) | Numpy Array | PCM, MP3, OGG or other avconv supported code |  ✅ |
-| [audioread](https://github.com/beetbox/audioread) ([libmad](https://www.underbit.com/products/mad/))  | 2.1.6 | [`ar_mad`](https://github.com/faroit/python_audio_loading_benchmark/blob/master/loaders.py#L77) | Numpy Array | FFMPEG | ❌ |
-| [audioread](https://github.com/beetbox/audioread) ([gstreamer](https://gstreamer.freedesktop.org/)) |2.1.6 | [`ar_gstreamer`](https://github.com/faroit/python_audio_loading_benchmark/blob/master/loaders.py#L67) |2.1.6 | Numpy Array | all of FFMPEG | ❌ |
-| [audioread](https://github.com/beetbox/audioread) ([FFMPEG](https://www.ffmpeg.org/)) | 2.1.6 | [`ar_ffmpeg`](https://github.com/faroit/python_audio_loading_benchmark/blob/master/loaders.py#L87) | Numpy Array | all of FFMPEG | ❌ |
-| [librosa](https://librosa.github.io/)  | 0.6.2 | [`librosa`](https://github.com/faroit/python_audio_loading_benchmark/blob/master/loaders.py#L104) | Numpy Array | relies on audioread |  ✅ |
-| [tensorflow 1.13 `contrib.ffmpeg`](https://www.tensorflow.org/api_docs/python/tf/contrib/ffmpeg/decode_audio) | 1.13 | [`tf_decode`](https://github.com/faroit/python_audio_loading_benchmark/blob/master/loaders.py#L21) | Tensorflow Tensor | All codecs supported by FFMPEG |  ❌ |
-| [torchaudio](https://github.com/pytorch/audio) | 0.3.0 | [`torchaudio`](https://github.com/faroit/python_audio_loading_benchmark/blob/master/loaders.py#L45) | PyTorch Tensor | all codecs supported by Sox |  ✅ |
-
-### Not tested
+| [audioread](https://github.com/beetbox/audioread) ([gstreamer](https://gstreamer.freedesktop.org/)) | 2.1.8 | [`ar_gstreamer`](https://github.com/faroit/python_audio_loading_benchmark/blob/master/loaders.py#L67) | Numpy Array | all of FFMPEG | ❌ |
+| [audioread](https://github.com/beetbox/audioread) ([FFMPEG](https://www.ffmpeg.org/)) | 2.1.8 | [`ar_ffmpeg`](https://github.com/faroit/python_audio_loading_benchmark/blob/master/loaders.py#L87) | Numpy Array | all of FFMPEG | ❌ |
+| [librosa](https://librosa.github.io/)  | 0.7.2 | [`librosa`](https://github.com/faroit/python_audio_loading_benchmark/blob/master/loaders.py#L104) | Numpy Array | relies on audioread |  ✅ |
+| [tensorflow `tf.io.audio.decode_wav`](https://www.tensorflow.org/api_docs/python/tf/contrib/ffmpeg/decode_audio) | 2.1.0 | [`tf_decode_wav`](https://github.com/faroit/python_audio_loading_benchmark/blob/master/loaders.py#L22) | Tensorflow Tensor | PCM (only 16 bit) |  ❌ |
+| [tensorflow-io `from_audio`](https://www.tensorflow.org/io/api_docs/python/tfio/v0/IOTensor#from_audio) | 0.11.0 | [`tfio_fromaudio`](https://github.com/faroit/python_audio_loading_benchmark/blob/master/loaders.py#L22) | Tensorflow Tensor | PCM, Ogg, Flac |  ✅ |
+| [torchaudio](https://github.com/pytorch/audio) (sox) | 0.4.0 | [`torchaudio`](https://github.com/faroit/python_audio_loading_benchmark/blob/master/loaders.py#L45) | PyTorch Tensor | all codecs supported by Sox |  ✅ |
+| [torchaudio](https://github.com/pytorch/audio) (soundfile) | 0.4.0| [`torchaudio`](https://github.com/faroit/python_audio_loading_benchmark/blob/master/loaders.py#L45) | PyTorch Tensor | all codecs supported by Sox |  ✅ |
+### Not included
 
 * __[audioread (coreaudio)](https://github.com/beetbox/audioread/blob/master/audioread/macca.py)__: only available on macOS.
-* __[madmom](https://github.com/CPJKU/madmom):__ same ffmpeg interface as `ar_ffmpeg`.
-* __[tensorflow 2 `decode_wav`](https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/audio/decode_wav):__ Not released yet.
-* __[python builtin `wave`](https://docs.python.org/3.7/library/wave.html)__: TODO
+* __[madmom](https://github.com/CPJKU/madmom)__: same ffmpeg interface as `ar_ffmpeg`.
+* __[pymad](https://github.com/jaqx0r/pymad)__: only support for MP3, also very slow.
+* __[python builtin `wave`](https://docs.python.org/3.7/library/wave.html)__: TODO (open for PR)
 
 ## Results
 
@@ -73,7 +73,13 @@ Build the docker container using
 docker build -t audio_benchmark .
 ```
 It installs all the package requirements for all audio libraries.
-Afterwards, mount the data directory into the docker container.
+Afterwards, mount the data directory into the docker container and run `run.sh` inside the
+container, e.g.:
+
+```bash
+docker run -v /home/user/repos/python_audio_loading_benchmark/:/app \
+    -it audio_benchmark:latest /bin/bash run.sh
+```
 
 ### Setting up in a virtual environment
 
@@ -105,4 +111,4 @@ The data is generated by using a shell script. To generate the data in the folde
 
 ## Contribution
 
-We encourage interested users to contribute to this repository in the issue section and via pull requests. Particularly interesting are notifications of new tools and new versions of existing packages. Since benchmarks are subjective, I (@faroit) will reran the benchmark on our server again.
+We encourage interested users to contribute to this repository in the issue section and via pull requests. Particularly interesting are notifications of new tools and new versions of existing packages. Since benchmarks are subjective, I (@faroit) will reran the benchmark on our server again.
diff --git a/benchmark_metadata.py b/benchmark_metadata.py
@@ -61,7 +61,6 @@ def __len__(self):
         'soundfile',
         'sox',
         'audioread',
-        # 'pydub',  # too slow
     ]
 
     for lib in libs:

diff --git a/benchmark_np.py b/benchmark_np.py
@@ -1,13 +1,11 @@
-import matplotlib
-matplotlib.use('Agg')
 import os
 import os.path
 import random
-import time
+import timeit
 import argparse
 import utils
-import loaders
 import numpy as np
+import functools
 
 
 def get_files(dir, extension):
@@ -26,24 +24,31 @@ class AudioFolder(object):
     def __init__(
         self,
         root,
-        download=True,
-        extension='wav',
-        lib="librosa",
+        extension='wav'
     ):
         self.root = os.path.expanduser(root)
         self.data = []
         self.audio_files = get_files(dir=self.root, extension=extension)
-        self.loader_function = getattr(loaders, lib)
 
     def __getitem__(self, index):
-        return self.loader_function(self.audio_files[index])
+        return self.audio_files[index]
 
     def __len__(self):
         return len(self.audio_files)
 
 
+def test_np_loading(fp, lib):
+    import loaders
+    load_function = getattr(loaders, 'load_' + lib)
+    audio = load_function(fp)
+    if np.max(audio) > 0:
+        return True
+    else:
+        return False
+
+
 if __name__ == "__main__":
-    
+
     parser = argparse.ArgumentParser(description='Process some integers.')
     parser.add_argument('--ext', type=str, default="wav")
     args = parser.parse_args()
@@ -62,11 +67,10 @@ def __len__(self):
     libs = [
         'ar_gstreamer',
         'ar_ffmpeg',
-        'ar_mad',
         'aubio',
         'pydub',
-        'soundfile', 
-        'librosa', 
+        'soundfile',
+        'librosa',
         'scipy',
         'scipy_mmap'
     ]
@@ -75,29 +79,24 @@ def __len__(self):
         print("Testing: %s" % lib)
         for root, dirs, fnames in sorted(os.walk('AUDIO')):
             for audio_dir in dirs:
-                try:
-                    duration = int(audio_dir)
-                    dataset = AudioFolder(
-                            os.path.join(root, audio_dir), 
-                            lib='load_' + lib,
-                            extension=args.ext
-                    )
-
-
-                    start = time.time()
-
-                    for fp in dataset.audio_files:
-                        audio = dataset.loader_function(fp)
-                        np.max(audio)
-
-                    end = time.time()
-                    store.append(
-                        ext=args.ext,
-                        lib=lib,
-                        duration=duration,
-                        time=float(end-start) / len(dataset),
+                duration = int(audio_dir)
+                dataset = AudioFolder(
+                    os.path.join(root, audio_dir),
+                    extension=args.ext
+                )
+
+                # for fp in dataset.audio_files:
+                for fp in dataset.audio_files:
+                    time = timeit.timeit(
+                        functools.partial(test_np_loading, fp, lib),
+                        number=10
                     )
-                except:
-                    continue
 
-    store.df.to_pickle("results/benchmark_%s_%s.pickle" % ("np", args.ext))
+                store.append(
+                    ext=args.ext,
+                    lib=lib,
+                    duration=duration,
+                    time=time,
+                )
+
+    store.df.to_pickle("results/benchmark_%s_%s.pickle" % ("np", args.ext))
diff --git a/benchmark_pytorch.py b/benchmark_pytorch.py
@@ -28,7 +28,6 @@ class AudioFolder(torch.utils.data.Dataset):
     def __init__(
         self,
         root,
-        download=True,
         extension='wav',
         lib="librosa",
     ):
@@ -39,14 +38,14 @@ def __init__(
 
     def __getitem__(self, index):
         audio = self.loader_function(self.audio_files[index])
-        return torch.FloatTensor(audio).view(1, 1, -1)
+        return torch.from_numpy(audio).view(1, 1, -1)
 
     def __len__(self):
         return len(self.audio_files)
 
 
 if __name__ == "__main__":
-    
+
     parser = argparse.ArgumentParser(description='Process some integers.')
     parser.add_argument('--ext', type=str, default="wav")
     args = parser.parse_args()
@@ -65,28 +64,36 @@ def __len__(self):
     libs = [
         'ar_gstreamer',
         'ar_ffmpeg',
-        'ar_mad',
         'aubio',
         'pydub',
-        'soundfile', 
-        'librosa', 
+        'soundfile',
+        'librosa',
         'scipy',
         'scipy_mmap',
     ]
 
     if args.ext != "mp4":
-        libs.append('torchaudio')
+        libs.append('torchaudio_sox')
+        libs.append('torchaudio_soundfile')
 
     for lib in libs:
         print("Testing: %s" % lib)
+        if "torchaudio" in lib:
+            backend = lib.split("torchaudio_")[-1]
+            import torchaudio
+            torchaudio.set_audio_backend(backend)
+            call_fun = "load_torchaudio"
+        else:
+            call_fun = 'load_' + lib
+
         for root, dirs, fnames in sorted(os.walk('AUDIO')):
             for audio_dir in dirs:
                 try:
                     duration = int(audio_dir)
                     data = torch.utils.data.DataLoader(
                         AudioFolder(
-                            os.path.join(root, audio_dir), 
-                            lib='load_' + lib,
+                            os.path.join(root, audio_dir),
+                            lib=call_fun,
                             extension=args.ext
                         ),
                         batch_size=1,
@@ -106,7 +113,7 @@ def __len__(self):
                         time=float(end-start) / len(data),
                     )
                 except:
+                    "Error but continue"
                     continue
 
-
-    store.df.to_pickle("results/benchmark_%s_%s.pickle" % ("pytorch", args.ext))
+    store.df.to_pickle("results/benchmark_%s_%s.pickle" % ("pytorch", args.ext))
diff --git a/benchmark_tf.py b/benchmark_tf.py
@@ -26,13 +26,15 @@ def get_files(dir, extension):
 
 def _make_py_loader_function(func):
     def _py_loader_function(fp):
-        return func(fp.decode())
+        return func(fp.numpy().decode())
     return _py_loader_function
 
 
 if __name__ == "__main__":
 
-    parser = argparse.ArgumentParser(description='Benchmark audio loading in tensorflow')
+    parser = argparse.ArgumentParser(
+        description='Benchmark audio loading in tensorflow'
+    )
     parser.add_argument('--ext', type=str, default="wav")
     args = parser.parse_args()
 
@@ -48,14 +50,14 @@ def _py_loader_function(fp):
     libs = [
         'ar_gstreamer',
         'ar_ffmpeg',
-        'ar_mad',
         'aubio',
         'pydub',
         'soundfile',
         'librosa',
         'scipy',
         'scipy_mmap',
-        'tf_decode'
+        'tf_decode_wav',
+        'tfio_fromaudio',
     ]
 
     for lib in libs:
@@ -64,34 +66,47 @@ def _py_loader_function(fp):
             for audio_dir in dirs:
                 try:
                     duration = int(audio_dir)
-                    audio_files = get_files(dir=os.path.join(root, audio_dir), extension=args.ext)
+                    audio_files = get_files(
+                        dir=os.path.join(root, audio_dir),
+                        extension=args.ext
+                    )
 
                     dataset = tf.data.Dataset.from_tensor_slices(audio_files)
-                    if lib == "tf_decode":
-                        dataset = dataset.map(lambda x: loaders.load_tf_decode(x, args.ext))
+                    if lib in ["tf_decode_wav"]:
+                        dataset = dataset.map(
+                            lambda x: loaders.load_tf_decode_wav(x),
+                            num_parallel_calls=1
+                        )
+                    elif lib in ["tfio_fromaudio"]:
+                        dataset = dataset.map(
+                            lambda x: loaders.load_tfio_fromaudio(x),
+                            num_parallel_calls=1
+                        )
+                    elif lib in ["tfio_fromffmpeg"]:
+                        dataset = dataset.map(
+                            lambda x: loaders.load_tfio_fromffmpeg(x),
+                            num_parallel_calls=1
+                        )
                     else:
                         loader_function = getattr(loaders, 'load_' + lib)
                         dataset = dataset.map(
-                            lambda filename: tf.py_func(
-                                _make_py_loader_function(loader_function), 
-                                [filename], 
+                            lambda filename: tf.py_function(
+                                _make_py_loader_function(loader_function),
+                                [filename],
                                 [tf.float32]
-                            )
+                            ),
+                            num_parallel_calls=4
                         )
 
-                    dataset = dataset.batch(1)
+                    # dataset = dataset.apply(tf.data.experimental.ignore_errors())
+                    # dataset = dataset.batch(4)
                     start = time.time()
-                    iterator = dataset.make_one_shot_iterator()
-                    next_audio = iterator.get_next()
-                    with tf.Session() as sess:
-                        for i in range(len(audio_files)):
-                            try:
-                                value = sess.run(tf.reduce_max(next_audio))
-                            except tf.errors.OutOfRangeError:
-                                break
+
+                    for audio in dataset:
+                        value = tf.reduce_max(audio)
 
                     end = time.time()
-                    
+
                     store.append(
                         ext=args.ext,
                         lib=lib,
@@ -102,4 +117,3 @@ def _py_loader_function(fp):
                     continue
 
     store.df.to_pickle("results/benchmark_%s_%s.pickle" % ("tf", args.ext))
-