Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First draft of timeit loading #13

Draft
wants to merge 25 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Empty file removed AUDIO/.gitkeep
Empty file.
7 changes: 1 addition & 6 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,6 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
libavutil-dev \
libswresample-dev \
libfftw3-dev \
libmad0 \
libmad0-dev \
python-gst-1.0 \
python3-gst-1.0 \
libsndfile1 &&\
Expand All @@ -47,7 +45,4 @@ WORKDIR /app

# install requirements, starting with pycairo because it fails in a different order
RUN pip install pycairo
RUN pip install --requirement /app/requirements.txt

# install torchaudio from source
RUN git clone https://github.com/pytorch/audio.git pytorchaudio && cd pytorchaudio && python setup.py install
RUN pip install --requirement /app/requirements.txt
40 changes: 23 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,28 +4,28 @@ The aim of his repository is to evaluate the loading performance of various audi

This is relevant for machine learning models that today often process raw (time domain) audio and assembling a batch on the fly. It is therefore important to load the audio as fast as possible. At the same time a library should ideally support a variety of uncompressed and compressed audio formats and also is capable of loading only chunks of audio (seeking). The latter is especially important for models that cannot easily work with samples of variable length (convnets).

## Tested Libraries
## Tested Libraries

| Library | Version | Short-Name/Code | Out Type | Supported codecs | Excerpts/Seeking |
|-------------------------|---------|-----------------------|-------------------|-------------------| -----------------|
| [scipy.io.wavfile](https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.io.wavfile.read.html#scipy.io.wavfile.read) | 0.14.0 | [`scipy`](https://github.com/faroit/python_audio_loading_benchmark/blob/master/loaders.py#L55) | Numpy | PCM (only 16 bit) | ❌ |
| [scipy.io.wavfile memmap](https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.io.wavfile.read.html#scipy.io.wavfile.read) | 0.14.0 | [`scipy_mmap`](https://github.com/faroit/python_audio_loading_benchmark/blob/master/loaders.py#L61) | Numpy | PCM (only 16 bit) | ✅ |
| [soundfile](https://pysoundfile.readthedocs.io/en/0.9.0/) ([libsndfile](http://www.mega-nerd.com/libsndfile/)) | 0.9.0 | [`soundfile`](https://github.com/faroit/python_audio_loading_benchmark/blob/master/loaders.py#L50) | Numpy | PCM, Ogg, Flac | ✅ |
| [scipy.io.wavfile](https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.io.wavfile.read.html#scipy.io.wavfile.read) | 1.4.1 | [`scipy`](https://github.com/faroit/python_audio_loading_benchmark/blob/master/loaders.py#L55) | Numpy | PCM (only 16 bit) | ❌ |
| [scipy.io.wavfile memmap](https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.io.wavfile.read.html#scipy.io.wavfile.read) | 1.4.1 | [`scipy_mmap`](https://github.com/faroit/python_audio_loading_benchmark/blob/master/loaders.py#L61) | Numpy | PCM (only 16 bit) | ✅ |
| [soundfile](https://pysoundfile.readthedocs.io/en/0.9.0/) ([libsndfile](http://www.mega-nerd.com/libsndfile/)) | 0.9.0 | [`soundfile`](https://github.com/faroit/python_audio_loading_benchmark/blob/master/loaders.py#L50) | Numpy | PCM, Ogg, Flac | ✅ |
| [pydub](https://github.com/jiaaro/pydub) | 0.23.1 | [`pydub`](https://github.com/faroit/python_audio_loading_benchmark/blob/master/loaders.py#L97) | Python Array | PCM, MP3, OGG or other FFMPEG/libav supported codec | ❌ |
| [aubio](https://github.com/aubio/aubio) | 0.4.9 | [`aubio`](https://github.com/faroit/python_audio_loading_benchmark/blob/master/loaders.py#L32) | Numpy Array | PCM, MP3, OGG or other avconv supported code | ✅ |
| [audioread](https://github.com/beetbox/audioread) ([libmad](https://www.underbit.com/products/mad/)) | 2.1.6 | [`ar_mad`](https://github.com/faroit/python_audio_loading_benchmark/blob/master/loaders.py#L77) | Numpy Array | FFMPEG | ❌ |
| [audioread](https://github.com/beetbox/audioread) ([gstreamer](https://gstreamer.freedesktop.org/)) |2.1.6 | [`ar_gstreamer`](https://github.com/faroit/python_audio_loading_benchmark/blob/master/loaders.py#L67) |2.1.6 | Numpy Array | all of FFMPEG | ❌ |
| [audioread](https://github.com/beetbox/audioread) ([FFMPEG](https://www.ffmpeg.org/)) | 2.1.6 | [`ar_ffmpeg`](https://github.com/faroit/python_audio_loading_benchmark/blob/master/loaders.py#L87) | Numpy Array | all of FFMPEG | |
| [librosa](https://librosa.github.io/) | 0.6.2 | [`librosa`](https://github.com/faroit/python_audio_loading_benchmark/blob/master/loaders.py#L104) | Numpy Array | relies on audioread | |
| [tensorflow 1.13 `contrib.ffmpeg`](https://www.tensorflow.org/api_docs/python/tf/contrib/ffmpeg/decode_audio) | 1.13 | [`tf_decode`](https://github.com/faroit/python_audio_loading_benchmark/blob/master/loaders.py#L21) | Tensorflow Tensor | All codecs supported by FFMPEG | |
| [torchaudio](https://github.com/pytorch/audio) | 0.3.0 | [`torchaudio`](https://github.com/faroit/python_audio_loading_benchmark/blob/master/loaders.py#L45) | PyTorch Tensor | all codecs supported by Sox | ✅ |

### Not tested
| [audioread](https://github.com/beetbox/audioread) ([gstreamer](https://gstreamer.freedesktop.org/)) | 2.1.8 | [`ar_gstreamer`](https://github.com/faroit/python_audio_loading_benchmark/blob/master/loaders.py#L67) | Numpy Array | all of FFMPEG | ❌ |
| [audioread](https://github.com/beetbox/audioread) ([FFMPEG](https://www.ffmpeg.org/)) | 2.1.8 | [`ar_ffmpeg`](https://github.com/faroit/python_audio_loading_benchmark/blob/master/loaders.py#L87) | Numpy Array | all of FFMPEG | ❌ |
| [librosa](https://librosa.github.io/) | 0.7.2 | [`librosa`](https://github.com/faroit/python_audio_loading_benchmark/blob/master/loaders.py#L104) | Numpy Array | relies on audioread | |
| [tensorflow `tf.io.audio.decode_wav`](https://www.tensorflow.org/api_docs/python/tf/contrib/ffmpeg/decode_audio) | 2.1.0 | [`tf_decode_wav`](https://github.com/faroit/python_audio_loading_benchmark/blob/master/loaders.py#L22) | Tensorflow Tensor | PCM (only 16 bit) | |
| [tensorflow-io `from_audio`](https://www.tensorflow.org/io/api_docs/python/tfio/v0/IOTensor#from_audio) | 0.11.0 | [`tfio_fromaudio`](https://github.com/faroit/python_audio_loading_benchmark/blob/master/loaders.py#L22) | Tensorflow Tensor | PCM, Ogg, Flac | |
| [torchaudio](https://github.com/pytorch/audio) (sox) | 0.4.0 | [`torchaudio`](https://github.com/faroit/python_audio_loading_benchmark/blob/master/loaders.py#L45) | PyTorch Tensor | all codecs supported by Sox | ✅ |
| [torchaudio](https://github.com/pytorch/audio) (soundfile) | 0.4.0| [`torchaudio`](https://github.com/faroit/python_audio_loading_benchmark/blob/master/loaders.py#L45) | PyTorch Tensor | all codecs supported by Sox | ✅ |
### Not included

* __[audioread (coreaudio)](https://github.com/beetbox/audioread/blob/master/audioread/macca.py)__: only available on macOS.
* __[madmom](https://github.com/CPJKU/madmom):__ same ffmpeg interface as `ar_ffmpeg`.
* __[tensorflow 2 `decode_wav`](https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/audio/decode_wav):__ Not released yet.
* __[python builtin `wave`](https://docs.python.org/3.7/library/wave.html)__: TODO
* __[madmom](https://github.com/CPJKU/madmom)__: same ffmpeg interface as `ar_ffmpeg`.
* __[pymad](https://github.com/jaqx0r/pymad)__: only support for MP3, also very slow.
* __[python builtin `wave`](https://docs.python.org/3.7/library/wave.html)__: TODO (open for PR)

## Results

Expand Down Expand Up @@ -73,7 +73,13 @@ Build the docker container using
docker build -t audio_benchmark .
```
It installs all the package requirements for all audio libraries.
Afterwards, mount the data directory into the docker container.
Afterwards, mount the data directory into the docker container and run `run.sh` inside the
container, e.g.:

```bash
docker run -v /home/user/repos/python_audio_loading_benchmark/:/app \
-it audio_benchmark:latest /bin/bash run.sh
```

### Setting up in a virtual environment

Expand Down Expand Up @@ -105,4 +111,4 @@ The data is generated by using a shell script. To generate the data in the folde

## Contribution

We encourage interested users to contribute to this repository in the issue section and via pull requests. Particularly interesting are notifications of new tools and new versions of existing packages. Since benchmarks are subjective, I (@faroit) will reran the benchmark on our server again.
We encourage interested users to contribute to this repository in the issue section and via pull requests. Particularly interesting are notifications of new tools and new versions of existing packages. Since benchmarks are subjective, I (@faroit) will reran the benchmark on our server again.
1 change: 0 additions & 1 deletion benchmark_metadata.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,6 @@ def __len__(self):
'soundfile',
'sox',
'audioread',
# 'pydub', # too slow
]

for lib in libs:
Expand Down
73 changes: 36 additions & 37 deletions benchmark_np.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,11 @@
import matplotlib
matplotlib.use('Agg')
import os
import os.path
import random
import time
import timeit
import argparse
import utils
import loaders
import numpy as np
import functools


def get_files(dir, extension):
Expand All @@ -26,24 +24,31 @@ class AudioFolder(object):
def __init__(
self,
root,
download=True,
extension='wav',
lib="librosa",
extension='wav'
):
self.root = os.path.expanduser(root)
self.data = []
self.audio_files = get_files(dir=self.root, extension=extension)
self.loader_function = getattr(loaders, lib)

def __getitem__(self, index):
return self.loader_function(self.audio_files[index])
return self.audio_files[index]

def __len__(self):
return len(self.audio_files)


def test_np_loading(fp, lib):
import loaders
load_function = getattr(loaders, 'load_' + lib)
audio = load_function(fp)
if np.max(audio) > 0:
return True
else:
return False
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bmcfee do you know a way to ignore measurements when using timeit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm, never thought about it; not seeing an obvious workaround in the timeit API.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK so I guess the only alternative then it is to do a dry run in the beginning to just check if functions return of valid output for a given test file...



if __name__ == "__main__":

parser = argparse.ArgumentParser(description='Process some integers.')
parser.add_argument('--ext', type=str, default="wav")
args = parser.parse_args()
Expand All @@ -62,11 +67,10 @@ def __len__(self):
libs = [
'ar_gstreamer',
'ar_ffmpeg',
'ar_mad',
'aubio',
'pydub',
'soundfile',
'librosa',
'soundfile',
'librosa',
'scipy',
'scipy_mmap'
]
Expand All @@ -75,29 +79,24 @@ def __len__(self):
print("Testing: %s" % lib)
for root, dirs, fnames in sorted(os.walk('AUDIO')):
for audio_dir in dirs:
try:
duration = int(audio_dir)
dataset = AudioFolder(
os.path.join(root, audio_dir),
lib='load_' + lib,
extension=args.ext
)


start = time.time()

for fp in dataset.audio_files:
audio = dataset.loader_function(fp)
np.max(audio)

end = time.time()
store.append(
ext=args.ext,
lib=lib,
duration=duration,
time=float(end-start) / len(dataset),
duration = int(audio_dir)
dataset = AudioFolder(
os.path.join(root, audio_dir),
extension=args.ext
)

# for fp in dataset.audio_files:
for fp in dataset.audio_files:
time = timeit.timeit(
functools.partial(test_np_loading, fp, lib),
number=10
Comment on lines +89 to +92
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now you're iterating over the files, loading each one 10 times in a row, then storing the time it took to load the last file 10 times. I think you'd want to divide the return value by 10, and accumulate the time over all files? Also you may want to use timeit.repeat() to run 3 repetitions and then keep the smallest one, if the idea was to factor out disk I/O.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Jan. Will have some time this weekend to finish this up

)
except:
continue

store.df.to_pickle("results/benchmark_%s_%s.pickle" % ("np", args.ext))
store.append(
ext=args.ext,
lib=lib,
duration=duration,
time=time,
)

store.df.to_pickle("results/benchmark_%s_%s.pickle" % ("np", args.ext))
29 changes: 18 additions & 11 deletions benchmark_pytorch.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,6 @@ class AudioFolder(torch.utils.data.Dataset):
def __init__(
self,
root,
download=True,
extension='wav',
lib="librosa",
):
Expand All @@ -39,14 +38,14 @@ def __init__(

def __getitem__(self, index):
audio = self.loader_function(self.audio_files[index])
return torch.FloatTensor(audio).view(1, 1, -1)
return torch.from_numpy(audio).view(1, 1, -1)

def __len__(self):
return len(self.audio_files)


if __name__ == "__main__":

parser = argparse.ArgumentParser(description='Process some integers.')
parser.add_argument('--ext', type=str, default="wav")
args = parser.parse_args()
Expand All @@ -65,28 +64,36 @@ def __len__(self):
libs = [
'ar_gstreamer',
'ar_ffmpeg',
'ar_mad',
'aubio',
'pydub',
'soundfile',
'librosa',
'soundfile',
'librosa',
'scipy',
'scipy_mmap',
]

if args.ext != "mp4":
libs.append('torchaudio')
libs.append('torchaudio_sox')
libs.append('torchaudio_soundfile')

for lib in libs:
print("Testing: %s" % lib)
if "torchaudio" in lib:
backend = lib.split("torchaudio_")[-1]
import torchaudio
torchaudio.set_audio_backend(backend)
call_fun = "load_torchaudio"
else:
call_fun = 'load_' + lib

for root, dirs, fnames in sorted(os.walk('AUDIO')):
for audio_dir in dirs:
try:
duration = int(audio_dir)
data = torch.utils.data.DataLoader(
AudioFolder(
os.path.join(root, audio_dir),
lib='load_' + lib,
os.path.join(root, audio_dir),
lib=call_fun,
extension=args.ext
),
batch_size=1,
Expand All @@ -106,7 +113,7 @@ def __len__(self):
time=float(end-start) / len(data),
)
except:
"Error but continue"
continue


store.df.to_pickle("results/benchmark_%s_%s.pickle" % ("pytorch", args.ext))
store.df.to_pickle("results/benchmark_%s_%s.pickle" % ("pytorch", args.ext))
58 changes: 36 additions & 22 deletions benchmark_tf.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,13 +26,15 @@ def get_files(dir, extension):

def _make_py_loader_function(func):
def _py_loader_function(fp):
return func(fp.decode())
return func(fp.numpy().decode())
return _py_loader_function


if __name__ == "__main__":

parser = argparse.ArgumentParser(description='Benchmark audio loading in tensorflow')
parser = argparse.ArgumentParser(
description='Benchmark audio loading in tensorflow'
)
parser.add_argument('--ext', type=str, default="wav")
args = parser.parse_args()

Expand All @@ -48,14 +50,14 @@ def _py_loader_function(fp):
libs = [
'ar_gstreamer',
'ar_ffmpeg',
'ar_mad',
'aubio',
'pydub',
'soundfile',
'librosa',
'scipy',
'scipy_mmap',
'tf_decode'
'tf_decode_wav',
'tfio_fromaudio',
]

for lib in libs:
Expand All @@ -64,34 +66,47 @@ def _py_loader_function(fp):
for audio_dir in dirs:
try:
duration = int(audio_dir)
audio_files = get_files(dir=os.path.join(root, audio_dir), extension=args.ext)
audio_files = get_files(
dir=os.path.join(root, audio_dir),
extension=args.ext
)

dataset = tf.data.Dataset.from_tensor_slices(audio_files)
if lib == "tf_decode":
dataset = dataset.map(lambda x: loaders.load_tf_decode(x, args.ext))
if lib in ["tf_decode_wav"]:
dataset = dataset.map(
lambda x: loaders.load_tf_decode_wav(x),
num_parallel_calls=1
)
elif lib in ["tfio_fromaudio"]:
dataset = dataset.map(
lambda x: loaders.load_tfio_fromaudio(x),
num_parallel_calls=1
)
elif lib in ["tfio_fromffmpeg"]:
dataset = dataset.map(
lambda x: loaders.load_tfio_fromffmpeg(x),
num_parallel_calls=1
)
else:
loader_function = getattr(loaders, 'load_' + lib)
dataset = dataset.map(
lambda filename: tf.py_func(
_make_py_loader_function(loader_function),
[filename],
lambda filename: tf.py_function(
_make_py_loader_function(loader_function),
[filename],
[tf.float32]
)
),
num_parallel_calls=4
)

dataset = dataset.batch(1)
# dataset = dataset.apply(tf.data.experimental.ignore_errors())
# dataset = dataset.batch(4)
start = time.time()
iterator = dataset.make_one_shot_iterator()
next_audio = iterator.get_next()
with tf.Session() as sess:
for i in range(len(audio_files)):
try:
value = sess.run(tf.reduce_max(next_audio))
except tf.errors.OutOfRangeError:
break

for audio in dataset:
value = tf.reduce_max(audio)

end = time.time()

store.append(
ext=args.ext,
lib=lib,
Expand All @@ -102,4 +117,3 @@ def _py_loader_function(fp):
continue

store.df.to_pickle("results/benchmark_%s_%s.pickle" % ("tf", args.ext))

Loading