You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
During computing features of each segment, we find that it is very slow. After some debugging, we find
Looks like it uses
ffmpeg -i xxx.m4a -f s16le -
to read the whole file xxx.m4a, which is not expected since we only need to extract a segment.
Also, our *.m4a files usually contain two channels. We hope that it reads only one channel from the file instead of all channels and discards unused channel(s) afterwards.
## Test loading the whole fileimporttimeforiinrange(10):
start=time.time()
audio=recording.load_audio()
end=time.time()
print(f'Iter {i}: {end-start} s')
The output is given below
Iter 0: 2.1797096729278564 s
Iter 1: 2.187424659729004 s
Iter 2: 1.7927534580230713 s
Iter 3: 2.0300517082214355 s
Iter 4: 2.191190242767334 s
Iter 5: 1.7366209030151367 s
Iter 6: 1.7066655158996582 s
Iter 7: 1.7146885395050049 s
Iter 8: 1.7510318756103516 s
Iter 9: 1.6823420524597168 s
You can see that it takes about 2 seconds to load the whole file (4320 seconds).
## Test loading 100 seconds
import time
for i in range(10):
start = time.time()
audio = recording.load_audio(offset=200, duration=100)
end = time.time()
print(f'Iter {i}: {end-start} s')
The output is given below
Iter 0: 1.2440290451049805 s
Iter 1: 1.5040011405944824 s
Iter 2: 1.0565969944000244 s
Iter 3: 1.477067232131958 s
Iter 4: 1.03183913230896 s
Iter 5: 1.0384526252746582 s
Iter 6: 0.9423067569732666 s
Iter 7: 1.135765790939331 s
Iter 8: 1.038952350616455 s
Iter 9: 0.9517149925231934 s
You can see it takes comparable time as loading the whole file even if it requests loading 100s of the file.
## Test loading 5 seconds
import time
for i in range(10):
start = time.time()
audio = recording.load_audio(offset=100, duration=5)
end = time.time()
print(f'Iter {i}: {end-start} s')
The output is
Iter 0: 1.0234184265136719 s
Iter 1: 1.2550442218780518 s
Iter 2: 1.5072486400604248 s
Iter 3: 1.1016292572021484 s
Iter 4: 1.038039207458496 s
Iter 5: 0.9318227767944336 s
Iter 6: 1.1400001049041748 s
Iter 7: 1.0508427619934082 s
Iter 8: 1.0651192665100098 s
Iter 9: 1.0305383205413818 s
Even worse, reading only 5 seconds of the file also takes a long time.
real 0m0.172s
user 0m0.140s
sys 0m0.108s
real 0m0.171s
user 0m0.137s
sys 0m0.099s
real 0m0.175s
user 0m0.149s
sys 0m0.099s
real 0m0.169s
user 0m0.143s
sys 0m0.100s
real 0m0.173s
user 0m0.140s
sys 0m0.101s
real 0m0.200s
user 0m0.159s
sys 0m0.099s
real 0m0.168s
user 0m0.138s
sys 0m0.098s
real 0m0.172s
user 0m0.143s
sys 0m0.098s
real 0m0.163s
user 0m0.135s
sys 0m0.097s
real 0m0.172s
user 0m0.140s
sys 0m0.100s
You can see it is much faster.
This commit csukuangfj@2dd8125
changes the current audio backed to use ffmpeg commandline, instead of torchaudio.load() to handle m4a files.
With the above commit, we have the following from the output of htop
The RTF on CPU for extracting features is about 0.0003-0.0008.
The text was updated successfully, but these errors were encountered:
Thanks for the detailed description. I managed to reproduce your issue. You can select a different existing audio backend that is more performant for m4a.
Iter 0: 0.014747142791748047 s
Iter 1: 0.011191129684448242 s
Iter 2: 0.009851455688476562 s
Iter 3: 0.00884866714477539 s
Iter 4: 0.008603811264038086 s
Iter 5: 0.008523941040039062 s
Iter 6: 0.008649110794067383 s
Iter 7: 0.009306192398071289 s
Iter 8: 0.009340286254882812 s
Iter 9: 0.008769750595092773 s
Iter 0: 0.24323129653930664 s
Iter 1: 0.20929312705993652 s
Iter 2: 0.23014044761657715 s
Iter 3: 0.2314000129699707 s
Iter 4: 0.21734905242919922 s
Iter 5: 0.1870877742767334 s
Iter 6: 0.19929170608520508 s
Iter 7: 0.22307038307189941 s
Iter 8: 0.2071971893310547 s
Iter 9: 0.20948362350463867 s
It's also possible to globally override the audio backend with lhotse.set_audio_backend or with LHOTSE_AUDIO_BACKEND env var (see this readme section).
We have many
*.m4a
files, where each file is several hours long and contains multiple supervision segments.We have used
lhotse/lhotse/cut/set.py
Line 1002 in acbca24
to split a long cutset into smaller segments.
During computing features of each segment, we find that it is very slow. After some debugging, we find
Looks like it uses
to read the whole file
xxx.m4a
, which is not expected since we only need to extract a segment.Also, our
*.m4a
files usually contain two channels. We hope that it reads only one channel from the file instead of all channels and discards unused channel(s) afterwards.To help reproduce, we have prepared a colab notebook at
https://colab.research.google.com/drive/1ZU2M_z9kY503UKPYMuYX0BQ5BLuVi6jt?usp=sharing
it first generates a
m4a
file of length 4320 seconds.then we construct a recording from it.
The output is given below
You can see that it takes about 2 seconds to load the whole file (4320 seconds).
The output is given below
You can see it takes comparable time as loading the whole file even if it requests loading 100s of the file.
The output is
Even worse, reading only 5 seconds of the file also takes a long time.
If we use
ffmpeg
directly to read part of a fileThe output is
You can see it is much faster.
This commit
csukuangfj@2dd8125
changes the current audio backed to use
ffmpeg
commandline, instead oftorchaudio.load()
to handlem4a
files.With the above commit, we have the following from the output of htop
The RTF on CPU for extracting features is about
0.0003-0.0008
.The text was updated successfully, but these errors were encountered: