Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new augmentations: lowpass filtering, lossy compression (opus/mp3/vorbis) #1451

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

racoiaws
Copy link
Contributor

@racoiaws racoiaws commented Feb 5, 2025

This PR adds two operations on Cuts/Recordings and corresponding randomized CutSet transforms

  1. Lowpass filtering
  2. In-memory lossy compression (with immediate decoding back to raw waveform)

Both would be useful for training robust ASR models, speech enhancement models, etc.

TODO

  • tests
  • docstrings for new methods in lhotse.audio.recording.Recording

@anteju anteju requested review from pzelasko and anteju February 5, 2025 19:18
f.seek(0)
samples_compressed, rate_compressed = sf.read(
f, always_2d=True
) # TODO: handle possible sample rate change with the opus codec?
Copy link
Collaborator

@pzelasko pzelasko Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you write OPUS files with soundfile, it adds extra information in the file header about the original sampling rate, so that when you load this file with soundfile later, it's resampled from 48k before returning the audio array (unlike most other tools).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, did not know that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted

from typing import Literal

import numpy as np
import scipy.signal
Copy link
Collaborator

@pzelasko pzelasko Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since scipy is an optional lhotse dependency, please move this import inside __call__, and add an import guard (from lhotse.utils) before it:

if not is_module_available("scipy"):
    raise ImportError("In order to use Lowpass transforms, run 'pip install scipy'")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


import numpy as np
import scipy.signal
import soundfile as sf
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also move all soundfile imports to local function scopes? IIRC importing this globally used to silently break documentation builds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Collaborator

@pzelasko pzelasko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Please also add unit tests for calling transform methods on Recording and Cut, and for calling cut_transforms on CutSet

@racoiaws
Copy link
Contributor Author

Thanks for the review, addressed comments

Will add tests a bit later

@pzelasko pzelasko added this to the v1.30 milestone Feb 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants