-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add multi-storage-client backend for file open #1455
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -815,6 +815,82 @@ def handles_special_case(self, identifier: Pathlike) -> bool: | |||
def is_applicable(self, identifier: Pathlike) -> bool: | |||
return is_valid_url(identifier) | |||
|
|||
|
|||
@lru_cache(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove the lru cache decorator since these environ lookups should be cheap - it looks like that would simplify the tests.
2. override the profile/bucket name by env LHOTSE_MSC_PROFILE if provided: msc://profile/path/to/my/object2, | ||
if bucket name is not provided, then we expect the msc profile name to match with bucket name | ||
""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add an import guard here:
if not is_module_available("multistorageclient"):
raise RuntimeError("Please run 'pip install multistorageclient' in order to use MSCIOBackend.")
(imported from lhotse.utils
)
|
||
class MSCIOBackend(IOBackend): | ||
""" | ||
Uses multi-storage client to download data from object store |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a link to MSC here? It'd be good to add 1-2 sentences about how MSC is different and what are it's unique features.
|
||
@lru_cache(1) | ||
def get_lhotse_msc_override_protocols() -> Any: | ||
return os.getenv("LHOTSE_MSC_OVERRIDE_PROTOCOLS", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please document these environment variables in Lhotse's top-level README.md
where it lists all env vars used to modify lhotse behavior.
This PR adds support for the Multi-Storage Client (MSC) backend to handle object storage access in Lhotse. The changes include:
Features
MSCIOBackend
for handling MSC protocol URLsLHOTSE_MSC_OVERRIDE_PROTOCOLS
env for supported protocols, e.g.s3://
->msc://
LHOTSE_MSC_PROFILE
env for profile/bucket name overrides, e.g.msc://my-bucket
->msc://my-profile
Implementation Details
Configuration
MSC behavior can be configured through environment variables:
LHOTSE_MSC_OVERRIDE_PROTOCOLS
: Comma-separated list of protocols to override (e.g., "s3,gs")LHOTSE_MSC_PROFILE
: Profile name to use for bucket overrideDependencies
multistorageclient
package to be installed