Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stable interface to mlpp-workflows #14

Open
frazane opened this issue Oct 14, 2022 · 2 comments
Open

Stable interface to mlpp-workflows #14

frazane opened this issue Oct 14, 2022 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@frazane
Copy link
Collaborator

frazane commented Oct 14, 2022

Every time we change the way pipelines are called (e.g. by changing function arguments) we have to adapt the code in mlpp-workflows accordingly. It would be better instead if we had a stable interface between the two libraries.

It could be in the form of an output xr.Dataset object.

This could be done by simply moving the following function (defined in mlpp-workflows) to this library.

def extract_features(
    data: Dict[str, xr.Dataset],
    feature_list: List[str],
    points: Tuple[List],
    reftimes: List[datetime],
    leadtimes: List[int],
) -> xr.Dataset:
    """Extract features from a given source."""
    ds = xr.Dataset()
    for feature in feature_list:
        LOGGER.info(f"FEATURE: {feature}")
        try:
            output = getattr(globals()["mlpp_features"], feature)(
                data, points, reftimes, leadtimes, ds=ds
            )
        except:
            LOGGER.exception(f"{feature} pipeline failed!")
        ds[feature] = output.chunk("auto").persist()
    LOGGER.info(ds)
    return ds

It will also be easier to document how the two libraries interact since it will be just one object.

@dnerini thoughts?

@frazane frazane added the enhancement New feature or request label Oct 14, 2022
@frazane frazane self-assigned this Oct 14, 2022
@dnerini
Copy link
Member

dnerini commented Oct 17, 2022

Hi @frazane, thanks for the nice suggestion. Indeed, the interface to mlpp-features is defined as a xr.Dataset object (all pipelines return that). This said, I like the idea of moving the extract_features method to mlpp-features! Moreover, I think it could be interesting to refactor it as a class, say a FeatureStore class, and use that not only to return the feature dataset (as in the original method above), but also to discover and explore features, for example to retrieve the list of all the input parameters given a list of features. What you think?

@frazane
Copy link
Collaborator Author

frazane commented Oct 17, 2022

Nice idea! A class with two main methods: extract and discover? And discover could be used from mlpp-workflows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants