Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests and canonicalization for ConfigWorkflow #89

Merged
merged 11 commits into from
Jan 21, 2025
3 changes: 3 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ Changelog = "https://github.com/C2SM/Sirocco/blob/main/CHANGELOG.md"
[tool.pytest.ini_options]
# Configuration for [pytest](https://docs.pytest.org)
addopts = "--pdbcls=IPython.terminal.debugger:TerminalPdb"
norecursedirs = "tests/cases"

[tool.coverage.run]
# Configuration of [coverage.py](https://coverage.readthedocs.io)
Expand Down Expand Up @@ -75,6 +76,8 @@ path = "src/sirocco/__init__.py"
extra-dependencies = [
"ipdb"
]
default-args = []
extra-args = ["--doctest-modules"]

[[tool.hatch.envs.hatch-test.matrix]]
python = ["3.12"]
Expand Down
43 changes: 42 additions & 1 deletion src/sirocco/parsing/_yaml_data_models.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,15 @@

from isoduration import parse_duration
from isoduration.types import Duration # pydantic needs type # noqa: TCH002
from pydantic import BaseModel, ConfigDict, Discriminator, Field, Tag, field_validator, model_validator
from pydantic import (
BaseModel,
ConfigDict,
Discriminator,
Field,
Tag,
field_validator,
model_validator,
)

from sirocco.parsing._utils import TimeUtils

Expand Down Expand Up @@ -387,6 +395,39 @@ def get_plugin_from_named_base_model(data: dict) -> str:


class ConfigWorkflow(BaseModel):
"""
The root of the configuration tree.

Examples:

minimal yaml to generate:

>>> import textwrap
>>> import pydantic_yaml
>>> config = textwrap.dedent(
... '''
... cycles:
... - minimal_cycle:
... tasks:
... - task_a:
... tasks:
... - task_b:
... plugin: shell
... data:
... available:
... - foo:
... generated:
... - bar:
... '''
... )
>>> wf = pydantic_yaml.parse_yaml_raw_as(ConfigWorkflow, config)

minimum programmatically created instance

>>> empty_wf = ConfigWorkflow(cycles=[], tasks=[], data={})

"""

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: are we ok with this yaml snippet parsing without validation errors?

If so, should it fail with a clear message down the line or should it create a viable WorkGraph, which runs nothing?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we agreed in the meeting to produce an error, maybe it is easier for merging to do this in a subsequent PR

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once I merge #90 into this, the error would be raised during canonicalization

name: str | None = None
rootdir: Path | None = None
cycles: list[ConfigCycle]
Expand Down
11 changes: 5 additions & 6 deletions src/sirocco/pretty_print.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ def as_block(self, header: str, body: str) -> str:

Example:

>>> print(PrettyPrinter().as_block("header", "foo\nbar"))
>>> print(PrettyPrinter().as_block("header", "foo\\nbar"))
header:
foo
bar
Expand All @@ -50,7 +50,7 @@ def as_item(self, content: str) -> str:
- foo

>>> pp = PrettyPrinter()
>>> print(pp.as_item(pp.as_block("header", "multiple\nlines\nof text")))
>>> print(pp.as_item(pp.as_block("header", "multiple\\nlines\\nof text")))
- header:
multiple
lines
Expand Down Expand Up @@ -87,14 +87,13 @@ def format_basic(self, obj: core.GraphItem) -> str:
>>> from datetime import datetime
>>> print(
... PrettyPrinter().format_basic(
... Task(
... name=foo,
... core.Task(
... name="foo",
... coordinates={"date": datetime(1000, 1, 1).date()},
... workflow=None,
... )
... )
... )
foo [1000-01-01]
foo [date: 1000-01-01]
"""
name = obj.name
if obj.coordinates:
Expand Down
Empty file added tests/unit_tests/__init__.py
Empty file.
Empty file.
23 changes: 23 additions & 0 deletions tests/unit_tests/core/test_workflow.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
from sirocco import pretty_print
from sirocco.core import workflow
from sirocco.parsing import _yaml_data_models as models


def test_minimal_workflow():
minimal_config = models.ConfigWorkflow(
cycles=[],
tasks=[{"some_task": {"plugin": "shell"}}],
data=models.ConfigData(
available=[models.ConfigAvailableData(foo={})],
generated=[models.ConfigGeneratedData(bar={})],
),
)

testee = workflow.Workflow(minimal_config)

pretty_print.PrettyPrinter().format(testee)

assert testee.name is None
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: What does it mean to have a core.Workflow with name=None, rootdir=None?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Realistically this can not happen in normal usage, (as tested in test_yaml_data_models.py). However, downstream code has to deal with the possibility or negate the help of static type checking.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we need to check if they are not None in the core.Workflow right? Should we make an issue to keep in mind?

Copy link
Contributor

@leclairm leclairm Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually, for core objects, the name and rootdir are compulsory but the reason for None possibility in the user provided information is different for both of them.

  • name can be None and then defaults in the core object to what is derived from the yaml config path.
  • rootdir should actually NOT be given by the user and only inferred from the yaml config path.

So if you create a canonical class:

  • name would stay str | None in ConfigWorkflow defaulting to None and be non default str in the canonical class
  • rootdir would belong to the canonical class as non default str and disappear from ConfigWorkflow

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will adapt this in #90

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will need to think about how to adapt the canonicalize function to allow passing in additional information though, because the additional information will be different per class if any (not sure if singledispatch is still the right match then).

assert len(list(testee.tasks)) == 0
assert len(list(testee.cycles)) == 0
assert testee.data[("foo", {})].available
Empty file.
41 changes: 41 additions & 0 deletions tests/unit_tests/parsing/test_yaml_data_models.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
import textwrap

from sirocco.parsing import _yaml_data_models as models


def test_workflow_test_internal_dicts():
testee = models.ConfigWorkflow(
cycles=[],
tasks=[{"some_task": {"plugin": "shell"}}],
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: is it a problem, that we can not currently (for testing purposes) pass actual ConfigXyzTask instances into the constructor of ConfigWorkflow, but have to go through this dictionary form?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think yes. As you pointed out in the last meeting, we need to decouple the components for effective testing so we need to test YAML <-> CONFIG and CONFIG <-> GRAPH separately which requires a creation of the config objects without a yaml string.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of the drafted solution in #90, splitting into a "canonical" and a "yaml-parsed" class? Note that
a) Some of this would be alleviated by idempotent validators, which that approach would introduce gradually
b) The full benefits would only kick in, once all the config classes have undergone the same process. However, testability will increase incrementally.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made a comment there about the canonical workflow

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the creation of canonical classes to remove the type ambiguities. Same as rootdir in WorkFlowConfig.

data=models.ConfigData(
available=[models.ConfigAvailableData(foo={})],
generated=[models.ConfigGeneratedData(bar={})],
),
)
assert testee.data_dict["foo"].name == "foo"
assert testee.data_dict["bar"].name == "bar"
assert testee.task_dict["some_task"].name == "some_task"


def test_load_workflow_config(tmp_path):
minimal_config = textwrap.dedent(
"""
cycles:
- minimal:
tasks:
- a:
tasks:
- b:
plugin: shell
data:
available:
- c:
generated:
- d:
"""
)
minimal = tmp_path / "minimal.yml"
minimal.write_text(minimal_config)
testee = models.load_workflow_config(str(minimal))
assert testee.name == "minimal"
assert testee.rootdir == tmp_path
Loading