Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(type-coverage-generation): Model type coverage batch generation #390

Merged
merged 42 commits into from
Nov 12, 2023

Conversation

sam-or
Copy link
Contributor

@sam-or sam-or commented Sep 27, 2023

Pull Request Checklist

  • New code has 100% test coverage
  • (If applicable) The prose documentation has been updated to reflect the changes introduced by this PR
  • (If applicable) The reference documentation has been updated to reflect the changes introduced by this PR
  • Pre-Commit Checks were ran and passed
  • Tests were ran and passed

Description

  • This PR implements an alternate batch generation process. The goal is to generate a minimal set of examples of a model that achieves full coverage of the forms that model can take.
    A very simple example:
class Model(pydantic.BaseModel):
  data: int | str
  
list(ModelFactory.coverage())
# >>>
# [Model(data=1234), Model(data="abc123")]

Close Issue(s)

@JacobCoffee
Copy link
Member

Please see the suggested sourcery refactorings, it cant merge into forks for some reason:
61deae0

@sam-or
Copy link
Contributor Author

sam-or commented Sep 27, 2023

Please see the suggested sourcery refactorings, it cant merge into forks for some reason: 61deae0

done

@guacs
Copy link
Member

guacs commented Sep 30, 2023

@sam-or is this ready for review or are you still working on it?

@sam-or
Copy link
Contributor Author

sam-or commented Sep 30, 2023

@sam-or is this ready for review or are you still working on it?

There a still a couple of tests failing that I won’t get to look at until mid next week. But other than fixing those there’s hopefully not too much else that needs doing, so it should be good for review.

@sam-or sam-or marked this pull request as ready for review September 30, 2023 07:58
@sam-or sam-or requested review from a team as code owners September 30, 2023 07:58
@sam-or
Copy link
Contributor Author

sam-or commented Oct 2, 2023

tests seem to be passing now, should be good for review

@guacs
Copy link
Member

guacs commented Oct 3, 2023

tests seem to be passing now, should be good for review

Could you merge from main and run pdm run lint?

@guacs
Copy link
Member

guacs commented Oct 4, 2023

@sam-or Sorry for the delay. I'll take a look this weekend :)

polyfactory/factories/base.py Outdated Show resolved Hide resolved
polyfactory/factories/base.py Show resolved Hide resolved
polyfactory/factories/base.py Show resolved Hide resolved
polyfactory/utils/model_coverage.py Outdated Show resolved Hide resolved
tests/test_type_coverage_generation.py Outdated Show resolved Hide resolved
@guacs
Copy link
Member

guacs commented Oct 8, 2023

I did review the code and I think it's great, but I'll be honest, I'm not sure I'm seeing the benefit of this feature too much. The reason for that is because once #397 is done, then hypothesis will do a form of this kind of coverage. Also, what is the expected output of the following:

@dataclass
class Bar:
    bar_val: int | str | bool


@dataclass
class Foo:
    foo_val: int | str
    bar: Bar


FooFactory = DataclassFactory.create_factory(Foo)
coverage = list(FooFactory.coverage())

print(len(coverage)) # current output = 3

Currently, the output is 3, but I'm expecting there to be 6. That is, with bar_value of bar in Foo having int, str, and bool when the type of foo_val is int and then the same when the type of foo_val is str. For more complex models, the number of variations will increase very quickly. Or am I misunderstanding the intent of this feature?

@sam-or
Copy link
Contributor Author

sam-or commented Oct 8, 2023

Thank you very much for your review. I realise I have probably not explained the intention very well. The reason for wanting this feature over something like hypothesis (which I am currently using) is that I wanted something that would always generate examples of a model with every option of what that type could be, every single time that a batch is generated. The other goal is to achieve this with the minimum number of examples in a batch. With hypothesis this is not guaranteed, nor is it guaranteed with other methods based on randomness. Tools like hypothesis are amazing but I don't wish to rely on a statistical approach to this "coverage", which is why I wanted to move away from fuzzing to a more targeted method to generating the kind of test data that is best for my use case. Please let me know what your thoughts are on this, perhaps this feature is a bit niche so I understand your reservations.

So to answer your question about expected output, we would be expecting 3 examples because the highest variation of any model in your example is bar_val: int | str | bool:

Foo(foo_val=123, bar_var=Bar(bar_val=321)) # foo_val: int, bar_val: int
Foo(foo_val="abc", bar_var=Bar(bar_val="def")) # foo_val: str, bar_val: str
Foo(foo_val=456, bar_var=Bar(bar_val=True)) # foo_val: int (wraps around), bar_val: bool

As you can see it covers every value in the union types of foo_val and bar_val with the smallest number of examples - not considering every permutation of the union types which would yield 6 examples (and yes grow very quickly for complex models)

@guacs
Copy link
Member

guacs commented Oct 8, 2023

Aah okay now I get what you were trying to do. I do think it's helpful, but also like you said it might be a bit too niche of a feature for us to merge and then maintain.

Thoughts, @litestar-org/members?

@sam-or
Copy link
Contributor Author

sam-or commented Oct 8, 2023

Perhaps to try to sell it a bit more I'll try to explain the benefits I see in this feature;

  • There is a testing speed advantage with larger more complex models to generating a minimal number of examples that still achieves a high percentage of code coverage
  • Consistency in testing, I'm sure anyone who has used hypothesis enough has run into flaky tests and this also aims to resolve that
  • Useful for loading up a test database with data to test things like searching and migrations with more consistent coverage and confidence that data in every form has been tested (for some deeply nested models, it is very unlikely that hypothesis will generate examples that cover all the branches in the nested types)

I do hope that others might find it as useful as I will.

I'm also more than happy to continue to spend time on this in the future, to help maintain and improve it

@guacs
Copy link
Member

guacs commented Oct 18, 2023

@sam-or is this ready? If so, could you add documentation for this as well??

@sam-or
Copy link
Contributor Author

sam-or commented Oct 18, 2023

Yes it is, I'll add documentation now

@guacs
Copy link
Member

guacs commented Oct 21, 2023

@sam-or sorry for the delay! I wanted to take another look into this properly and I'll definitely do it in a few days :)

Copy link
Member

@guacs guacs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sam-or first of all, sorry for the delay! I have left a few comments and once those are resolved, I think this is good to merge :)

docs/examples/model_coverage/test_example_1.py Outdated Show resolved Hide resolved
docs/examples/model_coverage/test_example_2.py Outdated Show resolved Hide resolved
docs/usage/model_coverage.rst Outdated Show resolved Hide resolved
polyfactory/factories/base.py Outdated Show resolved Hide resolved
polyfactory/factories/base.py Outdated Show resolved Hide resolved
polyfactory/utils/model_coverage.py Show resolved Hide resolved
tests/test_type_coverage_generation.py Outdated Show resolved Hide resolved
tests/test_type_coverage_generation.py Outdated Show resolved Hide resolved
tests/test_type_coverage_generation.py Outdated Show resolved Hide resolved
tests/test_type_coverage_generation.py Show resolved Hide resolved
Copy link
Member

@guacs guacs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! Just one small comment regarding adding docstrings. Also, it'd be great if you could just tag me or request another review once you've made the changes. If not, I might not know whether the PR is ready without manually looking to see if it's ready for another review.

polyfactory/utils/model_coverage.py Show resolved Hide resolved
Copy link

Documentation preview will be available shortly at https://litestar-org.github.io/polyfactory-docs-preview/390

@sam-or
Copy link
Contributor Author

sam-or commented Nov 12, 2023

Thanks for this! Just one small comment regarding adding docstrings. Also, it'd be great if you could just tag me or request another review once you've made the changes. If not, I might not know whether the PR is ready without manually looking to see if it's ready for another review.

Ah yep no worries, will do. I've added that docstring to CoverageContainerCallable, hopefully it's good to go now?

Copy link
Member

@guacs guacs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sam-or sorry for the delay and thank you for the work you've done :)

@guacs guacs merged commit b1e8b5e into litestar-org:main Nov 12, 2023
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants