Support user-defined `format` in string schemas #29

Zac-HD · 2019-12-19T01:16:44Z

Thanks to @Stranger6667 in schemathesis/schemathesis#337 (comment):

[kiwicom has] some custom formats, e.g. for payment card numbers - and we declare it in the schema + use validation for it via jsonschema + connexion:
 @draft4_format_checker.checks("card_number")
 def is_card_number(value):
     # Validation based on Luhn algorithm

This is definitely in-scope for hypothesis-jsonschema, because user-defined formats are explicitly allowed by the spec! The options for this are to either add an argument, so that the API looks like

def from_type(schema: Schema, *, formats: Dict[str, SearchStrategy[str]] = None) -> ...:

or to maintain a global registry of known formats and strategies. In either case I'd want to raise an error if we ever try to generate a format value for which jsonschema does not have a known validator.

At the moment I'm leaning towards global state, with validation and the caveat that users can't override formats defined in the spec, but feedback would be most welcome.

The text was updated successfully, but these errors were encountered:

Stranger6667 · 2020-04-11T16:05:29Z

Hi and sorry for a late reply :)

With a global registry, it might be easier to use by 3rd party tools like Schemathesis. E.g. users will be able to use hypothesis-jsonschema API directly instead of the wrapper we have now in Schemathesis (which is not necessarily a better idea). With an explicit argument, it feels cleaner to me (since there will be no mutation of internals) and Schemathesis may keep its interface for custom formats and call from_schema without mutating that format dictionary. In this case, there will be no backward-incompatible changes in Schemathesis + there will be no need to access the internals of hypothesis-jsonschema

In either case I'd want to raise an error if we ever try to generate a format value for which jsonschema does not have a known validator.

Hm, for Schemathesis it means, that if there is a custom format in the schema, then it will fail on any affected endpoint. Which, actually may help to find spots where we can generate data better suited than random strings. But I see a possible downside - there might be fewer errors discovered. We actually found a few flaws in our input validators on data that didn't fit the specified format. With the validation we'll have to write a strategy for data generation and the previously discovered cases will not be generated.

Also, it might slow down error discovery - e.g. if there are two fields A (without format) and B (with format) and there is an error that depends on A's value, then we won't discover this error until a proper format strategy will not be specified. It might be hard for users that are not experienced with Hypothesis and use Schemathesis as a zero-config tool.

But I do feel that it makes internal logic more explicit in both packages and the cases I mentioned above could be also handled explicitly on the Schemathesis side, e.g. by passing some "default" format strategy for unknown formats (this behavior may be configurable as well).

Also, since the scope of hypothesis-jsonschema is generating data that fit it also makes sense to have validation. Btw, I was experimenting with a package that generates data that doesn't fit - to widen Schemathesis's scope, but it is still WIP on the Schemathesis side.

So, in short I think passing format strategies explicitly + validation might be the most explicit and clean approach :)

Zac-HD · 2020-04-12T09:56:06Z

Hmm, some examples might help:

    # Imagine that we've registered a validator for the format "hello",
    # which the only valid value is "world", and no other formats.
>>> FORMATS: Dict[str, SearchStrategy[str]] = {"hello": st.just("world")}

    # If we pass a strategy for a custom format, it gets used
>>> from_schema({"type": "string", "format": "hello"}, custom_formats=FORMATS)
st.just("world")

    # If the schema contains an unknown format, ignore the format keyword (status quo)
>>> from_schema({"type": "string", "format": "something else"})
st.text()

    # It's an error to pass a strategy for a standard format
>>> from_schema(..., custom_formats={"date": ...})
Traceback: ...
InvalidArgument("Cannot pass a custom format for `date`, "
                "because it is part of the jsonschema standard.")

    # It's an error to pass a strategy for a custom format unless a
    # validator for that format is registered on the draft(4,6,7) validator
    # See https://python-jsonschema.readthedocs.io/en/stable/validate/#validating-formats
>>> from_schema(..., custom_formats={"foo": ...})
Traceback: ...
InvalidArgument("A strategy was passed for the custom format `foo`, "
                "but no checker is registered for this format.")

    # Error if generated value for custom format doesn't validate
>>> from_schema(
        {"type": "string", "format": "hello"},
        custom_formats={"hello": st.just("an invalid string")}
    ).example()
Traceback: ...
InvalidArgument("Got 'an invalid string' from custom_formats['hello']=..., "
                "but this value does not conform to the format checker.")

And this shouldn't actually be too hard to implement, so long as we're careful to thread the custom_formats dict through all our recursive calls to from_schema and the variants for arrays, objects, and strings.

Stranger6667 · 2020-04-12T11:59:59Z

Aha, so you propose to connect format strategies with jsonschema's registered checks, right? Sounds interesting to me, I didn't think about it initially :) I like that it will add some extra validation on user-defined strategies.

Thank you, Zac, your examples are very helpful and from my pov reflect a very clean and reliable approach :)

Zac-HD · 2020-04-12T12:29:35Z

Aha, so you propose to connect format strategies with jsonschema's registered checks, right?

Yep, that's right!

Zac-HD · 2020-04-14T14:27:35Z

Untested prototype: master...custom-formats - @Stranger6667 I'd love to know how this would work for you!

Stranger6667 · 2020-04-15T17:36:46Z

@Zac-HD Seems excellent to me :)

Zac-HD · 2020-07-15T12:46:23Z

@Stranger6667 - I've rebased, finished, and tested this 🎉

Can you pip install -U git+https://github.com/Zac-HD/hypothesis-jsonschema@custom-formats, try it out, and let me know how it goes? I don't really have a use-case, but if it works for you I'll ship it 😄

Stranger6667 · 2020-07-15T14:36:38Z

Hi @Zac-HD !

Thank you for working on this! I am thinking about a couple of use cases where additional jsonschema check might not be handy

Non-Python app testing. Usually, the user spins up the app and tests it via e.g. schemathesis.from_uri("http://app/schema.yaml") or CLI.

In this case, there is no dependency on jsonschema on the app side and to avoid InvalidArgument error it will require writing some extra check and wrapping it with e.g. draft4_format_checker.checks decorator. Also, depending on the original check code (which might be in e.g. Java) it might require a re-implementation in Python (as I see the _get_format_filter implementation) to get the validated output. And since such checks might be quite complex I am concerned that it might actually make usage of hypothesis-jsonschema / schemathesis harder, than before. I also assume that if the app under test is written in some other language, then Python might be not the language of primary expertise which also implies some additional complexity for the users.

Apps that don't use jsonschema.

Even though jsonschema is the most popular package for JSON Schema validation there are some alternatives. For example fastjsonschema (supports custom formats) and jsonschema-rs (doesn't support custom formats yet). My original case was tied to connexion which uses jsonschema, but for other frameworks, it is not the case (e.g. rororo plans to use jsonschema-rs). Bringing jsonschema checks will add a necessity for these users to implement additional code to run tests.

In general, I like the idea of extra validation, but it seems to me that it might negatively affect the experience of the users outside of connexion-like use case :( Maybe it could be optional somehow?

Zac-HD · 2020-07-15T14:44:29Z

That makes sense. Let's relax the requirement a little: still use the registered checker if there is one, but allow custom formats without a registered checker too.

Stranger6667 · 2020-07-15T14:45:33Z

Sounds awesome! :)

Zac-HD · 2020-07-16T02:59:42Z

Alright! I've added the feature in 61e0cdf, and shipped it as version 0.17.0 🎉

Stranger6667 · 2020-07-16T10:47:33Z

Great, thanks! I will take a look at integrating the new version into Schemathesis :)

Zac-HD mentioned this issue Dec 19, 2019

Change hypothesis_jsonschema private module name schemathesis/schemathesis#337

Merged

Zac-HD mentioned this issue Apr 11, 2020

Support or ignore extended formats schemathesis/schemathesis#481

Closed

Zac-HD closed this as completed Jul 16, 2020

Stranger6667 mentioned this issue Jul 26, 2020

Integrate new hypothesis-jsonschema API for user-defined strategies schemathesis/schemathesis#684

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support user-defined `format` in string schemas #29

Support user-defined `format` in string schemas #29

Zac-HD commented Dec 19, 2019

Stranger6667 commented Apr 11, 2020

Zac-HD commented Apr 12, 2020

Stranger6667 commented Apr 12, 2020

Zac-HD commented Apr 12, 2020

Zac-HD commented Apr 14, 2020 •

edited

Loading

Stranger6667 commented Apr 15, 2020

Zac-HD commented Jul 15, 2020

Stranger6667 commented Jul 15, 2020

Zac-HD commented Jul 15, 2020

Stranger6667 commented Jul 15, 2020

Zac-HD commented Jul 16, 2020

Stranger6667 commented Jul 16, 2020

Support user-defined format in string schemas #29

Support user-defined format in string schemas #29

Comments

Zac-HD commented Dec 19, 2019

Stranger6667 commented Apr 11, 2020

Zac-HD commented Apr 12, 2020

Stranger6667 commented Apr 12, 2020

Zac-HD commented Apr 12, 2020

Zac-HD commented Apr 14, 2020 • edited Loading

Stranger6667 commented Apr 15, 2020

Zac-HD commented Jul 15, 2020

Stranger6667 commented Jul 15, 2020

Zac-HD commented Jul 15, 2020

Stranger6667 commented Jul 15, 2020

Zac-HD commented Jul 16, 2020

Stranger6667 commented Jul 16, 2020

Support user-defined `format` in string schemas #29

Support user-defined `format` in string schemas #29

Zac-HD commented Apr 14, 2020 •

edited

Loading