Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add basic enum support. #82

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

Conversation

tokarenko
Copy link

Please review this pull request to close the issue of Support constants in JSON schema #81

Copy link
Owner

@wolverdude wolverdude left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this!

We'll need to add docs for this behavior. I can polish them, but if you can at least get them started, that would be appreciated.

active_strategy = self._active_strategies[0]
return active_strategy
if kind == "schema":
for special_strategy in [Enum, Typeless]:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SchemaNode shouldn't need to be modified as Enum doesn't need to (in fact shouldn't) be a special strategy. Just add it to BASIC_SCHEMA_STRATEGIES instead at the head of the list.

This will mean that enum and type will never exist in the same schema even though they technically could, but can add a disclaimer for that in the docs. You will only end up with an enum schema if you start with (or add) one, so presumably most users who try this will know what they're doing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rolled back all the changes to node.py and added Enum into the BASIC_SCHEMA_STRATEGIES. Adding the Enum at the head of the list breaks the tests. All the tests pass if the Enum is added at the tail of the list.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I realize now that I misunderstood the complexity of this. Getting enum to behave the way that I said in my earlier comments would actually break backwards compatibility. This is because there's no distinction between matching when you do vs. don't have an existing schema, and this actually would require different behavior in each case.

So your original implementation is correct, including the throwing exceptions, and possibly even designating this as a "special" strategy, though I think it's still probably cleaner just to put it at the end of the list.

Copy link
Owner

@wolverdude wolverdude Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think there is a way, but it's a bit hacky:

Basically, you create a @classmethod match_object() that always returns False. But there's another instance method _instance_match_object() that always returns True. In __init__(), you reassign self.match_object = self._instance_match_object().

That way you can get different behavior if the SchemaStrategy already exists vs. not and so there's no way an Enum will get created unless it was explicitly asked for by a schema, and this behavior is not order- or type-dependent, so you can give enum preference over other strategies and raise an error if its misused.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea. Done.

genson/schema/strategies/enum.py Show resolved Hide resolved
@@ -64,3 +64,11 @@ def test_keeps_unrecognized_properties(self):
'properties': {'a': {'type': 'boolean'}},
'patternProperties': {r'^\d$': {'type': 'integer'}},
'required': ['a']})

def test_enum(self):
self.add_schema({'type': 'object',
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could simplify this test by making enum the top-level schema.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -64,3 +64,11 @@ def test_keeps_unrecognized_properties(self):
'properties': {'a': {'type': 'boolean'}},
'patternProperties': {r'^\d$': {'type': 'integer'}},
'required': ['a']})

def test_enum(self):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll want to add another test that checks what happens when complex objects are added.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

genson/schema/strategies/enum.py Show resolved Hide resolved
genson/schema/strategies/enum.py Outdated Show resolved Hide resolved
@wolverdude
Copy link
Owner

Okay, extra credit idea. You don't have to do this, but I might add it after you're done.

A problem with this implementation is that if given a schema that contains enum and a type, one or the other will be discarded.

This could be fixed by having the SchemaStrategy hold another SchemaNode if it's given a typed schema. Then it would remove enum from the schema and pass it into that subnode and pass on any objects it receives as well. Then when it assembles its output schema, it grabs the output of that subnode and adds its enum key to it.

This might not be a good idea if we expect there to be several ways to combine different schemas like this, but as far as I can tell, this is the only such keyword. Any other combining keywords are specific to one or two schema types and don't cut across all of them like enum.

@tokarenko
Copy link
Author

It seems that I am done with all of the comments received so far. I would prefer if you could implement "enum and type" case.

Copy link
Owner

@wolverdude wolverdude left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again! I've left a few comments. They're all small and hopefully don't seem too nitpicky, but there are a couple of things that look like bugs that I wanted to point out.

test/sort.py Outdated
@@ -0,0 +1,39 @@
from sys import intern

class Py2Key:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does "Py2" come from in the name? I think it's best to use something more descriptive such as "MultiTypeSortKey"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the name as suggested.

@@ -137,4 +137,4 @@ def _get_strategy_for_(self, kind, schema_or_obj):
# no match found, raise an error
raise SchemaGenerationError(
'Could not find matching schema type for {0}: {1!r}'.format(
kind, schema_or_obj))
kind, schema_or_obj))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This codebase uses the default flake8 styles, which includes a newline at EOF. Please run flake8 to find and fix formatting issues.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed linter errors.

# and this behavior is not order- or type-dependent.
self.match_object = self._instance_match_object

@classmethod
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this also be @staticmethod (or match_schema be @classmethod) for consistency?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to @staticmethod

genson/schema/strategies/enum.py Outdated Show resolved Hide resolved
def __lt__(self, other):
try:
return self.value < other.value
except TypeError:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the case that self and other are of the same type, but that type is not sortable? Maybe we don't expect to encounter that (at least not in tests)? If so, please document this behavior.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment as suggested.

return self.typestr < other.typestr


def sort_lists_in_schema(schema, sorted_key):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is only used in one place and always uses the same sorted_key, why take it as a parameter? Why not just hardcode it and make the interface simpler?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the parameter as suggested.

if item_type in [bool, str, int, float]:
self._enum.add(item)
elif item is None:
self._enum.add("null")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, this adds a string "null", which is not the same as the original object. JSON.dumps will handle the conversion from "None" to "null" for you. Just add this to the scalars list and remove this if case.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added to scalars as suggested and refactored this code.


def test_enum_scalar_list(self):
self.add_schema({"enum": []})
self.add_object(["123", 1, 1.2, True, None])
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this raise an error since it's a list?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. Refactored this code.

self.add_schema({"enum": []})
self.add_object(["123", 1, 1.2, True, None])
self.assertResult(
{"enum": ["123", 1, 1.2, "null"]},
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To get this result, you should have to either start with this schema or add each object individually. To properly test, you should probably do a combination of the two: start with 2 or 3 of the items and then add the rest, making sure you add at least one of each type.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the test as suggested.

Copy link
Owner

@wolverdude wolverdude left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a couple suggestions left, but lgtm at this point.

Once this is merged, I'll update the readme and add a couple other fixes before releasing the next minor version.

@@ -64,3 +64,21 @@ def test_keeps_unrecognized_properties(self):
'properties': {'a': {'type': 'boolean'}},
'patternProperties': {r'^\d$': {'type': 'integer'}},
'required': ['a']})

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why this didn't occur to me before, but there should be some test that adds the same item multiple times to make sure it gets deduped.

# Add only scalar types. Technically, the JSON-Schema spec allows
# any type in an enum list, but using objects and lists is a very
# rare use-case.
if obj is not None and type(obj) not in [bool, str, int, float]:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplified:

Suggested change
if obj is not None and type(obj) not in [bool, str, int, float]:
if not isinstance(obj, (bool, str, int, float, type(None)):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants