-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extensions should coerce values to correct type #1044
Comments
I might be off in my assessment. I was looking at the https://staging-stac.delta-backend.com/collections/nceo_africa_2017/items/AGB_map_2017v0m_COG using a JSON prettifying extension and it looked like |
So the part about |
using python httpsimport httpx
r = httpx.get("https://staging-stac.delta-backend.com/collections/nceo_africa_2017/items/AGB_map_2017v0m_COG")
r.json()["properties"]["proj:epsg"]
>>> 4326.0 using curlcurl https://staging-stac.delta-backend.com/collections/nceo_africa_2017/items/AGB_map_2017v0m_COG
{...,"proj:epsg":4326.0,...}% When I use curl https://staging-stac.delta-backend.com/collections/nceo_africa_2017/items/AGB_map_2017v0m_COG | jq -c '.properties."proj:epsg"'
4326 |
Thanks that is helpful context. From googling around a bit it seems like json doesn't really have a concept of int vs float. So it is probably unreliable to depend on how the data is stored. |
json validation does not catch this either because in json schema numbers with a trailing .0 are considered valid integers: https://json-schema.org/understanding-json-schema/reference/numeric.html |
I was toying around with the idea of doing int coersion in the json_loads method itself. This is what it would look like in the native json library: import math
import json
def parse_maybe_int(obj):
result = float(obj)
decimal, whole = math.modf(result)
if decimal == 0:
result = int(whole)
return result
json.loads('{"foo": 1.0}', parse_float=parse_maybe_int) # {'foo': 1}
json.loads('{"foo": 1.1}', parse_float=parse_maybe_int) # {'foo': 1.1}
json.loads('{"foo": 1}', parse_float=parse_maybe_int) # {'foo': 1} But then I realized that there is an alternate library that is used: Before I go down the path of implementing that I'd like to get some feedback on whether this approach seems too aggressive for pystac objects. |
Just to make this explicit. I think there are two very different approaches available:
|
My instinct is option 2, option 1 feels a little to magical to me. Typing and (de)serialization is a recurring problem in PySTAC (e.g. #1047), and I think forcing extensions to define the behavior that they want is the best call -- e.g. for projection, you may want to error if |
I agree that using a dedicated library seems like a better fit. Do you think we should close this since the preferred approach is captured in #1092? FWIW, VEDA is correcting the values within the pgstac database, and there are conversations going on about changing the type of |
Sure. I think an issue specifically focused on a single extension (e.g. "Proj extension should fail if EPSG is not an integer") would be more actionable. |
Specifically non-null values in "proj:epsg" and "proj:shape" should be coerced to int. Ideally this should happen on the
properties
field itself rather than only when directly getting the prop from the extension classContext
I was debugging an issue on VEDA where epsg was being read as a float and causing some issues.
I decided that maybe there was a mismatch between the STAC version and the extension version, so I pulled the file locally (using
wget
) and changed proj ext schema version from v1.0.0 to v1.1.0This time the proj ext is understood
, but in the process of pulling the file all my int fields were converted to floats. I think this is because json doesn't really understand the difference between ints and floats.but since my STAC entry contains floats rather than ints, the value is always a float. Even when getting it directly from the extension class.The text was updated successfully, but these errors were encountered: