-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use a serialization library #1092
Comments
FYI, I started on https://github.com/TomAugspurger/msgspec-stac/blob/main/msgspec_stac.py a few weeks back but haven't done anything with it since (using msgspec). It seemed pretty straightforward, but I had a few questions:
In theory pydanic v2 is much faster (though still slower than msgspec last I saw). |
I did a dead-simple msgspec implementation myself just now using For more context, this is how it's done in Rust: https://github.com/gadomski/stac-rs/blob/dfaaabc00f581af3d6b948ee3de24f4b68e5acdd/stac/src/item.rs#L66-L68 |
There's a pydantic-stac implementation here https://github.com/stac-utils/stac-pydantic which doesn't seem really active, but I still made a PR to migrate it to pydantic v2 in the hope it'd be useful at some point. |
@thomas-maschler did some ad-hoc benchmarking in radiantearth/stac-spec#1252 (comment) and found pydantic to be slower than than pystac in the deserialization case. |
I would be very curious about where the time difference are coming from. All things being the same, I would definitely expect Pydantic should be faster since most of the work is happening in Rust and not Python. If I had to take a guess, it may be related to pystac validation being optional vs being built in with pydantic. There may be some ways around that as well as ways to improve performance vs those benchmarks. It could be very beneficial to go that route, if the performance is acceptable, as it would supersede |
bumping this! the code complexity for pystac is pretty high and when implementing the MLModel extension, pydantic v2 has felt more approachable. is pydantic 2 or another serialization library an option for pystac v2? |
Could be, however no one (that I know of) is currently working on a pystac v2 at the moment. |
so https://github.com/stac-utils/stac-pydantic already exists - but I am new and I know there are some functionality gaps. Would it be a high value / good first project for me to look at adding this? we could eventually support multiple serialization formats, defaulting to stac-pydantic - that would be powerful. |
You're of course welcome to poke at it, but I'd say this is a pretty complex first ticket -- you'll end up touching most of the codebase. I'll note too that some of the performance considerations discussed in this issue might be resolved by #1434. Zooming out, I think that we could pivot towards simply interoperability between pystac and stac-pydantic so folks can use both in the same ecosystem more seamlessly. |
In PySTAC, we do a lot of work converting STAC objects to and from JSON (
to_dict/from_dict
) and that process can be error prone and tricky to maintain. As discussed in #1047, the use of a (de)serialization could:We should explore what it would take to add a serialization library as a dependency and convert our data structures to use it if it makes sense.
Two options (there may be more):
We probably don't want to use pydantic unless it's gotten a lot faster since last checked.
Downsides
By adding a dependency on a serialization library, we move away from our "light, few/no dependencies" model that we've been operating in for v1. Hopefully we can reduce our own code complexity by offloading some work to the serialization library, but we should ensure that the juice is worth the squeeze.
The text was updated successfully, but these errors were encountered: