Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with "Flight. split" method, likely related to "PyArrow-backed" DataFrame #466

Closed
iavrekh opened this issue Oct 23, 2024 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@iavrekh
Copy link

iavrekh commented Oct 23, 2024

Fresh install of the Traffic library v. 2.10.2 in a separate conda environment (from conda-forge).
Contains:
Python 3.12.7, numpy 1.26.4, pandas 2.2.3, trino-python-client 0.330.0
pyarrow, pyarrow-core 17.0.0

"_split" method in flight.py doesn't work correctly

diff = data.timestamp.diff().values

max_ = np.nanmax(diff)

The last statement returns max_= NaT, most likely, due to timestamp values having "timestamp[ns, tz=UTC][pyarrow]" type
and, as the result, "diff" values being of "duration[ns][pyarrow]" type.

@iavrekh iavrekh added the bug Something isn't working label Oct 23, 2024
@xoolive
Copy link
Owner

xoolive commented Oct 25, 2024

Could you please provide the code for a failing example please?

@iavrekh
Copy link
Author

iavrekh commented Oct 28, 2024

Code to reproduce the issue attached:
SplitTestA.py - using "pickled" Traffic object (file TwoFlightsFromOpenSky.pkl) containing two flights "excerpted"
from the set of flights in Terminal Maneuvering Area (TMA) around KPWM (Portland International Jetport, Portland, Maine,USA)
on 2024-07-01, downloaded from OpenSky repository. Both flights are overflights of the area by the same aircraft - they have the same icao24 value, but different callsigns. The program manipulates the second flight by making both callsigns the same.
The Traffic.iterate method fails to separate them after that despite the gap larger than 10 minutes.

SplitTestB.py - "contrasting" example using Belevingsvlucht dataset. Flight.split method works fine in this case.
If you put debug printing for "max_" in "_split" method you'll see that the value is "NaT" in the first case, but
meaningful in the second one.

Split.zip

@xoolive
Copy link
Owner

xoolive commented Oct 28, 2024

Let me start a branch to fix all these pyarrow related issues, with a related PR
Let's check it there

@iavrekh
Copy link
Author

iavrekh commented Oct 28, 2024

Thanks! I hope these small pieces of code could be useful.
Providing my real "production" code would've been quite impractical due to the size and complexity, but the data in the provided small example is "real".

xoolive added a commit that referenced this issue Oct 28, 2024
@xoolive
Copy link
Owner

xoolive commented Oct 28, 2024

Let me close the issue, let's confirm in #468 that everything works
When confirmed, I will merge into master

@xoolive xoolive closed this as completed Oct 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants