Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flight split based on condition #457

Merged
merged 2 commits into from
Jul 12, 2024
Merged

Flight split based on condition #457

merged 2 commits into from
Jul 12, 2024

Conversation

xoolive
Copy link
Owner

@xoolive xoolive commented Jul 9, 2024

No description provided.

@junzis
Copy link
Collaborator

junzis commented Jul 11, 2024

Can we get this method into LazyTraffic?

For example, the following code through error: Method 'split' not implemented on Traffic or LazyTraffic

t = (
    t.iterate_lazy(iterate_kw=dict(by="6 hours"))
    .assign_id(name="{self.callsign}_{idx:>04}")
    .filter()
    .longer_than("1 hour")
    .pipe(enough_data)
    .split("30min", condition=no_split_enroute)
    .pipe(full_flight)
)

@xoolive
Copy link
Owner Author

xoolive commented Jul 12, 2024

The proper way to add a method to LazyTraffic is to add a decorator to a method without an implementation (use the ellipsis ...) in the Traffic class, like everything below this line:

@lazy_evaluation()

Note that the decorator will look for a method in Flight with the same name:

if not hasattr(Flight, f.__name__):

Lazy methods

At the moment only two main categories of Flight methods are supported:

  • those returning a boolean: if True, the input flight is passed as is to the next method, if False, the processing for the current flight stops;
  • those returning an Optional[Flight]: if None, the processing for the current flight stops, if a Flight, the output flight is passed to the next method on the stack.
  • There is also summary() which returns a dictionary but it can only be used at the end of the stack: eval() will create a pd.DataFrame out of all the records. There is nothing in the code to ensure that no other method is stacked after it.

The most important category that is not supported as is are methods returning a FlightIterator: since there can be many segments returned, this would break the linearity of the stack.

Instead of

flight:
|_ method1 -> bool
|_ method2 -> None | Flight
|_ method3 -> None | Flight

It could create a tree of executions, with possible RecursionError associated, and without the possibility to merge branches later:

flight
|_ method1 -> bool
|_ method2 -> FlightIterator
   |_ [segment1]
   |  |_ method3 -> FlightIterator
   |     |_ ...
   |_ [segment2]
   |  |_ method3 -> FlightIterator
   |     |_ ...

A common use case with FlightIterator however is to use an aggregator right behind (i.e. a function FlightIterator -> None | bool | Flight)

That pattern is supported in LazyTraffic, with many aggregators such as has, next, or all, they also support methods being passed as a string to avoid passing lambdas which cannot be serialized (and multiprocessed), that's why instead of:

flight.aligned_on_ils("EHRD").next()  # None | Flight
traffic.aligned_on_ils("EHRD").next().eval()  # INVALID code
        ^__ returns a FlightIterator
                               ^__ brings the FlightIterator back to a Flight

You would write:

traffic.next('aligned_on_ils("EHRD")').eval()
        ^__ next returns an Optional[Flight]

On your example

t = (
    t.iterate_lazy(iterate_kw=dict(by="6 hours"))
    .assign_id(name="{self.callsign}_{idx:>04}")  # you assign a flight_id
    .filter()
    .longer_than("1 hour")
    .pipe(enough_data)
    .split("30min", condition=no_split_enroute)  # this returns a FlightIterator
    .pipe(full_flight)
)

The problem here that needs clarification is that when you split the trajectories, all the pieces will have the same flight_id. I am not sure if it is what you want. You may have wanted to put the assign_id() at the end of the stack? It feels error-prone though. I am also not sure what the full_flight method does.

That's the reason I am not very sure if I want to open the Pandora box with lazy flight iterators, because of this kind of situation.

Workaround

There is however an example of a workaround in the following example, with the all() aggregator, followed by a .drop(columns=['flight_id']) or here where a new flight_id in generated in the all() aggregator.

For your situation, I am not sure:

  • I am not sure how you want to apply the full_flight function? (are the types valid in the first place?)
  • I haven't checked whether .all('split("30min", condition=no_split_enroute)') would work. It would be probably relevant to implement it if it doesn't.

@junzis
Copy link
Collaborator

junzis commented Jul 12, 2024

This is what I came up with in the end. Not pretty, but it work well.

def no_split_enroute(f1: Flight, f2: Flight) -> bool:
    return f1.data.iloc[-1].altitude < 25000 or f2.data.iloc[0].altitude < 25000


def resample_filter_split(flight: Flight) -> None | Flight:
    flight_ = Traffic.from_flights(
        [f.resample("2s").filter() for f in flight.split("10min")]
    )[0]  # 🌶️  I would recommend writing it Flight(pd.concat([...])) to avoid confusion with the [0] here

    return flight_.split("30min", condition=no_split_enroute).all(
        "{self.flight_id}_{i}"
    )  # 🌶️ This would be a perfect use case for flight_i.all('split("30min", condition=no_split_en_route)', flight_id="{self.flight_id}_{i}")

def full_flight(flight: Flight) -> None | Flight:  # 🌶️  return a bool
    if (  # 🌶️ replace 'if' with 'return'
        flight.data.altitude.iloc[0] < 5000
        and flight.data.altitude.iloc[-1] < 5000
        and flight.altitude_max > 15000
    ):
        return flight  # 🌶️ remove
    return None  # 🌶️  remove

 t = (
      t.iterate_lazy(iterate_kw=dict(by="6 hours"))
      .assign_id(name="{self.callsign}_{idx:>04}")
      .filter()
      .longer_than("1 hour")
      .pipe(resample_filter_split)
      .pipe(full_flight)
      .eval(max_workers=8, desc="processing")
  )

@xoolive
Copy link
Owner Author

xoolive commented Jul 12, 2024

Commenting directly in your code above

@xoolive
Copy link
Owner Author

xoolive commented Jul 12, 2024

You can now replace

flight_ = Traffic.from_flights(
        [f.resample("2s").filter() for f in flight.split("10min")]
    )[0] 

with

flight_ = flight.split('10min').map(lambda f: f.resample('2s').filter()).all()

@junzis
Copy link
Collaborator

junzis commented Jul 12, 2024

Yes, it works! I think it is good to be merged 👍

@xoolive xoolive merged commit cd03e2f into master Jul 12, 2024
4 checks passed
@xoolive xoolive deleted the pr-split-condition branch July 12, 2024 13:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants