Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement session time chopping job #103

Open
ZavenArra opened this issue May 4, 2024 · 4 comments
Open

Implement session time chopping job #103

ZavenArra opened this issue May 4, 2024 · 4 comments
Assignees

Comments

@ZavenArra
Copy link
Contributor

ZavenArra commented May 4, 2024

The session time chopping job computes the session_segment for each raw_capture in a session.

The algorithm is as follows:

  1. Get the next unprocessed session by querying for a session with processed_at = NULL and a created_at timestamp of more than 24 hours in the past
  2. Order the raw captures by captured_at
  3. Update the session to set session.start_time to the value lowest value of captured_at in the set of raw captures.
  4. Iterate through the set of raw captures in order by captured_at sequentially using the following algorithm, and update the session_segments_table

algorithm:

current_session_segment = session_segment.new()
first_raw_capture.session_segment_id = current_session_segment.id
current_session_segment.starts_at = first_raw_capture.captured_at
previous_raw_capture = first_raw_capture
for each raw_capture after first_raw_capture as current_raw_capture 
      time_distance = current_raw_capture.captured_at - last_raw_capture.captured_at
      location_distance =  ST_DISTANCE(current_raw_capture.location, previous_raw_capture.location)
      if time_distance > 2 hours OR location_distance > 100m
             // chop the session
             current_session_segment.ends_at = last_raw_capture.captured_at
             current_session_segment = session_segment.new()
             current_session_segment.starts_at = current_raw_capture.captured_at
      current_raw_capture.session_segment_id = current_session_segment.id
      previous_raw_capture = current_raw_capture
current_session_segment.ends_at = last_raw_capture.captured_at

  1. When the process completes, session.processed_at should updated to the current timestamp.
@Kpoke
Copy link
Collaborator

Kpoke commented Nov 20, 2024

@ZavenArra Noticed that processed_at does not exist on the session table but it does on the session_segment table. Should I add a processed_at column to the session table or do we modify the 1st step above to

Get the next unprocessed session by querying for a session not linked to a session_segment and a created_at timestamp of more than 24 hours in the past

@Kpoke
Copy link
Collaborator

Kpoke commented Nov 20, 2024

session_id on the session_segment table should be unique right?

@ZavenArra
Copy link
Contributor Author

A session can have one or more session segments, so session_id will NOT be unique on the session_segment table (if I understand your question correctly). The idea is to chop a session into segments based on some criteria.

Yeah processed_at should be added to session, you are right. Once a sessions is finished being processed into segments, this is set to the current time. Maybe it makes sense to be more specific though, in case we add other steps to the pipeline. Something like processed_segements_at

@Kpoke
Copy link
Collaborator

Kpoke commented Nov 20, 2024

Okay noted. That's clear. Also for step 3 should that be session_segment.start_time or should I add start_time to the session table?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

2 participants