-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Automate location extraction and english translation #642
Merged
Changes from 21 commits
Commits
Show all changes
24 commits
Select commit
Hold shift + click to select a range
355a442
feat: renaming functions
cka-y 450e63d
feat: extracted bb logic to separate module
cka-y 62d63aa
feat: db changes
cka-y 3021072
feat: avoid overwriting the locations
cka-y 1e61741
feat: n points extraction for location computation
cka-y 9b5419b
fix: infra script
cka-y e8f4ce7
fix: infra script
cka-y fa54c0d
feat: added reverse geolocation
cka-y e2fee2b
feat: added location extraction
cka-y 39b2d99
Merge branch 'main' into feat/618
cka-y e5a9cb7
feat: changing multiple countries logic
cka-y d8bed0e
feat: added location translation as part of the pipeline
cka-y 4fbed1f
feat: added english translation
cka-y 58828cb
fix: failing test
cka-y daaa819
test: ui build with changes
cka-y c6632ef
fix: region bug + clean up
cka-y 90cb372
Merge branch 'main' into feat/618
cka-y 593045d
Merge branch 'main' into feat/618
cka-y 9ce9b7f
fix: search feed only uses first location
cka-y 4260fa8
fix: search feed only uses first location
cka-y 243d9a2
fix: provider and feed name added back to document
cka-y f3bce05
fix: commenting out location filtering integration tests
cka-y 6c5b768
fix: docker compose issue
cka-y aa3f12c
fix: removed sql header
cka-y File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
## Function Workflow | ||
|
||
1. **Eventarc Trigger**: The original function is triggered by a `CloudEvent` indicating a GTFS dataset upload. It parses the event data to identify the dataset and calculates the bounding box and location information from the GTFS feed. | ||
|
||
2. **Pub/Sub Triggered Function**: A new function is triggered by Pub/Sub messages. This allows for batch processing of dataset extractions, enabling multiple datasets to be processed in parallel without waiting for each one to complete sequentially. | ||
|
||
3. **HTTP Triggered Batch Function**: Another function, triggered via HTTP request, identifies all latest datasets lacking bounding box or location information. It then publishes messages to the Pub/Sub topic to trigger the extraction process for these datasets. | ||
|
||
4. **Data Parsing**: Extracts `stable_id`, `dataset_id`, and the GTFS feed `url` from the triggering event or message. | ||
|
||
5. **GTFS Feed Processing**: Retrieves bounding box coordinates and other location-related information from the GTFS feed located at the provided URL. | ||
|
||
6. **Database Update**: Updates the bounding box and location information for the dataset in the database. | ||
|
||
## Expected Behavior | ||
|
||
- Bounding boxes and location information are extracted for the latest datasets that are missing them, improving the efficiency of the process by utilizing both batch and individual dataset processing mechanisms. | ||
|
||
## Function Configuration | ||
|
||
The functions rely on the following environment variables: | ||
- `FEEDS_DATABASE_URL`: The database URL for connecting to the database containing GTFS datasets. | ||
|
||
## Local Development | ||
|
||
Local development of these functions should follow standard practices for GCP serverless functions. For general instructions on setting up the development environment, refer to the main [README.md](../README.md) file. |
4 changes: 2 additions & 2 deletions
4
...ns-python/extract_bb/function_config.json → ...hon/extract_location/function_config.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
File renamed without changes.
File renamed without changes.
42 changes: 42 additions & 0 deletions
42
functions-python/extract_location/src/bounding_box/bounding_box_extractor.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
import numpy | ||
from geoalchemy2 import WKTElement | ||
|
||
from database_gen.sqlacodegen_models import Gtfsdataset | ||
|
||
|
||
def create_polygon_wkt_element(bounds: numpy.ndarray) -> WKTElement: | ||
""" | ||
Create a WKTElement polygon from bounding box coordinates. | ||
@:param bounds (numpy.ndarray): Bounding box coordinates. | ||
@:return WKTElement: The polygon representation of the bounding box. | ||
""" | ||
min_longitude, min_latitude, max_longitude, max_latitude = bounds | ||
points = [ | ||
(min_longitude, min_latitude), | ||
(min_longitude, max_latitude), | ||
(max_longitude, max_latitude), | ||
(max_longitude, min_latitude), | ||
(min_longitude, min_latitude), | ||
] | ||
wkt_polygon = f"POLYGON(({', '.join(f'{lon} {lat}' for lon, lat in points)}))" | ||
return WKTElement(wkt_polygon, srid=4326) | ||
|
||
|
||
def update_dataset_bounding_box(session, dataset_id, geometry_polygon): | ||
""" | ||
Update the bounding box of a dataset in the database. | ||
@:param session (Session): The database session. | ||
@:param dataset_id (str): The ID of the dataset. | ||
@:param geometry_polygon (WKTElement): The polygon representing the bounding box. | ||
@:raises Exception: If the dataset is not found in the database. | ||
""" | ||
dataset: Gtfsdataset | None = ( | ||
session.query(Gtfsdataset) | ||
.filter(Gtfsdataset.stable_id == dataset_id) | ||
.one_or_none() | ||
) | ||
if dataset is None: | ||
raise Exception(f"Dataset {dataset_id} does not exist in the database.") | ||
dataset.bounding_box = geometry_polygon | ||
session.add(dataset) | ||
session.commit() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be validated as part of #623