-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Automate location extraction and english translation #642
Conversation
Preview Firebase Hosting URL: https://mobility-feeds-dev--pr-642-u6p0n28u.web.app |
@@ -117,6 +117,11 @@ def populate_location(self, feed, row, stable_id): | |||
""" | |||
Populate the location for the feed | |||
""" | |||
# TODO: validate behaviour for gtfs-rt feeds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be validated as part of #623
@cka-y From a QA perspective, this looks great! (Solves the core problem we were trying to fix, where people can input full country name in either original language or English, accents or no accents). Much better user behaviour that I look forward to telling our usability testers we incorporated! Couple outstanding questions: I see a few examples of countries that when I include the original language name vs. the English, I get a different number of results: Is this because the location data in the search UI is inaccurate? Or another reason? In these cases, it looks like the feeds are all based in the same country, so it's not because of text in the feed name or transit provider that matches where the location does not. |
@cka-y - Got it. If I understand this correctly, basically this is occurring either due to 1) issues with parsing the location because of the feed, whether it be missing data or size or 2) changes we still need to make to the API side? I'm fine living with this as is for now. |
I've done a deeper dive to understand the differences in search results. For The problem arose with feeds containing multiple locations (e.g., feeds covering several countries including I've now fixed this issue! Both |
We should comment the locations integration test, as (I believe ) it will block the production content updates from the catalog repository |
@davidgamez @cka-y Does that mean that merging this is blocked until #622 is done? |
Yes, this is why my suggestion is to comment/ignore the integration test for locations until the follow-up issue is completed. |
@davidgamez @emmambd I could comment the location filtering integration tests as part of this PR, merge and then generate/uncomment tests as part of #622. Thoughts? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work; I added minor non-blocking comments
task_id=task_id, | ||
index=f"{i + 1}/{len(country_codes)}", | ||
) | ||
# def test_filter_by_country_code(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
picky: this can be disabled with
@pytest.mark.skip(reason="This test is expected to fail until API location issue is implemented")
def test_filter_by_country_code(self):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we do not use pytest to run the integration tests, ill leave this commented for now but i'll address the changes as part of #622
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cka-y Is it time to put back these tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#662 needs to be addressed first
task_id=task_id, | ||
index=f"{i + 1}/{len(municipalities)}", | ||
) | ||
# def test_filter_by_municipality(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same as before, use pytest skip annotation.
task_id=task_id, | ||
index=f"{i + 1}/{len(country_codes)}", | ||
) | ||
# def test_filter_by_country_code_gtfs(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same as before, use pytest skip annotation.
task_id=task_id, | ||
index=f"{i + 1}/{len(municipalities)}", | ||
) | ||
# def test_filter_by_municipality_gtfs(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same as before, use pytest skip annotation.
task_id=task_id, | ||
index=f"{i + 1}/{len(country_codes)}", | ||
) | ||
# def test_filter_by_country_code_gtfs_rt(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same as before, use pytest skip annotation.
task_id=task_id, | ||
index=f"{i + 1}/{len(municipalities)}", | ||
) | ||
# def test_filter_by_municipality_gtfs_rt(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same as before, use pytest skip annotation.
Summary:
Closes #618, which includes adding translations for location names (country, subdivision, and municipality) to the
FeedSearch
materialized view and improving the functionality for geocoding locations from GTFS feeds.Changes include:
Database Schema Changes:
country
column to theLocation
table.Translation
table withtype
,language_code
,key
, andvalue
columns to store translations for various location elements.FeedSearch
materialized view to incorporate translations forcountry
,subdivision_name
, andmunicipality
into the searchable document field.Backend Logic:
GeocodedLocation
class to include methods for handling location extraction and translations.update_location
function to integrate location translations into the database.Testing:
Expected behavior:
The system should now support searching feeds using English-translated names for countries, subdivisions, and municipalities. When a feed has associated locations with translations available in the
Translation
table, these translations will be included in the search index, enabling users to find feeds using either the original or translated location names. This change aims to improve the searchability of feeds for users who might use different languages.Feed locations are also now automatically extracted from reverse geolocating using five points from the dataset: the extreme points (the ones with extreme lat/lon which give four points but can be less if one point represents two extremes) and the point in stops.txt closest to the center of the bounding box. Additional points are randomly selected to complete the count of five. The decision on the subdivision or municipality is based on majority voting. If there's no majority at the subdivision level, the country level is included, and multiple countries are included if necessary.
Testing tips:
Use the PR preview URL to search for locations. Example tests:
Please make sure these boxes are checked before submitting your pull request - thanks!
./scripts/api-tests.sh
to make sure you didn't break anything