-
-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How do we handle address components? #173
Comments
First of all, I hope my summary was an accurate and unbiased one before I launch over into debate mode. If not, please edit directly. I firmly believe we want to be able to combine The ability to combine these is particularly useful. For instance, if a particular datset is known to be from a particular local government or restricted to a certain country (like most postal mail not explicitly mentioning a country), being able to filter down to that level (even if the input text duplicates it) is exceptionally useful, but it's very much a scoping of the data, though the individual elements of that scoping are up for interpretation (must be searched for, unlike any of the other scope queries, which are explicit and cannot be misinterpreted). This is how CartoDB, with many of their queries, can direct its geocoder to more refined results by doing the coarse parsing already. So it's scoping. But it's up for interpretation since each element must be searched for, whereas the rest of the scope queries are impossible to mis-interperet (extent geometries, country codes). At the same time, it's entirely possible to use this syntax to construct a completely valid search result (boiling down to a venue or address). So it's search. I know this leaves us in a tough spot. It would mean addresses can scope the results like Categories (#128), but can also serve as the search query itself. It's also like an expansion of the Country Codes (pelias/pelias#101) It's weird and challenging, but can also serve as one of the most effective ways for batch geocoders and power users to fine tune their results by removing ambiguity while still constructing a search query. I think it's one of those things that makes us deeply competitive (as I go and look for who it makes us competitive with, but that's for another comment). |
UPDATE: Now being tracked in pelias/pelias#213
In some cases, a user of the API may have already segmented parts of the query. Examples include:
We agree that these address components should be given a common namespace so it's clearly identifiable where these separate elements are being used. This breaks down as:
addr.name
| corporate or personal name of entity/venueaddr.number
| house number (for lack of a better term, sometimes this may not be an integer, but the individually identifiable element of the address from the street)addr.street
| street!addr.city
| locality (colloquial admin2 / metro-area / sub-locality)addr.postal_code
| postal code / postcodeaddr.state
| region, province, state (colloquial admin1)addr.country
| country (colloquial admin0)We would like to support when users have these components but are at an impasse as how to consider them.
One option is to consider a componentized address in lieu of
text
in the SCORE, so that if you have a componentized input, you would use that in lieu of free-form text, overriding any address parsing that we would do. Each of the components would still have to be searched for individually, and thus are directly impacting the score of the results.One other option is to consider the address elements alongside a user's
text
input, enabling a user to blend a known address component with a freeformtext
query so that the known components may be semantically expressed (overriding the address parser) while the unstructured components (from the user) goes through the traditional search pipeline.The first option recognizes that by explicitly stating an address' components AND freeform text, there's an opportunity for one parameter to contradict the other (example
text=mapzen 30 west 20th st, New York, NY
andaddr.street=west 26th street
). It'll probably happen a lot.The second option recognizes that often certain elements are known about a dataset (through a column, geoparsing, prior knowledge) but that there might be some semi-unstructured parts of a search that the user wants us to handle on their behalf.
These should be weighed to decide how this is handled.
The text was updated successfully, but these errors were encountered: