Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do we handle address components? #173

Closed
riordan opened this issue Jul 28, 2015 · 1 comment
Closed

How do we handle address components? #173

riordan opened this issue Jul 28, 2015 · 1 comment

Comments

@riordan
Copy link
Contributor

riordan commented Jul 28, 2015

UPDATE: Now being tracked in pelias/pelias#213

In some cases, a user of the API may have already segmented parts of the query. Examples include:

  • pre-structured form fields
  • geoparsing to detect available components
  • known scope of a particular dataset

We agree that these address components should be given a common namespace so it's clearly identifiable where these separate elements are being used. This breaks down as:

  • addr.name | corporate or personal name of entity/venue
  • addr.number | house number (for lack of a better term, sometimes this may not be an integer, but the individually identifiable element of the address from the street)
  • addr.street | street!
  • addr.city | locality (colloquial admin2 / metro-area / sub-locality)
  • addr.postal_code | postal code / postcode
  • addr.state | region, province, state (colloquial admin1)
  • addr.country | country (colloquial admin0)

We would like to support when users have these components but are at an impasse as how to consider them.

One option is to consider a componentized address in lieu of text in the SCORE, so that if you have a componentized input, you would use that in lieu of free-form text, overriding any address parsing that we would do. Each of the components would still have to be searched for individually, and thus are directly impacting the score of the results.

One other option is to consider the address elements alongside a user's text input, enabling a user to blend a known address component with a freeform text query so that the known components may be semantically expressed (overriding the address parser) while the unstructured components (from the user) goes through the traditional search pipeline.

The first option recognizes that by explicitly stating an address' components AND freeform text, there's an opportunity for one parameter to contradict the other (example text=mapzen 30 west 20th st, New York, NY and addr.street=west 26th street). It'll probably happen a lot.

The second option recognizes that often certain elements are known about a dataset (through a column, geoparsing, prior knowledge) but that there might be some semi-unstructured parts of a search that the user wants us to handle on their behalf.

These should be weighed to decide how this is handled.

@riordan riordan added this to the Pelias v1.0.0 milestone Jul 28, 2015
@riordan
Copy link
Contributor Author

riordan commented Jul 28, 2015

First of all, I hope my summary was an accurate and unbiased one before I launch over into debate mode. If not, please edit directly.

I firmly believe we want to be able to combine addr components AND text inputs with addr components capable of affecting both the scope AND score.

The ability to combine these is particularly useful.

For instance, if a particular datset is known to be from a particular local government or restricted to a certain country (like most postal mail not explicitly mentioning a country), being able to filter down to that level (even if the input text duplicates it) is exceptionally useful, but it's very much a scoping of the data, though the individual elements of that scoping are up for interpretation (must be searched for, unlike any of the other scope queries, which are explicit and cannot be misinterpreted). This is how CartoDB, with many of their queries, can direct its geocoder to more refined results by doing the coarse parsing already. So it's scoping. But it's up for interpretation since each element must be searched for, whereas the rest of the scope queries are impossible to mis-interperet (extent geometries, country codes).

At the same time, it's entirely possible to use this syntax to construct a completely valid search result (boiling down to a venue or address). So it's search.

I know this leaves us in a tough spot. It would mean addresses can scope the results like Categories (#128), but can also serve as the search query itself. It's also like an expansion of the Country Codes (pelias/pelias#101)

It's weird and challenging, but can also serve as one of the most effective ways for batch geocoders and power users to fine tune their results by removing ambiguity while still constructing a search query. I think it's one of those things that makes us deeply competitive (as I go and look for who it makes us competitive with, but that's for another comment).

@riordan riordan added the v1 label Aug 13, 2015
@riordan riordan removed this from the Pelias v1.0.0 milestone Mar 10, 2016
@dianashk dianashk removed the v1 label Mar 10, 2016
@riordan riordan closed this as completed Mar 10, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants