How do we handle address components? #173

riordan · 2015-07-28T19:06:09Z

UPDATE: Now being tracked in pelias/pelias#213

In some cases, a user of the API may have already segmented parts of the query. Examples include:

pre-structured form fields
geoparsing to detect available components
known scope of a particular dataset

We agree that these address components should be given a common namespace so it's clearly identifiable where these separate elements are being used. This breaks down as:

addr.name | corporate or personal name of entity/venue
addr.number | house number (for lack of a better term, sometimes this may not be an integer, but the individually identifiable element of the address from the street)
addr.street | street!
addr.city | locality (colloquial admin2 / metro-area / sub-locality)
addr.postal_code | postal code / postcode
addr.state | region, province, state (colloquial admin1)
addr.country | country (colloquial admin0)

We would like to support when users have these components but are at an impasse as how to consider them.

One option is to consider a componentized address in lieu of text in the SCORE, so that if you have a componentized input, you would use that in lieu of free-form text, overriding any address parsing that we would do. Each of the components would still have to be searched for individually, and thus are directly impacting the score of the results.

One other option is to consider the address elements alongside a user's text input, enabling a user to blend a known address component with a freeform text query so that the known components may be semantically expressed (overriding the address parser) while the unstructured components (from the user) goes through the traditional search pipeline.

The first option recognizes that by explicitly stating an address' components AND freeform text, there's an opportunity for one parameter to contradict the other (example text=mapzen 30 west 20th st, New York, NY and addr.street=west 26th street). It'll probably happen a lot.

The second option recognizes that often certain elements are known about a dataset (through a column, geoparsing, prior knowledge) but that there might be some semi-unstructured parts of a search that the user wants us to handle on their behalf.

These should be weighed to decide how this is handled.

The text was updated successfully, but these errors were encountered:

riordan · 2015-07-28T20:32:12Z

First of all, I hope my summary was an accurate and unbiased one before I launch over into debate mode. If not, please edit directly.

I firmly believe we want to be able to combine addr components AND text inputs with addr components capable of affecting both the scope AND score.

The ability to combine these is particularly useful.

For instance, if a particular datset is known to be from a particular local government or restricted to a certain country (like most postal mail not explicitly mentioning a country), being able to filter down to that level (even if the input text duplicates it) is exceptionally useful, but it's very much a scoping of the data, though the individual elements of that scoping are up for interpretation (must be searched for, unlike any of the other scope queries, which are explicit and cannot be misinterpreted). This is how CartoDB, with many of their queries, can direct its geocoder to more refined results by doing the coarse parsing already. So it's scoping. But it's up for interpretation since each element must be searched for, whereas the rest of the scope queries are impossible to mis-interperet (extent geometries, country codes).

At the same time, it's entirely possible to use this syntax to construct a completely valid search result (boiling down to a venue or address). So it's search.

I know this leaves us in a tough spot. It would mean addresses can scope the results like Categories (#128), but can also serve as the search query itself. It's also like an expansion of the Country Codes (pelias/pelias#101)

It's weird and challenging, but can also serve as one of the most effective ways for batch geocoders and power users to fine tune their results by removing ambiguity while still constructing a search query. I think it's one of those things that makes us deeply competitive (as I go and look for who it makes us competitive with, but that's for another comment).

riordan added this to the Pelias v1.0.0 milestone Jul 28, 2015

riordan added the question label Jul 28, 2015

riordan added the v1 label Aug 13, 2015

riordan removed this from the Pelias v1.0.0 milestone Mar 10, 2016

dianashk removed the v1 label Mar 10, 2016

riordan added the duplicate label Mar 10, 2016

riordan closed this as completed Mar 10, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do we handle address components? #173

How do we handle address components? #173

riordan commented Jul 28, 2015

riordan commented Jul 28, 2015

How do we handle address components? #173

How do we handle address components? #173

Comments

riordan commented Jul 28, 2015

riordan commented Jul 28, 2015