Releases: pelias/api
v3.1.0
v3.0.2
v3.0.1
Address Parsing
Address parsing is huge for an address geocoder and this release takes a first crack at it using AddressIt module. AddressIt is a freeform street address parser, that is designed to take a piece of text and convert that into a structured address that can be processed in different systems.
> var addressit = require('addressit')
> addressit('123 main st new york ny 10010 usa')
{ text: '123 main st new york ny 10010 usa',
parts: [],
unit: undefined,
number: 123,
street: 'main st',
state: 'NY',
country: 'USA',
postalcode: 10010,
regions: [ 'new york' ] }
Before the pelias API calls addressit for address parsing, it does some basic checks by parsing query to ensure that we dont slow things down drastically when unnecessary for example the following are the cases where we dont need address parsing -
input=a
orinput=au
orinput=aus
- if the input has 3 or less characters, we could assume its not a fully formed address, in fact - we can go one step further by only targeting admin layers because if we return results such asaustin
,australia
etc it should be relevant but more importantly fast.input=boston
orinput=frankfurt
orinput=somereallybigname
orinput=new york
- if the input is just one or even two tokens and does not contain a number - we can get away with just targetingadmin
andpoi
layers
In all other cases, we do address parsing and handle the address parts to query the ES index. Here's a sample mapping
number + street -> name.default
number -> address.number
street -> address.street
postalcode -> address.zip
state -> admin1_abbr
country -> alpha3
regions -> admin2
Sometimes, the address parser comes back empty handed
> addressit('123 chelsea, london')
{ text: '123 chelsea, london',
parts: [],
unit: undefined,
state: undefined,
country: undefined,
postalcode: undefined,
regions: [ '123 chelsea', 'london' ] }
In this case, we take fall back to the naive approach we implemented months ago - where we split the address based on a comma and assume everything that follows the comma is an admin part and add a match
block in the should
array. So, we query name.default
with 123 chelsea
and the should
array in the query would try to match london
with all the 5 admin fields
admin0
admin1
admin1_abbr
admin2
alpha3
All of this logic lives in helper/query_parser.js
and is well documented with in-line comments. The query changes can be seen in query/search.js
.
An additional 104 test cases were written to test out all the above mentioned logic and to test the query building - bringing the grand total of unit tests for the API to 708!
Deleting code is so much fun
Code cleanup - deleted all suggester related code (843 deletions) FTW!
Tech Debt - Better 408/500 error handling
Minor cleanup -> minor speedup
Minor cleanup, minor speedup and minor performance improvement - brought to you by:
- removed exact_match script
- increased search radius to 500kms
NGRAMS
This release is a big one, we are using ngrams to analyze/tokenize & are officially moving away from using the context suggester that is memory intensive and wasn't letting us build an autocomplete suggester on a global scale
Some major Features:
- partial matching using the ngrams approach ftw! https://www.elastic.co/guide/en/elasticsearch/guide/current/_ngrams_for_partial_matching.html
- better support for geohashes https://github.com/pelias/schema/blob/ngram/mappings/partial/centroid.js
- explicit definitions of how field data is to be stored
- improved punctuation https://github.com/pelias/schema/blob/ngram/punctuation.js
- improved synonyms: https://github.com/pelias/schema/blob/ngram/street_suffix.js
category scoring, bbox format change, fix JSONification bug
Better index page and details parameter
This release includes the following improvements:
- All endpoints support a
details
parameter. Whendetails=true
all available properties are returned. Whendetails=false
onlyid
,layer
, andtext
are returned. /reverse
endpoint supports a parameter calledcategories
, which allows reverse searching for POI's of specified types. The categories feature is inALPHA
and will probably change in the near future.- API index page shows API documentation for all available endpoints when requested with
Accepts:html
header. /suggest/coarse
endpoint has been fixed to only useadmin
layers.- Cleaner runtime error reporting and a few other minor things.
General Enhancements and Bug fixes
This release fixes a few bugs and adds a couple of new features/ enhancements
- New: Ability to search/filter on localities and local_admin layers
&layers=locality
- New: API now uses the optimized bbox query which cuts down the response times by one third.
- Bug Fix: Filtering by layer is now working on all endpoints (/suggest, /search, /reverse)
- Bug Fix: No Internal server errors when you pass in lat=0 and lon=0