Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect altnames that are a substring of name.default #548

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 21 additions & 16 deletions stream/tag_mapper.js
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,27 @@ module.exports = function(){
return next( null, doc );
}

// Unfortunately we need to iterate over every tag,
// so we only do the iteration once to save CPU.
// handle the most likely source of name.default first
const trimmed_name = trim(tags.name);
if (trimmed_name) {
doc.setName('default', trimmed_name);
}

// check the other tags that might go into name.default second
Object.entries(NAME_SCHEMA).forEach(([key, value]) => {
if (value === 'default' && key !== 'name') {
const trimmed_value = trim(tags[key]);
if (trimmed_value) {
if (!trimmed_name ) {
doc.setName('default', trim( tags[key]));
} else if(!trimmed_name.includes(trimmed_value)) {
doc.setNameAlias('default', trim( tags[key]));
}
}
}
});

// iterate through all tags, catching any address/localized names
_.each(tags, (value, key) => {

// Map localized names which begin with 'name:'
Expand All @@ -44,20 +63,6 @@ module.exports = function(){
}
}

// Map name data from our name mapping schema
else if( _.has(NAME_SCHEMA, key) ){
var val2 = trim( value );
if( val2 ){
if( key === NAME_SCHEMA._primary ){
doc.setName( NAME_SCHEMA[key], val2 );
} else if ( 'default' === NAME_SCHEMA[key] ) {
doc.setNameAlias( NAME_SCHEMA[key], val2 );
} else {
doc.setName( NAME_SCHEMA[key], val2 );
}
}
}

// Map address data from our address mapping schema
else if( _.has(ADDRESS_SCHEMA, key) ){
var val3 = trim( value );
Expand Down
25 changes: 6 additions & 19 deletions test/fixtures/combined_vancouver_queens.json
Original file line number Diff line number Diff line change
Expand Up @@ -856355,16 +856355,10 @@
"_type": "_doc",
"data": {
"name": {
"default": [
"Mountain Equipment Co-op (MEC)",
"MEC"
]
"default": "Mountain Equipment Co-op (MEC)"
},
"phrase": {
"default": [
"Mountain Equipment Co-op (MEC)",
"MEC"
]
"default": "Mountain Equipment Co-op (MEC)"
},
"address_parts": {
"number": "212",
Expand Down Expand Up @@ -892394,8 +892388,7 @@
"_type": "_doc",
"data": {
"name": {
"default": "IPOH Asian House"
},
"default": "IPOH Asian House" },
"phrase": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-issue: this is weirdly indented

Copy link
Member Author

@orangejulius orangejulius Feb 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, oops! My fault! I had a try at editing the fixture manually since there were only a few changes.

Should have left it to the machines.

Copy link
Member

@missinglink missinglink Feb 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha, it's really not an issue, I just noticed it a few times in the PR, like the trim( tags[key])) catches my eye.

the Pelias code styling used to be more hit-and-miss, I've been using autoformatting in my editor for a while now which hopefully fixes a bunch of the little ones.

at some point I'd still love to fully adopt standardJS.

"default": "IPOH Asian House"
},
Expand Down Expand Up @@ -892429,16 +892422,10 @@
"_type": "_doc",
"data": {
"name": {
"default": [
"On Lok Restaurant & Wun Tun House",
"On Lok"
]
"default": "On Lok Restaurant & Wun Tun House"
},
"phrase": {
"default": [
"On Lok Restaurant & Wun Tun House",
"On Lok"
]
"default": "On Lok Restaurant & Wun Tun House"
},
"address_parts": {
"number": "2010",
Expand Down Expand Up @@ -956784,4 +956771,4 @@
"bounding_box": "{\"min_lat\":49.2174915,\"max_lat\":49.2194865,\"min_lon\":-123.2018987,\"max_lon\":-123.1991481}"
}
}
]
]
32 changes: 32 additions & 0 deletions test/stream/tag_mapper.js
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,38 @@ module.exports.tests.osm_names = function(test, common) {
});
};

// Reject alt names that are a substring of the main name
module.exports.tests.substring_alt_name = function(test, common) {
var doc = new Document('a','b',1);
doc.setMeta('tags', { 'name': 'test place', 'alt_name': 'test pl' });

test('rejects - substring alt name', function(t) {
var stream = mapper();
stream.pipe( through.obj( function( doc, enc, next ){
t.deepEqual(doc.name, { default: 'test place' }, 'substring name removed');
t.end(); // test will fail if not called (or called twice).
next();
}));
stream.write(doc);
});
};

// Reject alt names that are a substring of the main name, even if they appear before the name in list of tags
module.exports.tests.substring_alt_name2 = function(test, common) {
var doc = new Document('a','b',1);
doc.setMeta('tags', { 'alt_name': 'test pl', name: 'test place'});

test('rejects - substring alt name', function(t) {
var stream = mapper();
stream.pipe( through.obj( function( doc, enc, next ){
t.deepEqual(doc.name, { default: 'test place' }, 'substring name removed');
t.end(); // test will fail if not called (or called twice).
next();
}));
stream.write(doc);
});
};

// Cover the case of a tag key being 'name:' eg. { 'name:': 'foo' }
// Not to be confused with { 'name': 'foo' } (note the extraneous colon)
module.exports.tests.extraneous_colon = function(test, common) {
Expand Down