-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: assess if fuzzyjoin may simplify/enhance the implementation of match_name
#302
Comments
Very cool. |
A word of caution, faster is not always better. The first example in the docs for Also to be fair, this is likely not worse than what is currently being done in this package. But it's likely not much better either, even if it's faster. library(tidyverse)
library(zoomerjoin)
options(width = 130)
corpus_1 <- dime_data %>% # dime data is packaged with zoomerjoin
head(500)
names(corpus_1) <- c("a", "field")
corpus_2 <- dime_data %>% # dime data is packaged with zoomerjoin
tail(500)
names(corpus_2) <- c("b", "field")
jaccard_inner_join(corpus_1, corpus_2,
by = "field", n_gram_width = 6,
n_bands = 20, band_width = 6, threshold = .8
)
#> # A tibble: 8 × 4
#> a field.x b field.y
#> <dbl> <chr> <dbl> <chr>
#> 1 302 americans for good government inc 910 americans for good government
#> 2 230 pipefitters local union 524 998 pipefitters local union 533
#> 3 292 bill bradley for u s senate '84 913 bill bradley for u s senate '90
#> 4 378 guarini for congress 1982 606 guarini for congress 1984
#> 5 378 guarini for congress 1982 883 guarini for congress 1986
#> 6 238 4th congressional district democratic party 518 16th congressional district democratic party
#> 7 88 scheuer for congress 1980 667 scheuer for congress 1984
#> 8 319 7th congressional district democratic party of wisconsin 792 8th congressional district democratic party of wisconsin |
Fair enough. And code speed isn't really the main blocker with this package, it's "time it takes to manually verify" Still neat to look into! |
match_name
https://cran.r-project.org/web/packages/fuzzyjoin/
AB#10180
The text was updated successfully, but these errors were encountered: