Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for equivalence classes #626

Closed
albfan opened this issue Oct 7, 2017 · 4 comments
Closed

Support for equivalence classes #626

albfan opened this issue Oct 7, 2017 · 4 comments
Labels
enhancement An enhancement to the functionality of the software. icebox A feature that is recognized as possibly desirable, but is unlikely to implemented any time soon.

Comments

@albfan
Copy link

albfan commented Oct 7, 2017

Is there any chance to support equivalence classes?

https://www.regular-expressions.info/posixbrackets.html#eq

here is a scce test case:

$ cat >test <<EOF
Búsqueda global
EOF
$ grep -RiH "B[[=u=]]squeda global" *
test:Búsqueda global
$ ag -i "b[[=u=]]squeda global"
ERR: Bad regex! pcre_compile() failed at position 2: POSIX collating elements are not supported
If you meant to search for a literal string, run ag with -Q
@BurntSushi BurntSushi added enhancement An enhancement to the functionality of the software. icebox A feature that is recognized as possibly desirable, but is unlikely to implemented any time soon. labels Oct 7, 2017
@BurntSushi
Copy link
Owner

I'm not in principle against it, but this enhancement requires enhancements to the regex engine itself. If this ever happens, it will not be for a very long time.

Anywho, this issue isn't really for ripgrep but for the regex engine, so I've filed the issue there instead: rust-lang/regex#404

@albfan
Copy link
Author

albfan commented Oct 7, 2017

Fair enough. Thanks for take time to do fill that. Let's see if I can provide a patch on that

@BurntSushi
Copy link
Owner

BurntSushi commented Oct 8, 2017 via email

@albfan
Copy link
Author

albfan commented Nov 5, 2017

As a resume for anyone looking at this

rust tries to be agnostic about charsets, so this concept of equivalence classes is not avaliable. The only thing remotely similar is the normalization form for chars, so ò can be decomposed on o and ``` so searching for o equivalences could detect `ò`as an equivalence. That implies a file preprocess. That can be done with https://crates.io/crates/unicode-normalization, but presumably it will slow down all searches

provide custom config for this

[equivalence]
a = aá
e = eé
i = ií
o = oó
u = uúü
n = nñ

seems a better option, but I didn't see it as a useful contribution to ripgrep, seing #196, but for rust regex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement An enhancement to the functionality of the software. icebox A feature that is recognized as possibly desirable, but is unlikely to implemented any time soon.
Projects
None yet
Development

No branches or pull requests

2 participants