-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Code marked as comment due to Umlauts #134
Comments
I'm not sure, but perhaps problem comes from that tm4e doesn't support onigurama with UTF-16 (see ignored tests at https://github.com/eclipse/tm4e/blob/master/org.eclipse.tm4e.core.tests/src/main/java/org/eclipse/tm4e/core/grammar/GrammarSuiteTest.java#L44) I think we should try to implement ConvertUtf16OffsetToUtf8 https://github.com/atom/node-oniguruma/blob/master/src/onig-searcher.cc#L4 in Java at https://github.com/eclipse/tm4e/blob/master/org.eclipse.tm4e.core/src/main/java/org/eclipse/tm4e/core/internal/oniguruma/OnigSearcher.java Any PR are welcome! |
I already have some utilities for converting to/from bytes/multibyte I used in LiClipseText... (https://github.com/fabioz/LiClipseText/blob/master/plugins/org.brainwy.liclipsetext.editor/src/org/brainwy/liclipsetext/editor/partitioning/Utf8WithCharLen.java). I'll take a look on porting them to Maybe better would be just using always utf-8 internally and convert just the final result? Anyway, will provide a pull request for the conversions shortly, but will leave it up to you where to do the |
@NielsNet could you reinstall http://download.eclipse.org/tm4e/snapshots/ and tell me if it fixes your problem. Thanks! |
@angelozerr With this change I have the same behavior I had with the approach I was using which just used everything with utf-8 offsets internally and just converting the final result from utf-16 to utf-8. (i.e.: fabioz/LiClipseText@1d8f88c -- the interesting bit is Token.java and Grammar.java -- mainly, no conversion is needed internally and just the final result is converted). So not sure... I must say I'm more inclined toward the patch which keeps everything utf-8 internally and just changes offsets when returning to grammar callers (it should be a bit faster as it doesn't have to convert all the time internally which may happen more often and makes internal handling simpler as everything is utf-8 internally as oniguruma has to work on bytes anyways). If you want I can provide a patch for that... |
@fabioz I have tried to translate node onigurama code and it seems it fixes this issue and #136 (I have only tested in Windows OS). Im' waiting for users feedback to know if it fixes encoding problem with other OS.
I'm aware with any patch if it improves performance and fixes any problem. Please note that I would like to consume tokenizeLine2 #38 like VSCode does but for you it will change nothing if you wish to use again tokenizeLine |
@angelozerr Sorry I did not notice your comment. I tried to install the snapshot but it doesn't work. I get an |
@NielsNet please try to reinstall typescript.java http://oss.opensagres.fr/typescript.ide/snapshots/ which install last version of tm4e |
@angelozerr After installing the latest version of typescript.java, the problem fully is gone: |
Thanks, let's close then. |
There seems to bee an issue with Textmate and umlauts.
As shown in the issue angelozerr/typescript.java#203 as soon as two or more letters with an umlaut are used the following lines of code are marked as comment even though they should not.
The following screenshot shows different cases where the behaviour appears.
The text was updated successfully, but these errors were encountered: