-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bengali / Khmer / Gujarati / Odia / Hindi regression? #126
Comments
Looks like 0.2.9 is where the change happened. |
@masaccio which terminal? |
#91 (comment) related issue and comment about the change |
This was iTerm2 on a Mac with |
This is the text I expect to see aligned - https://raw.githubusercontent.com/masaccio/compact-json/main/tests/data/test-issue-4.ref-1.json Though in my browser it's not aligned so I don't know what the right answer is. |
I will say that I also use iTerm2, and that it is not a great indicator of multilanguage support. I have since authored a testing and reporting tool, ucs-detect, and have published results for ~27 terminals. The following terminals match this library's measurements for Hindi:
The other ~23 terminals, including iTerm2, do not. iTerm2 gets an overall score of "B" rating for LANG score while the ones listed above get A's. Some of them are systematic errors and I may create bug reports for their respective projects. However, languages like Hindi of script Devanagari are very excessive with combining characters (Category codes Mc and Mn), and, strictly following the Unicode Specifications, as these 4 terminals and this library do, may result in so much "squeezing" to be totally illegible! On your findings of the browser, I have found that they do not make the effort to align by column as a terminal is expected to (see screenshots in #123 (comment)) I have authored a dummy "check" function to display a sequence where '|' should align, def check(n, phrase):
print('|'+(' '*wcwidth.wcswidth(phrase))+'|'+'\n'+'|'+phrase+'|\n') And these are the results for iTerm (left) and WezTerm (right) I don't know Devanagari enough to say for sure, I would say that iTerm2 appears to fail to correctly combine characters of category Mc and Mn, while wezterm does combine them but also sometimes reduces the font size to accommodate their expected width and maybe some combining characters are also poorly aligned |
Thanks for the comprehensive debug. I can see I'm staring a large rabbit hole of encodings I don't understand so I'll step away! Wezterm does indeed agree with your library (though not editing in vim) and that is enough for me. |
I recently updated from 0.2.6 to 0.2.13 and I have some tests breaking in a package that uses
wcswidth
. The following test fails every check in 0.2.13 but passes in 0.2.6:Aligning some ASCII text in my terminal, I believe that the check lengths are correct:
The text was updated successfully, but these errors were encountered: