-
-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgraded rivo/uniseg to latest version, switched StringWidth/Truncate to speedier version #63
base: master
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## master #63 +/- ##
==========================================
+ Coverage 93.56% 93.82% +0.25%
==========================================
Files 3 3
Lines 171 178 +7
==========================================
+ Hits 160 167 +7
Misses 6 6
Partials 5 5
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
I upgraded again to the latest Also updated CI to use current Go versions. Everything passes now. |
@@ -166,6 +167,16 @@ func emoji(out io.Writer, in io.Reader) error { | |||
}) | |||
} | |||
|
|||
// We also want regional indicator symbols (flags) to be part of the Emoji | |||
// table. They are U+1F1E6..U+1F1FF. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add link to documentation of the specification?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
http://www.unicode.org/reports/tr51/#def_emoji_flag_sequence documents how flags are constructed using Regional Indicator code points.
Regional Indicator code points are not classified as Extended_Pictographic so they don't show up in your emoji table. But for the sake of calculating the width, they behave the same as other emojis. So the simplest solution is to add them to your emoji table. Alternatively, you could add the detection of Regional Indicators to all other parts of your code. That would be overkill, in my opinion, but it's up to you.
In any case, you'll want StringLength("🇯🇵") == 2
. That's what this is for.
You'll find them in the same file (look for "Regional Indicator"):
go.mod
Outdated
@@ -1,5 +1,5 @@ | |||
module github.com/mattn/go-runewidth | |||
|
|||
go 1.9 | |||
go 1.18 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope to keep go1.16 but do you have something problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, rivo/uniseg
uses generics and the new build tag syntax, both of which were introduced with Go 1.18.
I could probably downgrade it in go-runewidth
to a previous version but that old version was much slower than the new version. If I do, Go 1.16 will work.
Let me know how you'd like to proceed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What version of uniseg is it possible to build with go-runewidth with 1.16?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
v0.3.4
But if you use v0.3.4, you also need to make adjustments to your code.
Here's my suggestion: I prepare a second PR with the same output as this one, but based on uniseg
v.0.3.4 and the older Go version. We can leave this PR (#63) open and you can merge it once you're ready to switch to Go 1.18.
What do you think of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about to separate files for runewidth_go118.go and runewidth_go117.go ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just submitted a commit that does this but it gives me merge conflicts. It looks like you made/accepted other changes in the meantime. I'm not able to resolve these conflicts, only you can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I was able to resolve the merge conflict. Please have a look.
Do you still need information or any action from me here? |
👍 to getting this merged |
@mattn is there anything blocking this from merging and cutting a new release? |
@rivo Thanks for the patch. I can confirm that this fixes a display issue of my program. |
cl, s, _, state = uniseg.FirstGraphemeClusterInString(s, state) | ||
for index, r := range cl { | ||
if index == 0 && inTable(r, emoji) { | ||
chWidth = 2 // Not the optimal solution but it will work in most cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that this incorrectly reports the width of some characters, such as ▶
(0x25b6), which is in both the emoji
and ambiguous
tables.
Related: junegunn/fzf#3588
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it turns out that emojis can have multiple representations. There is a default for each emoji and your example or, say, ☺, defaults to "text presentation". And there are code points that can be added to force a specific presentation. To make things worse, many systems don't respect the Unicode specification and then there is the question of whether fonts will support the different representations.
go-runewidth
does not go into this detail. This is also why the comment here says "in most cases". It's a simple approximation. If you want this emoji presentation flag to be considered, you can also use uniseg
directly which has had string width calculation for about two years now. uniseg.StringWidth("\u25b6")
will report as "1".
From what I can see, uniseg
will be much more accurate than go-runewidth
. However, go-runewidth
is mostly in line with most terminals which also tend to use the simple wcwidth
functionality. Depending on your application, there may be some benefit to "making the same mistakes" as the environment it is run in. For example, iTerm2 on macOS which is a very popular terminal application, does not render the rainbow flag correctly:

This flag obviously has a width of 2 but iTerm2 assigns a width of 1. If you need your application to be in line with iTerm2, it may be better to make that same mistake. go-runewidth
is probably your best bet in that case. (Although I don't know what it reports for this specific example, it might not have the same issue as iTerm2.) But if you need something that is accurate, no matter what, I would suggest using uniseg
to calculate rune/string widths.
As a final note, VS Code renders the flag correctly:

There is no official spec for these widths. Therefore, everyone rolls their own implementation. It's a mess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed comment. Really appreciated. I'll see if I can use uniseg.StringWidth
instead.
@jesseduffield I'm using this package but not this branch. After tons and tons of hours and effort getting into the details of Unicode and terminals, including writing my own version of a character width package (https://github.com/rivo/uniseg?tab=readme-ov-file#monospace-width), I realize that which package to use really depends on what you're trying to achieve. On the other hand, the Regarding this branch, I don't know what the plans are. This whole thing is a complicated topic and I totally understand if @mattn doesn't have the time to get into the details. Looks like there's not much happening in this project anymore anyway. Maybe I should add |
I got some errors in this branch.
|
@mattn I haven't spent time on this PR anymore as it seemed like you didn't want to follow up on it / merge it. I can fix these issues but only if you indicate that you will merge it. Otherwise, it would be a wasted effort. Please let me know. Also, your |
@rivo thanks for that detailed explanation, that's really helpful |
The
rivo/uniseg
package has received a major update which also includes methods for grapheme cluster parsing that are much faster than the previously usedGraphemes
class.I've upgraded your package accordingly and updated the relevant code to use these faster methods. It would be great if you could merge these changes.
Thank you!
ps. I noticed that some automatic checks did not complete successfully because they are still running on Go 1.15. Would you like me to look into upgrading them to the current version (1.18)?