-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch parser to multi-byte processing #118
Conversation
Performance seems objectively better. There might be other things that can be improved but these are all the ideas I could come up with. Obviously a breaking change, but I think that's fine considering how long VTE has been stable for. I think intentional breakage is better than providing the old API in a way that would be slower, that way people are aware that they should switch to a buffer-based approach. Probably makes sense to resolve alacritty/vtebench#40 before merging this, but I don't expect it to impact this PR in any way. |
9503eaf
to
6c3695b
Compare
|
800c230
to
5611b1d
Compare
Sync updates are broken, since they don't actually start batching once you had them inside the batch unless you process the current buffer, leading to state being partially applied. We somehow need to break from the advance and make the rest parse into the sync buffer, which means that we basically have to exit and tell how much we've parsed based on the state of the CSI. Like Transitioning to a state where we parse the rest as sync probably won't work as well, since we could have interrupt in it, so aborting and telling the user to restart parse sounds like the most sane thing to do in such case, however it also means that every routine that involves please parse the rest inside the |
32ea120
to
27bc361
Compare
Copying my latest commit message here, since it provides some useful insights (imo):
|
67fd07f
to
006a44c
Compare
This patch overhauls the `Parser::advance` API to operate on byte slices instead of individual bytes, which allows for additional performance optimizations. VTE does not support C1 escapes and C0 escapes always start with an escape character. This makes it possible to simplify processing if a byte stream is determined to not contain any escapes. The `memchr` crate provides a battle-tested implementation for SIMD-accelerated byte searches, which is why this implementation makes use of it. VTE also only supports UTF8 characters in the ground state, which means that the new non-escape parsing path is able to rely completely on STD's `str::from_utf8` since `memchr` gives us the full length of the plain text character buffer. This allows us to completely remove `utf8parse` and all related code. We also make use of `memchr` in the synchronized escape handling in `ansi.rs`, since it realies heavily on scanning large amounts of text for the extension/termination escape sequences.
This patch is a rework of the partial processing patch in an attempt to provide an identical clean API while still allowing for arbitrary terminated of the parser for partial synchronized update processing. Instead of returning values using the dispatch functions, a separate `Perform::terminated` function is added which is queried whenever the new `advance_until_terminated` function is called. The normal `advance` function stays unchanged. While the `advance` function could be implemented using the `advance_until_terminated` function, this seems like it would just add an unnecessary performance overhead. So since the function is pretty small its contents are just duplicated instead.
dfc77db
to
1ad25bc
Compare
2d79b0f
to
d039177
Compare
This patch overhauls the
Parser::advance
API to operate on byte slices instead of individual bytes, which allows for additional performance optimizations.VTE does not support C1 escapes and C0 escapes always start with an escape character. This makes it possible to simplify processing if a byte stream is determined to not contain any escapes. The
memchr
crate provides a battle-tested implementation for SIMD-accelerated byte searches, which is why this implementation makes use of it.VTE also only supports UTF8 characters in the ground state, which means that the new non-escape parsing path is able to rely completely on STD's
str::from_utf8
sincememchr
gives us the full length of the plain text character buffer. This allows us to completely removeutf8parse
and all related code.We also make use of
memchr
in the synchronized escape handling inansi.rs
, since it realies heavily on scanning large amounts of text for the extension/termination escape sequences.