Switch parser to multi-byte processing #118

chrisduerr · 2024-12-20T02:39:13Z

This patch overhauls the Parser::advance API to operate on byte slices instead of individual bytes, which allows for additional performance optimizations.

VTE does not support C1 escapes and C0 escapes always start with an escape character. This makes it possible to simplify processing if a byte stream is determined to not contain any escapes. The memchr crate provides a battle-tested implementation for SIMD-accelerated byte searches, which is why this implementation makes use of it.

VTE also only supports UTF8 characters in the ground state, which means that the new non-escape parsing path is able to rely completely on STD's str::from_utf8 since memchr gives us the full length of the plain text character buffer. This allows us to completely remove utf8parse and all related code.

We also make use of memchr in the synchronized escape handling in ansi.rs, since it realies heavily on scanning large amounts of text for the extension/termination escape sequences.

chrisduerr · 2024-12-20T02:41:00Z

Performance seems objectively better. There might be other things that can be improved but these are all the ideas I could come up with.

Obviously a breaking change, but I think that's fine considering how long VTE has been stable for. I think intentional breakage is better than providing the old API in a way that would be slower, that way people are aware that they should switch to a buffer-based approach.

Probably makes sense to resolve alacritty/vtebench#40 before merging this, but I don't expect it to impact this PR in any way.

examples/parselog.rs

src/definitions.rs

src/lib.rs

src/table.rs

kchibisov · 2024-12-25T03:08:24Z

cat /dev/urandom

thread 'PTY reader' panicked at /home/kchibisov/.cargo/git/checkouts/vte-abf4426cf053d48c/fbe3273/src/lib.rs:384:29:
index out of bounds: the len is 4 but the index is 4
stack backtrace:
   0:     0x5612113d32aa - std::backtrace_rs::backtrace::libunwind::trace::h5a5b8284f2d0c266
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/../../backtrace/src/backtrace/libunwind.rs:116:5
   1:     0x5612113d32aa - std::backtrace_rs::backtrace::trace_unsynchronized::h76d4f1c9b0b875e3
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x5612113d32aa - std::sys::backtrace::_print_fmt::hc4546b8364a537c6
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/sys/backtrace.rs:66:9
   3:     0x5612113d32aa - <std::sys::backtrace::BacktraceLock::print::DisplayBacktrace as core::fmt::Display>::fmt::h5b6bd5631a6d1f6b
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/sys/backtrace.rs:39:26
   4:     0x561211286753 - core::fmt::rt::Argument::fmt::h270f6602a2b96f62
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/core/src/fmt/rt.rs:177:76
   5:     0x561211286753 - core::fmt::write::h7550c97b06c86515
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/core/src/fmt/mod.rs:1186:21
   6:     0x5612113ceb13 - std::io::Write::write_fmt::h7b09c64fe0be9c84
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/io/mod.rs:1839:15
   7:     0x5612113d3102 - std::sys::backtrace::BacktraceLock::print::h2395ccd2c84ba3aa
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/sys/backtrace.rs:42:9
   8:     0x5612113d4616 - std::panicking::default_hook::{{closure}}::he19d4c7230e07961
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/panicking.rs:268:22
   9:     0x5612113d4460 - std::panicking::default_hook::hf614597d3c67bbdb
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/panicking.rs:295:9
  10:     0x5612113d4bd7 - std::panicking::rust_panic_with_hook::h8942133a8b252070
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/panicking.rs:801:13
  11:     0x5612113d4a7a - std::panicking::begin_panic_handler::{{closure}}::hb5f5963570096b29
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/panicking.rs:674:13
  12:     0x5612113d3759 - std::sys::backtrace::__rust_end_short_backtrace::h6208cedc1922feda
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/sys/backtrace.rs:170:18
  13:     0x5612113d471c - rust_begin_unwind
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/panicking.rs:665:5
  14:     0x561211283d30 - core::panicking::panic_fmt::h0c3082644d1bf418
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/core/src/panicking.rs:74:14
  15:     0x561211283f12 - core::panicking::panic_bounds_check::h8307ccead484a122
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/core/src/panicking.rs:276:5
  16:     0x5612111209ec - vte::Parser<_>::advance::h64319247f68d1473
  17:     0x5612111209ec - vte::ansi::Processor<T>::advance::he1250df037376f89
                               at /home/kchibisov/.cargo/git/checkouts/vte-abf4426cf053d48c/fbe3273/src/ansi.rs:306:13
  18:     0x5612111209ec - alacritty_terminal::event_loop::EventLoop<T,U>::pty_read::hadf5178adaadf87d
                               at /home/kchibisov/src/rust/alacritty-workspace/fork/alacritty_terminal/src/event_loop.rs:154:13
  19:     0x561211123a75 - alacritty_terminal::event_loop::EventLoop<T,U>::spawn::{{closure}}::h0aa9c6f0142eeda3
                               at /home/kchibisov/src/rust/alacritty-workspace/fork/alacritty_terminal/src/event_loop.rs:283:51
  20:     0x561211123a75 - std::sys::backtrace::__rust_begin_short_backtrace::hed037cca6b0c47fe
                               at /opt/rust-bin-1.83.0/lib/rustlib/src/rust/library/std/src/sys/backtrace.rs:154:18
  21:     0x561210fcb98e - std::thread::Builder::spawn_unchecked_::{{closure}}::{{closure}}::hbdcb6ef5bb62af20
                               at /opt/rust-bin-1.83.0/lib/rustlib/src/rust/library/std/src/thread/mod.rs:538:17
  22:     0x561210fcb98e - <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::h535959ff600df3c4
                               at /opt/rust-bin-1.83.0/lib/rustlib/src/rust/library/core/src/panic/unwind_safe.rs:272:9
  23:     0x561210fcb98e - std::panicking::try::do_call::hf94b16b1c3374dfd
                               at /opt/rust-bin-1.83.0/lib/rustlib/src/rust/library/std/src/panicking.rs:557:40
  24:     0x561210fcb98e - std::panicking::try::h16dbbc65d2504a71
                               at /opt/rust-bin-1.83.0/lib/rustlib/src/rust/library/std/src/panicking.rs:520:19
  25:     0x561210fcb98e - std::panic::catch_unwind::he6dac2e511ab91d5
                               at /opt/rust-bin-1.83.0/lib/rustlib/src/rust/library/std/src/panic.rs:358:14
  26:     0x561210fcb98e - std::thread::Builder::spawn_unchecked_::{{closure}}::hd5579a0b2eb51d33
                               at /opt/rust-bin-1.83.0/lib/rustlib/src/rust/library/std/src/thread/mod.rs:537:30
  27:     0x561210fcb98e - core::ops::function::FnOnce::call_once{{vtable.shim}}::h024e23de750216c2
                               at /opt/rust-bin-1.83.0/lib/rustlib/src/rust/library/core/src/ops/function.rs:250:5
  28:     0x5612113d8d3b - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::hf75717d9f28faebf
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/alloc/src/boxed.rs:2454:9
  29:     0x5612113d8d3b - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h7bd883a5f3c5f3c1
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/alloc/src/boxed.rs:2454:9
  30:     0x5612113d8d3b - std::sys::pal::unix::thread::Thread::new::thread_start::hcc78f3943333fa94
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/sys/pal/unix/thread.rs:105:17
  31:     0x7fbd196576e2 - <unknown>
  32:     0x7fbd196d381c - <unknown>
  33:                0x0 - <unknown>
^C

kchibisov · 2024-12-30T04:52:55Z

Sync updates are broken, since they don't actually start batching once you had them inside the batch unless you process the current buffer, leading to state being partially applied. We somehow need to break from the advance and make the rest parse into the sync buffer, which means that we basically have to exit and tell how much we've parsed based on the state of the CSI. Like advance() -> usize and then user has to retry until it's all parsed, it also means that csi_dispatch need to indicate that parsing should be aborted, etc, etc... Maybe some other indication will work, but one should be in place.

Transitioning to a state where we parse the rest as sync probably won't work as well, since we could have interrupt in it, so aborting and telling the user to restart parse sounds like the most sane thing to do in such case, however it also means that every routine that involves please parse the rest inside the ansi.rs would need same process of aborting every thing possible and returning to the user.

src/ansi.rs

chrisduerr · 2025-01-03T09:46:16Z

Copying my latest commit message here, since it provides some useful insights (imo):

Add `Perform::terminated` function

This patch is a rework of the partial processing patch in an attempt to
provide an identical clean API while still allowing for arbitrary
terminated of the parser for partial synchronized update processing.

Instead of returning values using the dispatch functions, a separate
`Perform::terminated` function is added which is queried whenever the
new `advance_until_terminated` function is called. The normal `advance`
function stays unchanged.

While the `advance` function could be implemented using the
`advance_until_terminated` function, this seems like it would just add
an unnecessary performance overhead. So since the function is pretty
small its contents are just duplicated instead.

src/ansi.rs

src/lib.rs

src/table.rs

src/lib.rs

This patch overhauls the `Parser::advance` API to operate on byte slices instead of individual bytes, which allows for additional performance optimizations. VTE does not support C1 escapes and C0 escapes always start with an escape character. This makes it possible to simplify processing if a byte stream is determined to not contain any escapes. The `memchr` crate provides a battle-tested implementation for SIMD-accelerated byte searches, which is why this implementation makes use of it. VTE also only supports UTF8 characters in the ground state, which means that the new non-escape parsing path is able to rely completely on STD's `str::from_utf8` since `memchr` gives us the full length of the plain text character buffer. This allows us to completely remove `utf8parse` and all related code. We also make use of `memchr` in the synchronized escape handling in `ansi.rs`, since it realies heavily on scanning large amounts of text for the extension/termination escape sequences.

This patch is a rework of the partial processing patch in an attempt to provide an identical clean API while still allowing for arbitrary terminated of the parser for partial synchronized update processing. Instead of returning values using the dispatch functions, a separate `Perform::terminated` function is added which is queried whenever the new `advance_until_terminated` function is called. The normal `advance` function stays unchanged. While the `advance` function could be implemented using the `advance_until_terminated` function, this seems like it would just add an unnecessary performance overhead. So since the function is pretty small its contents are just duplicated instead.

chrisduerr requested a review from kchibisov December 20, 2024 02:39

chrisduerr force-pushed the need_for_speed branch 2 times, most recently from 9503eaf to 6c3695b Compare December 20, 2024 02:46

nixpulvis reviewed Dec 20, 2024

View reviewed changes

examples/parselog.rs Show resolved Hide resolved

nixpulvis reviewed Dec 20, 2024

View reviewed changes

src/definitions.rs Show resolved Hide resolved

src/definitions.rs Show resolved Hide resolved

src/lib.rs Show resolved Hide resolved

src/table.rs Outdated Show resolved Hide resolved

chrisduerr mentioned this pull request Dec 20, 2024

Add support for custom parsing of APC, SOS and PM sequences. #115

Open

chrisduerr force-pushed the need_for_speed branch from 800c230 to 5611b1d Compare December 28, 2024 21:21

chrisduerr commented Jan 2, 2025

View reviewed changes

src/ansi.rs Outdated Show resolved Hide resolved

chrisduerr force-pushed the need_for_speed branch 5 times, most recently from 32ea120 to 27bc361 Compare January 3, 2025 07:00

chrisduerr force-pushed the need_for_speed branch from 67fd07f to 006a44c Compare January 3, 2025 10:02

kchibisov requested changes Jan 8, 2025

View reviewed changes

chrisduerr added 5 commits January 9, 2025 03:01

Fix partial UTF8 interrupted by ESC

130eeeb

Add sync buffer overflow test

ed7c714

Add mixed sync buffer test

0cc141d

chrisduerr force-pushed the need_for_speed branch from dfc77db to 1ad25bc Compare January 9, 2025 02:01

chrisduerr requested a review from kchibisov January 9, 2025 02:08

Fix review suggestions

d039177

chrisduerr force-pushed the need_for_speed branch 2 times, most recently from 2d79b0f to d039177 Compare January 9, 2025 06:03

kchibisov approved these changes Jan 9, 2025

View reviewed changes

chrisduerr merged commit 7321a44 into alacritty:master Jan 9, 2025
1 check passed

chrisduerr deleted the need_for_speed branch January 9, 2025 06:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch parser to multi-byte processing #118

Switch parser to multi-byte processing #118

chrisduerr commented Dec 20, 2024

chrisduerr commented Dec 20, 2024

kchibisov commented Dec 25, 2024

kchibisov commented Dec 30, 2024

chrisduerr commented Jan 3, 2025

Switch parser to multi-byte processing #118

Switch parser to multi-byte processing #118

Conversation

chrisduerr commented Dec 20, 2024

chrisduerr commented Dec 20, 2024

kchibisov commented Dec 25, 2024

kchibisov commented Dec 30, 2024

chrisduerr commented Jan 3, 2025