scanner: compute token line/column lazily on errors #624

bluetech · 2025-01-28T22:49:02Z

The scanner functions are hot, and the line/column location tracking is quite expensive. We only use it for errors, which don't need to be fast, because we bail if there are too many; and for warnings, which are usually not shown by default. So only keep the token start pos, and compute the line/column lazily from that. This will also allow some further improvements ahead.

bench/rulescomp

before: compiled 1000 keymaps in 1.669028s
after:  compiled 1000 keymaps in 1.550411s

bench/compose:

before: compiled 1000 compose tables in 2.145217s
after:  compiled 1000 compose tables in 2.016044s

wismill · 2025-01-29T04:04:54Z

Really nice! With faster position computation, we could also use it more frequently to get better logs.

This reminds me this article about Megaparsec, a quite fast Haskell parser with really good error messages. One big improvement was computing the exact position (line, column) only on demand. But it also caches it, so the computation does not start from scratch each time. I do not think this is the case here, but it could maybe improve the perf further?

bluetech · 2025-01-29T08:55:09Z

Nice article, talks exactly about this. I like the caching idea, it should be easy and work well for us, will try it a bit later.

bluetech · 2025-01-29T17:53:46Z

I added the caching.

wismill · 2025-01-29T20:17:29Z

Really nice! Using bench-compile-keymap --layout de --variant neo --stdev 0.3, I observed a 1.075x speedup without the cache and a 1.097x speedup with the cache. That’s a lot just for token position computation!

I would love to see some automated tests based on the stderr output. That would be test/log.c with a dummy keymap. It can be a follow-up.

wismill · 2025-01-29T20:18:45Z

Changing the milestone, I am sure this speedup is very much welcome!

wismill · 2025-01-30T04:44:39Z

I am going to write some tests in an independent MR.

wismill · 2025-01-30T07:21:13Z

(did not mean to close the PR, sorry for the noise)

See #630 for the tests.

The scanner functions are hot, and the line/column location tracking is quite expensive. We only use it for errors, which don't need to be fast, because we bail if there are too many; and for warnings, which are usually not shown by default. So only keep the token start pos, and compute the line/column lazily from that. This will also allow some further improvements ahead. bench/rulescomp before: compiled 1000 keymaps in 1.669028s after: compiled 1000 keymaps in 1.550411s bench/compose: before: compiled 1000 compose tables in 2.145217s after: compiled 1000 compose tables in 2.016044s Signed-off-by: Ran Benita <ran@unusedvar.com>

Signed-off-by: Ran Benita <ran@unusedvar.com>

bluetech added this to the 1.9.0 milestone Jan 28, 2025

wismill added compose compile-keymap performance labels Jan 29, 2025

bluetech force-pushed the scanner-lazy-loc branch from 060f082 to 6d33392 Compare January 29, 2025 17:50

wismill modified the milestones: 1.9.0, 1.8.0 Jan 29, 2025

wismill closed this Jan 30, 2025

wismill reopened this Jan 30, 2025

bluetech added 2 commits January 30, 2025 10:44

scanner: speed up token position -> location using a cache

9a00d32

Signed-off-by: Ran Benita <ran@unusedvar.com>

bluetech force-pushed the scanner-lazy-loc branch from 6d33392 to 9a00d32 Compare January 30, 2025 08:45

bluetech merged commit 6e97f57 into xkbcommon:master Jan 30, 2025
5 checks passed

bluetech deleted the scanner-lazy-loc branch February 5, 2025 12:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scanner: compute token line/column lazily on errors #624

scanner: compute token line/column lazily on errors #624

bluetech commented Jan 28, 2025

wismill commented Jan 29, 2025

bluetech commented Jan 29, 2025

bluetech commented Jan 29, 2025

wismill commented Jan 29, 2025

wismill commented Jan 29, 2025

wismill commented Jan 30, 2025

wismill commented Jan 30, 2025

scanner: compute token line/column lazily on errors #624

scanner: compute token line/column lazily on errors #624

Conversation

bluetech commented Jan 28, 2025

wismill commented Jan 29, 2025

bluetech commented Jan 29, 2025

bluetech commented Jan 29, 2025

wismill commented Jan 29, 2025

wismill commented Jan 29, 2025

wismill commented Jan 30, 2025

wismill commented Jan 30, 2025