Clean out the saw-core lexer #2195

sauclovian-g · 2025-01-23T00:36:50Z

Unwind confusing code, tidy up, improve modularity, add Unicode support to match Cryptol and saw-script. Also fixes the interaction of block and line comments, and closes #1225.

(fortunately they were unused) No functional change.

(yes, it lives in the parser's source file, that's one of the reasons to untangle it) No functional change intended.

The parser's monad has an ExceptT for doing that, but the lexer shouldn't be using it directly. It should either have its own, or just return a list of messages. For the time being we'll do the latter. No functional change intended.

Put it instead in the Either monad (with () on the unused error side) so as to not have to change all the do-notation. (These changes are small, but this code is delicate, not well tested, and confusingly written in continuation-passing style, so I want it to all be readily bisectable later.) No functional change intended.

…type. No functional change intended.

It no longer needs to live in the parser. No functional change intended.

No functional change.

No functional change intended.

…xer. No functional change intended.

There's now a call to "go Nothing" before each call to "read", because the calls to "read" were the continuations in the calls to "go". Now we can move the call to "go" into "read", and suddenly the logic looks a lot more comprehensible. No functional change intended.

…ken. No functional change intended.

…nToken No functional change intended.

Pass the current error list in when recursing instead of accumulating it on the way out. Makes things tail-recursive as well as tidier. No functional change intended.

No functional change intended.

…ents. No functional change intended.

No functional change intended.

No functional change.

At least some saw-core errors are coming out via default Show instances that apparently print the directory and filename separately, which defeats the existing logic for keeping the current directory name out of the test output. Extend the seddery to cope.

…nts. Add some tests. Fixes #1225. Note that the error messages are complete rubbish so far (they seem to arise from default Show instances), but that's a problem for later. We also could have done this earlier but now the change is obviously correct. (except that it prints the end-of-file position with the error message; we would like to print where the open comment is, but that requires rearranging the state machine, and that's coming next)

As noted in #1225, it should not be possible for a line-comment to comment out the end of a block comment. Instead of matching --.*, materialize tokens for -- and \n and run both those and the {- -} tokens through a single state machine to drop comments. (This is essentially the same treatment as given the saw-script lexer a few commits back.) Update the tests: the test1225 test for this specific point now works properly. Also the more complex state machine lets us track the start location of the current block comment so we can print that when it's unclosed, as mentioned would be desirable in the last commit. This updates the corelexer002 test output.

The normal way to do this is %lexer { lexer >>= }, not %lexer { lexer }. This allows "lexer" to have a reasonable signature. (Although it still lives in the parser monad, so it's still not just a direct call into the lexer itself.) Hat tip to Eric Mertens who pointed this out. No functional change intended.

This is basically the same treatment the saw-script lexer got a few commits back. Add comparable tests. Stick them under the umbrella of #2042 even though this technically isn't part of that issue.

When we get AlexError, collect the actual failing characters, not the classification bytes. Also add some comments; doing that was what prompted noticing the oversight. (I'm not sure AlexError is actually reachable in this lexer, but that's a different issue...)

sauclovian-g · 2025-01-23T00:37:15Z

This is one it probably makes more sense to review one commit at a time, and I'm afraid there's a lot of them...

RyanGlScott · 2025-01-23T12:23:55Z

saw-core/src/Verifier/SAW/Grammar.y

+  -- XXX: this can only actually throw one error. Fix this up when we
+  -- clean out the error printing infrastructure.
+  let issue (pos, msg) = case msg of


Perhaps this comment should be moved to be above the mapM issue errors line below rather than here. (It took me a while to understand why it was talking about "only actually throw[ing] one error" until I read the entirety of the function.)

Sure. Hopefully this logic won't last very much longer anyway...

sauclovian-g added 25 commits January 22, 2025 19:07

Don't re-export random things from the saw-core parser.

2cdce5d

(fortunately they were unused) No functional change.

Begin unwinding the tangle in the main loop of the saw-core lexer.

d9af5ea

(yes, it lives in the parser's source file, that's one of the reasons to untangle it) No functional change intended.

In the main loop of the saw-core lexer, don't use the parser's error …

6a1c3fa

…type. No functional change intended.

Move the main loop of the saw-core lexer to the lexer source.

9f01a70

It no longer needs to live in the parser. No functional change intended.

Tidy up the saw-core lexer's exports.

e69e08f

No functional change.

saw-core lexer: drop dead code

b2cfda5

saw-core lexer: move code around.

9a11b34

No functional change intended.

Unwind the innermost layer of continuation passing in the saw-core le…

a461aed

…xer. No functional change intended.

saw-core lexer: move "go Nothing" to its own function; call it scanTo…

db93f3d

…ken. No functional change intended.

saw-core lexer: remove unneeded do-notation and Either monad from sca…

bf1bcd9

…nToken No functional change intended.

saw-core lexer: tidy scanToken further.

d5054c6

Pass the current error list in when recursing instead of accumulating it on the way out. Makes things tail-recursive as well as tidier. No functional change intended.

saw-core lexer: apply same treatment to "read".

ec1d3da

No functional change intended.

saw-core lexer: move "read" to its own function, call it scanSkipComm…

233d082

…ents. No functional change intended.

saw-core lexer: remove remaining unneeded do-notation and Either ()

cd7eb46

No functional change intended.

saw-core lexer: a few final tidyups related to the previous few changes.

0df4366

No functional change intended.

saw-core lexer: adjust formatting.

b617f06

No functional change.

Add Unicode support to the saw-core lexer.

da23d3b

This is basically the same treatment the saw-script lexer got a few commits back. Add comparable tests. Stick them under the umbrella of #2042 even though this technically isn't part of that issue.

sauclovian-g requested review from RyanGlScott, mtullsen and mccleeary-galois January 23, 2025 00:36

RyanGlScott approved these changes Jan 23, 2025

View reviewed changes

saw-core lexer: Adjust confusing comment by request

c42eb2d

sauclovian-g merged commit ec8844e into master Jan 24, 2025
32 of 34 checks passed

sauclovian-g deleted the 1225-sawcore-lexer branch January 24, 2025 03:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean out the saw-core lexer #2195

Clean out the saw-core lexer #2195

sauclovian-g commented Jan 23, 2025

sauclovian-g commented Jan 23, 2025

RyanGlScott Jan 23, 2025

sauclovian-g Jan 24, 2025

Clean out the saw-core lexer #2195

Clean out the saw-core lexer #2195

Conversation

sauclovian-g commented Jan 23, 2025

sauclovian-g commented Jan 23, 2025

RyanGlScott Jan 23, 2025

Choose a reason for hiding this comment

sauclovian-g Jan 24, 2025

Choose a reason for hiding this comment