Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean out the saw-core lexer #2195

Merged
merged 26 commits into from
Jan 24, 2025
Merged

Clean out the saw-core lexer #2195

merged 26 commits into from
Jan 24, 2025

Conversation

sauclovian-g
Copy link
Contributor

Unwind confusing code, tidy up, improve modularity, add Unicode support to match Cryptol and saw-script. Also fixes the interaction of block and line comments, and closes #1225.

(fortunately they were unused)

No functional change.
(yes, it lives in the parser's source file, that's one of the
reasons to untangle it)

No functional change intended.
The parser's monad has an ExceptT for doing that, but the lexer
shouldn't be using it directly. It should either have its own,
or just return a list of messages. For the time being we'll do
the latter.

No functional change intended.
Put it instead in the Either monad (with () on the unused error side)
so as to not have to change all the do-notation.

(These changes are small, but this code is delicate, not well tested,
and confusingly written in continuation-passing style, so I want it to
all be readily bisectable later.)

No functional change intended.
It no longer needs to live in the parser.

No functional change intended.
No functional change intended.
There's now a call to "go Nothing" before each call to "read", because
the calls to "read" were the continuations in the calls to "go". Now
we can move the call to "go" into "read", and suddenly the logic looks
a lot more comprehensible.

No functional change intended.
Pass the current error list in when recursing instead of accumulating
it on the way out. Makes things tail-recursive as well as tidier.

No functional change intended.
No functional change intended.
No functional change.
At least some saw-core errors are coming out via default Show
instances that apparently print the directory and filename separately,
which defeats the existing logic for keeping the current directory
name out of the test output. Extend the seddery to cope.
…nts.

Add some tests.
Fixes #1225.

Note that the error messages are complete rubbish so far (they seem to
arise from default Show instances), but that's a problem for later.

We also could have done this earlier but now the change is obviously
correct.

(except that it prints the end-of-file position with the error message;
we would like to print where the open comment is, but that requires
rearranging the state machine, and that's coming next)
As noted in #1225, it should not be possible for a line-comment to
comment out the end of a block comment.

Instead of matching --.*, materialize tokens for -- and \n and run
both those and the {- -} tokens through a single state machine to drop
comments.

(This is essentially the same treatment as given the saw-script lexer
a few commits back.)

Update the tests: the test1225 test for this specific point now works
properly. Also the more complex state machine lets us track the start
location of the current block comment so we can print that when it's
unclosed, as mentioned would be desirable in the last commit. This
updates the corelexer002 test output.
The normal way to do this is %lexer { lexer >>= }, not %lexer { lexer }.
This allows "lexer" to have a reasonable signature.

(Although it still lives in the parser monad, so it's still not just a
direct call into the lexer itself.)

Hat tip to Eric Mertens who pointed this out.

No functional change intended.
This is basically the same treatment the saw-script lexer got a few
commits back.

Add comparable tests. Stick them under the umbrella of #2042 even
though this technically isn't part of that issue.
When we get AlexError, collect the actual failing characters, not the
classification bytes.

Also add some comments; doing that was what prompted noticing the
oversight.

(I'm not sure AlexError is actually reachable in this lexer, but
that's a different issue...)
@sauclovian-g
Copy link
Contributor Author

This is one it probably makes more sense to review one commit at a time, and I'm afraid there's a lot of them...

Comment on lines 271 to 273
-- XXX: this can only actually throw one error. Fix this up when we
-- clean out the error printing infrastructure.
let issue (pos, msg) = case msg of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this comment should be moved to be above the mapM issue errors line below rather than here. (It took me a while to understand why it was talking about "only actually throw[ing] one error" until I read the entirety of the function.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Hopefully this logic won't last very much longer anyway...

@sauclovian-g sauclovian-g merged commit ec8844e into master Jan 24, 2025
32 of 34 checks passed
@sauclovian-g sauclovian-g deleted the 1225-sawcore-lexer branch January 24, 2025 03:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SAWCore parser hangs on .sawcore files with unterminated block comments
2 participants