Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean out the saw-core lexer #2195

Merged
merged 26 commits into from
Jan 24, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
2cdce5d
Don't re-export random things from the saw-core parser.
sauclovian-g Jan 17, 2025
d9af5ea
Begin unwinding the tangle in the main loop of the saw-core lexer.
sauclovian-g Jan 18, 2025
ff754c0
Move error-throwing out of the main loop of the saw-core lexer.
sauclovian-g Jan 18, 2025
70d6319
Now move the main loop of the saw-core lexer out of the parser monad.
sauclovian-g Jan 18, 2025
6a1c3fa
In the main loop of the saw-core lexer, don't use the parser's error …
sauclovian-g Jan 18, 2025
9f01a70
Move the main loop of the saw-core lexer to the lexer source.
sauclovian-g Jan 18, 2025
e69e08f
Tidy up the saw-core lexer's exports.
sauclovian-g Jan 18, 2025
b2cfda5
saw-core lexer: drop dead code
sauclovian-g Jan 18, 2025
9a11b34
saw-core lexer: move code around.
sauclovian-g Jan 18, 2025
a461aed
Unwind the innermost layer of continuation passing in the saw-core le…
sauclovian-g Jan 18, 2025
99c5591
Now simplify the saw-core lexer further.
sauclovian-g Jan 18, 2025
db93f3d
saw-core lexer: move "go Nothing" to its own function; call it scanTo…
sauclovian-g Jan 18, 2025
bf1bcd9
saw-core lexer: remove unneeded do-notation and Either monad from sca…
sauclovian-g Jan 18, 2025
d5054c6
saw-core lexer: tidy scanToken further.
sauclovian-g Jan 18, 2025
ec1d3da
saw-core lexer: apply same treatment to "read".
sauclovian-g Jan 18, 2025
233d082
saw-core lexer: move "read" to its own function, call it scanSkipComm…
sauclovian-g Jan 18, 2025
cd7eb46
saw-core lexer: remove remaining unneeded do-notation and Either ()
sauclovian-g Jan 18, 2025
0df4366
saw-core lexer: a few final tidyups related to the previous few changes.
sauclovian-g Jan 18, 2025
b617f06
saw-core lexer: adjust formatting.
sauclovian-g Jan 18, 2025
fd6e8f8
Extend test-and-diff to handle saw-core errors.
sauclovian-g Jan 20, 2025
bacafa8
saw-core lexer: make comment state machine not hang on unclosed comme…
sauclovian-g Jan 18, 2025
bf27c91
saw-core lexer: fix the interaction of block and line comments.
sauclovian-g Jan 20, 2025
9eaa679
saw-core parser: simplify the lexer call
sauclovian-g Jan 20, 2025
da23d3b
Add Unicode support to the saw-core lexer.
sauclovian-g Jan 22, 2025
20a230e
saw-core lexer: fix oversight in previous
sauclovian-g Jan 23, 2025
c42eb2d
saw-core lexer: Adjust confusing comment by request
sauclovian-g Jan 24, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@
## Bug fixes

* SAW now accepts Unicode input beyond characters 0..255, including in
embedded Cryptol fragments, and allows the same Unicode code points
in identifiers as Cryptol.
embedded Cryptol fragments and saw-core modules, and allows the same
Unicode code points in identifiers as Cryptol.
It is thus now possible to refer to Cryptol objects whose names
include extended characters.
(#2042)
Expand Down
8 changes: 8 additions & 0 deletions intTests/support/test-and-diff.sh
Original file line number Diff line number Diff line change
Expand Up @@ -106,11 +106,19 @@ run-tests() {
# We also need to shorten pathnames that contain the current
# directory, because saw feels the need to expand pathnames
# (see also saw-script #2082).
#
# In addition to that we also need to remove the current
# directory when it appears in double quotes, at least for
# now, because many error prints at the saw-core level seem to
# use default Show instances and the saw-core positions carry
# the directory and filename separately. This becomes
# "interesting" from a quoting perspective...
sed < $TEST.rawlog > $TEST.log '
/^\[[0-9][0-9]:[0-9][0-9]:[0-9][0-9]\.[0-9][0-9][0-9]\] /{
s/^..............//
}
s,'"$CURDIR"'/,,g
s,"'"$CURDIR"'",".",g
'

# Check the output against the expected version.
Expand Down
5 changes: 5 additions & 0 deletions intTests/test1225/test.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
set -e

! $SAW test1.saw
$SAW test2.saw
$SAW test3.saw
2 changes: 2 additions & 0 deletions intTests/test1225/test1.saw
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
let x = parse_core_mod "Prelude" "True {- ";
print_term x;
2 changes: 2 additions & 0 deletions intTests/test1225/test2.saw
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
let x = parse_core_mod "Prelude" " {- -} True";
print_term x;
2 changes: 2 additions & 0 deletions intTests/test1225/test3.saw
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
let x = parse_core_mod "Prelude" " {--- ---} True";
print_term x;
4 changes: 4 additions & 0 deletions intTests/test2042/test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,7 @@ $SAW test1.saw
! $SAW test2.saw
! $SAW test3.saw

$SAW test4.saw
! $SAW test5.saw
! $SAW test6.saw

2 changes: 2 additions & 0 deletions intTests/test2042/test4.saw
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
enable_experimental;
load_sawcore_from_file "test4.sawcore";
5 changes: 5 additions & 0 deletions intTests/test2042/test4.sawcore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
-- We should now be able to have non-ascii identifiers in saw-core
module test4 where
import Prelude;
lá : Bool;
lá = True;
2 changes: 2 additions & 0 deletions intTests/test2042/test5.saw
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
enable_experimental;
load_sawcore_from_file "test5.sawcore";
5 changes: 5 additions & 0 deletions intTests/test2042/test5.sawcore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
-- However, we shouldn't be able to use just anything as an identifier.
module test5 where
import Prelude;
😸 : Bool;
😸 = True;
2 changes: 2 additions & 0 deletions intTests/test2042/test6.saw
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
enable_experimental;
load_sawcore_from_file "test6.sawcore";
7 changes: 7 additions & 0 deletions intTests/test2042/test6.sawcore
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
-- Nor should we be able to use just anything _in_ an identifier.
module test5 where
import Prelude;
thingy_hook : Bool;
thingy_hook = True;
thingy_🪝 : Bool;
thingy_🪝 = False;
7 changes: 7 additions & 0 deletions intTests/test_parse_errors/corelexer001.log.good
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Loading file "corelexer001.saw"
Stack trace:
"load_sawcore_from_file" (corelexer001.saw:2:1-2:23)
Module parsing failed:
PosPair {_pos = Pos {posBase = ".", posPath = "corelexer001.sawcore", posLine = 1, posCol = 46}, val = UnexpectedToken TEnd}

FAILED
2 changes: 2 additions & 0 deletions intTests/test_parse_errors/corelexer001.saw
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
enable_experimental;
load_sawcore_from_file "corelexer001.sawcore";
1 change: 1 addition & 0 deletions intTests/test_parse_errors/corelexer001.sawcore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
-- line comment with no newline at end of file
7 changes: 7 additions & 0 deletions intTests/test_parse_errors/corelexer002.log.good
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Loading file "corelexer002.saw"
Stack trace:
"load_sawcore_from_file" (corelexer002.saw:2:1-2:23)
Module parsing failed:
PosPair {_pos = Pos {posBase = ".", posPath = "corelexer002.sawcore", posLine = 1, posCol = 0}, val = ParseError "Unclosed Comment"}

FAILED
2 changes: 2 additions & 0 deletions intTests/test_parse_errors/corelexer002.saw
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
enable_experimental;
load_sawcore_from_file "corelexer002.sawcore";
3 changes: 3 additions & 0 deletions intTests/test_parse_errors/corelexer002.sawcore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{-
- Unclosed comment
-
77 changes: 26 additions & 51 deletions saw-core/src/Verifier/SAW/Grammar.y
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,8 @@ Portability : non-portable (language extensions)
-}

module Verifier.SAW.Grammar
( Decl(..)
, Term(..)
, parseSAW
( parseSAW
, parseSAWTerm
, lexer
) where

import Control.Applicative ((<$>))
Expand All @@ -28,6 +25,7 @@ import qualified Data.ByteString.Lazy.UTF8 as B
import Data.Maybe (isJust)
import Data.Text (Text)
import qualified Data.Text as Text
import qualified Data.Text.Lazy as LText
import Data.Traversable
import Data.Word
import Numeric.Natural
Expand All @@ -46,7 +44,7 @@ import Verifier.SAW.Lexer

%tokentype { PosPair Token }
%monad { Parser }
%lexer { lexer } { PosPair _ TEnd }
%lexer { lexer >>= } { PosPair _ TEnd }
%error { parseError }
%expect 0

Expand Down Expand Up @@ -251,70 +249,47 @@ rlist1(p) :: { [p] }

{
data ParseError
= UnexpectedLex [Word8]
= UnexpectedLex Text
| UnexpectedToken Token
| ParseError String
deriving (Show)

newtype Parser a = Parser { _unParser :: ExceptT (PosPair ParseError) (State AlexInput) a }
newtype Parser a = Parser { _unParser :: ExceptT (PosPair ParseError) (State LexerState) a }
deriving (Applicative, Functor, Monad)

addError :: Pos -> ParseError -> Parser a
addError p err = Parser $ throwError (PosPair p err)

setInput :: AlexInput -> Parser ()
setInput inp = Parser $ put inp

parsePos :: Parser Pos
parsePos = Parser $ gets pos

lexer :: (PosPair Token -> Parser a) -> Parser a
lexer f = do
let go prevErr next = do
let addErrors =
case prevErr of
Nothing -> return ()
Just (po,l) -> addError po (UnexpectedLex (reverse l))
s <- Parser get
let inp@(PosPair p (Buffer _ b)) = s
end = addErrors >> next (PosPair p TEnd)
case alexScan inp 0 of
AlexEOF -> end
AlexError _ ->
case alexGetByte inp of
Just (w,inp') -> do
setInput inp'
case prevErr of
Nothing -> go (Just (p,[w])) next
Just (po,l) -> go (Just (po,w:l)) next
Nothing -> end
AlexSkip inp' _ -> addErrors >> setInput inp' >> go Nothing next
AlexToken inp' l act -> do
addErrors
setInput inp'
let v = act (B.toString (B.take (fromIntegral l) b))
next (PosPair p v)
let read i tkn =
case val tkn of
TCmntS -> go Nothing (read (i+1))
TCmntE | i > 0 -> go Nothing (read (i-1))
| otherwise -> do
addError (pos tkn) (UnexpectedLex (fmap (fromIntegral . fromEnum) "-}"))
go Nothing (read 0)
_ | i > 0 -> go Nothing (read i)
| otherwise -> f tkn
go Nothing (read (0::Integer))
lexer :: Parser (PosPair Token)
lexer = do
inp <- Parser get
let (inp', errors, result) = lexSAWCore inp
Parser $ put inp'
let issue (pos, msg) = case msg of
InvalidInput chars -> addError pos $ UnexpectedLex chars
UnclosedComment -> addError pos $ ParseError "Unclosed Comment"
MissingEOL ->
-- XXX: this should be a warning but we have no such ability here yet
--addWarning pos $ "No newline at end of file"
return ()
-- XXX: this can only actually throw one error. Fix this up when we
-- clean out the error printing infrastructure.
mapM issue errors
return result

-- | Run parser given a directory for the base (used for making pathname relative),
-- bytestring to parse, and parser to run.
runParser :: Parser a -> FilePath -> FilePath -> B.ByteString -> Either (PosPair ParseError) a
runParser (Parser m) base path b = evalState (runExceptT m) initState
where initState = initialAlexInput base path b
runParser :: Parser a -> FilePath -> FilePath -> LText.Text -> Either (PosPair ParseError) a
runParser (Parser m) base path txt = evalState (runExceptT m) initState
where initState = initialLexerState base path txt

parseSAW :: FilePath -> FilePath -> B.ByteString -> Either (PosPair ParseError) Module
parseSAW :: FilePath -> FilePath -> LText.Text -> Either (PosPair ParseError) Module
parseSAW = runParser parseSAW2

parseSAWTerm :: FilePath -> FilePath -> B.ByteString -> Either (PosPair ParseError) Term
parseSAWTerm :: FilePath -> FilePath -> LText.Text -> Either (PosPair ParseError) Term
parseSAWTerm = runParser parseSAWTerm2

parseError :: PosPair Token -> Parser a
Expand Down
Loading
Loading