-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(es/lexer): Use logos
lexer as a sub-lexer
#9807
base: main
Are you sure you want to change the base?
Conversation
|
logos
for Text => RawToken => Token
phaselogos
for sub-lexer
logos
for sub-lexerlogos
lexer as a sub-lexer
This reverts commit 2cb094e.
@@ -372,7 +353,7 @@ impl Iterator for Lexer<'_> { | |||
} | |||
|
|||
self.state.update(start, token.kind()); | |||
self.state.prev_hi = self.last_pos(); | |||
self.state.prev_hi = self.input.cur_pos(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not need maintain the cur_pos message in parser.lexer. How about get span
from logos.lexer ? It's a better way to reduce ecma.parser.lexer complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it can be a better design. You would only need to add start_pos
to those.
|
||
#[derive(Logos, Debug, Clone, Copy, PartialEq, Eq)] | ||
#[logos(error = LogosError)] | ||
pub enum JsxToken {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<
in jsx maybe BinOpToken::Lt
or JSXTagStart
, logos lexer is so hard using a kind enum to express two token kind. So I think we don't generate jsx token by logos lexer.
We can think of logos lexer as a basic lexer that generate basic token kind such as LtAngle(<)
, RtAngle(>)
....
Lexer using logos and state to generate more specific Token.
Do you think it a better way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeap, that's the way I was thinking of.
Description:
Define
RawToken
.logos
generates a deterministic finite state machine that consists of lookup tables and jump tables.RawToken
will be renamed toToken
and used directly by the parser. I need to investigate much more about API.RawToken
should be single-byte-sized.Adjust lexer to work on
RawToken
instead ofchar
.logos
takes&str
and generatesIterator<Item = RawToken>
.logos
provides callback API, and that's how we should handle ambiguous tokens.logos
is a bit inferior to that ofregex
. In other words, even the regex is valid, the logos lexer may not generate matchingRawToken
. This is for performance, and it's documented here.Wrapper:
logos::Lexer
=>RawLexer
=>Lexer
=>Parser
RawLexer
as a sort of buffer (based onpeek_nth
fromitertools
), but I found that it's a good place to have various lexing methods, so I addedread_regexp
.read_regexp
uses another logos token definition, so it should be in theswc_ecma_raw_lexer
crate.Fix tests
Str
has the value of\\\\
and the raw value of\\\\
for input string\\\\
. But the value field should be\\
instead.Related issue (if exists):