forked from ruby/lrama
-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
3 changed files
with
578 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,382 @@ | ||
# Compressed State Table | ||
|
||
LR parser generates two large tables, action table and GOTO table. | ||
Action table is a matrix of current state and token. Each cell of action table indicates next action (shift, reduce, accept and error). | ||
GOTO table is a matrix of current state and nonterminal symbol. Each cell of GOTO table indicates next state. | ||
|
||
Both action table and GOTO table are sparse. Therefore LR parser generator compresses both tables and creates these tables. | ||
|
||
* `yypact` & `yypgoto` | ||
* `yytable` | ||
* `yycheck` | ||
* `yydefact` & `yydefgoto` | ||
|
||
See also: https://speakerdeck.com/yui_knk/what-is-expected?slide=52 | ||
|
||
## Introduction to major tables | ||
|
||
### `yypact` & `yypgoto` | ||
|
||
`yypact` specifies what to do on the current state. | ||
Accessing the value by `state`. For example, | ||
|
||
```ruby | ||
yyn = yypact[state] | ||
``` | ||
|
||
If the value is `YYPACT_NINF` (Negative INFinity), it means execution of default reduce action. | ||
Otherwise the value is an offset in `yytable`. | ||
|
||
`yypgoto` plays the same role as `yypact`. But `yypgoto` is used for GOTO table. | ||
Especially `yypgoto` is used when reduce happens. | ||
|
||
```ruby | ||
rule_for_reduce = rules[rule_id] | ||
|
||
# lhs_id holds LHS nonterminal id of the rule used for reduce. | ||
lhs_id = rule_for_reduce.lhs.id | ||
|
||
yyn = yypgoto[lhs_id] | ||
|
||
# Validate access to yytable | ||
if yycheck[yyn + state] == state | ||
next_state = yytable[yyn + state] | ||
end | ||
``` | ||
|
||
### `yytable` | ||
|
||
`yytable` specifies what actually to do on the current state. | ||
|
||
Positive number means shift and specifies next state. | ||
For example, `yytable[yyn] == 1` means shift and next state is State 1. | ||
|
||
`YYTABLE_NINF` (Negative INFinity) means syntax error. | ||
For example, `yytable[yyn] == YYTABLE_NINF` means syntax error. | ||
|
||
Other negative number and zero mean reducing with the rule whose number is opposite. | ||
For example, `yytable[yyn] == -1` means reduce with Rule 1. | ||
|
||
### `yycheck` | ||
|
||
`yycheck` validates accesses to `yytable`. | ||
|
||
Each line of action table and GOTO table is placed into single array in `yytable`. | ||
Consider the case where action table has only two states. | ||
In this case, if the second array is shifted to the right, they can be merged into one array without conflict. | ||
|
||
```ruby | ||
[ | ||
[ 'a', 'b', , , 'e'], # State 0 | ||
[ , 'B', 'C', , 'E'], # State 1 | ||
] | ||
|
||
# => Shift the second array to the right | ||
|
||
[ | ||
[ 'a', 'b', , , 'e'], # State 0 | ||
[ , 'B', 'C', , 'E'], # State 1 | ||
] | ||
|
||
# => Merge them into single array | ||
|
||
yytable = [ | ||
'a', 'b', 'B', 'C', 'e', 'E' | ||
] | ||
``` | ||
|
||
`yypact` is an array of each state offset. | ||
|
||
```ruby | ||
yypact = [0, 1] | ||
``` | ||
|
||
We can access the value of `state1[2]` by consulting `yypact`. | ||
|
||
```ruby | ||
yytable[yypact[1] + 2] | ||
# => yytable[1 + 2] | ||
# => 'C' | ||
``` | ||
|
||
However this approach doesn't work well when accessing to nil value like `state1[3]`. | ||
Because it tries to access to `state0[4]`. | ||
|
||
```ruby | ||
yytable[yypact[1] + 3] | ||
# => yytable[1 + 3] | ||
# => 'e' | ||
``` | ||
|
||
This is why `yycheck` is needed. | ||
`yycheck` stores valid indexes of the original table. | ||
In the current example: | ||
|
||
* 0, 1 and 4 are valid index of State 0 | ||
* 1, 2 and 4 are valid index of State 1 | ||
|
||
`yycheck` stores these indexes with same offset with `yytable`. | ||
|
||
```ruby | ||
# yytable | ||
[ | ||
[ 'a', 'b', , , 'e'], # State 0 | ||
[ , 'B', 'C', , 'E'], # State 1 | ||
] | ||
|
||
yytable = [ | ||
'a', 'b', 'B', 'C', 'e', 'E' | ||
] | ||
|
||
# yycheck | ||
[ | ||
[ 0, 1, , , 4], # State 0 | ||
[ , 1, 2, , 4], # State 1 | ||
] | ||
|
||
yycheck = [ | ||
0, 1, 1, 2, 4, 4 | ||
] | ||
``` | ||
|
||
We can validate accesses to `yytable` by consulting `yycheck`. | ||
`yycheck` stores valid indexes in the original arrays then validation is comparing `yycheck[index_for_yytable]` and `index_for_the_state`. | ||
The access is valid if both values are same. | ||
|
||
```ruby | ||
# Validate an access to state1[2] | ||
yycheck[yypact[1] + 2] == 2 | ||
# => yycheck[1 + 2] == 2 | ||
# => 2 == 2 | ||
# => true (valid) | ||
|
||
# Validate an access to state1[3] | ||
yycheck[yypact[1] + 3] == 3 | ||
# => yycheck[1 + 3] == 3 | ||
# => 4 == 3 | ||
# => false (invalid) | ||
``` | ||
|
||
### `yydefact` & `yydefgoto` | ||
|
||
`yydefact` stores default actions for each state. | ||
|
||
```ruby | ||
rule_id = yydefact[state] | ||
# => 0 means syntax error, other number means reduce using Rule yyn | ||
``` | ||
|
||
`yydefgoto` stores default GOTOs for each nonterminal. | ||
|
||
```ruby | ||
next_state = yydefgoto[lhs_id] | ||
``` | ||
|
||
## Example | ||
|
||
### `yytable` | ||
|
||
Take a look at compressed tables of "compressed_state_table.y". | ||
See "compressed_state_table.output" for detailed information of symbols and states. | ||
|
||
Original action table and GOTO table look like: | ||
|
||
```ruby | ||
# Action table is a matrix of terminals * states | ||
[ | ||
# [ EOF, error, undef, LF, NUM, '+', '*', '(', ')'] (default reduce) | ||
[ , , , , s1, , , s2, ], # State 0 (r1) | ||
[ , , , , , , , , ], # State 1 (r3) | ||
[ , , , , s1, , , s2, ], # State 2 () | ||
[ s6, , , , , , , , ], # State 3 () | ||
[ , , , s7, , s8, s9, , ], # State 4 () | ||
[ , , , , , s8, s9, , s10], # State 5 () | ||
[ , , , , , , , , ], # State 6 (accept) | ||
[ , , , , , , , , ], # State 7 (r2) | ||
[ , , , , s1, , , s2, ], # State 8 () | ||
[ , , , , s1, , , s2, ], # State 9 () | ||
[ , , , , , , , , ], # State 10 (r6) | ||
[ , , , , , , s9, , ], # State 11 (r4) | ||
[ , , , , , , , , ], # State 12 (r5) | ||
] | ||
|
||
# GOTO table is a matrix of states * nonterminals | ||
[ | ||
# [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12] (default goto) | ||
[ , , , , , , , , , , , , ], # $accept (g0) | ||
[ g3, , , , , , , , , , , , ], # program (g3) | ||
[ g4, , g5, , , , , , g11, g12, , , ], # expr (g4) | ||
] | ||
|
||
# => Remove default goto | ||
|
||
[ | ||
# [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12] (default goto) | ||
[ , , , , , , , , , , , , ], # $accept (g0) | ||
[ , , , , , , , , , , , , ], # program (g3) | ||
[ , , g5, , , , , , g11, g12, , , ], # expr (g4) | ||
] | ||
``` | ||
|
||
These are compressed to `yytable` like below. | ||
If offset equals to `YYPACT_NINF`, the line has only default value then the line can be ignored (commented out in this example). | ||
|
||
```ruby | ||
[ | ||
# Action table | ||
# (offset, YYPACT_NINF = -4) | ||
[ , , , , s1, , , s2, ], # State 0 ( 6) | ||
# [ , , , , , , , , ], # State 1 (-4) | ||
[ , , , , s1, , , s2, ], # State 2 ( 6) | ||
[ s6, , , , , , , , ], # State 3 ( 1) | ||
[ , , , s7, , s8, s9, , ], # State 4 (-1) | ||
[ , , , , , s8, s9, , s10], # State 5 ( 3) | ||
# [ , , , , , , , , ], # State 6 (-4) | ||
# [ , , , , , , , , ], # State 7 (-4) | ||
[ , , , , s1, , , s2, ], # State 8 ( 6) | ||
[ , , , , s1, , , s2, ], # State 9 ( 6) | ||
# [ , , , , , , , , ], # State 10 (-4) | ||
[ , , , , , , s9, , ], # State 11 (-3) | ||
# [ , , , , , , , , ], # State 12 (-4) | ||
|
||
# GOTO table | ||
# [ , , , , , , , , , , , , ], # $accept (-4) | ||
# [ , , , , , , , , , , , , ], # program (-4) | ||
[ , , g5, , , , , , g11, g12, , , ], # expr (-2) | ||
] | ||
|
||
# => compressed into single array | ||
[ , , , g5, s6, s7, s9, s8, s9, g11, g12, s8, s9, s1, s10, , s2, ] | ||
|
||
# => Cut blank cells on head and tail, remove 'g' and 's' prefix, fill blank with 0 | ||
# This is `yytable` | ||
[ 5, 6, 7, 9, 8, 9, 11, 12, 8, 9, 1, 10, 0, 2] | ||
``` | ||
|
||
### `yycheck` | ||
|
||
```ruby | ||
[ | ||
# Action table valid indexes | ||
# (offset, YYPACT_NINF = -4) | ||
[ , , , , 4, , , 7, ], # State 0 ( 6) | ||
# [ , , , , , , , , ], # State 1 (-4) | ||
[ , , , , 4, , , 7, ], # State 2 ( 6) | ||
[ 0, , , , , , , , ], # State 3 ( 1) | ||
[ , , , 3, , 5, 6, , ], # State 4 (-1) | ||
[ , , , , , 5, 6, , 8], # State 5 ( 3) | ||
# [ , , , , , , , , ], # State 6 (-4) | ||
# [ , , , , , , , , ], # State 7 (-4) | ||
[ , , , , 4, , , 7, ], # State 8 ( 6) | ||
[ , , , , 4, , , 7, ], # State 9 ( 6) | ||
# [ , , , , , , , , ], # State 10 (-4) | ||
[ , , , , , , 6, , ], # State 11 (-3) | ||
# [ , , , , , , , , ], # State 12 (-4) | ||
|
||
# GOTO table valid indexes | ||
# [ , , , , , , , , , , , , ], # $accept (-4) | ||
# [ , , , , , , , , , , , , ], # program (-4) | ||
[ , , 2, , , , , , 8, 9, , , ], # expr (-2) | ||
] | ||
|
||
# => compressed into single array | ||
[ , , , 2, 0, 3, 6, 5, 6, 8, 9, 5, 6, 4, 8, , 7, ] | ||
|
||
# => Cut blank cells on head and tail, fill blank with -1 because no index can be -1 and comparison always fails | ||
# This is `yycheck` | ||
[ 2, 0, 3, 6, 5, 6, 8, 9, 5, 6, 4, 8, -1, 7] | ||
``` | ||
|
||
### `yypact` & `yypgoto` | ||
|
||
```ruby | ||
|
||
``` | ||
|
||
|
||
### `yydefact` & `yydefgoto` | ||
|
||
```ruby | ||
|
||
``` | ||
|
||
|
||
## How to use each table | ||
|
||
Whole processes of ... | ||
|
||
* 1. If `yypact[state]` is same with `YYPACT_NINF`, it should execute default action. Then consult `yydefact` table. | ||
* 2. Otherwise need to determine what to do next, shift, recude, error. | ||
* 2-1. Reading next token if `yychar` is empty. | ||
* 2-2. Check current token (`yychar`). | ||
* 2-2-1. If `yychar` is end-of-file symbol or less than end-of-file symbol, it means end-of-file. Update `yychar` to `output.eof_symbol.id.s_value` and `yytoken` to `output.eof_symbol.enum_name`. | ||
* 2-2-2. If `yychar` is error symbol, it measn error. Update `yychar` to `output.undef_symbol.id.s_value` to avoid infinite loop in error handling process and update `yytoken` to `output.error_symbol.enum_name`. | ||
* 2-2-3. Otherwise update `yytoken`. Because `yychar` type is `enum yytokentype` and `yytoken` type is `enum yysymbol_kind_t`, need to convert `yychar` to `enum yysymbol_kind_t` by `yytranslate` table before assign it to `yytoken` local variable. | ||
* 2-3. Consult `yycheck` table to determine next action. Index of `yycheck` is `yypact[yystate] + yytoken`. | ||
* 2-3-1. If the value of `yycheck` is same with `yytoken`, consult `yytable`. | ||
* 2-3-2. Otherwise it should execute default action. | ||
* 2-4. Consult `yytable` to determine . Index of `yytable` is `yypact[yystate] + yytoken`, which is same with `yycheck`. | ||
|
||
|
||
```ruby | ||
next_action = nil | ||
yyn = yypact[state] | ||
|
||
# Check if next action is default or not | ||
if yyn == YYPACT_NINF | ||
next_action = :yydefault | ||
return | ||
else | ||
if yychar == YYEMPTY | ||
# Read a token | ||
yychar = yylex() | ||
end | ||
|
||
if yychar <= TOKEN_END_OF_FILE | ||
# End of File | ||
yychar = eof_symbol.id.s_value | ||
yytoken = eof_symbol.enum_name | ||
elsif yychar == TOKEN_ERROR | ||
# Lexer returns YYerror token | ||
yychar = undef_symbol.id.s_value | ||
yytoken = error_symbol.enum_name | ||
next_action = :error | ||
return | ||
else | ||
yytoken = yytranslate[yychar] | ||
end | ||
|
||
# Add token offset to index of yycheck and yytable | ||
yyn += yytoken | ||
if yym < 0 || YYLAST < yyn | ||
# Out of range of yycheck means default action | ||
next_action = :yydefault | ||
return | ||
end | ||
if yycheck[yyn] != yytoken | ||
# No need to consult yytable | ||
next_action = :yydefault | ||
return | ||
end | ||
|
||
yyn = yytable[yyn] | ||
if yyn == YYTABLE_NINF | ||
elsif yyn <= 0 | ||
end | ||
|
||
# Execute shift | ||
yy_state_stack.push(yystate) | ||
yy_semantic_value_stack.push(yylval) | ||
yy_location_stack.push(yylloc) | ||
|
||
# Reset current token | ||
yychar = YYEMPTY | ||
|
||
next_action = :yynewstate | ||
return | ||
end | ||
``` | ||
|
||
|
||
## `yytranslate` | ||
|
Oops, something went wrong.