diff --git a/doc/development/compressed_state_table.md b/doc/development/compressed_state_table.md new file mode 100644 index 00000000..054bf201 --- /dev/null +++ b/doc/development/compressed_state_table.md @@ -0,0 +1,629 @@ +# Compressed State Table + +LR parser generates two large tables, action table and GOTO table. +Action table is a matrix of current state and token. Each cell of action table indicates next action (shift, reduce, accept and error). +GOTO table is a matrix of current state and nonterminal symbol. Each cell of GOTO table indicates next state. + +Both action table and GOTO table are sparse. Therefore LR parser generator compresses both tables and creates these tables. + +* `yypact` & `yypgoto` +* `yytable` +* `yycheck` +* `yydefact` & `yydefgoto` + +See also: https://speakerdeck.com/yui_knk/what-is-expected?slide=52 + +## Introduction to major tables + +### `yypact` & `yypgoto` + +`yypact` specifies what to do on the current state. +Accessing the value by `state`. For example, + +```ruby +yyn = yypact[state] +``` + +If the value is `YYPACT_NINF` (Negative INFinity), it means execution of default reduce action. +Otherwise the value is an offset in `yytable`. + +`yypgoto` plays the same role as `yypact`. +But `yypgoto` is used for GOTO table. +Then its index is nonterminal symbol id. +Especially `yypgoto` is used when reduce happens. + +```ruby +rule_for_reduce = rules[rule_id] + +# lhs_id holds LHS nonterminal id of the rule used for reduce. +lhs_id = rule_for_reduce.lhs.id + +yyn = yypgoto[lhs_id] + +# Validate access to yytable +if yycheck[yyn + state] == state + next_state = yytable[yyn + state] +end +``` + +### `yytable` + +`yytable` specifies what actually to do on the current state. + +Positive number means shift and specifies next state. +For example, `yytable[yyn] == 1` means shift and next state is State 1. + +`YYTABLE_NINF` (Negative INFinity) means syntax error. +For example, `yytable[yyn] == YYTABLE_NINF` means syntax error. + +Other negative number and zero mean reducing with the rule whose number is opposite. +For example, `yytable[yyn] == -1` means reduce with Rule 1. + +### `yycheck` + +`yycheck` validates accesses to `yytable`. + +Each line of action table and GOTO table is placed into single array in `yytable`. +Consider the case where action table has only two states. +In this case, if the second array is shifted to the right, they can be merged into one array without conflict. + +```ruby +[ + [ 'a', 'b', , , 'e'], # State 0 + [ , 'B', 'C', , 'E'], # State 1 +] + +# => Shift the second array to the right + +[ + [ 'a', 'b', , , 'e'], # State 0 + [ , 'B', 'C', , 'E'], # State 1 +] + +# => Merge them into single array + +yytable = [ + 'a', 'b', 'B', 'C', 'e', 'E' +] +``` + +`yypact` is an array of each state offset. + +```ruby +yypact = [0, 1] +``` + +We can access the value of `state1[2]` by consulting `yypact`. + +```ruby +yytable[yypact[1] + 2] +# => yytable[1 + 2] +# => 'C' +``` + +However this approach doesn't work well when accessing to nil value like `state1[3]`. +Because it tries to access to `state0[4]`. + +```ruby +yytable[yypact[1] + 3] +# => yytable[1 + 3] +# => 'e' +``` + +This is why `yycheck` is needed. +`yycheck` stores valid indexes of the original table. +In the current example: + +* 0, 1 and 4 are valid index of State 0 +* 1, 2 and 4 are valid index of State 1 + +`yycheck` stores these indexes with same offset with `yytable`. + +```ruby +# yytable +[ + [ 'a', 'b', , , 'e'], # State 0 + [ , 'B', 'C', , 'E'], # State 1 +] + +yytable = [ + 'a', 'b', 'B', 'C', 'e', 'E' +] + +# yycheck +[ + [ 0, 1, , , 4], # State 0 + [ , 1, 2, , 4], # State 1 +] + +yycheck = [ + 0, 1, 1, 2, 4, 4 +] +``` + +We can validate accesses to `yytable` by consulting `yycheck`. +`yycheck` stores valid indexes in the original arrays then validation is comparing `yycheck[index_for_yytable]` and `index_for_the_state`. +The access is valid if both values are same. + +```ruby +# Validate an access to state1[2] +yycheck[yypact[1] + 2] == 2 +# => yycheck[1 + 2] == 2 +# => 2 == 2 +# => true (valid) + +# Validate an access to state1[3] +yycheck[yypact[1] + 3] == 3 +# => yycheck[1 + 3] == 3 +# => 4 == 3 +# => false (invalid) +``` + +### `yydefact` & `yydefgoto` + +`yydefact` stores rule id of default actions for each state. +`0` means syntax error, other number means reduce using Rule N. + +```ruby +rule_id = yydefact[state] +# => 0 means syntax error, other number means reduce using Rule whose id is `rule_id` +``` + +`yydefgoto` stores default GOTOs for each nonterminal. +The number means next state. + +```ruby +next_state = yydefgoto[lhs_id] +# => Next state id is `next_state` +``` + +## Example + +Take a look at compressed tables of "compressed_state_table.y". +See "compressed_state_table.output" for detailed information of symbols and states. + +### `yytable` + +Original action table and GOTO table look like: + +```ruby +# Action table is a matrix of terminals * states +[ +# [ EOF, error, undef, LF, NUM, '+', '*', '(', ')'] (default reduce) + [ , , , , s1, , , s2, ], # State 0 (r1) + [ , , , , , , , , ], # State 1 (r3) + [ , , , , s1, , , s2, ], # State 2 () + [ s6, , , , , , , , ], # State 3 () + [ , , , s7, , s8, s9, , ], # State 4 () + [ , , , , , s8, s9, , s10], # State 5 () + [ , , , , , , , , ], # State 6 (accept) + [ , , , , , , , , ], # State 7 (r2) + [ , , , , s1, , , s2, ], # State 8 () + [ , , , , s1, , , s2, ], # State 9 () + [ , , , , , , , , ], # State 10 (r6) + [ , , , , , , s9, , ], # State 11 (r4) + [ , , , , , , , , ], # State 12 (r5) +] + +# GOTO table is a matrix of states * nonterminals +[ +# [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12] State No (default goto) + [ , , , , , , , , , , , , ], # $accept (g0) + [ g3, , , , , , , , , , , , ], # program (g3) + [ g4, , g5, , , , , , g11, g12, , , ], # expr (g4) +] + +# => Remove default goto + +[ +# [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12] State No (default goto) + [ , , , , , , , , , , , , ], # $accept (g0) + [ , , , , , , , , , , , , ], # program (g3) + [ , , g5, , , , , , g11, g12, , , ], # expr (g4) +] +``` + +These are compressed to `yytable` like below. +If offset equals to `YYPACT_NINF`, the line has only default value then the line can be ignored (commented out in this example). + +```ruby +[ +# Action table +# (offset, YYPACT_NINF = -4) + [ , , , , s1, , , s2, ], # State 0 ( 6) +# [ , , , , , , , , ], # State 1 (-4) + [ , , , , s1, , , s2, ], # State 2 ( 6) + [ s6, , , , , , , , ], # State 3 ( 1) + [ , , , s7, , s8, s9, , ], # State 4 (-1) + [ , , , , , s8, s9, , s10], # State 5 ( 3) +# [ , , , , , , , , ], # State 6 (-4) +# [ , , , , , , , , ], # State 7 (-4) + [ , , , , s1, , , s2, ], # State 8 ( 6) + [ , , , , s1, , , s2, ], # State 9 ( 6) +# [ , , , , , , , , ], # State 10 (-4) +[ , , , , , , s9, , ], # State 11 (-3) +# [ , , , , , , , , ], # State 12 (-4) + +# GOTO table +# [ , , , , , , , , , , , , ], # $accept (-4) +# [ , , , , , , , , , , , , ], # program (-4) + [ , , g5, , , , , , g11, g12, , , ], # expr (-2) +] + +# => compressed into single array +[ , , , g5, s6, s7, s9, s8, s9, g11, g12, s8, s9, s1, s10, , s2, ] + +# => Cut blank cells on head and tail, remove 'g' and 's' prefix, fill blank with 0 +# This is `yytable` + [ 5, 6, 7, 9, 8, 9, 11, 12, 8, 9, 1, 10, 0, 2] +``` + +`YYTABLE_NINF` is the minimum negative number. +In this case, `0` is the minimum offset number then `YYTABLE_NINF` is `-1`. + +### `yycheck` + +```ruby +[ +# Action table valid indexes +# (offset, YYPACT_NINF = -4) + [ , , , , 4, , , 7, ], # State 0 ( 6) +# [ , , , , , , , , ], # State 1 (-4) + [ , , , , 4, , , 7, ], # State 2 ( 6) + [ 0, , , , , , , , ], # State 3 ( 1) + [ , , , 3, , 5, 6, , ], # State 4 (-1) + [ , , , , , 5, 6, , 8], # State 5 ( 3) +# [ , , , , , , , , ], # State 6 (-4) +# [ , , , , , , , , ], # State 7 (-4) + [ , , , , 4, , , 7, ], # State 8 ( 6) + [ , , , , 4, , , 7, ], # State 9 ( 6) +# [ , , , , , , , , ], # State 10 (-4) +[ , , , , , , 6, , ], # State 11 (-3) +# [ , , , , , , , , ], # State 12 (-4) + +# GOTO table valid indexes +# [ , , , , , , , , , , , , ], # $accept (-4) +# [ , , , , , , , , , , , , ], # program (-4) + [ , , 2, , , , , , 8, 9, , , ], # expr (-2) +] + +# => compressed into single array +[ , , , 2, 0, 3, 6, 5, 6, 8, 9, 5, 6, 4, 8, , 7, ] + +# => Cut blank cells on head and tail, fill blank with -1 because no index can be -1 and comparison always fails +# This is `yycheck` + [ 2, 0, 3, 6, 5, 6, 8, 9, 5, 6, 4, 8, -1, 7] +``` + +### `yypact` & `yypgoto` + +`yypact` & `yypgoto` are mixture of offset in `yytable` and `YYPACT_NINF` (default reduce action). +The index in `yypact` is state id, the index in `yypgoto` is nonterminal symbol id. +`YYPACT_NINF` is the minimum negative number. +In this case, `-3` is the minimum offset number then `YYPACT_NINF` is `-4`. + +```ruby +YYPACT_NINF = -4 + +yypact = [ +# 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 (State No) + 6, -4, 6, 1, -1, 3, -4, -4, 6, 6, -4, -3, -4 +] + +yypgoto = [ +# $accept, program, expr + -4, -4, -2 +] +``` + +### `yydefact` & `yydefgoto` + +`yydefact` & `yydefgoto` store default value. + +`yydefact` specifies rule id of default actions of the state. +Because `0` is reserved for syntax error, Rule id starts with 1. + +``` +# In "compressed_state_table.output" +Grammar + + 0 $accept: program "end of file" + + 1 program: ε + 2 | expr LF + + 3 expr: NUM + 4 | expr '+' expr + 5 | expr '*' expr + 6 | '(' expr ')' + +# => + +# In `yydefact` +Grammar + + 0 Syntax Error + + 1 $accept: program "end of file" + + 2 program: ε + 3 | expr LF + + 4 expr: NUM + 5 | expr '+' expr + 6 | expr '*' expr + 7 | '(' expr ')' +``` + +For example, default action for state 1 is 4 (`yydefact[1] == 4`). +This means Rule 3 (`3 expr: NUM`) in "compressed_state_table.output" file. + +`yydefgoto` specifies next state id of the nonterminal. + +```ruby +yydefact = [ +# 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 (State No) + 2, 4, 0, 0, 0, 0, 1, 3, 0, 0, 7, 5, 6 +] + +yydefgoto = [ +# $accept, program, expr + 0, 3, 4 +] +``` + +### `yyr1` & `yyr2` + +Both of them are Rule table. +`yyr1` specifies nonterminal symbol id of rule's Left-Hand-Side. +`yyr2` specifies the length of the rule, number of symbols on the rule's Right-Hand-Side. +Index 0 + +```ruby +yyr1 = [ +# 0, 1, 2, 3, 4, 5, 6, 7 (Rule id) +# no rule, $accept, program, program, expr, expr, expr, expr (LHS symbol id) + 0, 9, 10, 10, 11, 11, 11, 11 +] + +yyr2 = [ +# 0, 1, 2, 3, 4, 5, 6, 7 (Rule id) + 0, 2, 0, 2, 1, 3, 3, 3 +] +``` + +## How to use tables + +```ruby +YYNTOKENS = 9 + +# The last index of yytable and yycheck +# The lenght of yytable and yycheck are always same +YYLAST = 13 +YYTABLE_NINF = -1 +yytable = [ 5, 6, 7, 9, 8, 9, 11, 12, 8, 9, 1, 10, 0, 2] +yycheck = [ 2, 0, 3, 6, 5, 6, 8, 9, 5, 6, 4, 8, -1, 7] + +YYPACT_NINF = -4 +yypact = [ 6, -4, 6, 1, -1, 3, -4, -4, 6, 6, -4, -3, -4] +yypgoto = [ -4, -4, -2] + +yydefact = [ 2, 4, 0, 0, 0, 0, 1, 3, 0, 0, 7, 5, 6] +yydefgoto = [ 0, 3, 4] + +yyr1 = [ 0, 9, 10, 10, 11, 11, 11, 11] +yyr2 = [ 0, 2, 0, 2, 1, 3, 3, 3] +``` + +### Determine what to do next + +Determine what to do next based on current state (`state`) and next token (`yytoken`). + +```ruby +# Case 1: Only default reduce exists for the state +# +# State 7 +# +# 2 program: expr LF • +# +# $default reduce using rule 2 (program) + +state = 7 +yytoken = nil # Do not use yytoken in this case + +offset = yypact[state] # -4 +if offset == YYPACT_NINF # true + next_action = :yydefault + return +end +``` + +```ruby +# Case 2: Both shift and default reduce exists for the state +# +# State 11 +# +# 4 expr: expr • '+' expr +# 4 | expr '+' expr • [LF, '+', ')'] +# 5 | expr • '*' expr +# +# '*' shift, and go to state 9 +# +# $default reduce using rule 4 (expr) + +# Next token is '*' then shift it +state = 11 +yytoken = nil + +offset = yypact[state] # -3 +if offset == YYPACT_NINF # false + next_action = :yydefault + break +end + +unless yytoken + yytoken = yylex() # yylex returns 6 ('*') +end + +idx = yypact[state] + yytoken +if idx < 0 || YYLAST < idx # false + next_action = :yydefault + break +end +if yycheck[idx] != yytoken # false + next_action = :yydefault + break +end + +act = yytable[idx] +if act == YYTABLE_NINF # false + next_action = :syntax_error + break +end +if act > 0 # true + # Shift + next_action = :yyshift + break +else + # Reduce + next_action = :yyreduce + break +end +``` + +### Execute (default) reduce + +```ruby +# State 11 +# +# 4 expr: expr • '+' expr +# 4 | expr '+' expr • [LF, '+', ')'] +# 5 | expr • '*' expr +# +# '*' shift, and go to state 9 +# +# $default reduce using rule 4 (expr) + +# Input is "1 + 2 + 3" and next token is the second '+'. +# Current state stack is `[0, 4, 8, 11]`. +# What to do next is reduce as default action. +state = 11 +yytoken = 5 # '+' + +rule = yydefact[state] # 5 +if rule == 0 # false + next_action = :syntax_error + break +end + +rhs_length = yyr2[rule] # 3 because rule 4 is "expr: expr '+' expr" +lhs_nterm = yyr1[rule] # 11 (expr) +lhs_nterm_id = lhs_nterm - YYNTOKENS # 11 - 9 = 2 + +case rule +when 1 + # Execute Rule 1 action +when 2 + # Execute Rule 2 action +#... +when 7 + # Execute Rule 7 action +end + +pop_stack(rhs_length) # state stack: `[0, 4, 8, 11]` -> `[0]` +# state = 0 + +offset = yypgoto[lhs_nterm_id] # -2 +if offset == YYPACT_NINF # false + state = yydefgoto[lhs_nterm_id] +else + idx = yytable[offset] + state # 0 + if idx < 0 || YYLAST < idx # true + state = yydefgoto[lhs_nterm_id] # 4 + else + state = yytable[idx] + end +end + +push_state(state, yyval, yyloc) # yyval = $$, yyloc = @$ +``` + +### + +Whole processes of ... + +* 1. If `yypact[state]` is same with `YYPACT_NINF`, it should execute default action. Then consult `yydefact` table. +* 2. Otherwise need to determine what to do next, shift, recude, error. + * 2-1. Reading next token if `yychar` is empty. + * 2-2. Check current token (`yychar`). + * 2-2-1. If `yychar` is end-of-file symbol or less than end-of-file symbol, it means end-of-file. Update `yychar` to `output.eof_symbol.id.s_value` and `yytoken` to `output.eof_symbol.enum_name`. + * 2-2-2. If `yychar` is error symbol, it measn error. Update `yychar` to `output.undef_symbol.id.s_value` to avoid infinite loop in error handling process and update `yytoken` to `output.error_symbol.enum_name`. + * 2-2-3. Otherwise update `yytoken`. Because `yychar` type is `enum yytokentype` and `yytoken` type is `enum yysymbol_kind_t`, need to convert `yychar` to `enum yysymbol_kind_t` by `yytranslate` table before assign it to `yytoken` local variable. + * 2-3. Consult `yycheck` table to determine next action. Index of `yycheck` is `yypact[yystate] + yytoken`. + * 2-3-1. If the value of `yycheck` is same with `yytoken`, consult `yytable`. + * 2-3-2. Otherwise it should execute default action. + * 2-4. Consult `yytable` to determine . Index of `yytable` is `yypact[yystate] + yytoken`, which is same with `yycheck`. + + +```ruby +next_action = nil +yyn = yypact[state] + +# Check if next action is default or not +if yyn == YYPACT_NINF + next_action = :yydefault + return +else + if yychar == YYEMPTY + # Read a token + yychar = yylex() + end + + if yychar <= TOKEN_END_OF_FILE + # End of File + yychar = eof_symbol.id.s_value + yytoken = eof_symbol.enum_name + elsif yychar == TOKEN_ERROR + # Lexer returns YYerror token + yychar = undef_symbol.id.s_value + yytoken = error_symbol.enum_name + next_action = :error + return + else + yytoken = yytranslate[yychar] + end + + # Add token offset to index of yycheck and yytable + yyn += yytoken + if yym < 0 || YYLAST < yyn + # Out of range of yycheck means default action + next_action = :yydefault + return + end + if yycheck[yyn] != yytoken + # No need to consult yytable + next_action = :yydefault + return + end + + yyn = yytable[yyn] + if yyn == YYTABLE_NINF + elsif yyn <= 0 + end + + # Execute shift + yy_state_stack.push(yystate) + yy_semantic_value_stack.push(yylval) + yy_location_stack.push(yylloc) + + # Reset current token + yychar = YYEMPTY + + next_action = :yynewstate + return +end +``` + + +## `yytranslate` + diff --git a/doc/development/compressed_state_table.output b/doc/development/compressed_state_table.output new file mode 100644 index 00000000..02e8a2ef --- /dev/null +++ b/doc/development/compressed_state_table.output @@ -0,0 +1,174 @@ +Symbol + + -2 EMPTY + 0 "end of file" + 1 error + 2 "invalid token" (undef) + 3 LF + 4 NUM + 5 '+' + 6 '*' + 7 '(' + 8 ')' + 9 $accept # Start of nonterminal + 10 program + 11 expr + + +Grammar + + 0 $accept: program "end of file" + + 1 program: ε + 2 | expr LF + + 3 expr: NUM + 4 | expr '+' expr + 5 | expr '*' expr + 6 | '(' expr ')' + + +State 0 + + 0 $accept: • program "end of file" + 1 program: ε • ["end of file"] + 2 | • expr LF + 3 expr: • NUM + 4 | • expr '+' expr + 5 | • expr '*' expr + 6 | • '(' expr ')' + + NUM shift, and go to state 1 + '(' shift, and go to state 2 + + $default reduce using rule 1 (program) + + program go to state 3 + expr go to state 4 + + +State 1 + + 3 expr: NUM • + + $default reduce using rule 3 (expr) + + +State 2 + + 3 expr: • NUM + 4 | • expr '+' expr + 5 | • expr '*' expr + 6 | • '(' expr ')' + 6 | '(' • expr ')' + + NUM shift, and go to state 1 + '(' shift, and go to state 2 + + expr go to state 5 + + +State 3 + + 0 $accept: program • "end of file" + + "end of file" shift, and go to state 6 + + +State 4 + + 2 program: expr • LF + 4 expr: expr • '+' expr + 5 | expr • '*' expr + + LF shift, and go to state 7 + '+' shift, and go to state 8 + '*' shift, and go to state 9 + + +State 5 + + 4 expr: expr • '+' expr + 5 | expr • '*' expr + 6 | '(' expr • ')' + + '+' shift, and go to state 8 + '*' shift, and go to state 9 + ')' shift, and go to state 10 + + +State 6 + + 0 $accept: program "end of file" • + + $default accept + + +State 7 + + 2 program: expr LF • + + $default reduce using rule 2 (program) + + +State 8 + + 3 expr: • NUM + 4 | • expr '+' expr + 4 | expr '+' • expr + 5 | • expr '*' expr + 6 | • '(' expr ')' + + NUM shift, and go to state 1 + '(' shift, and go to state 2 + + expr go to state 11 + + +State 9 + + 3 expr: • NUM + 4 | • expr '+' expr + 5 | • expr '*' expr + 5 | expr '*' • expr + 6 | • '(' expr ')' + + NUM shift, and go to state 1 + '(' shift, and go to state 2 + + expr go to state 12 + + +State 10 + + 6 expr: '(' expr ')' • + + $default reduce using rule 6 (expr) + + +State 11 + + 4 expr: expr • '+' expr + 4 | expr '+' expr • [LF, '+', ')'] + 5 | expr • '*' expr + + '*' shift, and go to state 9 + + $default reduce using rule 4 (expr) + + Conflict between rule 4 and token '+' resolved as reduce (%left '+'). + Conflict between rule 4 and token '*' resolved as shift ('+' < '*'). + + +State 12 + + 4 expr: expr • '+' expr + 5 | expr • '*' expr + 5 | expr '*' expr • [LF, '+', '*', ')'] + + $default reduce using rule 5 (expr) + + Conflict between rule 5 and token '+' resolved as reduce ('+' < '*'). + Conflict between rule 5 and token '*' resolved as reduce (%left '*'). + + diff --git a/doc/development/compressed_state_table.y b/doc/development/compressed_state_table.y new file mode 100644 index 00000000..9ed0d71f --- /dev/null +++ b/doc/development/compressed_state_table.y @@ -0,0 +1,22 @@ +%union { + int val; +} +%token LF +%token NUM +%type expr +%left '+' +%left '*' + +%% + +program : /* empty */ + | expr LF { printf("=> %d\n", $1); } + ; + +expr : NUM + | expr '+' expr { $$ = $1 + $3; } + | expr '*' expr { $$ = $1 * $3; } + | '(' expr ')' { $$ = $2; } + ; + +%% diff --git a/doc/development/compressed_state_table_parser.rb b/doc/development/compressed_state_table_parser.rb new file mode 100644 index 00000000..b0c4281e --- /dev/null +++ b/doc/development/compressed_state_table_parser.rb @@ -0,0 +1,279 @@ +class Parser + YYNTOKENS = 9 + YYLAST = 13 + YYTABLE_NINF = -1 + YYTABLE = [ 5, 6, 7, 9, 8, 9, 11, 12, 8, 9, 1, 10, 0, 2] + YYCHECK = [ 2, 0, 3, 6, 5, 6, 8, 9, 5, 6, 4, 8, -1, 7] + + YYPACT_NINF = -4 + YYPACT = [ 6, -4, 6, 1, -1, 3, -4, -4, 6, 6, -4, -3, -4] + YYPGOTO = [ -4, -4, -2] + + YYDEFACT = [ 2, 4, 0, 0, 0, 0, 1, 3, 0, 0, 7, 5, 6] + YYDEFGOTO = [ 0, 3, 4] + + YYR1 = [ 0, 9, 10, 10, 11, 11, 11, 11] + YYR2 = [ 0, 2, 0, 2, 1, 3, 3, 3] + + YYFINAL = 6 + + # Symbols + SYM_EMPTY = -2 + SYM_EOF = 0 # "end of file" + SYM_ERROR = 1 # error + SYM_UNDEF = 2 # Invalid Token + SYM_LF = 3 # LF + SYM_NUM = 4 # NUM + SYM_PLUS = 5 # '+' + SYM_ASTER = 6 # '*' + SYM_LPAREN = 7 # '(' + SYM_RPAREN = 8 # ')' + # Start of nonterminal + SYM_ACCEPT = 9 # $accept + SYM_PROGRAM = 10 # program + SYM_EXPR = 11 # expr + + def initialize(debug = false) + @debug = debug + end + + def parse(lexer) + state = 0 + stack = [] + yytoken = SYM_EMPTY + parser_action = :push_state + next_state = nil + rule = nil + + while true + _parser_action = parser_action + parser_action = nil + + case _parser_action + when :syntax_error + debug_print("Entering :syntax_error") + + return 1 + when :accept + debug_print("Entering :accept") + + return 0 + when :push_state + debug_print("Entering :push_state") + + debug_print("Push state #{state}") + stack.push(state) + debug_print("Current stack #{stack}") + + if state == YYFINAL + parser_action = :accept + next + end + + parser_action = :decide_parser_action + next + when :decide_parser_action + debug_print("Entering :decide_parser_action") + + offset = yypact[state] + if offset == YYPACT_NINF + parser_action = :yydefault + next + end + + # Ensure next token + if yytoken == SYM_EMPTY + debug_print("Reading a token") + + yytoken = lexer.next_token + end + + case yytoken + when SYM_EOF + debug_print("Now at end of input.") + when SYM_ERROR + parser_action = :syntax_error + next + else + debug_print("Next token is #{yytoken}") + end + + idx = yypact[state] + yytoken + if idx < 0 || YYLAST < idx + debug_print("Decide next parser action as :yydefault") + + parser_action = :yydefault + next + end + if yycheck[idx] != yytoken + debug_print("Decide next parser action as :yydefault") + + parser_action = :yydefault + next + end + + action = yytable[idx] + if action == YYTABLE_NINF + parser_action = :syntax_error + next + end + if action > 0 + # Shift + debug_print("Decide next parser action as :yyshift") + + next_state = action + parser_action = :yyshift + next + else + # Reduce + debug_print("Decide next parser action as :yyreduce") + + rule = -action + parser_action = :yyreduce + next + end + when :yyshift + # Precondition: `next_state` is set + debug_print("Entering :yyshift") + raise "next_state is not set" unless next_state + + yytoken = SYM_EMPTY + state = next_state + next_state = nil + parser_action = :push_state + next + when :yydefault + debug_print("Entering :yydefault") + + rule = yydefact[state] + if rule == 0 + parser_action = :syntax_error + next + end + + parser_action = :yyreduce + next + when :yyreduce + # Precondition: `rule`, used for reduce, is set + debug_print("Entering :yyreduce") + raise "rule is not set" unless rule + + rhs_length = yyr2[rule] + lhs_nterm = yyr1[rule] + lhs_nterm_id = lhs_nterm - YYNTOKENS + + text = "Execute action for Rule (#{rule}) " + case rule + when 1 + text << "$accept: program \"end of file\"" + when 2 + text << "program: ε" + when 3 + text << "program: expr LF" + when 4 + text << "expr: NUM" + when 5 + text << "expr: expr '+' expr" + when 6 + text << "expr: expr '*' expr" + when 7 + text << "expr: '(' expr ')'" + end + debug_print(text) + + debug_print("Pop #{rhs_length} elements") + debug_print("Stack before pop: #{stack}") + stack.pop(rhs_length) + debug_print("Stack after pop: #{stack}") + state = stack[-1] + + # "Shift" LHS nonterminal + offset = yypgoto[lhs_nterm_id] + if offset == YYPACT_NINF + state = yydefgoto[lhs_nterm_id] + else + idx = offset + state + if idx < 0 || YYLAST < idx + state = yydefgoto[lhs_nterm_id] + else + state = yytable[idx] + end + end + + rule = nil + parser_action = :push_state + next + else + raise "Unknown parser_action: #{parser_action}" + end + end + end + + private + + def debug_print(str) + if @debug + $stderr.puts str + end + end + + def yytable + YYTABLE + end + + def yycheck + YYCHECK + end + + def yypact + YYPACT + end + + def yypgoto + YYPGOTO + end + + def yydefact + YYDEFACT + end + + def yydefgoto + YYDEFGOTO + end + + def yyr1 + YYR1 + end + + def yyr2 + YYR2 + end +end + +class Lexer + def initialize(tokens) + @tokens = tokens + @index = 0 + end + + def next_token + if @tokens.length > @index + token = @tokens[@index] + @index += 1 + return token + else + return Parser::SYM_EOF + end + end +end + +lexer = Lexer.new([ + # 1 + 2 + 3 LF + Parser::SYM_NUM, + Parser::SYM_PLUS, + Parser::SYM_NUM, + Parser::SYM_PLUS, + Parser::SYM_NUM, + Parser::SYM_LF, +]) +Parser.new(debug: true).parse(lexer)