Skip to content

Commit

Permalink
Write compressed_state_table.md
Browse files Browse the repository at this point in the history
  • Loading branch information
yui-knk committed Sep 19, 2024
1 parent ea5306c commit 5cd832c
Show file tree
Hide file tree
Showing 3 changed files with 578 additions and 0 deletions.
382 changes: 382 additions & 0 deletions doc/development/compressed_state_table.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,382 @@
# Compressed State Table

LR parser generates two large tables, action table and GOTO table.
Action table is a matrix of current state and token. Each cell of action table indicates next action (shift, reduce, accept and error).
GOTO table is a matrix of current state and nonterminal symbol. Each cell of GOTO table indicates next state.

Both action table and GOTO table are sparse. Therefore LR parser generator compresses both tables and creates these tables.

* `yypact` & `yypgoto`
* `yytable`
* `yycheck`
* `yydefact` & `yydefgoto`

See also: https://speakerdeck.com/yui_knk/what-is-expected?slide=52

## Introduction to major tables

### `yypact` & `yypgoto`

`yypact` specifies what to do on the current state.
Accessing the value by `state`. For example,

```ruby
yyn = yypact[state]
```

If the value is `YYPACT_NINF` (Negative INFinity), it means execution of default reduce action.
Otherwise the value is an offset in `yytable`.

`yypgoto` plays the same role as `yypact`. But `yypgoto` is used for GOTO table.
Especially `yypgoto` is used when reduce happens.

```ruby
rule_for_reduce = rules[rule_id]

# lhs_id holds LHS nonterminal id of the rule used for reduce.
lhs_id = rule_for_reduce.lhs.id

yyn = yypgoto[lhs_id]

# Validate access to yytable
if yycheck[yyn + state] == state
next_state = yytable[yyn + state]
end
```

### `yytable`

`yytable` specifies what actually to do on the current state.

Positive number means shift and specifies next state.
For example, `yytable[yyn] == 1` means shift and next state is State 1.

`YYTABLE_NINF` (Negative INFinity) means syntax error.
For example, `yytable[yyn] == YYTABLE_NINF` means syntax error.

Other negative number and zero mean reducing with the rule whose number is opposite.
For example, `yytable[yyn] == -1` means reduce with Rule 1.

### `yycheck`

`yycheck` validates accesses to `yytable`.

Each line of action table and GOTO table is placed into single array in `yytable`.
Consider the case where action table has only two states.
In this case, if the second array is shifted to the right, they can be merged into one array without conflict.

```ruby
[
[ 'a', 'b', , , 'e'], # State 0
[ , 'B', 'C', , 'E'], # State 1
]

# => Shift the second array to the right

[
[ 'a', 'b', , , 'e'], # State 0
[ , 'B', 'C', , 'E'], # State 1
]

# => Merge them into single array

yytable = [
'a', 'b', 'B', 'C', 'e', 'E'
]
```

`yypact` is an array of each state offset.

```ruby
yypact = [0, 1]
```

We can access the value of `state1[2]` by consulting `yypact`.

```ruby
yytable[yypact[1] + 2]
# => yytable[1 + 2]
# => 'C'
```

However this approach doesn't work well when accessing to nil value like `state1[3]`.
Because it tries to access to `state0[4]`.

```ruby
yytable[yypact[1] + 3]
# => yytable[1 + 3]
# => 'e'
```

This is why `yycheck` is needed.
`yycheck` stores valid indexes of the original table.
In the current example:

* 0, 1 and 4 are valid index of State 0
* 1, 2 and 4 are valid index of State 1

`yycheck` stores these indexes with same offset with `yytable`.

```ruby
# yytable
[
[ 'a', 'b', , , 'e'], # State 0
[ , 'B', 'C', , 'E'], # State 1
]

yytable = [
'a', 'b', 'B', 'C', 'e', 'E'
]

# yycheck
[
[ 0, 1, , , 4], # State 0
[ , 1, 2, , 4], # State 1
]

yycheck = [
0, 1, 1, 2, 4, 4
]
```

We can validate accesses to `yytable` by consulting `yycheck`.
`yycheck` stores valid indexes in the original arrays then validation is comparing `yycheck[index_for_yytable]` and `index_for_the_state`.
The access is valid if both values are same.

```ruby
# Validate an access to state1[2]
yycheck[yypact[1] + 2] == 2
# => yycheck[1 + 2] == 2
# => 2 == 2
# => true (valid)

# Validate an access to state1[3]
yycheck[yypact[1] + 3] == 3
# => yycheck[1 + 3] == 3
# => 4 == 3
# => false (invalid)
```

### `yydefact` & `yydefgoto`

`yydefact` stores default actions for each state.

```ruby
rule_id = yydefact[state]
# => 0 means syntax error, other number means reduce using Rule yyn
```

`yydefgoto` stores default GOTOs for each nonterminal.

```ruby
next_state = yydefgoto[lhs_id]
```

## Example

### `yytable`

Take a look at compressed tables of "compressed_state_table.y".
See "compressed_state_table.output" for detailed information of symbols and states.

Original action table and GOTO table look like:

```ruby
# Action table is a matrix of terminals * states
[
# [ EOF, error, undef, LF, NUM, '+', '*', '(', ')'] (default reduce)
[ , , , , s1, , , s2, ], # State 0 (r1)
[ , , , , , , , , ], # State 1 (r3)
[ , , , , s1, , , s2, ], # State 2 ()
[ s6, , , , , , , , ], # State 3 ()
[ , , , s7, , s8, s9, , ], # State 4 ()
[ , , , , , s8, s9, , s10], # State 5 ()
[ , , , , , , , , ], # State 6 (accept)
[ , , , , , , , , ], # State 7 (r2)
[ , , , , s1, , , s2, ], # State 8 ()
[ , , , , s1, , , s2, ], # State 9 ()
[ , , , , , , , , ], # State 10 (r6)
[ , , , , , , s9, , ], # State 11 (r4)
[ , , , , , , , , ], # State 12 (r5)
]

# GOTO table is a matrix of states * nonterminals
[
# [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12] (default goto)
[ , , , , , , , , , , , , ], # $accept (g0)
[ g3, , , , , , , , , , , , ], # program (g3)
[ g4, , g5, , , , , , g11, g12, , , ], # expr (g4)
]

# => Remove default goto

[
# [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12] (default goto)
[ , , , , , , , , , , , , ], # $accept (g0)
[ , , , , , , , , , , , , ], # program (g3)
[ , , g5, , , , , , g11, g12, , , ], # expr (g4)
]
```

These are compressed to `yytable` like below.
If offset equals to `YYPACT_NINF`, the line has only default value then the line can be ignored (commented out in this example).

```ruby
[
# Action table
# (offset, YYPACT_NINF = -4)
[ , , , , s1, , , s2, ], # State 0 ( 6)
# [ , , , , , , , , ], # State 1 (-4)
[ , , , , s1, , , s2, ], # State 2 ( 6)
[ s6, , , , , , , , ], # State 3 ( 1)
[ , , , s7, , s8, s9, , ], # State 4 (-1)
[ , , , , , s8, s9, , s10], # State 5 ( 3)
# [ , , , , , , , , ], # State 6 (-4)
# [ , , , , , , , , ], # State 7 (-4)
[ , , , , s1, , , s2, ], # State 8 ( 6)
[ , , , , s1, , , s2, ], # State 9 ( 6)
# [ , , , , , , , , ], # State 10 (-4)
[ , , , , , , s9, , ], # State 11 (-3)
# [ , , , , , , , , ], # State 12 (-4)

# GOTO table
# [ , , , , , , , , , , , , ], # $accept (-4)
# [ , , , , , , , , , , , , ], # program (-4)
[ , , g5, , , , , , g11, g12, , , ], # expr (-2)
]

# => compressed into single array
[ , , , g5, s6, s7, s9, s8, s9, g11, g12, s8, s9, s1, s10, , s2, ]

# => Cut blank cells on head and tail, remove 'g' and 's' prefix, fill blank with 0
# This is `yytable`
[ 5, 6, 7, 9, 8, 9, 11, 12, 8, 9, 1, 10, 0, 2]
```

### `yycheck`

```ruby
[
# Action table valid indexes
# (offset, YYPACT_NINF = -4)
[ , , , , 4, , , 7, ], # State 0 ( 6)
# [ , , , , , , , , ], # State 1 (-4)
[ , , , , 4, , , 7, ], # State 2 ( 6)
[ 0, , , , , , , , ], # State 3 ( 1)
[ , , , 3, , 5, 6, , ], # State 4 (-1)
[ , , , , , 5, 6, , 8], # State 5 ( 3)
# [ , , , , , , , , ], # State 6 (-4)
# [ , , , , , , , , ], # State 7 (-4)
[ , , , , 4, , , 7, ], # State 8 ( 6)
[ , , , , 4, , , 7, ], # State 9 ( 6)
# [ , , , , , , , , ], # State 10 (-4)
[ , , , , , , 6, , ], # State 11 (-3)
# [ , , , , , , , , ], # State 12 (-4)

# GOTO table valid indexes
# [ , , , , , , , , , , , , ], # $accept (-4)
# [ , , , , , , , , , , , , ], # program (-4)
[ , , 2, , , , , , 8, 9, , , ], # expr (-2)
]

# => compressed into single array
[ , , , 2, 0, 3, 6, 5, 6, 8, 9, 5, 6, 4, 8, , 7, ]

# => Cut blank cells on head and tail, fill blank with -1 because no index can be -1 and comparison always fails
# This is `yycheck`
[ 2, 0, 3, 6, 5, 6, 8, 9, 5, 6, 4, 8, -1, 7]
```

### `yypact` & `yypgoto`

```ruby

```


### `yydefact` & `yydefgoto`

```ruby

```


## How to use each table

Whole processes of ...

* 1. If `yypact[state]` is same with `YYPACT_NINF`, it should execute default action. Then consult `yydefact` table.
* 2. Otherwise need to determine what to do next, shift, recude, error.
* 2-1. Reading next token if `yychar` is empty.
* 2-2. Check current token (`yychar`).
* 2-2-1. If `yychar` is end-of-file symbol or less than end-of-file symbol, it means end-of-file. Update `yychar` to `output.eof_symbol.id.s_value` and `yytoken` to `output.eof_symbol.enum_name`.
* 2-2-2. If `yychar` is error symbol, it measn error. Update `yychar` to `output.undef_symbol.id.s_value` to avoid infinite loop in error handling process and update `yytoken` to `output.error_symbol.enum_name`.
* 2-2-3. Otherwise update `yytoken`. Because `yychar` type is `enum yytokentype` and `yytoken` type is `enum yysymbol_kind_t`, need to convert `yychar` to `enum yysymbol_kind_t` by `yytranslate` table before assign it to `yytoken` local variable.
* 2-3. Consult `yycheck` table to determine next action. Index of `yycheck` is `yypact[yystate] + yytoken`.
* 2-3-1. If the value of `yycheck` is same with `yytoken`, consult `yytable`.
* 2-3-2. Otherwise it should execute default action.
* 2-4. Consult `yytable` to determine . Index of `yytable` is `yypact[yystate] + yytoken`, which is same with `yycheck`.


```ruby
next_action = nil
yyn = yypact[state]

# Check if next action is default or not
if yyn == YYPACT_NINF
next_action = :yydefault
return
else
if yychar == YYEMPTY
# Read a token
yychar = yylex()
end

if yychar <= TOKEN_END_OF_FILE
# End of File
yychar = eof_symbol.id.s_value
yytoken = eof_symbol.enum_name
elsif yychar == TOKEN_ERROR
# Lexer returns YYerror token
yychar = undef_symbol.id.s_value
yytoken = error_symbol.enum_name
next_action = :error
return
else
yytoken = yytranslate[yychar]
end

# Add token offset to index of yycheck and yytable
yyn += yytoken
if yym < 0 || YYLAST < yyn
# Out of range of yycheck means default action
next_action = :yydefault
return
end
if yycheck[yyn] != yytoken
# No need to consult yytable
next_action = :yydefault
return
end

yyn = yytable[yyn]
if yyn == YYTABLE_NINF
elsif yyn <= 0
end

# Execute shift
yy_state_stack.push(yystate)
yy_semantic_value_stack.push(yylval)
yy_location_stack.push(yylloc)

# Reset current token
yychar = YYEMPTY

next_action = :yynewstate
return
end
```


## `yytranslate`

Loading

0 comments on commit 5cd832c

Please sign in to comment.