Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor generation of proof hint "rule applied" event #862

Closed
wants to merge 6 commits into from

Conversation

Baltoli
Copy link
Contributor

@Baltoli Baltoli commented Oct 17, 2023

This PR packages up @gtrepta's initial work towards part (1) of #860; a refactoring of the emission of rewrite / rule match events in the proof trace.

At a high level, the code changes here cause the interpreter to emit "rule matched" events for all rules, not just top-level rewrite rules; in the trace below we can see function rules emitting these events. Currently, the term produced by a function event is not yet emitted into the trace, but it could be recovered by evaluating the function on its bound arguments, which are in the trace.

To evaluate the current state of the trace format, consider the following definition:

module TEST
  imports INT

  syntax Int ::= "inc" Int [function, total]
  rule inc I => I +Int 1

  syntax Foo ::= foo(Int) | bar(Int)
  rule foo(I) => bar(inc I)
endmodule

When run with --proof-output, the sequence of events that is generated in the trace is as follows (with lengthy KORE terms omitted here for my own sanity when extracting this information):

function: _|->_
function: .Map
function: _Map_
function: initGeneratedTopCell
rewrite :
  ordinal: 162 (rule initGeneratedTopCell(Init) => ...)
  VarInit -> ...
function: initKCell
rewrite:
  ordinal: 163 (rule initKCell(Init) => ...)
  VarInit -> ...
function: Map:lookup
function: project:KItem
rewrite:
  ordinal: 207 (rule project:KItem(K) => ...)
  VarK -> ...
function: initGeneratedCounterCell
rewrite:
  ordinal: 161 (rule initGeneratedCounterCell() => ...)
<initial configuration>
rewrite:
  ordinal: 126 (rule foo(I) => ...)
  _DotVar0 -> ...
  _DotVar1 -> ...
  I -> ...
function: inc__TEST_Int_Int
rewrite:
  ordinal: 160 (rule inc__TEST_Int_Int(I) => ...)
  I -> ...
function: _+Int_
<next configuration>

The things lacking from this trace are currently:

Additionally, the parser will need to be updated to reflect this new format as previously.

I would also like to clean up the code a little now that I better understand the trace format and the events being emitted.

@Baltoli Baltoli marked this pull request as ready for review November 2, 2023 14:06
Comment on lines +668 to 673
auto Call = llvm::CallInst::Create(applyRule, args, "", d->CurrentBlock);

// How do you emit the resulting config from here?

setDebugLoc(Call);
Call->setCallingConv(llvm::CallingConv::Tail);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still needs to be implemented. The call instruction is meant to be tail recursive, I tried changing that but from what I remember it wasn't that simple.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once this is implemented, the call that's left in CreateTerm to emit the config on rewrites can be removed, right? i.e. that it's there and not here is why we have configs and not function returns in the trace.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe so, yes.

@Baltoli Baltoli closed this Nov 9, 2023
rv-jenkins pushed a commit that referenced this pull request Nov 9, 2023
As discussed previously, this PR is an initial cleanup and refactoring
effort for the existing proof-hint generation code implemented for
rewrite, function and hook events. The changes in this PR don't add any
new features to the traces; they just reorganise the existing code.

Most of the changes made are documentation and identifying duplicated
code across different call-sites that can be merged together.

I plan to port the reorganisation in #862 over to the infrastructure in
this PR once it is merged.

The trace format is not tested in the backend currently (this is future
work), but I have verified that a proof trace (that uses all event
types) generated using this branch is byte-for-byte identical to one
generated from the current master branch.
rv-jenkins pushed a commit that referenced this pull request Nov 13, 2023
This PR is a reworking of the changes initially made by @gtrepta in
#862, to include the cleaned-up proof infrastructure in #876.

The changes here are pretty minimal; we need to split the "rewrite"
event into a pre and post part so that the arity / ordinal and resulting
configuration can be emitted separately. Other than that, it's just a
question of reusing some of the boilerplate in #862 that assembles the
substitution information in the event.

~~There is one major issue with the change here, which @gtrepta also
identified in his initial version of this change. We can only emit
resulting configurations for top-level rewrites (that is, we cannot
print the resulting _term_ from calling a function). Doing so breaks TCO
for the generated interpreter badly enough that some backend tests
crash; I have not dug into this in detail as I'm reasonably sure that's
the reason things break.~~
* @nishantjr confirms that we can safely omit the result of function
calls from the trace.

## Outstanding Issues

I am working my way through the example below to ensure that the format
being generated is precisely what we expect to see from the
documentation. The issues I have identified are as follows:
* [x] Duplicated KORE comments below are caused by function events not
being bracketed as rewrites and hooks are; we need to pull the header
out to the front, allocate args, then do the existing event.
* [x] We need to handle binary KORE delimitation more consistently; at
the moment it's wavy whether we emit header / footer or not. Should
probably standardise on using the 1.2.0 self-limiting format.

## Worked Example

Similarly to #862, I have generated a trace for the program `foo(0)` in
the following definition:
```k
module TEST
  imports INT

  syntax Int ::= "inc" Int [function, total]
  rule inc I => I +Int 1

  syntax Foo ::= foo(Int) | bar(Int)
  rule foo(I) => bar(inc I)
endmodule
```

The trace looks like:
```
hook: MAP.element
  function: Lbl'UndsPipe'-'-GT-Unds'{} (0)
    arg: kore[60]
    arg: kore[109]
hook result: kore[207]
hook: MAP.unit
  function: Lbl'Stop'Map{} (0)
hook result: kore[51]
hook: MAP.concat
  function: Lbl'Unds'Map'Unds'{} (0)
    arg: kore[51]
    arg: kore[207]
hook result: kore[207]
function: LblinitGeneratedTopCell{} (0)
  arg: kore[207]
rule: 162 1
  VarInit = kore[207]
function: LblinitKCell{} (0:0)
  arg: kore[207]
rule: 163 1
  VarInit = kore[207]
function: Lblproject'Coln'KItem{} (0:0:0)
  hook: MAP.lookup
    function: LblMap'Coln'lookup{} (0:0:0:0:0)
      arg: kore[207]
      arg: kore[60]
  hook result: kore[109]
  arg: kore[129]
rule: 207 1
  VarK = kore[109]
function: LblinitGeneratedCounterCell{} (0:1)
rule: 161 0
config: kore[237]
rule: 126 3
  Var'Unds'DotVar0 = kore[61]
  VarI = kore[50]
  Var'Unds'DotVar1 = kore[10]
function: Lblinc'UndsUnds'TEST'Unds'Int'Unds'Int{} (0:0:0:0:0:0)
  arg: kore[50]
rule: 160 1
  VarI = kore[50]
hook: INT.add
  function: Lbl'UndsPlus'Int'Unds'{} (0)
    arg: kore[50]
    arg: kore[50]
hook result: kore[50]
config: kore[237]
```

And was generated with the following (really horrible, quick and dirty)
Python recogniser:
<details><summary>Recogniser</summary>

```python
#!/usr/bin/env python3

import sys
import struct

def load(fn):
    with open(fn, 'rb') as f:
        return f.read()

def word(byte):
    return byte * 8

class Parser:
    def __init__(self, data):
        self.input = data
        self.depth = 0

    def print(self, *args):
        print(f'{"  " * self.depth}', end='')
        print(*args)

    def peek(self, cst):
        return self.input[:len(cst)] == cst

    def eof(self):
        return len(self.input) == 0

    def skip(self, n):
        self.input = self.input[n:]

    def read_constant(self, cst):
        if self.input[:len(cst)] != cst:
            print(self.input[:40])
            assert False
        self.input = self.input[len(cst):]

    def read_word(self, byte):
        assert len(byte) == 1
        self.read_constant(byte * 8)

    def read_until(self, constant):
        index = self.input.find(constant)
        ret = self.input[:index]
        self.input = self.input[index:]
        return ret
    
    def read_string(self):
        data = self.read_until(b'\x00')
        self.read_constant(b'\x00')
        return str(data, 'ascii')

    def read_uint64(self):
        index = 8
        raw = self.input[:index]
        self.input = self.input[index:]
        little_endian_long_long = '<Q'
        return struct.unpack(little_endian_long_long, raw)[0]

    def read_arg(self):
        self.depth += 1
        if self.peek(b'\x7F'):
            l = self.read_kore()
            self.print(f'arg: kore[{l}]')
            self.depth -= 1
            return True
        elif self.peek(word(b'\xDD')):
            self.read_function()
            self.depth -= 1
            return True
        elif self.peek(word(b'\xAA')):
            self.read_hook()
            self.depth -= 1
            return True
        elif self.peek(word(b'\x11')) or self.peek(word(b'\xBB')):
            self.depth -= 1
            return False
        else:
            self.read_rule()
            self.depth -= 1
            return True

    def read_hook(self):
        self.read_word(b'\xAA')
        name = self.read_string()
        self.print(f'hook: {name}')
        while True:
            if not self.read_arg():
                break
        self.read_word(b'\xBB')
        result = self.read_kore()
        self.print(f'hook result: kore[{result}]')

    def read_function(self):
        self.read_word(b'\xDD')
        label = self.read_string()
        loc = self.read_string()
        self.print(f'function: {label} ({loc})')
        while True:
            if not self.read_arg():
                break
        self.read_word(b'\x11')

    def read_kore(self):
        self.read_constant(b'\x7FKORE')
        self.skip(6)
        pattern_len = self.read_uint64()
        self.skip(pattern_len)
        return pattern_len

    def read_rule(self):
        ordinal = self.read_uint64()
        arity = self.read_uint64()
        self.print(f'rule: {ordinal} {arity}')
        for i in range(arity):
            self.read_variable()

    def read_variable(self):
        name = self.read_string()
        kore_len = self.read_kore()
        self.print(f'  {name} = kore[{kore_len}]')
        self.read_word(b'\xCC')

    def read_config(self):
        self.read_word(b'\xFF')
        l = self.read_kore()
        self.print(f'config: kore[{l}]')
        self.read_word(b'\xCC')

    def read_trace(self):
        while not self.eof():
            if self.peek(word(b'\xAA')):
                self.read_hook()
            elif self.peek(word(b'\xDD')):
                self.read_function()
            elif self.peek(word(b'\xFF')):
                self.read_config()
            else:
                self.read_rule()
            
if __name__ == '__main__':
    data = load(sys.argv[1])
    parser = Parser(data)
    parser.read_trace()
```
</details>

I have verified by hand that this recogniser accepts all the proof
traces from the proof generation repo's master branch, if they are
regenerated with the LLVM backend pointed at this PR (`kup install k
--override llvm-backend hint_emitPlace_2`).
@Baltoli Baltoli deleted the hint_emitPlace branch December 1, 2023 15:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants