Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generating the ast #8

Open
CAD97 opened this issue Nov 10, 2018 · 7 comments
Open

Generating the ast #8

CAD97 opened this issue Nov 10, 2018 · 7 comments
Labels
enhancement New feature or request fun An interesting potential

Comments

@CAD97
Copy link
Collaborator

CAD97 commented Nov 10, 2018

This is a fun one.

Even with #[derive(FromPest)], the initial creation of the ast structures is fairly rote.

It would be really cool if we could take a .pest file and create a working (though not necessarily ideal) module of ast structures.

Basic shapes
  • rule = { a ~ b ~ c }
    

    becomes

    #[derive(Debug, FromPest)]
    #[pest_ast(rule(Rule::rule))]
    pub struct rule {
      pub a: a,
      pub b: b,
      pub c: c,
    }
  • rule = { a | b | c }
    

    becomes

    #[derive(Debug, FromPest)]
    #[pest_ast(rule(Rule::rule))]
    pub enum rule {
      a(a),
      b(b),
      c(c),
    }
  •  a*
    

    becomes

    Vec<a>
  •  a+
    

    becomes

    Vec<a>
  •  a?
    

    becomes

    Option<a>

Please, ping me on Gitter or on Discord if you're interested in attacking this. It'll be fun, but somewhat involved.

@CAD97 CAD97 added enhancement New feature or request fun An interesting potential labels Nov 10, 2018
@CAD97
Copy link
Collaborator Author

CAD97 commented Nov 10, 2018

I think it's best to structure this in a way that it's expected that the user take the output and commit it to their repository to be maintained by hand in the future. I don't think the goal should ever be to make a good, semantic AST, but rather to make the "obvious" translation and provide a quick starting point for projects that have designed their grammar already.

@loewenheim
Copy link

What are your thoughts on translating rule? to an Option and rule* to a Vec?

@CAD97
Copy link
Collaborator Author

CAD97 commented Nov 17, 2018

This should already work manually, and is definitely what I'd consider the "canonical" translation that such a tool should generate. They're powered solely by provided implementations in from-pest:

@loewenheim
Copy link

I have more questions :)

  • What is the most basic structure that a rule could desugar to? My assumption is a wrapper around a single char, like this:
one_char = { ANY }

becomes

struct one_char {
    content: char,
}

Am I right about this?

  • Based on the previous point, a rule
number = { ASCII_DIGIT* }

would turn into a struct

number {
    digit: Vec<char>,
}

I probably intended number to be parsed as an actual number, say, a u16. Would i first parse it to the ast and then convert it to the actual type I want later on?

@CAD97
Copy link
Collaborator Author

CAD97 commented Nov 28, 2018

Generally, I think that a sensible default for atoms would be to store a pest::Span, or potentially a custom From<pest::Span> (which could handle being an owning version instead of borrowing). This is one of the questions I'm not sure how to answer currently.

The problem with

// number = @{ ASCII_DIGIT* }
struct number(u16);

is that the rule matches way too big numbers, so converting it either means panicking or manually plumbing a FatalError through. Alternatively, you could use BigInteger.

So, this is a lot less "this is the obvious way to handle it" than the other proposals for the generated AST, but a potential sketch:

  • All AST nodes take a lifetime <'pest>.
  • All AST nodes have a member #[pest_ast(outer)] span: pest::Span<'pest>
  • Other members are added based on presence of non-builtin productions in the rule definition.

Here's a potential by-hand translation of a small number of rules:

Grammar:

a = { "a" }
b = { "b" }
c = { "c" }
number = @{ ASCII_DIGIT* }
any = { ANY }
seq = { a ~ b ~ c }
choice = { a | b | c }
compund_seq = { a ~ (b | c) }
compound_choice = { (a ~ b) | (b ~ c) }
assign = { (a|b|c) ~ "=" ~ number }
assigns = { (assign ~ ",")* ~ assign ~ ","? }

AST:
(tuple structs entirely to sidestep the issue of generating member names)
(and abusing fake nested struct syntax)

struct a<'pest>(
  #[pest_ast(outer)] Span<'pest>,
);
struct b<'pest>(
  #[pest_ast(outer)] Span<'pest>,
);
struct c<'pest>(
  #[pest_ast(outer)] Span<'pest>,
);
struct number<'pest>(
  #[pest_ast(outer)] Span<'pest>,
);
struct any<'pest>(
  #[pest_ast(outer)] Span<'pest>,
);
struct seq<'pest>(
  #[pest_ast(outer)] Span<'pest>,
  a<'pest>,
  b<'pest>,
  c<'pest>,
);
enum choice<'pest>{
  struct _1(a<'pest>),
  struct _2(b<'pest>),
  struct _3(c<'pest>),
}
struct compound_seq<'pest>(
  #[pest_ast(outer)] Span<'pest>,
  a<'pest>,
  enum _2 {
    struct _1(b<'pest>),
    struct _2(c<'pest>),
  },
);
enum compound_choice<'pest>{
  struct _1(
    #[pest_ast(outer)] Span<'pest>,
    a<'pest>,
    b<'pest>,
  ),
  struct _2(
    #[pest_ast(outer)] Span<'pest>,
    b<'pest>,
    c<'pest>,
  ),
}
struct assign<'pest>(
  #[pest_ast(outer)] Span<'pest>,
  enum _1 {
    struct _1(a<'pest>),
    struct _2(b<'pest>),
    struct _3(c<'pest>),
  },
  number<'pest>,
);
struct assigns<'pest>(
  #[pest_ast(outer)] Span<'pest>,
  Vec<struct _1(assign<'pest>)>,
  assign<'pest>,
);
There may be some transcription errors. (Entirely done without putting anything in a compiler.) (I'm also not entirely sure I got which should contain a `Span` correct; basically any one that corresponds to an actual rule.) Feel free to ask for translating further examples if desired. Key is that this isn't supposed to be the only nor even the ideal transformation. It's only meant to be the "obviously correct" mechanical translation that users can then iterate further on top of.

@dragostis
Copy link

I think having the AST take any lifetime is a flawed approach, since it adds a lot of complication in handling it. The first version of pest was designed with an owned Rc<*some input*> shared between Spans and Pairs and was easier to work with. I also don't really have a good alternative. One could have two separate APIs for owned and not owned inputs, but this would put a pretty big burned on the maintainers of the project.

@killercup
Copy link

Hey folks, I started https://github.com/killercup/pest-ast-generator for fun a few days ago and just now saw this issue. The approach I took is very simple -- the goal was to get rid of a bunch of struct I needed to write, not to support every edge case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request fun An interesting potential
Projects
None yet
Development

No branches or pull requests

4 participants