Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto generate ast expression nodes #16285

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
155 changes: 123 additions & 32 deletions crates/ruff_python_ast/ast.toml
Original file line number Diff line number Diff line change
Expand Up @@ -81,38 +81,129 @@ anynode_is_label = "expression"
rustdoc = "/// See also [expr](https://docs.python.org/3/library/ast.html#ast.expr)"

[Expr.nodes]
ExprBoolOp = {}
ExprNamed = {}
ExprBinOp = {}
ExprUnaryOp = {}
ExprLambda = {}
ExprIf = {}
ExprDict = {}
ExprSet = {}
ExprListComp = {}
ExprSetComp = {}
ExprDictComp = {}
ExprGenerator = {}
ExprAwait = {}
ExprYield = {}
ExprYieldFrom = {}
ExprCompare = {}
ExprCall = {}
ExprFString = {}
ExprStringLiteral = {}
ExprBytesLiteral = {}
ExprNumberLiteral = {}
ExprBooleanLiteral = {}
ExprNoneLiteral = {}
ExprEllipsisLiteral = {}
ExprAttribute = {}
ExprSubscript = {}
ExprStarred = {}
ExprName = {}
ExprList = {}
ExprTuple = {}
ExprSlice = {}
ExprIpyEscapeCommand = {}
ExprBoolOp = { fields = [
{ name = "op", type = "BoolOp" },
{ name = "values", type = "Expr", seq = true }
]}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the AST definition easy to understand and maintain?

Given this concern, this syntax does start to look a bit unwieldy.

Is it easy to parse in our code generator?

But on the other hand, continuing to use TOML still makes it easier to iterate on the code generator itself.

So given all of this, I'd still lean towards using this TOML syntax. We can always revisit the syntax later if we feel there's something that e.g. ungrammar would give us that the TOML doesn't.

ExprNamed = { fields = [
{ name = "target", type = "Expr" },
{ name = "value", type = "Expr" }
]}
ExprBinOp = { fields = [
{ name = "left", type = "Expr" },
{ name = "op", type = "Operator" },
{ name = "right", type = "Expr" }
]}
ExprUnaryOp = { fields = [
{ name = "op", type = "UnaryOp" },
{ name = "operand", type = "Expr" }
]}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as a style nit, the following are equivalent TOML (i.e., it shouldn't require changes to the generator) that I think might be a bit easier to read:

[Expr.nodes.ExprUnaryOp]
fields = [
  { name = "op", type = "UnaryOp" },
  { name = "operand", type = "Expr" }
]

or

[[Expr.nodes.ExprUnaryOp.fields]]
name = "op"
type = "UnaryOp"

[[Expr.nodes.ExprUnaryOp.fields]]
name = "operand"
type = "Expr"

I have a slight preference for the last one, since it eliminates the nested maps and lists from the TOML. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

ExprLambda = { fields = [
{ name = "parameters", type = "Box<crate::Parameters>", optional = true },
{ name = "body", type = "Expr" }
]}
ExprIf = { fields = [
{ name = "test", type = "Expr" },
{ name = "body", type = "Expr" },
{ name = "orelse", type = "Expr" }
]}
ExprDict = { fields = [
{ name = "items", type = "DictItem", seq = true },
]}
ExprSet = { fields = [
{ name = "elts", type = "Expr", seq = true }
]}
ExprListComp = { fields = [
{ name = "elt", type = "Expr" },
{ name = "generators", type = "Comprehension", seq = true }
]}
ExprSetComp = { fields = [
{ name = "elt", type = "Expr" },
{ name = "generators", type = "Comprehension", seq = true }
]}
ExprDictComp = { fields = [
{ name = "key", type = "Expr" },
{ name = "value", type = "Expr" },
{ name = "generators", type = "Comprehension", seq = true }
]}
ExprGenerator = { fields = [
{ name = "elt", type = "Expr" },
{ name = "generators", type = "Comprehension", seq = true },
{ name = "parenthesized", type = "bool" }
]}
ExprAwait = { fields = [
{ name = "value", type = "Expr" }
]}
ExprYield = { fields = [
{ name = "value", type = "Expr", optional = true }
]}
ExprYieldFrom = { fields = [
{ name = "value", type = "Expr" }
]}
ExprCompare = { fields = [
{ name = "left", type = "Expr" },
# TODO: We don't want this field to be a vec hence the seq is not enabled.
{ name = "ops", type = "Box<[crate::CmpOp]>"},
{ name = "comparators", type = "Box<[Expr]>"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially thought only knowing a field is a sequence or not would be enough to insert it in a Vec but I found in ExprCompare one of the types is Box<[crate::CmpOp]>

Doing this as a hard-coded type without a seq is just fine as a special case. We might also check whether they need to be a box-of-slice instead of a vec. The only benefit I can see is that we save one word of storage since we only need pointer+length instead of pointer+length+capacity. And if that's worth doing, why aren't we doing that for all of the other vecs? But we don't have to tackle that in this PR, this is fine as you have it.

]}
ExprCall = { fields = [
{ name = "func", type = "Expr" },
{ name = "arguments", type = "Arguments"}
]}
ExprFString = { fields = [
{ name = "value", type = "FStringValue" }
]}
ExprStringLiteral = { fields = [
{ name = "value", type = "StringLiteralValue" }
]}
ExprBytesLiteral = { fields = [
{ name = "value", type = "BytesLiteralValue" }
]}
ExprNumberLiteral = { fields = [
{ name = "value", type = "Number" }
]}
ExprBooleanLiteral = { fields = [
{ name = "value", type = "bool" }
], derives = ["Default"]}
ExprNoneLiteral = { fields = [], derives = ["Default"]}
ExprEllipsisLiteral = { fields = [], derives = ["Default"]}
ExprAttribute = { fields = [
{ name = "value", type = "Expr" },
{ name = "attr", type = "Identifier" },
{ name = "ctx", type = "ExprContext" }
]}
ExprSubscript = { fields = [
{ name = "value", type = "Expr" },
{ name = "slice", type = "Expr" },
{ name = "ctx", type = "ExprContext" }
]}
ExprStarred = { fields = [
{ name = "value", type = "Expr" },
{ name = "ctx", type = "ExprContext" }
]}
ExprName = { fields = [
{ name = "id", type = "name::Name" },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we refer to Name just by it's name and import it in the file or like this:

{ name = "id", type = "name::Name" },

I think it's worth ensuring that we can use just Name here in the TOML, and importing it at the top of the generated Rust file is a perfectly fine way to do that

Copy link
Contributor Author

@Glyphack Glyphack Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I defined the imports at the top of the generate.py.

One additional improvement is to define the imports in a way we can check if a name is imported and not use crate prefix for it.

For now I just used the global list of types we don't want to prefix with crate(the other comment) instead. Because we only have one import and it might be too soon to generalize.

{ name = "ctx", type = "ExprContext" }
]}
ExprList = { fields = [
{ name = "elts", type = "Expr", seq = true },
{ name = "ctx", type = "ExprContext" }
]}
ExprTuple = { fields = [
{ name = "elts", type = "Expr", seq = true },
{ name = "ctx", type = "ExprContext" },
{ name = "parenthesized", type = "bool" }
]}
ExprSlice = { fields = [
{ name = "lower", type = "Expr", optional = true },
{ name = "upper", type = "Expr", optional = true },
{ name = "step", type = "Expr", optional = true }
]}
ExprIpyEscapeCommand = { fields = [
{ name = "kind", type = "IpyEscapeKind" },
{ name = "value", type = "Box<str>" }

]}

[ExceptHandler]
rustdoc = "/// See also [excepthandler](https://docs.python.org/3/library/ast.html#ast.excepthandler)"
Expand Down
57 changes: 57 additions & 0 deletions crates/ruff_python_ast/generate.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,11 +90,32 @@ class Node:
name: str
variant: str
ty: str
fields: list[Field] | None
derives: list[str] | None

def __init__(self, group: Group, node_name: str, node: dict[str, Any]) -> None:
self.name = node_name
self.variant = node.get("variant", node_name.removeprefix(group.name))
self.ty = f"crate::{node_name}"
self.fields = None
fields = node.get("fields")
if fields is not None:
self.fields = [Field(f) for f in fields]
self.derives = node.get("derives", [])


@dataclass
class Field:
name: str
ty: str
seq: bool
optional: bool

def __init__(self, field: dict[str, Any]) -> None:
self.name = field["name"]
self.ty = field["type"]
self.seq = field.get("seq", False)
self.optional = field.get("optional", False)


# ------------------------------------------------------------------------------
Expand Down Expand Up @@ -547,6 +568,41 @@ def write_nodekind(out: list[str], ast: Ast) -> None:
""")


# ------------------------------------------------------------------------------


def write_node(out: list[str], ast: Ast) -> None:
for group in ast.groups:
for node in group.nodes:
if node.fields is None:
continue
out.append(
"#[derive(Clone, Debug, PartialEq, "
+ ", ".join(node.derives or [])
+ ")]"
)
name = node.name
out.append(f"pub struct {name} {{")
out.append("pub range: ruff_text_size::TextRange,")
for field in node.fields:
field_str = f"pub {field.name}: "
inner = f"crate::{field.ty}"
if field.ty == "bool" or field.ty.startswith("Box"):
inner = field.ty
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like your suggestion of adding a hard-coded list of types (defined near the top of this file) that don't require a crate:: qualifier

if field.ty == group.name and field.seq is False:
inner = f"Box<{inner}>"

if field.seq:
field_str += f"Vec<{inner}>,"
elif field.optional:
field_str += f"Option<{inner}>,"
else:
field_str += f"{inner},"
out.append(field_str)
out.append("}")
out.append("")


# ------------------------------------------------------------------------------
# Format and write output

Expand All @@ -558,6 +614,7 @@ def generate(ast: Ast) -> list[str]:
write_ref_enum(out, ast)
write_anynoderef(out, ast)
write_nodekind(out, ast)
write_node(out, ast)
return out


Expand Down
Loading
Loading