Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[red-knot] Separate 'infer_expression_type' query #15942

Closed

Conversation

sharkdp
Copy link
Contributor

@sharkdp sharkdp commented Feb 4, 2025

Summary

As a follow-up to the discussion here, this changeset adds a new salsa query

fn infer_expression_type<'db>(db: &'db dyn Db, expression: Expression<'db>) -> Type<'db>

which is similar to infer_expression_types (plural), but returns the type of the expression directly instead of returning a TypeInference object. We can use this in a few places. Notably, it can't be used here:

let scope = self.scope();
let inference = infer_expression_types(self.db, cls);
let ty = inference
.expression_type(cls.node_ref(self.db).scoped_expression_id(self.db, scope))
.to_instance(self.db);

because that uses self.scope(), not cls.scope().

Test Plan

@sharkdp sharkdp added the red-knot Multi-file analysis & type inference label Feb 4, 2025
@@ -42,8 +42,7 @@ impl<'db> Unpacker<'db> {
"Unpacking target must be a list or tuple expression"
);

let mut value_ty = infer_expression_types(self.db(), value.expression())
.expression_type(value.scoped_expression_id(self.db(), self.scope));
Copy link
Contributor Author

@sharkdp sharkdp Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed that this is not semantically equivalent (uses self.scope, not value.expression().scope()). Will check if that's okay.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that this should be the same? But someone familiar with this code should probably double-check.

I added a

assert_eq!(value.expression().scope(self.db()), self.scope);

assertion locally and all tests passed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dhruvmanila probably has the most context on this module?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's necessary to use infer_expression_type here because value always belongs to the same file (Unpacker never does any cross-module type inference).

Copy link
Contributor

@carljm carljm Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, an expression node can only be part of one scope, and that scope is the only possible correct scope to pass to scoped_expression_id for that expression. So I believe that in any case where we are doing this pattern, either you could use infer_expression_types instead with no change in behavior, or the existing code was buggy. I think the only reason existing code wouldn't use the scope of the expression itself was just because that code happened to already have that same scope stored in a more convenient place. I think this is also true for the narrowing code you linked in the PR summary.

(Note this comment is just about semantic correctness, and independent of the whole question of whether we should introduce another layer of Salsa query in all of these places.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current implementation (as is on main) could be incorrect when this will be used to unpack in comprehensions. The reason being that the value expression of the first generator comes from the outer scope and not the comprehension scope where the unpacking would belong to once implemented (#15369).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dhruvmanila I could be remembering wrong, but I would think we should give that "value expression of the first generator" an expression ID in the scope where it should be evaluated, maintaining the invariant that every expression belongs to exactly one scope, which I think would mean the current implementation of this helper is correct?

Sounds like this case deserves some investigation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be remembering wrong, but I would think we should give that "value expression of the first generator" an expression ID in the scope where it should be evaluated, maintaining the invariant that every expression belongs to exactly one scope, which I think would mean the current implementation of this helper is correct?

Oh, I think you're correct. This means the previous implementation would've been probably incorrect for comprehensions because the unpacking would happen in the comprehension scope while the expression is in the outer scope but the previous implementation would try to get the expression from the comprehension scope where it doesn't exist.

We should be giving the expression of the first generator in the scope where it should be evaluated:

// The `iter` of the first generator is evaluated in the outer scope, while all subsequent
// nodes are evaluated in the inner scope.
self.add_standalone_expression(&generator.iter);
self.visit_expr(&generator.iter);
self.push_scope(scope);

Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lovely!

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether we want to use infer_expression_type or not is slightly more subtle. The way you changed it in this PR isn't incorrect but it means that we pay for an extra salsa queries even in cases where the isolation isn't necessary.

We should only use infer_expression_type if it isn't guaranteed that expression belongs to the same file as the enclosing query. This, for example, isn't the case in the TypeInferenceBuilder or Unpacker where all ast nodes are guaranteed to be from the same file. We do want the isolation in Type::own_member because there's no guarantee from which file (if any) the method is called.

// Similar to `infer_expression_types` (with the same restrictions). Directly returns the
// type of the overall expression. This is a salsa query because it accesses `node_ref`,
// which is sensitive to changes in the AST. Making it a query allows downstream queries
// to short-circuit if the result type has not changed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to add a note that it isn't necessary to use this query in TypeInferenceBuilder or anywhere else where it's known that the expression belongs to the current file.

@@ -42,8 +42,7 @@ impl<'db> Unpacker<'db> {
"Unpacking target must be a list or tuple expression"
);

let mut value_ty = infer_expression_types(self.db(), value.expression())
.expression_type(value.scoped_expression_id(self.db(), self.scope));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's necessary to use infer_expression_type here because value always belongs to the same file (Unpacker never does any cross-module type inference).

let scope = test_expr.scope(db);
let ty = inference
.expression_type(test_expr.node_ref(db).scoped_expression_id(db, scope));
let ty = infer_expression_type(db, test_expr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% if we should use infer_expression_type here. We should only use it if analyze_single can be reached from another module than where test_expr is defined.

Copy link
Member

@AlexWaygood AlexWaygood Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, the new API is much more convenient and ergonomic. If we expect it to be slower in many cases, it feels like we're adding a footgun (making something that we should only do in very specific cases easy to do in all cases)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can have two different methods for it but it's important that we use the right method in the right context. I do think infer_expression_type as a query will be useful in other situations too, but it's not a "fits all" solution.

Copy link
Contributor

@carljm carljm Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that this is sufficiently useful on ergonomic grounds, that it might even be worth having two variants of it, a Salsa-cached one and a non-Salsa-cached one. (The Salsa-cached one could just call the non-Salsa-cached one.)

I think it would also be very valuable if we could implement (via some combination of types, visibility, and code organization) a clearer distinction between "code that can be called across files" (mostly I think this code should live in types.rs and on Type today? Because Type is how type information travels between files) and "code that exclusively operates on only one file", and ensure that the former code can never see an AST directly. This might be a non-trivial refactor, but I think it would be worth it. Maybe worth creating an issue for, at least?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created #15949 to follow up on the latter.

@MichaReiser
Copy link
Member

I tried to incorporate this change into #16268

@sharkdp sharkdp closed this Feb 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
red-knot Multi-file analysis & type inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants