Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent Equals/GetHashCode methods in SqlConstantExpression #35372

Closed
ranma42 opened this issue Dec 22, 2024 · 6 comments · Fixed by #35487
Closed

Inconsistent Equals/GetHashCode methods in SqlConstantExpression #35372

ranma42 opened this issue Dec 22, 2024 · 6 comments · Fixed by #35487
Assignees
Labels
area-query closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. community-contribution customer-reported type-bug
Milestone

Comments

@ranma42
Copy link
Contributor

ranma42 commented Dec 22, 2024

In EFCore, the implementation of SqlConstantExpression does not respect the requirements of Equals/GetHashCode since ff99dc3, specifically

If two objects compare as equal, the GetHashCode() method for each object must return the same value.

(from https://learn.microsoft.com/en-us/dotnet/api/system.object.gethashcode)

// @nuget: Microsoft.EntityFrameworkCore.Relational -Version 3.0.0
using System;
using System.Linq;
using System.Linq.Expressions;
using Microsoft.EntityFrameworkCore.Query.SqlExpressions;

int[] x = [1, 2, 3];

var a = new SqlConstantExpression(Expression.Constant(x.ToList()), null);
var b = new SqlConstantExpression(Expression.Constant(x.ToList()), null);

Console.WriteLine(a.GetHashCode());
Console.WriteLine(b.GetHashCode());
Console.WriteLine(a.GetHashCode() == b.GetHashCode());
Console.WriteLine(a.Equals(b));

This snippet emits

-765869390
-1092809484
False
False

on EFCore 3.0.0 and

-1656061180
971732864
False
True

on EFCore 3.1, 6, 8.

Note that because of #35355 the snippet is meaningless on EFCore 9.
The corresponding one is

// @nuget: Microsoft.EntityFrameworkCore.Relational -Version 9.0.0
using System;
using System.Linq;
using Microsoft.EntityFrameworkCore.Query.SqlExpressions;

int[] x = [1, 2, 3];

var a = new SqlConstantExpression(x.ToList(), null);
var b = new SqlConstantExpression(x.ToList(), null);

Console.WriteLine(a.GetHashCode());
Console.WriteLine(b.GetHashCode());
Console.WriteLine(a.GetHashCode() == b.GetHashCode());
Console.WriteLine(a.Equals(b));

which emits

-3729430
1003242459
False
True

This inconsistency is caused by the "deep" check for IList in SqlConstantExpression.ValueEquals.
Assuming that is intentional and needed, a corresponding hash construction should probably also be performed.

@ranma42
Copy link
Contributor Author

ranma42 commented Dec 22, 2024

Note that this has some implications regarding what options are legitimate when caching hash codes as suggested in #34149

@roji
Copy link
Member

roji commented Dec 22, 2024

Related: #29006

@roji
Copy link
Member

roji commented Dec 22, 2024

As suggested in #29006, we could consider simply not including the value in the hash code (i.e. have a fixed hash code for all SqlConstantExpressions with a given Type)... Let's think about this further.

Of course, the discrepancy between the equals and hash code behavior is indeed a bug that will need to be fixed.

@ranma42
Copy link
Contributor Author

ranma42 commented Dec 23, 2024

Skipping the Value when hashing would obviously fix this issue, but it would mean that hash codes of SqlExpressions only include "structure" information rather than values (for example x == null and x == 0 would hash to the same value).

I believe this would not be much of an issue if this only occurred while placing expressions in dictionaries, but it would hurt #34149 effectiveness quite a bit.

A slightly more costly option that would not involve deep hashing of the Value field would be combining the Value.GetHashCode() only if the value is not IList. This would match the Equals behavior and would still distinguish between most scalar values.

Finally, the highest-cost option is to actually hash the content of the Value (and implement a hash function that matches Equals). This would incur a significantly higher computation cost, but might still be acceptable under the assumption that:

  • hash codes are cached
  • big deeply-nested constants are rare and recommended against (for example if the expected IList literal is an array of a few enum values, opposed to tens/hundreds of dynamically-computed entries)

@roji If this makes sense to you I can push a PR with the approach you prefer. I would personally go for hashing Value only if not IList, with the assumption that it might be reconsidered afterwards, when caching changes the effective cost of (re-)computing hashes.

(Note that if the assumption is that IList values are rare holds true, in a way both extra collisions and extra hashing costs are likely to be negligible)

@ranma42
Copy link
Contributor Author

ranma42 commented Dec 23, 2024

As a (very biased) data point, running EFCore tests against Sqlite (Microsoft.EntityFrameworkCore.Sqlite.FunctionalTests), only 45/37084 use an IList as Value for a SqlConstantExpression.

(I understand that this proportion is unfortunately not representative of any actual real-world scenarios)

@roji
Copy link
Member

roji commented Jan 16, 2025

@ranma42 coming back to this after a while...

If this makes sense to you I can push a PR with the approach you prefer. I would personally go for hashing Value only if not IList, with the assumption that it might be reconsidered afterwards, when caching changes the effective cost of (re-)computing hashes.

Yeah, that makes sense to me. An alternative (suggested by @maumar) is also to hash IList.Count only - this would be computationally trivial (just like the hashing we do on non-IList values), and still provide a bit of hash distribution for IList constants. We'd definitely appreciate a PR which does this - thanks!

ranma42 added a commit to ranma42/efcore that referenced this issue Jan 16, 2025
Since the `Equals` method performs deep comparison of `IList` values, using
their `GetHashCode` directly is generally not valid. Instead, their `Count` is a
safe approximation.

Fixes dotnet#35372
@roji roji closed this as completed in 707fb5b Jan 16, 2025
@roji roji added this to the 10.0.0 milestone Jan 16, 2025
@roji roji added closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. type-bug community-contribution labels Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-query closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. community-contribution customer-reported type-bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants