Prefer equality in boolean comparisons #34166

ranma42 · 2024-07-05T10:31:55Z

In most databases, equality comparisons can take advantage of indexing and inequalities cannot.

roji · 2024-07-11T16:42:36Z

test/EFCore.Sqlite.FunctionalTests/Query/NullSemanticsQuerySqliteTest.cs

@@ -839,7 +839,7 @@ public override async Task Rewrite_compare_bool_with_bool(bool async)
            """
 SELECT "e"."Id"
 FROM "Entities1" AS "e"
-WHERE "e"."BoolA" <> "e"."NullableBoolB"
+WHERE "e"."BoolA" = (NOT ("e"."NullableBoolB"))


Is this a positive change? I doubt any database out there will use index with equality and NOT, more than it would for inequality, no? Should we make this change more targeted, so that it doesn't do this specific transformation?

Is this a positive change?

It is a positive change, at least as long as we do not analyze the provenance of the columns.
In this specific case, it is basically irrelevant: the whole table is going to be scanned linearly (just once).
If the left column and the right column came from different tables, it would be much more efficient.

I doubt any database out there will use index with equality and NOT, more than it would for inequality, no?

At least Sqlite does. This is basically an instance of #34048 (comment)

Should we make this change more targeted, so that it doesn't do this specific transformation?

Do you expect worse plans on some db?

I'll add some examples in the issue #34164.

Added Sqlite and Postgres examples 🚀

So between a <> b and a = NOT(b), the former certainly seems more natural, and what I'd expect a standard SQL query to look like. If these perform the same, I'd definitely prefer the first, at least for readability etc. (and we do generally care about that).

I could alsi imagine a database where the planner optimizes to use an index on b (effectively transforming the inequality into a not on a), where this change would cause a regression. of course, this is entirely speculative, and I have an actually looked into which databases basis to which optimizations.

I'd prefer us to do a bit more cross database research before merging a change like this, which at the very least makes our SQL less readable/standard/expected (and that does tend to have some correlation sometimes with performance). If this address is a very specific Sqlite behavior, where equality is always better, we always have the option of doing this change for sqlite only.

OK, I wrote the above comment before noticing you posted data on other databases… some remarks:

For the Sqlite case, have you confirmed that the second option, where an index is built, is actually faster than the first?

For the SQL Server case, the total subtree cost is actually higher with the second method.

I will perform some measurement, but I do not have real code/an actual database that is using this filter; I will try to do some synthetic examples (I'll try to cover some interesting cases, but a real-world case would definitely be more relevant).

ranma42 · 2024-07-12T12:14:33Z

Maybe I should have explained this in advance: this is not a new/different approach to translate boolean (in)equalities; it is just making the code more consistent in choosing = over <> (as per the issue).

Ideally there should be no duplication of this code, hence it should be "inevitably" consistent.

ranma42 · 2024-07-13T17:46:41Z

I cleaned up the code a little (the same equality conversion logic is now shared between OptimizeComparison and RewriteNullSemantics.

They can take advantage of indexing.

ranma42 · 2024-07-29T19:32:47Z

Rebased to resolve conflicts

ranma42 force-pushed the prefer-equal branch from 785b752 to 80a7450 Compare July 9, 2024 22:30

roji reviewed Jul 11, 2024

View reviewed changes

ranma42 mentioned this pull request Jul 12, 2024

Prefer equality when comparing values #34164

Open

ranma42 force-pushed the prefer-equal branch 2 times, most recently from 7f57ec3 to 32d05b3 Compare July 13, 2024 17:45

ranma42 force-pushed the prefer-equal branch from 32d05b3 to 3a4ff39 Compare July 13, 2024 17:55

ranma42 mentioned this pull request Jul 27, 2024

Avoid duplicating complex expression in comparisons #34172

Open

Prefer equality in boolean comparisons

3b90d81

They can take advantage of indexing.

ranma42 force-pushed the prefer-equal branch from 3a4ff39 to a16fd4f Compare July 29, 2024 19:32

Update baselines

b79642e

ranma42 force-pushed the prefer-equal branch from a16fd4f to b79642e Compare July 29, 2024 21:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prefer equality in boolean comparisons #34166

Prefer equality in boolean comparisons #34166

ranma42 commented Jul 5, 2024

roji Jul 11, 2024

ranma42 Jul 12, 2024 •

edited

Loading

ranma42 Jul 12, 2024

ranma42 Jul 12, 2024

roji Jul 12, 2024

roji Jul 12, 2024

ranma42 Jul 12, 2024

ranma42 commented Jul 12, 2024

ranma42 commented Jul 13, 2024

ranma42 commented Jul 29, 2024

Prefer equality in boolean comparisons #34166

Are you sure you want to change the base?

Prefer equality in boolean comparisons #34166

Conversation

ranma42 commented Jul 5, 2024

roji Jul 11, 2024

Choose a reason for hiding this comment

ranma42 Jul 12, 2024 • edited Loading

Choose a reason for hiding this comment

ranma42 Jul 12, 2024

Choose a reason for hiding this comment

ranma42 Jul 12, 2024

Choose a reason for hiding this comment

roji Jul 12, 2024

Choose a reason for hiding this comment

roji Jul 12, 2024

Choose a reason for hiding this comment

ranma42 Jul 12, 2024

Choose a reason for hiding this comment

ranma42 commented Jul 12, 2024

ranma42 commented Jul 13, 2024

ranma42 commented Jul 29, 2024

ranma42 Jul 12, 2024 •

edited

Loading