Skip to content

Commit

Permalink
[WIP] port generic query plans to 24.1, 23.2 (#18803)
Browse files Browse the repository at this point in the history
* port generic query plan docs to 24.1, 23.2
  • Loading branch information
taroface authored Jan 15, 2025
1 parent b4b3161 commit 6fcf900
Show file tree
Hide file tree
Showing 14 changed files with 182 additions and 26 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
- Because [generic query plans]({% link {{ page.version.version }}/cost-based-optimizer.md %}#query-plan-cache) use lookup joins instead of the scans and revscans used by custom query plans, generic query plans do not perform as well as custom query plans in some cases. [#128916](https://github.com/cockroachdb/cockroach/issues/128916)
- [Generic query plans]({% link {{ page.version.version }}/cost-based-optimizer.md %}#query-plan-type) are not included in the [plan cache]({% link {{ page.version.version }}/cost-based-optimizer.md %}#query-plan-cache). This means a generic query plan built and optimized for a prepared statement in one session cannot be used by another session. To reuse generic query plans for maximum performance, a prepared statement should be executed multiple times instead of prepared and executed once. [#128911](https://github.com/cockroachdb/cockroach/issues/128911)
1 change: 1 addition & 0 deletions src/current/_includes/v23.2/misc/enterprise-features.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ Feature | Description
[Multi-Region Capabilities]({% link {{ page.version.version }}/multiregion-overview.md %}) | Row-level control over where your data is stored to help you reduce read and write latency and meet regulatory requirements.
[PL/pgSQL]({% link {{ page.version.version }}/plpgsql.md %}) | Use a procedural language in [user-defined functions]({% link {{ page.version.version }}/user-defined-functions.md %}) and [stored procedures]({% link {{ page.version.version }}/stored-procedures.md %}) to improve performance and enable more complex queries.
[Node Map]({% link {{ page.version.version }}/enable-node-map.md %}) | Visualize the geographical distribution of a cluster by plotting its node localities on a world map.
[Generic query plans]({% link {{ page.version.version }}/cost-based-optimizer.md %}#query-plan-type) | Improve performance for prepared statements by enabling generic plans that eliminate most of the query latency attributed to planning.

## Recovery and streaming

Expand Down
1 change: 1 addition & 0 deletions src/current/_includes/v23.2/misc/session-vars.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@
| <a id="optimizer-use-multicol-stats"></a> `optimizer_use_multicol_stats` | If `on`, the optimizer uses collected multi-column statistics for cardinality estimation. | `on` | No | Yes |
| <a id="optimizer-use-not-visible-indexes"></a> `optimizer_use_not_visible_indexes` | If `on`, the optimizer uses not visible indexes for planning. | `off` | No | Yes |
| <a id="pg_trgm_similarity_threshold"></a> `pg_trgm.similarity_threshold` | The threshold above which a [`%`]({% link {{ page.version.version }}/functions-and-operators.md %}#operators) string comparison returns `true`. The value must be between `0` and `1`. For more information, see [Trigram Indexes]({% link {{ page.version.version }}/trigram-indexes.md %}). | `0.3` | Yes | Yes |
| <a id="plan-cache-mode"></a> `plan_cache_mode` | The type of plan that is cached in the [query plan cache]({% link {{ page.version.version }}/cost-based-optimizer.md %}#query-plan-cache): `auto`, `force_generic_plan`, or `force_custom_plan`.<br><br>For more information, refer to [Query plan type]({% link {{ page.version.version }}/cost-based-optimizer.md %}#query-plan-type). | `force_custom_plan` | Yes | Yes
| <a id="prefer-lookup-joins-for-fks"></a> `prefer_lookup_joins_for_fks` | If `on`, the optimizer prefers [`lookup joins`]({% link {{ page.version.version }}/joins.md %}#lookup-joins) to [`merge joins`]({% link {{ page.version.version }}/joins.md %}#merge-joins) when performing [`foreign key`]({% link {{ page.version.version }}/foreign-key.md %}) checks. | `off` | Yes | Yes |
| <a id="reorder-joins-limit"></a> `reorder_joins_limit` | Maximum number of joins that the optimizer will attempt to reorder when searching for an optimal query execution plan. <br/><br/>For more information, see [Join reordering]({% link {{ page.version.version }}/cost-based-optimizer.md %}#join-reordering). | `8` | Yes | Yes |
| <a id="require-explicit-primary-keys"></a> `require_explicit_primary_keys` | If `on`, CockroachDB throws an error for all tables created without an explicit primary key defined. | `off` | Yes | Yes |
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
- Because [generic query plans]({% link {{ page.version.version }}/cost-based-optimizer.md %}#query-plan-cache) use lookup joins instead of the scans and revscans used by custom query plans, generic query plans do not perform as well as custom query plans in some cases. [#128916](https://github.com/cockroachdb/cockroach/issues/128916)
- [Generic query plans]({% link {{ page.version.version }}/cost-based-optimizer.md %}#query-plan-type) are not included in the [plan cache]({% link {{ page.version.version }}/cost-based-optimizer.md %}#query-plan-cache). This means a generic query plan built and optimized for a prepared statement in one session cannot be used by another session. To reuse generic query plans for maximum performance, a prepared statement should be executed multiple times instead of prepared and executed once. [#128911](https://github.com/cockroachdb/cockroach/issues/128911)
1 change: 1 addition & 0 deletions src/current/_includes/v24.1/misc/enterprise-features.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ Feature | Description
[Multi-Region Capabilities]({% link {{ page.version.version }}/multiregion-overview.md %}) | Row-level control over where your data is stored to help you reduce read and write latency and meet regulatory requirements.
[PL/pgSQL]({% link {{ page.version.version }}/plpgsql.md %}) | Use a procedural language in [user-defined functions]({% link {{ page.version.version }}/user-defined-functions.md %}) and [stored procedures]({% link {{ page.version.version }}/stored-procedures.md %}) to improve performance and enable more complex queries.
[Node Map]({% link {{ page.version.version }}/enable-node-map.md %}) | Visualize the geographical distribution of a cluster by plotting its node localities on a world map.
[Generic query plans]({% link {{ page.version.version }}/cost-based-optimizer.md %}#query-plan-type) | Improve performance for prepared statements by enabling generic plans that eliminate most of the query latency attributed to planning.

## Recovery and streaming

Expand Down
5 changes: 3 additions & 2 deletions src/current/_includes/v24.1/misc/session-vars.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,9 +56,10 @@
| <a id="optimizer-use-lock-op-for-serializable"></a> `optimizer_use_lock_op_for_serializable` | If `on`, the optimizer uses a `Lock` operator to construct query plans for `SELECT` statements using the [`FOR UPDATE` and `FOR SHARE`]({% link {{ page.version.version }}/select-for-update.md %}) clauses. This setting only affects `SERIALIZABLE` transactions. `READ COMMITTED` transactions are evaluated with the `Lock` operator regardless of the setting. | `off` | Yes | Yes |
| <a id="optimizer-use-multicol-stats"></a> `optimizer_use_multicol_stats` | If `on`, the optimizer uses collected multi-column statistics for cardinality estimation. | `on` | No | Yes |
| <a id="optimizer-use-not-visible-indexes"></a> `optimizer_use_not_visible_indexes` | If `on`, the optimizer uses not visible indexes for planning. | `off` | No | Yes |
| <a id="optimizer-use-virtual-computed-column-stats"></a> `optimizer_use_virtual_computed_column_stats` | If `on`, the optimizer uses table statistics on [virtual computed columns]({% link {{ page.version.version }}/computed-columns.md %}#virtual-computed-columns). | `on` | Yes | Yes |
| <a id="plpgsql-use-strict-into"></a> `plpgsql_use_strict_into` | If `on`, PL/pgSQL [`SELECT ... INTO` and `RETURNING ... INTO` statements]({% link {{ page.version.version }}/plpgsql.md %}#assign-a-result-to-a-variable) behave as though the `STRICT` option is specified. This causes the SQL statement to error if it does not return exactly one row. | `off` | Yes | Yes |
| <a id="optimizer-use-virtual-computed-column-stats"></a> `optimizer_use_virtual_computed_column_stats` | If `on`, the optimizer uses table statistics on [virtual computed columns]({% link {{ page.version.version }}/computed-columns.md %}#virtual-computed-columns). | `on` | Yes | Yes
| <a id="pg_trgm_similarity_threshold"></a> `pg_trgm.similarity_threshold` | The threshold above which a [`%`]({% link {{ page.version.version }}/functions-and-operators.md %}#operators) string comparison returns `true`. The value must be between `0` and `1`. For more information, see [Trigram Indexes]({% link {{ page.version.version }}/trigram-indexes.md %}). | `0.3` | Yes | Yes |
| <a id="plan-cache-mode"></a> `plan_cache_mode` | The type of plan that is cached in the [query plan cache]({% link {{ page.version.version }}/cost-based-optimizer.md %}#query-plan-cache): `auto`, `force_generic_plan`, or `force_custom_plan`.<br><br>For more information, refer to [Query plan type]({% link {{ page.version.version }}/cost-based-optimizer.md %}#query-plan-type). | `force_custom_plan` | Yes | Yes
| <a id="plpgsql-use-strict-into"></a> `plpgsql_use_strict_into` | If `on`, PL/pgSQL [`SELECT ... INTO` and `RETURNING ... INTO` statements]({% link {{ page.version.version }}/plpgsql.md %}#assign-a-result-to-a-variable) behave as though the `STRICT` option is specified. This causes the SQL statement to error if it does not return exactly one row. | `off` | Yes | Yes |
| <a id="prefer-lookup-joins-for-fks"></a> `prefer_lookup_joins_for_fks` | If `on`, the optimizer prefers [`lookup joins`]({% link {{ page.version.version }}/joins.md %}#lookup-joins) to [`merge joins`]({% link {{ page.version.version }}/joins.md %}#merge-joins) when performing [`foreign key`]({% link {{ page.version.version }}/foreign-key.md %}) checks. | `off` | Yes | Yes |
| <a id="reorder-joins-limit"></a> `reorder_joins_limit` | Maximum number of joins that the optimizer will attempt to reorder when searching for an optimal query execution plan. <br/><br/>For more information, see [Join reordering]({% link {{ page.version.version }}/cost-based-optimizer.md %}#join-reordering). | `8` | Yes | Yes |
| <a id="require-explicit-primary-keys"></a> `require_explicit_primary_keys` | If `on`, CockroachDB throws an error for all tables created without an explicit primary key defined. | `off` | Yes | Yes |
Expand Down
4 changes: 4 additions & 0 deletions src/current/v23.2/cockroachdb-feature-availability.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,10 @@ Any feature made available in a phase prior to GA is provided without any warran
**The following features are in preview** and are subject to change. To share feedback and/or issues, contact [Support](https://support.cockroachlabs.com/hc).
{{site.data.alerts.end}}

### Generic query plans

[Generic query plans]({% link {{ page.version.version }}/cost-based-optimizer.md %}#query-plan-type) are generated and optimized once without considering specific placeholder values, and are not regenerated on subsequent executions, unless the plan becomes stale due to [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}) or new [table statistics]({% link {{ page.version.version }}/cost-based-optimizer.md %}#table-statistics) and must be re-optimized. This approach eliminates most of the query latency attributed to planning.

### CockroachDB Cloud Folders

[Organizing CockroachDB {{ site.data.products.cloud }} clusters using folders]({% link cockroachcloud/folders.md %}) is in preview. Folders allow you to organize and manage access to your clusters according to your organization's requirements. For example, you can create top-level folders for each business unit in your organization, and within those folders, organize clusters by geographic location and then by level of maturity, such as production, staging, and testing.
Expand Down
85 changes: 73 additions & 12 deletions src/current/v23.2/cost-based-optimizer.md
Original file line number Diff line number Diff line change
Expand Up @@ -277,27 +277,87 @@ Only tables with `ZONE` [survivability]({% link {{ page.version.version }}/multi

## Query plan cache

CockroachDB uses a cache for the query plans generated by the optimizer. This can lead to faster query execution since the database can reuse a query plan that was previously calculated, rather than computing a new plan each time a query is executed.
CockroachDB caches some of the query plans generated by the optimizer. The query plan cache is used for the following types of statements:

- Prepared statements.
- Non-prepared statements using identical constant values.

Caching query plans leads to faster query execution: rather than generating a new plan each time a query is executed, CockroachDB reuses a query plan that was previously generated.

The query plan cache is enabled by default. To disable it, execute the following statement:

{% include_cached copy-clipboard.html %}
~~~ sql
> SET CLUSTER SETTING sql.query_cache.enabled = false;
SET CLUSTER SETTING sql.query_cache.enabled = false;
~~~

Only the following statements use the plan cache:
The following statements can use the plan cache: [`SELECT`]({% link {{ page.version.version }}/select-clause.md %}), [`INSERT`]({% link {{ page.version.version }}/insert.md %}), [`UPDATE`]({% link {{ page.version.version }}/update.md %}), [`UPSERT`]({% link {{ page.version.version }}/upsert.md %}), and [`DELETE`]({% link {{ page.version.version }}/delete.md %}).

- [`SELECT`]({% link {{ page.version.version }}/select-clause.md %})
- [`INSERT`]({% link {{ page.version.version }}/insert.md %})
- [`UPDATE`]({% link {{ page.version.version }}/update.md %})
- [`UPSERT`]({% link {{ page.version.version }}/upsert.md %})
- [`DELETE`]({% link {{ page.version.version }}/delete.md %})
Two types of plans can be cached: custom and generic. Refer to [Query plan type](#query-plan-type).

The optimizer can use cached plans if they are:
### Query plan type

- Prepared statements.
- Non-prepared statements using identical constant values.
Two types of plans can be cached:

- *Custom* query plans are generated for a given query structure and optimized for specific placeholder values, and are re-optimized on subsequent executions. By default, the optimizer uses custom plans. Custom plans are included in the [plan cache](#query-plan-cache).
- {% include_cached new-in.html version="v23.2.12" %} *Generic* query plans are generated and optimized once without considering specific placeholder values, and are **not** regenerated on subsequent executions, unless the plan becomes stale due to [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}) or new [table statistics](#table-statistics) and must be re-optimized. This approach eliminates most of the query latency attributed to planning. For example:

~~~ sql
PREPARE p AS SELECT ...;
SET plan_cache_mode = force_generic_plan;
EXECUTE p; -- The query plan is generated and optimized.
EXECUTE p; -- The query plan is reused without re-optimizing.
EXECUTE p; -- The query plan is reused without re-optimizing.
~~~

Generic plans are **not** included in the plan cache, but are cached per session. This means that they must still be re-optimized each time a session prepares a statement using a generic plan. To reuse generic query plans for maximum performance, a prepared statement should be executed multiple times instead of prepared and executed once.

This feature is in [preview]({% link {{ page.version.version }}/cockroachdb-feature-availability.md %}) and is subject to change.

{{site.data.alerts.callout_success}}
Generic query plans will only benefit workloads that use prepared statements, which are issued via explicit `PREPARE` statements or by client libraries using the [PostgreSQL extended wire protocol](https://www.postgresql.org/docs/current/protocol-flow.html#PROTOCOL-FLOW-EXT-QUERY). Generic query plans are most beneficial for queries with high planning times, such as queries with many [joins]({% link {{ page.version.version }}/joins.md %}). For more information on reducing planning time for such queries, refer to [Reduce planning time for queries with many joins](#reduce-planning-time-for-queries-with-many-joins).
{{site.data.alerts.end}}

To change the type of plan that is cached, use the [`plan_cache_mode`]({% link {{ page.version.version }}/session-variables.md %}#plan-cache-mode) session setting. This setting applies when a statement is executed, not when it is prepared. Statements are therefore not associated with a specific query plan type when they are prepared.

The following modes can be set:

- `force_custom_plan` (default): Force the use of custom plans.
- `force_generic_plan`: Force the use of generic plans.
- `auto`: Automatically determine whether to use custom or generic query plans for prepared statements. Custom plans are used for the first five statement executions. Subsequent executions use a generic plan if its estimated cost is not significantly higher than the average cost of the preceding custom plans.

{{site.data.alerts.callout_info}}
Generic plans are always used for non-prepared statements that do not contain placeholders or [stable functions]({% link {{ page.version.version }}/functions-and-operators.md %}#function-volatility), regardless of the `plan_cache_mode` setting.
{{site.data.alerts.end}}

In some cases, generic query plans are less efficient than custom plans. For this reason, Cockroach Labs recommends setting `plan_cache_mode` to `auto` instead of `force_generic_plan`. Under the `auto` setting, the optimizer avoids bad generic plans by falling back to custom plans. For example:

Set `plan_cache_mode` to `auto` at the session level:

{% include_cached copy-clipboard.html %}
~~~ sql
SET plan_cache_mode = auto
~~~

At the [database level]({% link {{ page.version.version }}/alter-database.md %}#set-session-variable):

{% include_cached copy-clipboard.html %}
~~~ sql
ALTER DATABASE db SET plan_cache_mode = auto;
~~~

At the [role level]({% link {{ page.version.version }}/alter-role.md %}#set-default-session-variable-values-for-a-role):

{% include_cached copy-clipboard.html %}
~~~ sql
ALTER ROLE db_user SET plan_cache_mode = auto;
~~~

To verify the plan type used by a query, check the [`EXPLAIN ANALYZE`]({% link {{ page.version.version }}/explain-analyze.md %}) output for the query.

- If a generic query plan is optimized for the current execution, the `plan type` in the output is `generic, re-optimized`.
- If a generic query plan is reused for the current execution without performing optimization, the `plan type` in the output is `generic, reused`.
- If a custom query plan is used for the current execution, the `plan type` in the output is `custom`.

## Join reordering

Expand All @@ -309,7 +369,7 @@ To change this setting, which is controlled by the `reorder_joins_limit` [sessio

{% include_cached copy-clipboard.html %}
~~~ sql
> SET reorder_joins_limit = 0;
SET reorder_joins_limit = 0;
~~~

To disable this feature, set the variable to `0`. You can configure the default `reorder_joins_limit` session setting with the [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) `sql.defaults.reorder_joins_limit`, which has a default value of `8`.
Expand All @@ -328,6 +388,7 @@ The cost-based optimizer explores multiple join orderings to find the lowest-cos

- To limit the size of the subtree that can be reordered, set the `reorder_joins_limit` [session variable]({% link {{ page.version.version }}/set-vars.md %}) to a lower value, for example:

{% include_cached copy-clipboard.html %}
~~~ sql
SET reorder_joins_limit = 2;
~~~
Expand Down
Loading

0 comments on commit 6fcf900

Please sign in to comment.