Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add zone config troubleshooting guide #19283

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
[Zone configurations]({% link {{ page.version.version }}/configure-replication-zones.md %}) present on the destination cluster prior to a restore will be **overwritten** during a [cluster restore]({% link {{ page.version.version }}/restore.md %}#full-cluster) with the zone configurations from the [backed up cluster]({% link {{ page.version.version }}/backup.md %}#back-up-a-cluster). If there were no customized zone configurations on the cluster when the backup was taken, then after the restore the destination cluster will use the zone configuration from the [`RANGE DEFAULT` configuration]({% link {{ page.version.version }}/configure-replication-zones.md %}#view-the-default-replication-zone).
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
For instructions showing how to troubleshoot replication zones that may be misconfigured, see [Troubleshoot Replication Zone Configurations]({% link {{ page.version.version}}/troubleshoot-replication-zones.md %}).
6 changes: 6 additions & 0 deletions src/current/_includes/v24.3/sidebar-data/troubleshooting.json
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,12 @@
"/${VERSION}/query-replication-reports.html"
]
},
{
"title": "Troubleshoot Replication Zones",
"urls": [
"/${VERSION}/troubleshoot-replication-zones.html"
]
},
{
"title": "Benchmarking",
"items": [
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Cockroach Labs {% if page.name != "configure-replication-zones.md" %} [does not recommend modifying zone configurations manually]({% link {{ page.version.version }}/configure-replication-zones.md %}#why-manual-zone-config-management-is-not-recommended). {% else %} [does not recommend modifying zone configurations manually](#why-manual-zone-config-management-is-not-recommended). {% endif %}

Most users should use [Multi-region SQL statements]({% link {{ page.version.version }}/multiregion-overview.md %}) instead; if additional control is needed, [Zone config extensions]({% link {{ page.version.version }}/zone-config-extensions.md %}) can be used to augment the multi-region SQL statements.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have a similar comment here as the outline-style diagram, but also:

i would expect a direct arrow from index (on the left) to the partitions -- right now, it looks like the inheritance has table as the parent and index, partition, and subpartitions as the direct children

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the intent was to show table / index level partitions? Probably naming would help the individual examples..I did a quick thing with PlantUML to split it off, but we need to call both of them Partition

image

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
19 changes: 19 additions & 0 deletions src/current/v24.3/alter-database.md
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,10 @@ For usage, see [Synopsis](#synopsis).
If you directly change a database's zone configuration with `ALTER DATABASE ... CONFIGURE ZONE`, CockroachDB will block all [`ALTER DATABASE ... SET PRIMARY REGION`](#set-primary-region) statements on the database.
{{site.data.alerts.end}}

{{site.data.alerts.callout_danger}}
{% include {{ page.version.version }}/zone-configs/avoid-manual-zone-configs.md %}
{{site.data.alerts.end}}

You can use *replication zones* to control the number and location of replicas for specific sets of data, both when replicas are first added and when they are rebalanced to maintain cluster equilibrium.

For examples, see [Replication Controls](#configure-replication-zones).
Expand Down Expand Up @@ -689,6 +693,10 @@ HINT: you must first drop super region usa before you can drop the region us-wes

### Configure replication zones

{{site.data.alerts.callout_danger}}
{% include {{ page.version.version }}/zone-configs/avoid-manual-zone-configs.md %}
{{site.data.alerts.end}}

{% include {{ page.version.version }}/sql/movr-statements-geo-partitioned-replicas.md %}

#### Create a replication zone for a database
Expand All @@ -715,6 +723,10 @@ You cannot `DISCARD` any zone configurations on multi-region tables, indexes, or
ALTER DATABASE movr CONFIGURE ZONE DISCARD;
~~~

### Troubleshoot replication zones

{% include {{ page.version.version }}/see-zone-config-troubleshooting-guide.md %}

### Use Zone Config Extensions

The following examples show:
Expand Down Expand Up @@ -1078,6 +1090,12 @@ When you discard a zone configuration, the objects it was applied to will then i
However, this statement will not remove any configuration created by the [multi-region abstractions]({% link {{ page.version.version }}/multiregion-overview.md %}).
{{site.data.alerts.end}}

#### Troubleshoot Zone Config Extensions

The process for troubleshooting Zone Config Extensions is the same as troubleshooting any other changes to zone configs.

{% include {{ page.version.version }}/see-zone-config-troubleshooting-guide.md %}

### Change database owner

{% include {{page.version.version}}/sql/movr-statements.md %}
Expand Down Expand Up @@ -1283,3 +1301,4 @@ For more information about the region survival goal, see [Surviving region failu
- [`ALTER TABLE`]({% link {{ page.version.version }}/alter-table.md %})
- [Online Schema Changes]({% link {{ page.version.version }}/online-schema-changes.md %})
- [SQL Statements]({% link {{ page.version.version }}/sql-statements.md %})
- [Troubleshoot Replication Zones]({% link {{ page.version.version}}/troubleshoot-replication-zones.md %})
8 changes: 6 additions & 2 deletions src/current/v24.3/alter-index.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,12 +47,12 @@ Subcommand | Description |

`ALTER INDEX ... CONFIGURE ZONE` is used to add, modify, reset, or remove replication zones for an index. To view details about existing replication zones, use [`SHOW ZONE CONFIGURATIONS`]({% link {{ page.version.version }}/show-zone-configurations.md %}). For more information about replication zones, see [Replication Controls]({% link {{ page.version.version }}/configure-replication-zones.md %}).



You can use *replication zones* to control the number and location of replicas for specific sets of data, both when replicas are first added and when they are rebalanced to maintain cluster equilibrium.

For examples, see [Replication Controls](#configure-replication-zones).

{% include {{ page.version.version }}/see-zone-config-troubleshooting-guide.md %}

#### Required privileges

The user must be a member of the [`admin` role]({% link {{ page.version.version }}/security-reference/authorization.md %}#admin-role) or have been granted [`CREATE`]({% link {{ page.version.version }}/security-reference/authorization.md %}#supported-privileges) or [`ZONECONFIG`]({% link {{ page.version.version }}/security-reference/authorization.md %}#supported-privileges) privileges. To configure [`system` objects]({% link {{ page.version.version }}/configure-replication-zones.md %}#for-system-data), the user must be a member of the `admin` role.
Expand Down Expand Up @@ -225,6 +225,10 @@ You cannot `DISCARD` any zone configurations on multi-region tables, indexes, or
ALTER INDEX vehicles@vehicles_auto_index_fk_city_ref_users CONFIGURE ZONE DISCARD;
~~~

#### Troubleshoot replication zones

{% include {{ page.version.version }}/see-zone-config-troubleshooting-guide.md %}

### Define partitions

#### Define a list partition on an index
Expand Down
10 changes: 10 additions & 0 deletions src/current/v24.3/alter-partition.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ docs_area: reference.sql

To view details about existing replication zones, use [`SHOW ZONE CONFIGURATIONS`]({% link {{ page.version.version }}/show-zone-configurations.md %}). For more information about replication zones, see [Replication Controls]({% link {{ page.version.version }}/configure-replication-zones.md %}).

{% include {{ page.version.version }}/see-zone-config-troubleshooting-guide.md %}

You can use *replication zones* to control the number and location of replicas for specific sets of data, both when replicas are first added and when they are rebalanced to maintain cluster equilibrium.


Expand Down Expand Up @@ -44,3 +46,11 @@ The user must have the [`CREATE`]({% link {{ page.version.version }}/grant.md %}
### Create a replication zone for a partition

{% include {{ page.version.version }}/zone-configs/create-a-replication-zone-for-a-table-partition.md hide-enterprise-warning="true" %}

{% include {{ page.version.version }}/see-zone-config-troubleshooting-guide.md %}

## See also

- [Table partitioning]({% link {{page.version.version}}/partitioning.md %})
- [`SHOW PARTITIONS`]({% link {{page.version.version}}/show-partitions.md %})
- [Troubleshoot Replication Zones]({% link {{ page.version.version}}/troubleshoot-replication-zones.md %})
16 changes: 9 additions & 7 deletions src/current/v24.3/alter-range.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,9 +34,11 @@ Additional parameters are documented for the respective [subcommands](#subcomman

### `CONFIGURE ZONE`

`ALTER RANGE ... CONFIGURE ZONE` is used to add, modify, reset, or remove replication zones for a range. To view details about existing replication zones, see [`SHOW ZONE CONFIGURATIONS`]({% link {{ page.version.version }}/show-zone-configurations.md %}).
`ALTER RANGE ... CONFIGURE ZONE` is used to add, modify, reset, or remove [replication zones]({% link {{ page.version.version }}/configure-replication-zones.md %}) for a range. To view details about existing replication zones, see [`SHOW ZONE CONFIGURATIONS`]({% link {{ page.version.version }}/show-zone-configurations.md %}).

You can use *replication zones* to control the number and location of replicas for specific sets of data, both when replicas are first added and when they are rebalanced to maintain cluster equilibrium.
You can use replication zones to control the number and location of replicas for specific sets of data, both when replicas are first added and when they are rebalanced to maintain cluster equilibrium.

{% include {{ page.version.version }}/see-zone-config-troubleshooting-guide.md %}

#### Required privileges

Expand Down Expand Up @@ -121,7 +123,7 @@ For example, to get all range IDs, leaseholder store IDs, and leaseholder locali

{% include_cached copy-clipboard.html %}
~~~ sql
WITH user_info AS (SHOW RANGES FROM TABLE users) SELECT range_id, lease_holder, lease_holder_locality FROM user_info;
WITH user_info AS (SHOW RANGES FROM TABLE users WITH DETAILS) SELECT range_id, lease_holder, lease_holder_locality FROM user_info;
~~~

~~~
Expand Down Expand Up @@ -163,7 +165,7 @@ To move the leases for all data in the [`movr.users`]({% link {{ page.version.ve

{% include_cached copy-clipboard.html %}
~~~ sql
ALTER RANGE RELOCATE LEASE TO 2 FOR SELECT range_id from crdb_internal.ranges where table_name = 'users'
ALTER RANGE RELOCATE LEASE TO 2 FOR SELECT range_id from [SHOW RANGES FROM TABLE users WITH DETAILS];
~~~

~~~
Expand Down Expand Up @@ -205,7 +207,7 @@ To move the replicas for all data in the [`movr.users`]({% link {{ page.version.

{% include_cached copy-clipboard.html %}
~~~ sql
ALTER RANGE RELOCATE FROM 2 TO 7 FOR SELECT range_id from crdb_internal.ranges where table_name = 'users';
ALTER RANGE RELOCATE FROM 2 TO 7 FOR SELECT range_id from [SHOW RANGES FROM TABLE users WITH DETAILS];
~~~

~~~
Expand All @@ -231,7 +233,7 @@ To move all of a range's voting replicas from one store to another store:

{% include_cached copy-clipboard.html %}
~~~ sql
ALTER RANGE RELOCATE VOTERS FROM 7 TO 2 FOR SELECT range_id from crdb_internal.ranges where table_name = 'users';
ALTER RANGE RELOCATE VOTERS FROM 7 TO 2 FOR SELECT range_id from [SHOW RANGES FROM TABLE users WITH DETAILS];
~~~

~~~
Expand Down Expand Up @@ -261,7 +263,7 @@ This statement will only have an effect on clusters that have non-voting replica

{% include_cached copy-clipboard.html %}
~~~ sql
ALTER RANGE RELOCATE NONVOTERS FROM 7 TO 2 FOR SELECT range_id from crdb_internal.ranges where table_name = 'users';
ALTER RANGE RELOCATE NONVOTERS FROM 7 TO 2 FOR SELECT range_id from [SHOW RANGES FROM TABLE users WITH DETAILS];
~~~

~~~
Expand Down
4 changes: 2 additions & 2 deletions src/current/v24.3/alter-table.md
Original file line number Diff line number Diff line change
Expand Up @@ -223,6 +223,8 @@ You can use *replication zones* to control the number and location of replicas f

For examples, see [Replication Controls](#configure-replication-zones).

{% include {{ page.version.version }}/see-zone-config-troubleshooting-guide.md %}

#### Required privileges

The user must be a member of the [`admin` role]({% link {{ page.version.version }}/security-reference/authorization.md %}#admin-role) or have been granted [`CREATE`]({% link {{ page.version.version }}/security-reference/authorization.md %}#supported-privileges) or [`ZONECONFIG`]({% link {{ page.version.version }}/security-reference/authorization.md %}#supported-privileges) privileges. To configure [`system` objects]({% link {{ page.version.version }}/configure-replication-zones.md %}#for-system-data), the user must be a member of the `admin` role.
Expand Down Expand Up @@ -358,8 +360,6 @@ For usage, see [Synopsis](#synopsis).

`ALTER TABLE ... PARTITION BY` is used to partition, re-partition, or un-partition a table. After defining partitions, [`CONFIGURE ZONE`](#configure-zone) is used to control the replication and placement of partitions.



For examples, see [Define partitions](#define-partitions).

#### Parameters
Expand Down
3 changes: 2 additions & 1 deletion src/current/v24.3/backup.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,10 +33,10 @@ To view the contents of an backup created with the `BACKUP` statement, use [`SHO
## Considerations

- [Full cluster backups](#back-up-a-cluster) include [license keys]({% link {{ page.version.version }}/licensing-faqs.md %}#set-a-license). When you [restore]({% link {{ page.version.version }}/restore.md %}) a full cluster backup that includes a license, the license is also restored.
- [Zone configurations]({% link {{ page.version.version }}/configure-replication-zones.md %}) present on the destination cluster prior to a restore will be **overwritten** during a [cluster restore]({% link {{ page.version.version }}/restore.md %}#full-cluster) with the zone configurations from the [backed up cluster](#back-up-a-cluster). If there were no customized zone configurations on the cluster when the backup was taken, then after the restore the destination cluster will use the zone configuration from the [`RANGE DEFAULT` configuration]({% link {{ page.version.version }}/configure-replication-zones.md %}#view-the-default-replication-zone).
- You cannot restore a backup of a multi-region database into a single-region database.
- Exclude a table's row data from a backup using the [`exclude_data_from_backup`]({% link {{ page.version.version }}/take-full-and-incremental-backups.md %}#exclude-a-tables-data-from-backups) parameter.
- `BACKUP` is a blocking statement. To run a backup job asynchronously, use the `DETACHED` option. See the [options](#options) below.
- {% include {{ page.version.version }}/backups/zone-configs-overwritten-during-restore.md %}

### Storage considerations

Expand Down Expand Up @@ -378,3 +378,4 @@ To use an external connection URI to back up to cloud storage with an associated
- [`CREATE SCHEDULE FOR BACKUP`]({% link {{ page.version.version }}/create-schedule-for-backup.md %})
- [`RESTORE`]({% link {{ page.version.version }}/restore.md %})
- [Replication Controls]({% link {{ page.version.version }}/configure-replication-zones.md %})
- [Troubleshoot Replication Zones]({% link {{ page.version.version}}/troubleshoot-replication-zones.md %})
2 changes: 1 addition & 1 deletion src/current/v24.3/cluster-api.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ Endpoint | Name | Description | Support
[`/databases/{database}`](https://cockroachlabs.com/docs/api/cluster/v2.html#operation/databaseDetails) | Get database details | Get the descriptor ID of a specified database. | Stable
[`/databases/{database}/grants`](https://cockroachlabs.com/docs/api/cluster/v2.html#operation/databaseGrants) | List database grants | List all [privileges](security-reference/authorization.html#managing-privileges) granted to users for a specified database. | Stable
[`/databases/{database}/tables`](https://cockroachlabs.com/docs/api/cluster/v2.html#operation/databaseTables) | List database tables | List all tables in a specified database. | Stable
[`/databases/{database}/tables/{table}`](https://cockroachlabs.com/docs/api/cluster/v2.html#operation/tableDetails) | Get table details | Get details on a specified table, including schema, grants, indexes, range count, and zone configuration. | Stable
[`/databases/{database}/tables/{table}`](https://cockroachlabs.com/docs/api/cluster/v2.html#operation/tableDetails) | Get table details | Get details on a specified table, including schema, grants, indexes, range count, and [zone configurations]({% link {{ page.version.version }}/configure-replication-zones.md %}). | Stable
[`/events`](https://cockroachlabs.com/docs/api/cluster/v2.html#operation/listEvents) | List events | List the latest [events](eventlog.html) on the cluster, in descending order. | Unstable
[`/health`](https://cockroachlabs.com/docs/api/cluster/v2.html#operation/health) | Check node health | Determine if the node is running and ready to accept SQL connections. | Stable
[`/nodes`](https://cockroachlabs.com/docs/api/cluster/v2.html#operation/listNodes) | List nodes | Get details on all nodes in the cluster, including node IDs, software versions, and hardware. | Stable
Expand Down
24 changes: 12 additions & 12 deletions src/current/v24.3/cluster-setup-troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -587,6 +587,18 @@ If you still see under-replicated/unavailable ranges on the Cluster Overview pag
1. To view the **Range Report** for a range, click on the range number in the **Under-replicated (or slow)** table or **Unavailable** table.
1. On the Range Report page, scroll down to the **Simulated Allocator Output** section. The table contains an error message which explains the reason for the under-replicated range. Follow the guidance in the message to resolve the issue. If you need help understanding the error or the guidance, [file an issue]({% link {{ page.version.version }}/file-an-issue.md %}). Please be sure to include the full Range Report and error message when you submit the issue.

#### Check for under-replicated or unavailable data

To see if any data is under-replicated or unavailable in your cluster, follow the steps described in [Critical nodes endpoint]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#critical-nodes-endpoint).

#### Check for replication zone constraint violations

To see if any of your cluster's [data placement constraints]({% link {{ page.version.version }}/configure-replication-zones.md %}#replication-constraints) are being violated, follow the steps described in [Check for critical localities](#check-for-critical-localities).

#### Check for critical localities

To see which of your [localities]({% link {{ page.version.version }}/cockroach-start.md %}#locality) (if any) are critical, follow the steps described in the [Critical nodes endpoint documentation]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#critical-nodes-endpoint). A locality is "critical" for a range if all of the nodes in that locality becoming [unreachable](#node-liveness-issues) would cause the range to become unavailable. In other words, the locality contains a majority of the range's replicas.

## Node liveness issues

"Node liveness" refers to whether a node in your cluster has been determined to be "dead" or "alive" by the rest of the cluster. This is achieved using checks that ensure that each node connected to the cluster is updating its liveness record. This information is shared with the rest of the cluster using an internal gossip protocol.
Expand Down Expand Up @@ -633,18 +645,6 @@ If your cluster is in a partially-available state due to a recent node or networ

Even with `server.eventlog.enabled` set to `false`, notable log events are still sent to configured [log sinks]({% link {{ page.version.version }}/configure-logs.md %}#configure-log-sinks) as usual.

## Check for under-replicated or unavailable data

To see if any data is under-replicated or unavailable in your cluster, follow the steps described in [Replication Reports]({% link {{ page.version.version }}/query-replication-reports.md %}).

## Check for replication zone constraint violations

To see if any of your cluster's [data placement constraints]({% link {{ page.version.version }}/configure-replication-zones.md %}#replication-constraints) are being violated, follow the steps described in [Replication Reports]({% link {{ page.version.version }}/query-replication-reports.md %}).

## Check for critical localities

To see which of your [localities]({% link {{ page.version.version }}/cockroach-start.md %}#locality) (if any) are critical, follow the steps described in [Replication Reports]({% link {{ page.version.version }}/query-replication-reports.md %}). A locality is "critical" for a range if all of the nodes in that locality becoming [unreachable](#node-liveness-issues) would cause the range to become unavailable. In other words, the locality contains a majority of the range's replicas.

## Something else?

If we do not have a solution here, you can try using our other [support resources]({% link {{ page.version.version }}/support-resources.md %}), including:
Expand Down
Loading
Loading