The cost-based optimizer seeks the lowest cost for a query, usually related to time.
How is cost calculated?
A given SQL query can have thousands of equivalent query plans with vastly different execution times. The cost-based optimizer enumerates these plans and chooses the lowest cost plan.
Cost is roughly calculated by:
- Estimating how much time each node in the query plan will use to process all results
- Modeling how data flows through the query plan
The most important factor in determining the quality of a plan is cardinality (i.e., the number of rows); the fewer rows each SQL operator needs to process, the faster the query will run.
Table statistics
The cost-based optimizer can often find more performant query plans if it has access to statistical data on the contents of your tables. This data needs to be generated from scratch for new tables, and regenerated periodically for existing tables.
By default, CockroachDB generates table statistics automatically when tables are created, and as they are updated. It does this using a background job that automatically determines which columns to get statistics on — specifically, it chooses:
- Columns that are part of the primary key or an index (in other words, all indexed columns).
- Up to 100 non-indexed columns.
Schema changes trigger automatic statistics collection for the affected table(s).
Controlling automatic statistics
For best query performance, most users should leave automatic statistics enabled with the default settings. The information provided in this section is useful for troubleshooting or performance tuning by advanced users.
Controlling statistics refresh rate
Statistics are refreshed in the following cases:
- When there are no statistics.
- When it's been a long time since the last refresh, where "long time" is defined according to a moving average of the time across the last several refreshes.
- After each mutation operation (
INSERT
,UPDATE
, orDELETE
), the probability of a refresh is calculated using a formula that takes the cluster settings shown below as inputs. These settings define the target number of rows in a table that should be stale before statistics on that table are refreshed. Increasing either setting will reduce the frequency of refreshes. In particular,min_stale_rows
impacts the frequency of refreshes for small tables, whilefraction_stale_rows
has more of an impact on larger tables.
Setting | Default Value | Details |
---|---|---|
sql.stats.automatic_collection.fraction_stale_rows |
0.2 | Target fraction of stale rows per table that will trigger a statistics refresh |
sql.stats.automatic_collection.min_stale_rows |
500 | Target minimum number of stale rows per table that will trigger a statistics refresh |
Because the formula for statistics refreshes is probabilistic, you should not expect to see statistics update immediately after changing these settings, or immediately after exactly 500 rows have been updated.
Turning off statistics
If you need to turn off automatic statistics collection, follow the steps below:
Run the following statement to disable the automatic statistics cluster setting:
> SET CLUSTER SETTING sql.stats.automatic_collection.enabled = false;
Use the
SHOW STATISTICS
statement to view automatically generated statistics.Delete the automatically generated statistics using the following statement:
> DELETE FROM system.table_statistics WHERE true;
Restart the nodes in your cluster to clear the statistics caches.
For instructions showing how to manually generate statistics, see the examples in the CREATE STATISTICS
documentation.
Controlling histogram collection
New in v19.2: By default, the optimizer collects histograms for all index columns (specifically the first column in each index) during automatic statistics collection. If a single column statistic is explicitly requested using manual invocation of CREATE STATISTICS
, a histogram will be collected, regardless of whether or not the column is part of an index.
If you are an advanced user and need to disable histogram collection for troubleshooting or performance tuning reasons, change the sql.stats.histogram_collection.enabled
cluster setting by running SET CLUSTER SETTING
as follows:
SET CLUSTER SETTING sql.stats.histogram_collection.enabled = false;
When sql.stats.histogram_collection.enabled
is set to false
, histograms are never collected, either as part of automatic statistics collection or by manual invocation of CREATE STATISTICS
.
Query plan cache
CockroachDB uses a cache for the query plans generated by the optimizer. This can lead to faster query execution since the database can reuse a query plan that was previously calculated, rather than computing a new plan each time a query is executed.
The query plan cache is enabled by default. To disable it, execute the following statement:
> SET CLUSTER SETTING sql.query_cache.enabled = false;
Finally, note that only the following statements use the plan cache:
Join reordering
The cost-based optimizer will explore additional join orderings in an attempt to find the lowest-cost execution plan for a query involving multiple joins, which can lead to significantly better performance in some cases.
Because this process leads to an exponential increase in the number of possible execution plans for such queries, it's only used to reorder subtrees containing 4 or fewer joins by default.
To change this setting, which is controlled by the reorder_joins_limit
session variable, run the statement shown below. To disable this feature, set the variable to 0
.
> SET reorder_joins_limit = 6;
We strongly recommend not setting this value higher than 8 to avoid performance degradation. If set too high, the cost of generating and costing execution plans can end up dominating the total execution time of the query.
For more information about the difficulty of selecting an optimal join ordering, see our blog post An Introduction to Join Ordering.
Join hints
The optimizer supports hint syntax to force the use of a specific join algorithm. The algorithm is specified between the join type (INNER
, LEFT
, etc.) and the JOIN
keyword, for example:
INNER HASH JOIN
OUTER MERGE JOIN
LEFT LOOKUP JOIN
CROSS HASH JOIN
Note that the hint cannot be specified with a bare hint keyword (e.g., MERGE
) - in that case, the INNER
keyword must be added. For example, a INNER MERGE JOIN b
will work, but a MERGE JOIN b
will not work.
Join hints cannot be specified with a bare hint keyword (e.g., MERGE
) due to SQL's implicit AS
syntax. If you're not careful, you can make MERGE
be an alias for a table; for example, a MERGE JOIN b
will be interpreted as having an implicit AS
and be executed as a AS MERGE JOIN b
, which is just a long way of saying a JOIN b
. Because the resulting query might execute without returning any hint-related error (because it is valid SQL), it will seem like the join hint "worked", but actually it didn't affect which join algorithm was used. In this case, the correct syntax is a INNER MERGE JOIN b
.
Supported join algorithms
HASH
: Forces a hash join; in other words, it disables merge and lookup joins. A hash join is always possible, even if there are no equality columns - CockroachDB considers the nested loop join with no index a degenerate case of the hash join (i.e., a hash table with one bucket).MERGE
: Forces a merge join, even if it requires re-sorting both sides of the join.LOOKUP
: Forces a lookup join into the right side; the right side must be a table with a suitable index. Note thatLOOKUP
can only be used withINNER
andLEFT
joins.
If it is not possible to use the algorithm specified in the hint, an error is signaled.
Additional considerations
This syntax is consistent with the SQL Server syntax for join hints, except that:
- SQL Server uses
LOOP
instead ofLOOKUP
. - CockroachDB does not support
LOOP
and instead supportsLOOKUP
for the specific case of nested loop joins with an index.
- SQL Server uses
When a join hint is specified, the two tables will not be reordered by the optimizer. The reordering behavior has the following characteristics, which can be affected by hints:
- Given
a JOIN b
, CockroachDB will not try to commute tob JOIN a
. This means that you will need to pay attention to this ordering, which is especially important for lookup joins. Without a hint,a JOIN b
might be executed asb INNER LOOKUP JOIN a
using an index intoa
, whereasa INNER LOOKUP JOIN b
requires an index intob
. (a JOIN b) JOIN c
might be changed toa JOIN (b JOIN c)
, but this does not happen ifa JOIN b
uses a hint; the hint forces that particular join to happen as written in the query.
- Given
Hint usage should be reconsidered with each new release of CockroachDB. Due to improvements in the optimizer, hints specified to work with an older version may cause decreased performance in a newer version.
Preferring the nearest index
Given multiple identical indexes that have different locality constraints using replication zones, the optimizer will prefer the index that is closest to the gateway node that is planning the query. In a properly configured geo-distributed cluster, this can lead to performance improvements due to improved data locality and reduced network traffic.
This feature is only available to users with an enterprise license. For insight into how to use this feature to get low latency, consistent reads in multi-region deployments, see the Duplicate Indexes topology pattern.
This feature enables scenarios such as:
- Reference data such as a table of postal codes that can be replicated to different regions, and queries will use the copy in the same region. See Example - zone constraints for more details.
- Optimizing for local reads (potentially at the expense of writes) by adding leaseholder preferences to your zone configuration. See Example - leaseholder preferences for more details.
To take advantage of this feature, you need to:
- Have an enterprise license.
- Determine which data consists of reference tables that are rarely updated (such as postal codes) and can therefore be easily replicated to different regions.
- Create multiple secondary indexes on the reference tables. Note that these indexes must include (in key or using
STORED
) every column that you wish to query. For example, if you runSELECT * from db.table
and not every column ofdb.table
is in the set of secondary indexes you created, the optimizer will have no choice but to fall back to the primary index. - Create replication zones for each index.
With the above pieces in place, the optimizer will automatically choose the index nearest the gateway node that is planning the query.
The optimizer does not actually understand geographic locations, i.e., the relative closeness of the gateway node to other nodes that are located to its "east" or "west". It is matching against the node locality constraints you provided when you configured your replication zones.
Examples
Zone constraints
We can demonstrate the necessary configuration steps using a local cluster. The instructions below assume that you are already familiar with:
- How to start a local cluster.
- The syntax for assigning node locality when configuring replication zones.
- Using the built-in SQL client.
First, start 3 local nodes as shown below. Use the --locality
flag to put them each in a different region as denoted by region=usa
, region=eu
, etc.
$ cockroach start --locality=region=usa --insecure --store=/tmp/node0 --listen-addr=localhost:26257 \
--http-port=8888 --join=localhost:26257,localhost:26258,localhost:26259 --background
$ cockroach start --locality=region=eu --insecure --store=/tmp/node1 --listen-addr=localhost:26258 \
--http-port=8889 --join=localhost:26257,localhost:26258,localhost:26259 --background
$ cockroach start --locality=region=apac --insecure --store=/tmp/node2 --listen-addr=localhost:26259 \
--http-port=8890 --join=localhost:26257,localhost:26258,localhost:26259 --background
$ cockroach init --insecure --host=localhost --port=26257
Next, from the SQL client, add your organization name and enterprise license:
$ cockroach sql --insecure --host=localhost --port=26257
> SET CLUSTER SETTING cluster.organization = 'FooCorp - Local Testing';
> SET CLUSTER SETTING enterprise.license = 'xxxxx';
Create a test database and table. The table will have 3 indexes into the same data. Later, we'll configure the cluster to associate each of these indexes with a different datacenter using replication zones.
> CREATE DATABASE IF NOT EXISTS test;
> USE test;
CREATE TABLE postal_codes (
id INT PRIMARY KEY,
code STRING,
INDEX idx_eu (id) STORING (code),
INDEX idx_apac (id) STORING (code)
);
Next, we modify the replication zone configuration via SQL so that:
- Nodes in the USA will use the primary key index.
- Nodes in the EU will use the
postal_codes@idx_eu
index (which is identical to the primary key index). - Nodes in APAC will use the
postal_codes@idx_apac
index (which is also identical to the primary key index).
ALTER TABLE postal_codes CONFIGURE ZONE USING constraints='["+region=usa"]';
ALTER INDEX postal_codes@idx_eu CONFIGURE ZONE USING constraints='["+region=eu"]';
ALTER INDEX postal_codes@idx_apac CONFIGURE ZONE USING constraints='["+region=apac"]';
To verify this feature is working as expected, we'll query the database from each of our local nodes as shown below. Each node has been configured to be in a different region, and it should now be using the index pinned to that region.
In a geo-distributed scenario with a cluster that spans multiple datacenters, it may take time for the optimizer to fetch schemas from other nodes the first time a query is planned; thereafter, the schema should be cached locally.
For example, if you have 11 nodes, you may see 11 queries with high latency due to schema cache misses. Once all nodes have cached the schema locally, the latencies will drop.
This behavior may also cause the Statements page of the Web UI to show misleadingly high latencies until schemas are cached locally.
As expected, the node in the USA region uses the primary key index.
$ cockroach sql --insecure --host=localhost --port=26257 --database=test -e 'EXPLAIN SELECT * FROM postal_codes WHERE id=1;'
tree | field | description
+------+-------+----------------------+
scan | |
| table | postal_codes@primary
| spans | /1-/1/#
(3 rows)
As expected, the node in the EU uses the idx_eu
index.
$ cockroach sql --insecure --host=localhost --port=26258 --database=test -e 'EXPLAIN SELECT * FROM postal_codes WHERE id=1;'
tree | field | description
+------+-------+---------------------+
scan | |
| table | postal_codes@idx_eu
| spans | /1-/2
(3 rows)
As expected, the node in APAC uses the idx_apac
index.
$ cockroach sql --insecure --host=localhost --port=26259 --database=test -e 'EXPLAIN SELECT * FROM postal_codes WHERE id=1;'
tree | field | description
+------+-------+-----------------------+
scan | |
| table | postal_codes@idx_apac
| spans | /1-/2
(3 rows)
You'll need to make changes to the above configuration to reflect your production environment, but the concepts will be the same.
Leaseholder preferences
If you provide leaseholder preferences in addition to replication zone constraints, the optimizer will attempt to take your leaseholder preferences into account as well when selecting an index for your query. There are several factors to keep in mind:
Zone constraints are always respected (hard constraint), whereas lease preferences are taken into account as "additional information" -- as long as they do not contradict the zone constraints.
The optimizer does not consider the real-time location of leaseholders when selecting an index; it is pattern matching on the text values passed in the configuration (e.g., the
ALTER INDEX
statements shown below). For the same reason, the optimizer only matches against the first locality in yourlease_preferences
array.The optimizer may use an index that satisfies your leaseholder preferences even though that index has moved to a different node/region due to leaseholder rebalancing. This can cause slower performance than you expected. Therefore, you should only use this feature if you’re confident you know where the leaseholders will end up based on your cluster's usage patterns. We recommend thoroughly testing your configuration to ensure the optimizer is selecting the index(es) you expect.
In this example, we'll set up an authentication service using the access token / refresh token pattern from OAuth 2. To support fast local reads in our geo-distributed use case, we will have 3 indexes into the same authentication data: one for each region of our cluster. We configure each index using zone configurations and lease preferences so that the optimizer will use the local index for better performance.
The instructions below assume that you are already familiar with:
- How to start a local cluster.
- The syntax for assigning node locality when configuring replication zones.
- Using the built-in SQL client.
First, start 3 local nodes as shown below. Use the --locality
flag to put them each in a different region.
$ cockroach start --locality=region=us-east --insecure --store=/tmp/node0 --listen-addr=localhost:26257 \
--http-port=8888 --join=localhost:26257,localhost:26258,localhost:26259 --background
$ cockroach start --locality=region=us-central --insecure --store=/tmp/node1 --listen-addr=localhost:26258 \
--http-port=8889 --join=localhost:26257,localhost:26258,localhost:26259 --background
$ cockroach start --locality=region=us-west --insecure --store=/tmp/node2 --listen-addr=localhost:26259 \
--http-port=8890 --join=localhost:26257,localhost:26258,localhost:26259 --background
$ cockroach init --insecure --host=localhost --port=26257
From the SQL client, add your organization name and enterprise license:
$ cockroach sql --insecure --host=localhost --port=26257
> SET CLUSTER SETTING cluster.organization = 'FooCorp - Local Testing';
> SET CLUSTER SETTING enterprise.license = 'xxxxx';
Create an authentication database and table:
> CREATE DATABASE if NOT EXISTS auth;
> USE auth;
> CREATE TABLE token (
token_id VARCHAR(100) NULL,
access_token VARCHAR(4000) NULL,
refresh_token VARCHAR(4000) NULL
);
Create the indexes for each region:
> CREATE INDEX token_id_west_idx ON token (token_id) STORING (access_token, refresh_token);
> CREATE INDEX token_id_central_idx ON token (token_id) STORING (access_token, refresh_token);
> CREATE INDEX token_id_east_idx ON token (token_id) STORING (access_token, refresh_token);
Enter zone configurations to distribute replicas across the cluster as follows:
- For the "East" index, store 2 replicas in the East, 2 in Central, and 1 in the West. Further, prefer that the leaseholders for that index live in the East or, failing that, in the Central region.
- Follow the same replica and leaseholder patterns for each of the Central and West regions.
The idea is that, for example, token_id_east_idx
will have sufficient replicas (2/5) so that even if one replica goes down, the leaseholder will stay in the East region. That way, if a query comes in that accesses the columns covered by that index from the East gateway node, the optimizer will select token_id_east_idx
for fast reads.
The ALTER TABLE
statement below is not required since it's later made redundant by the token_id_west_idx
index. In production, you might go with the ALTER TABLE
to put your table's lease preferences in the West, and then create only 2 indexes (for East and Central); however, the use of 3 indexes makes the example easier to understand.
> ALTER TABLE token CONFIGURE ZONE USING
num_replicas = 5, constraints = '{+region=us-east: 1, +region=us-central: 2, +region=us-west: 2}', lease_preferences = '[[+region=us-west], [+region=us-central]]';
> ALTER INDEX token_id_east_idx CONFIGURE ZONE USING num_replicas = 5,
constraints = '{+region=us-east: 2, +region=us-central: 2, +region=us-west: 1}', lease_preferences = '[[+region=us-east], [+region=us-central]]';
> ALTER INDEX token_id_central_idx CONFIGURE ZONE USING num_replicas = 5,
constraints = '{+region=us-east: 2, +region=us-central: 2, +region=us-west: 1}', lease_preferences = '[[+region=us-central], [+region=us-east]]';
> ALTER INDEX token_id_west_idx CONFIGURE ZONE USING num_replicas = 5,
constraints = '{+region=us-west: 2, +region=us-central: 2, +region=us-east: 1}', lease_preferences = '[[+region=us-west], [+region=us-central]]';
Next let's check our zone configurations to make sure they match our expectation:
> SHOW ZONE CONFIGURATIONS;
The output should include the following:
TABLE auth.public.token | ALTER TABLE auth.public.token CONFIGURE ZONE USING
| num_replicas = 5,
| constraints = '{+region=us-central: 2, +region=us-east: 1, +region=us-west: 2}',
| lease_preferences = '[[+region=us-west], [+region=us-central]]'
INDEX auth.public.token@token_id_east_idx | ALTER INDEX auth.public.token@token_id_east_idx CONFIGURE ZONE USING
| num_replicas = 5,
| constraints = '{+region=us-central: 2, +region=us-east: 2, +region=us-west: 1}',
| lease_preferences = '[[+region=us-east], [+region=us-central]]'
INDEX auth.public.token@token_id_central_idx | ALTER INDEX auth.public.token@token_id_central_idx CONFIGURE ZONE USING
| num_replicas = 5,
| constraints = '{+region=us-central: 2, +region=us-east: 2, +region=us-west: 1}',
| lease_preferences = '[[+region=us-central], [+region=us-east]]'
INDEX auth.public.token@token_id_west_idx | ALTER INDEX auth.public.token@token_id_west_idx CONFIGURE ZONE USING
| num_replicas = 5,
| constraints = '{+region=us-central: 2, +region=us-east: 1, +region=us-west: 2}',
| lease_preferences = '[[+region=us-west], [+region=us-central]]'
Now that we've set up our indexes the way we want them, we need to insert some data. The first statement below inserts 10,000 rows of placeholder data; the second inserts a row with a specific UUID string that we'll later query against to check which index is used.
On a freshly created cluster like this one, you may need to wait a moment after adding the data to give automatic statistics time to update. Then, the optimizer can generate a query plan that uses the expected index.
> INSERT
INTO
token (token_id, access_token, refresh_token)
SELECT
gen_random_uuid()::STRING,
gen_random_uuid()::STRING,
gen_random_uuid()::STRING
FROM
generate_series(1, 10000);
> INSERT
INTO
token (token_id, access_token, refresh_token)
VALUES
(
'2E1B5BFE-6152-11E9-B9FD-A7E0F13211D9',
'49E36152-6152-11E9-8CDC-3682F23211D9',
'4E0E91B6-6152-11E9-BAC1-3782F23211D9'
);
Finally, we EXPLAIN
a selection query from each node to verify which index is being queried against. For example, when running the query shown below against the us-west
node, we expect it to use the token_id_west_idx
index.
$ cockroach sql --insecure --host=localhost --port=26259 --database=auth # "West" node
> EXPLAIN
SELECT
access_token, refresh_token
FROM
token
WHERE
token_id = '2E1B5BFE-6152-11E9-B9FD-A7E0F13211D9';
tree | field | description
+-----------+-------+-------------------------------------------------------------------------------------------+
render | |
└── scan | |
| table | token@token_id_west_idx
| spans | /"2E1B5BFE-6152-11E9-B9FD-A7E0F13211D9"-/"2E1B5BFE-6152-11E9-B9FD-A7E0F13211D9"/PrefixEnd
(4 rows)
Time: 787µs
Similarly, queries from the us-east
node should use the token_id_east_idx
index (and the same should be true for us-central
).
$ cockroach sql --insecure --host=localhost --port=26257 --database=auth # "East" node
> EXPLAIN
SELECT
access_token, refresh_token
FROM
token
WHERE
token_id = '2E1B5BFE-6152-11E9-B9FD-A7E0F13211D9';
tree | field | description
+-----------+-------+-------------------------------------------------------------------------------------------+
render | |
└── scan | |
| table | token@token_id_east_idx
| spans | /"2E1B5BFE-6152-11E9-B9FD-A7E0F13211D9"-/"2E1B5BFE-6152-11E9-B9FD-A7E0F13211D9"/PrefixEnd
(4 rows)
Time: 619µs
You'll need to make changes to the above configuration to reflect your production environment, but the concepts will be the same.