Cost-Based Optimizer

On this page Carat arrow pointing down
Warning:
As of November 12, 2021, CockroachDB v20.1 is no longer supported. For more details, refer to the Release Support Policy.

The cost-based optimizer seeks the lowest cost for a query, usually related to time.

How is cost calculated?

A given SQL query can have thousands of equivalent query plans with vastly different execution times. The cost-based optimizer enumerates these plans and chooses the lowest cost plan.

Cost is roughly calculated by:

  • Estimating how much time each node in the query plan will use to process all results
  • Modeling how data flows through the query plan

The most important factor in determining the quality of a plan is cardinality (i.e., the number of rows); the fewer rows each SQL operator needs to process, the faster the query will run.

Table statistics

The cost-based optimizer can often find more performant query plans if it has access to statistical data on the contents of your tables. This data needs to be generated from scratch for new tables, and regenerated periodically for existing tables.

By default, CockroachDB generates table statistics automatically when tables are created, and as they are updated. It does this using a background job that automatically determines which columns to get statistics on — specifically, it chooses:

  • Columns that are part of the primary key or an index (in other words, all indexed columns).
  • Up to 100 non-indexed columns.
Note:

Schema changes trigger automatic statistics collection for the affected table(s).

Controlling automatic statistics

For best query performance, most users should leave automatic statistics enabled with the default settings. The information provided in this section is useful for troubleshooting or performance tuning by advanced users.

Controlling statistics refresh rate

Statistics are refreshed in the following cases:

  1. When there are no statistics.
  2. When it's been a long time since the last refresh, where "long time" is defined according to a moving average of the time across the last several refreshes.
  3. After each mutation operation (INSERT, UPDATE, or DELETE), the probability of a refresh is calculated using a formula that takes the cluster settings shown below as inputs. These settings define the target number of rows in a table that should be stale before statistics on that table are refreshed. Increasing either setting will reduce the frequency of refreshes. In particular, min_stale_rows impacts the frequency of refreshes for small tables, while fraction_stale_rows has more of an impact on larger tables.
Setting Default Value Details
sql.stats.automatic_collection.fraction_stale_rows 0.2 Target fraction of stale rows per table that will trigger a statistics refresh
sql.stats.automatic_collection.min_stale_rows 500 Target minimum number of stale rows per table that will trigger a statistics refresh
Note:

Because the formula for statistics refreshes is probabilistic, you should not expect to see statistics update immediately after changing these settings, or immediately after exactly 500 rows have been updated.

Turning off statistics

If you need to turn off automatic statistics collection, follow the steps below:

  1. Run the following statement to disable the automatic statistics cluster setting:

    icon/buttons/copy
    > SET CLUSTER SETTING sql.stats.automatic_collection.enabled = false;
    
  2. Use the SHOW STATISTICS statement to view automatically generated statistics.

  3. Delete the automatically generated statistics using the following statement:

    icon/buttons/copy
    > DELETE FROM system.table_statistics WHERE true;
    
  4. Restart the nodes in your cluster to clear the statistics caches.

For instructions showing how to manually generate statistics, see the examples in the CREATE STATISTICS documentation.

Controlling histogram collection

By default, the optimizer collects histograms for all index columns (specifically the first column in each index) during automatic statistics collection. If a single column statistic is explicitly requested using manual invocation of CREATE STATISTICS, a histogram will be collected, regardless of whether or not the column is part of an index.

Note:

CockroachDB does not support histograms on ARRAY-typed columns. As a result, statistics created on ARRAY-typed columns do not include histograms.

If you are an advanced user and need to disable histogram collection for troubleshooting or performance tuning reasons, change the sql.stats.histogram_collection.enabled cluster setting by running SET CLUSTER SETTING as follows:

icon/buttons/copy
SET CLUSTER SETTING sql.stats.histogram_collection.enabled = false;
Note:

When sql.stats.histogram_collection.enabled is set to false, histograms are never collected, either as part of automatic statistics collection or by manual invocation of CREATE STATISTICS.

Query plan cache

CockroachDB uses a cache for the query plans generated by the optimizer. This can lead to faster query execution since the database can reuse a query plan that was previously calculated, rather than computing a new plan each time a query is executed.

The query plan cache is enabled by default. To disable it, execute the following statement:

icon/buttons/copy
> SET CLUSTER SETTING sql.query_cache.enabled = false;

Finally, note that only the following statements use the plan cache:

Join reordering

The cost-based optimizer will explore additional join orderings in an attempt to find the lowest-cost execution plan for a query involving multiple joins, which can lead to significantly better performance in some cases.

Because this process leads to an exponential increase in the number of possible execution plans for such queries, it's only used to reorder subtrees containing 4 or fewer joins by default.

To change this setting, which is controlled by the reorder_joins_limit session variable, run the statement shown below. To disable this feature, set the variable to 0.

icon/buttons/copy
> SET reorder_joins_limit = 6;
Warning:

We strongly recommend not setting this value higher than 8 to avoid performance degradation. If set too high, the cost of generating and costing execution plans can end up dominating the total execution time of the query.

For more information about the difficulty of selecting an optimal join ordering, see our blog post An Introduction to Join Ordering.

Join hints

The optimizer supports hint syntax to force the use of a specific join algorithm. The algorithm is specified between the join type (INNER, LEFT, etc.) and the JOIN keyword, for example:

  • INNER HASH JOIN
  • OUTER MERGE JOIN
  • LEFT LOOKUP JOIN
  • CROSS HASH JOIN

Note that the hint cannot be specified with a bare hint keyword (e.g., MERGE) - in that case, the INNER keyword must be added. For example, a INNER MERGE JOIN b will work, but a MERGE JOIN b will not work.

Note:

Join hints cannot be specified with a bare hint keyword (e.g., MERGE) due to SQL's implicit AS syntax. If you're not careful, you can make MERGE be an alias for a table; for example, a MERGE JOIN b will be interpreted as having an implicit AS and be executed as a AS MERGE JOIN b, which is just a long way of saying a JOIN b. Because the resulting query might execute without returning any hint-related error (because it is valid SQL), it will seem like the join hint "worked", but actually it didn't affect which join algorithm was used. In this case, the correct syntax is a INNER MERGE JOIN b.

Supported join algorithms

  • HASH: Forces a hash join; in other words, it disables merge and lookup joins. A hash join is always possible, even if there are no equality columns - CockroachDB considers the nested loop join with no index a degenerate case of the hash join (i.e., a hash table with one bucket).

  • MERGE: Forces a merge join, even if it requires re-sorting both sides of the join.

  • LOOKUP: Forces a lookup join into the right side; the right side must be a table with a suitable index. Note that LOOKUP can only be used with INNER and LEFT joins.

If it is not possible to use the algorithm specified in the hint, an error is signaled.

Additional considerations

  • This syntax is consistent with the SQL Server syntax for join hints, except that:

    • SQL Server uses LOOP instead of LOOKUP.
    • CockroachDB does not support LOOP and instead supports LOOKUP for the specific case of nested loop joins with an index.
  • When a join hint is specified, the two tables will not be reordered by the optimizer. The reordering behavior has the following characteristics, which can be affected by hints:

    • Given a JOIN b, CockroachDB will not try to commute to b JOIN a. This means that you will need to pay attention to this ordering, which is especially important for lookup joins. Without a hint, a JOIN b might be executed as b INNER LOOKUP JOIN a using an index into a, whereas a INNER LOOKUP JOIN b requires an index into b.
    • (a JOIN b) JOIN c might be changed to a JOIN (b JOIN c), but this does not happen if a JOIN b uses a hint; the hint forces that particular join to happen as written in the query.
  • Hint usage should be reconsidered with each new release of CockroachDB. Due to improvements in the optimizer, hints specified to work with an older version may cause decreased performance in a newer version.

Preferring the nearest index

Given multiple identical indexes that have different locality constraints using replication zones, the optimizer will prefer the index that is closest to the gateway node that is planning the query. In a properly configured geo-distributed cluster, this can lead to performance improvements due to improved data locality and reduced network traffic.

Note:

This feature is only available to users with an enterprise license. For insight into how to use this feature to get low latency, consistent reads in multi-region deployments, see the Duplicate Indexes topology pattern.

This feature enables scenarios such as:

  • Reference data such as a table of postal codes that can be replicated to different regions, and queries will use the copy in the same region. See Example - zone constraints for more details.
  • Optimizing for local reads (potentially at the expense of writes) by adding leaseholder preferences to your zone configuration. See Example - leaseholder preferences for more details.

To take advantage of this feature, you need to:

  1. Have an enterprise license.
  2. Determine which data consists of reference tables that are rarely updated (such as postal codes) and can therefore be easily replicated to different regions.
  3. Create multiple secondary indexes on the reference tables. Note that these indexes must include (in key or using STORED) every column that you wish to query. For example, if you run SELECT * from db.table and not every column of db.table is in the set of secondary indexes you created, the optimizer will have no choice but to fall back to the primary index.
  4. Create replication zones for each index.

With the above pieces in place, the optimizer will automatically choose the index nearest the gateway node that is planning the query.

Note:

The optimizer does not actually understand geographic locations, i.e., the relative closeness of the gateway node to other nodes that are located to its "east" or "west". It is matching against the node locality constraints you provided when you configured your replication zones.

Examples

Zone constraints

We can demonstrate the necessary configuration steps using a local cluster. The instructions below assume that you are already familiar with:

First, start 3 local nodes as shown below. Use the --locality flag to put them each in a different region as denoted by region=usa, region=eu, etc.

icon/buttons/copy
$ cockroach start --locality=region=usa  --insecure --store=/tmp/node0 --listen-addr=localhost:26257 \
  --http-port=8888  --join=localhost:26257,localhost:26258,localhost:26259 --background
icon/buttons/copy
$ cockroach start --locality=region=eu   --insecure --store=/tmp/node1 --listen-addr=localhost:26258 \
  --http-port=8889  --join=localhost:26257,localhost:26258,localhost:26259 --background
icon/buttons/copy
$ cockroach start --locality=region=apac --insecure --store=/tmp/node2 --listen-addr=localhost:26259 \
  --http-port=8890  --join=localhost:26257,localhost:26258,localhost:26259 --background
icon/buttons/copy
$ cockroach init --insecure --host=localhost --port=26257

Next, from the SQL client, add your organization name and enterprise license:

icon/buttons/copy
$ cockroach sql --insecure --host=localhost --port=26257
icon/buttons/copy
> SET CLUSTER SETTING cluster.organization = 'FooCorp - Local Testing';
icon/buttons/copy
> SET CLUSTER SETTING enterprise.license = 'xxxxx';

Create a test database and table. The table will have 3 indexes into the same data. Later, we'll configure the cluster to associate each of these indexes with a different datacenter using replication zones.

icon/buttons/copy
> CREATE DATABASE IF NOT EXISTS test;
icon/buttons/copy
> USE test;
icon/buttons/copy
CREATE TABLE postal_codes (
    id INT PRIMARY KEY,
    code STRING,
    INDEX idx_eu (id) STORING (code),
    INDEX idx_apac (id) STORING (code)
);

Next, we modify the replication zone configuration via SQL so that:

  • Nodes in the USA will use the primary key index.
  • Nodes in the EU will use the postal_codes@idx_eu index (which is identical to the primary key index).
  • Nodes in APAC will use the postal_codes@idx_apac index (which is also identical to the primary key index).
icon/buttons/copy
ALTER TABLE postal_codes CONFIGURE ZONE USING constraints='["+region=usa"]';
icon/buttons/copy
ALTER INDEX postal_codes@idx_eu CONFIGURE ZONE USING constraints='["+region=eu"]';
icon/buttons/copy
ALTER INDEX postal_codes@idx_apac CONFIGURE ZONE USING constraints='["+region=apac"]';

To verify this feature is working as expected, we'll query the database from each of our local nodes as shown below. Each node has been configured to be in a different region, and it should now be using the index pinned to that region.

Note:

In a geo-distributed scenario with a cluster that spans multiple datacenters, it may take time for the optimizer to fetch schemas from other nodes the first time a query is planned; thereafter, the schema should be cached locally.

For example, if you have 11 nodes, you may see 11 queries with high latency due to schema cache misses. Once all nodes have cached the schema locally, the latencies will drop.

This behavior may also cause the Statements page of the Web UI to show misleadingly high latencies until schemas are cached locally.

As expected, the node in the USA region uses the primary key index.

icon/buttons/copy
$ cockroach sql --insecure --host=localhost --port=26257 --database=test -e 'EXPLAIN SELECT * FROM postal_codes WHERE id=1;'
  tree |    field    |     description
-------+-------------+-----------------------
       | distributed | false
       | vectorized  | false
  scan |             |
       | table       | postal_codes@primary
       | spans       | /1-/1/#
(5 rows)

As expected, the node in the EU uses the idx_eu index.

icon/buttons/copy
$ cockroach sql --insecure --host=localhost --port=26258 --database=test -e 'EXPLAIN SELECT * FROM postal_codes WHERE id=1;'
  tree |    field    |     description
-------+-------------+----------------------
       | distributed | false
       | vectorized  | false
  scan |             |
       | table       | postal_codes@idx_eu
       | spans       | /1-/2
(5 rows)

As expected, the node in APAC uses the idx_apac index.

icon/buttons/copy
$ cockroach sql --insecure --host=localhost --port=26259 --database=test -e 'EXPLAIN SELECT * FROM postal_codes WHERE id=1;'
  tree |    field    |      description
-------+-------------+------------------------
       | distributed | false
       | vectorized  | false
  scan |             |
       | table       | postal_codes@idx_apac
       | spans       | /1-/2
(5 rows)

You'll need to make changes to the above configuration to reflect your production environment, but the concepts will be the same.

Leaseholder preferences

If you provide leaseholder preferences in addition to replication zone constraints, the optimizer will attempt to take your leaseholder preferences into account as well when selecting an index for your query. There are several factors to keep in mind:

  • Zone constraints are always respected (hard constraint), whereas lease preferences are taken into account as "additional information" -- as long as they do not contradict the zone constraints.

  • The optimizer does not consider the real-time location of leaseholders when selecting an index; it is pattern matching on the text values passed in the configuration (e.g., the ALTER INDEX statements shown below). For the same reason, the optimizer only matches against the first locality in your lease_preferences array.

  • The optimizer may use an index that satisfies your leaseholder preferences even though that index has moved to a different node/region due to leaseholder rebalancing. This can cause slower performance than you expected. Therefore, you should only use this feature if you’re confident you know where the leaseholders will end up based on your cluster's usage patterns. We recommend thoroughly testing your configuration to ensure the optimizer is selecting the index(es) you expect.

In this example, we'll set up an authentication service using the access token / refresh token pattern from OAuth 2. To support fast local reads in our geo-distributed use case, we will have 3 indexes into the same authentication data: one for each region of our cluster. We configure each index using zone configurations and lease preferences so that the optimizer will use the local index for better performance.

The instructions below assume that you are already familiar with:

First, start 3 local nodes as shown below. Use the --locality flag to put them each in a different region.

icon/buttons/copy
$ cockroach start --locality=region=us-east  --insecure --store=/tmp/node0 --listen-addr=localhost:26257 \
  --http-port=8888  --join=localhost:26257,localhost:26258,localhost:26259 --background
icon/buttons/copy
$ cockroach start --locality=region=us-central   --insecure --store=/tmp/node1 --listen-addr=localhost:26258 \
  --http-port=8889  --join=localhost:26257,localhost:26258,localhost:26259 --background
icon/buttons/copy
$ cockroach start --locality=region=us-west --insecure --store=/tmp/node2 --listen-addr=localhost:26259 \
  --http-port=8890  --join=localhost:26257,localhost:26258,localhost:26259 --background
icon/buttons/copy
$ cockroach init --insecure --host=localhost --port=26257

From the SQL client, add your organization name and enterprise license:

icon/buttons/copy
$ cockroach sql --insecure --host=localhost --port=26257
icon/buttons/copy
> SET CLUSTER SETTING cluster.organization = 'FooCorp - Local Testing';
icon/buttons/copy
> SET CLUSTER SETTING enterprise.license = 'xxxxx';

Create an authentication database and table:

icon/buttons/copy
> CREATE DATABASE if NOT EXISTS auth;
icon/buttons/copy
> USE auth;
icon/buttons/copy
> CREATE TABLE token (
    token_id VARCHAR(100) NULL,
    access_token VARCHAR(4000) NULL,
    refresh_token VARCHAR(4000) NULL
  );

Create the indexes for each region:

icon/buttons/copy
> CREATE INDEX token_id_west_idx ON token (token_id) STORING (access_token, refresh_token);
icon/buttons/copy
> CREATE INDEX token_id_central_idx ON token (token_id) STORING (access_token, refresh_token);
icon/buttons/copy
> CREATE INDEX token_id_east_idx ON token (token_id) STORING (access_token, refresh_token);

Enter zone configurations to distribute replicas across the cluster as follows:

  • For the "East" index, store 2 replicas in the East, 2 in Central, and 1 in the West. Further, prefer that the leaseholders for that index live in the East or, failing that, in the Central region.
  • Follow the same replica and leaseholder patterns for each of the Central and West regions.

The idea is that, for example, token_id_east_idx will have sufficient replicas (2/5) so that even if one replica goes down, the leaseholder will stay in the East region. That way, if a query comes in that accesses the columns covered by that index from the East gateway node, the optimizer will select token_id_east_idx for fast reads.

Note:

The ALTER TABLE statement below is not required since it's later made redundant by the token_id_west_idx index. In production, you might go with the ALTER TABLE to put your table's lease preferences in the West, and then create only 2 indexes (for East and Central); however, the use of 3 indexes makes the example easier to understand.

icon/buttons/copy
> ALTER TABLE token CONFIGURE ZONE USING
        num_replicas = 5, constraints = '{+region=us-east: 1, +region=us-central: 2, +region=us-west: 2}', lease_preferences = '[[+region=us-west], [+region=us-central]]';
icon/buttons/copy
> ALTER INDEX token_id_east_idx CONFIGURE ZONE USING num_replicas = 5,
        constraints = '{+region=us-east: 2, +region=us-central: 2, +region=us-west: 1}', lease_preferences = '[[+region=us-east], [+region=us-central]]';
icon/buttons/copy
> ALTER INDEX token_id_central_idx CONFIGURE ZONE USING num_replicas = 5,
        constraints = '{+region=us-east: 2, +region=us-central: 2, +region=us-west: 1}', lease_preferences = '[[+region=us-central], [+region=us-east]]';
icon/buttons/copy
> ALTER INDEX token_id_west_idx CONFIGURE ZONE USING num_replicas = 5,
        constraints = '{+region=us-west: 2, +region=us-central: 2, +region=us-east: 1}', lease_preferences = '[[+region=us-west], [+region=us-central]]';

Next let's check our zone configurations to make sure they match our expectation:

icon/buttons/copy
> SHOW ZONE CONFIGURATIONS;

The output should include the following:

TABLE auth.public.token                      | ALTER TABLE auth.public.token CONFIGURE ZONE USING
                                             |     num_replicas = 5,
                                             |     constraints = '{+region=us-central: 2, +region=us-east: 1, +region=us-west: 2}',
                                             |     lease_preferences = '[[+region=us-west], [+region=us-central]]'
INDEX auth.public.token@token_id_east_idx    | ALTER INDEX auth.public.token@token_id_east_idx CONFIGURE ZONE USING
                                             |     num_replicas = 5,
                                             |     constraints = '{+region=us-central: 2, +region=us-east: 2, +region=us-west: 1}',
                                             |     lease_preferences = '[[+region=us-east], [+region=us-central]]'
INDEX auth.public.token@token_id_central_idx | ALTER INDEX auth.public.token@token_id_central_idx CONFIGURE ZONE USING
                                             |     num_replicas = 5,
                                             |     constraints = '{+region=us-central: 2, +region=us-east: 2, +region=us-west: 1}',
                                             |     lease_preferences = '[[+region=us-central], [+region=us-east]]'
INDEX auth.public.token@token_id_west_idx    | ALTER INDEX auth.public.token@token_id_west_idx CONFIGURE ZONE USING
                                             |     num_replicas = 5,
                                             |     constraints = '{+region=us-central: 2, +region=us-east: 1, +region=us-west: 2}',
                                             |     lease_preferences = '[[+region=us-west], [+region=us-central]]'

Now that we've set up our indexes the way we want them, we need to insert some data. The first statement below inserts 10,000 rows of placeholder data; the second inserts a row with a specific UUID string that we'll later query against to check which index is used.

Note:

On a freshly created cluster like this one, you may need to wait a moment after adding the data to give automatic statistics time to update. Then, the optimizer can generate a query plan that uses the expected index.

icon/buttons/copy
> INSERT
  INTO
      token (token_id, access_token, refresh_token)
  SELECT
      gen_random_uuid()::STRING,
      gen_random_uuid()::STRING,
      gen_random_uuid()::STRING
  FROM
      generate_series(1, 10000);
icon/buttons/copy
> INSERT
  INTO
      token (token_id, access_token, refresh_token)
  VALUES
      (
          '2E1B5BFE-6152-11E9-B9FD-A7E0F13211D9',
          '49E36152-6152-11E9-8CDC-3682F23211D9',
          '4E0E91B6-6152-11E9-BAC1-3782F23211D9'
      );

Finally, we EXPLAIN a selection query from each node to verify which index is being queried against. For example, when running the query shown below against the us-west node, we expect it to use the token_id_west_idx index.

icon/buttons/copy
$ cockroach sql --insecure --host=localhost --port=26259 --database=auth # "West" node
icon/buttons/copy
> EXPLAIN
      SELECT
          access_token, refresh_token
      FROM
          token
      WHERE
          token_id = '2E1B5BFE-6152-11E9-B9FD-A7E0F13211D9';
    tree    |    field    |                                        description
------------+-------------+--------------------------------------------------------------------------------------------
            | distributed | false
            | vectorized  | false
  render    |             |
   └── scan |             |
            | table       | token@token_id_east_idx
            | spans       | /"2E1B5BFE-6152-11E9-B9FD-A7E0F13211D9"-/"2E1B5BFE-6152-11E9-B9FD-A7E0F13211D9"/PrefixEnd
(6 rows)

Similarly, queries from the us-east node should use the token_id_east_idx index (and the same should be true for us-central).

icon/buttons/copy
$ cockroach sql --insecure --host=localhost --port=26257 --database=auth # "East" node
icon/buttons/copy
> EXPLAIN
      SELECT
          access_token, refresh_token
      FROM
          token
      WHERE
          token_id = '2E1B5BFE-6152-11E9-B9FD-A7E0F13211D9';
    tree    |    field    |                                        description
------------+-------------+--------------------------------------------------------------------------------------------
            | distributed | false
            | vectorized  | false
  render    |             |
   └── scan |             |
            | table       | token@token_id_east_idx
            | spans       | /"2E1B5BFE-6152-11E9-B9FD-A7E0F13211D9"-/"2E1B5BFE-6152-11E9-B9FD-A7E0F13211D9"/PrefixEnd
(6 rows)

You'll need to make changes to the above configuration to reflect your production environment, but the concepts will be the same.

See also


Yes No
On this page

Yes No