Use the CREATE STATISTICS
statement to generate table statistics for the cost-based optimizer to use.
Once you create a table and load data into it (e.g., INSERT
, IMPORT
), table statistics can be generated. Table statistics help the cost-based optimizer determine the cardinality of the rows used in each query, which helps to predict more accurate costs.
For compatibility with PostgreSQL, CockroachDB supports the ANALYZE
/ANALYSE
statement as an alias for CREATE STATISTICS
. For syntax, see below.
By default, CockroachDB automatically generates statistics on all indexed columns, and up to 100 non-indexed columns. As a result, most users do not need to issue CREATE STATISTICS
statements directly.
CockroachDB also automatically collects multi-column statistics on columns that prefix each index.
Syntax
Parameters
Parameter | Description |
---|---|
statistics_name |
The name of the set of statistics you are creating. |
opt_stats_columns |
The name of the column(s) you want to create statistics for. |
create_stats_target |
The name of the table you want to create statistics for. |
opt_as_of_clause |
Used to create historical stats using the AS OF SYSTEM TIME clause. For instructions, see Create statistics as of a given time. |
Required privileges
The user must have the CREATE
privilege on the parent database.
Aliases
For PostgreSQL compatibility, CockroachDB supports ANALYZE
and ANALYSE
as aliases for CREATE STATISTICS
.
Alias syntax
Alias parameters
Parameter | Description |
---|---|
analyze_target |
The name of the table for which you want to create statistics. |
Examples
Setup
The following examples use MovR, a fictional vehicle-sharing application, to demonstrate CockroachDB SQL statements. For more information about the MovR example application and dataset, see MovR: A Global Vehicle-sharing App.
To follow along, run cockroach demo
to start a temporary, in-memory cluster with the movr
dataset preloaded:
$ cockroach demo
Create statistics on a single column
> CREATE STATISTICS revenue_stats ON revenue FROM rides;
> SHOW STATISTICS FOR TABLE rides;
statistics_name | column_names | created | row_count | distinct_count | null_count | histogram_id
------------------+---------------------------+----------------------------------+-----------+----------------+------------+---------------------
__auto__ | {city} | 2020-08-26 16:55:24.725089+00:00 | 500 | 9 | 0 | 584550071425531905
__auto__ | {id} | 2020-08-26 16:55:24.725089+00:00 | 500 | 500 | 0 | 584550071432740865
__auto__ | {city,id} | 2020-08-26 16:55:24.725089+00:00 | 500 | 500 | 0 | NULL
__auto__ | {rider_id} | 2020-08-26 16:55:24.725089+00:00 | 500 | 50 | 0 | 584550071446732801
__auto__ | {city,rider_id} | 2020-08-26 16:55:24.725089+00:00 | 500 | 50 | 0 | NULL
__auto__ | {vehicle_city} | 2020-08-26 16:55:24.725089+00:00 | 500 | 9 | 0 | 584550071461019649
__auto__ | {vehicle_id} | 2020-08-26 16:55:24.725089+00:00 | 500 | 15 | 0 | 584550071467966465
__auto__ | {vehicle_city,vehicle_id} | 2020-08-26 16:55:24.725089+00:00 | 500 | 15 | 0 | NULL
__auto__ | {start_address} | 2020-08-26 16:55:24.725089+00:00 | 500 | 500 | 0 | 584550071482122241
__auto__ | {end_address} | 2020-08-26 16:55:24.725089+00:00 | 500 | 500 | 0 | 584550071489167361
__auto__ | {start_time} | 2020-08-26 16:55:24.725089+00:00 | 500 | 30 | 0 | 584550071496671233
__auto__ | {end_time} | 2020-08-26 16:55:24.725089+00:00 | 500 | 367 | 0 | 584550071504437249
__auto__ | {revenue} | 2020-08-26 16:55:24.725089+00:00 | 500 | 100 | 0 | 584550071512137729
revenue_stats | {revenue} | 2020-08-26 16:55:33.986698+00:00 | 500 | 100 | 0 | 584550101775384577
(14 rows)
Statistics are automatically collected for all columns in the rides
table, making the revenue_stats
statistics a duplicate of the statistics automatically collected on the revenue
column.
Create statistics on multiple columns
> CREATE STATISTICS city_revenue_stats ON city, revenue FROM rides;
> SHOW STATISTICS FOR TABLE rides;
statistics_name | column_names | created | row_count | distinct_count | null_count | histogram_id
---------------------+---------------------------+----------------------------------+-----------+----------------+------------+---------------------
__auto__ | {city} | 2020-08-26 16:55:24.725089+00:00 | 500 | 9 | 0 | 584550071425531905
__auto__ | {id} | 2020-08-26 16:55:24.725089+00:00 | 500 | 500 | 0 | 584550071432740865
__auto__ | {city,id} | 2020-08-26 16:55:24.725089+00:00 | 500 | 500 | 0 | NULL
__auto__ | {rider_id} | 2020-08-26 16:55:24.725089+00:00 | 500 | 50 | 0 | 584550071446732801
__auto__ | {city,rider_id} | 2020-08-26 16:55:24.725089+00:00 | 500 | 50 | 0 | NULL
__auto__ | {vehicle_city} | 2020-08-26 16:55:24.725089+00:00 | 500 | 9 | 0 | 584550071461019649
__auto__ | {vehicle_id} | 2020-08-26 16:55:24.725089+00:00 | 500 | 15 | 0 | 584550071467966465
__auto__ | {vehicle_city,vehicle_id} | 2020-08-26 16:55:24.725089+00:00 | 500 | 15 | 0 | NULL
__auto__ | {start_address} | 2020-08-26 16:55:24.725089+00:00 | 500 | 500 | 0 | 584550071482122241
__auto__ | {end_address} | 2020-08-26 16:55:24.725089+00:00 | 500 | 500 | 0 | 584550071489167361
__auto__ | {start_time} | 2020-08-26 16:55:24.725089+00:00 | 500 | 30 | 0 | 584550071496671233
__auto__ | {end_time} | 2020-08-26 16:55:24.725089+00:00 | 500 | 367 | 0 | 584550071504437249
__auto__ | {revenue} | 2020-08-26 16:55:24.725089+00:00 | 500 | 100 | 0 | 584550071512137729
revenue_stats | {revenue} | 2020-08-26 16:55:33.986698+00:00 | 500 | 100 | 0 | 584550101775384577
city_revenue_stats | {city,revenue} | 2020-08-26 16:55:52.539795+00:00 | 500 | 372 | 0 | NULL
(15 rows)
Multi-column statistics are automatically collected for all columns that prefix an index. In this example, city
and revenue
are not an index prefix, making the city_revenue_stats
statistics unique for the table.
Create statistics on a default set of columns
The CREATE STATISTICS
statement shown below automatically figures out which columns to get statistics on.
> CREATE STATISTICS users_stats FROM users;
This statement creates statistics identical to the statistics that CockroachDB creates automatically.
> SHOW STATISTICS FOR TABLE users;
statistics_name | column_names | created | row_count | distinct_count | null_count | histogram_id
------------------+---------------+----------------------------------+-----------+----------------+------------+---------------------
__auto__ | {city} | 2020-08-26 16:55:24.765331+00:00 | 50 | 9 | 0 | 584550071556964353
__auto__ | {id} | 2020-08-26 16:55:24.765331+00:00 | 50 | 50 | 0 | 584550071563976705
__auto__ | {city,id} | 2020-08-26 16:55:24.765331+00:00 | 50 | 50 | 0 | NULL
__auto__ | {name} | 2020-08-26 16:55:24.765331+00:00 | 50 | 49 | 0 | 584550071577477121
__auto__ | {address} | 2020-08-26 16:55:24.765331+00:00 | 50 | 50 | 0 | 584550071583997953
__auto__ | {credit_card} | 2020-08-26 16:55:24.765331+00:00 | 50 | 50 | 0 | 584550071591141377
users_stats | {city} | 2020-08-26 16:56:12.802308+00:00 | 50 | 9 | 0 | 584550228973027329
users_stats | {id} | 2020-08-26 16:56:12.802308+00:00 | 50 | 50 | 0 | 584550228985905153
users_stats | {city,id} | 2020-08-26 16:56:12.802308+00:00 | 50 | 50 | 0 | NULL
users_stats | {name} | 2020-08-26 16:56:12.802308+00:00 | 50 | 49 | 0 | 584550229015625729
users_stats | {address} | 2020-08-26 16:56:12.802308+00:00 | 50 | 50 | 0 | 584550229028765697
users_stats | {credit_card} | 2020-08-26 16:56:12.802308+00:00 | 50 | 50 | 0 | 584550229043937281
(12 rows)
Create statistics as of a given time
To create statistics as of a given time (in this example, 1 minute ago to avoid interfering with the production workload), run a statement like the following:
> CREATE STATISTICS vehicle_stats_1 FROM vehicles AS OF SYSTEM TIME '-1m';
For more information about how the AS OF SYSTEM TIME
clause works, including supported time formats, see AS OF SYSTEM TIME
.
Delete statistics
To delete statistics for all tables in all databases:
DELETE FROM system.table_statistics WHERE true;
To delete a named set of statistics (e.g, one named "users_stats"), run a query like the following:
DELETE FROM system.table_statistics WHERE name = 'users_stats';
For more information about the DELETE
statement, see DELETE
.
View statistics jobs
Every time the CREATE STATISTICS
statement is executed, it kicks off a background job. This is true for queries issued by your application as well as queries issued by the automatic stats feature.
To view statistics jobs, there are two options:
Use
SHOW JOBS
to see all statistics jobs that were created by user queries (i.e., someone enteringCREATE STATISTICS
at the SQL prompt or via application code):> SELECT * FROM [SHOW JOBS] WHERE job_type LIKE '%CREATE STATS%';
job_id | job_type | description | statement | user_name | status | running_status | created | started | finished | modified | fraction_completed | error | coordinator_id ---------------------+--------------+--------------------------------------------------------------------------------------------------+-----------+-----------+-----------+----------------+----------------------------------+----------------------------------+----------------------------------+----------------------------------+--------------------+-------+----------------- 584550101732950017 | CREATE STATS | CREATE STATISTICS revenue_stats ON revenue FROM movr.public.rides | | root | succeeded | NULL | 2020-08-26 16:55:33.976113+00:00 | 2020-08-26 16:55:33.979043+00:00 | 2020-08-26 16:55:33.990197+00:00 | 2020-08-26 16:55:33.989405+00:00 | 1 | | NULL 584550162508382209 | CREATE STATS | CREATE STATISTICS city_revenue_stats ON city, revenue FROM movr.public.rides | | root | succeeded | NULL | 2020-08-26 16:55:52.523299+00:00 | 2020-08-26 16:55:52.527194+00:00 | 2020-08-26 16:55:52.544301+00:00 | 2020-08-26 16:55:52.543148+00:00 | 1 | | NULL 584550228891500545 | CREATE STATS | CREATE STATISTICS users_stats FROM movr.public.users | | root | succeeded | NULL | 2020-08-26 16:56:12.781808+00:00 | 2020-08-26 16:56:12.789111+00:00 | 2020-08-26 16:56:12.830659+00:00 | 2020-08-26 16:56:12.82907+00:00 | 1 | | NULL 584550307147874305 | CREATE STATS | CREATE STATISTICS vehicle_stats_1 FROM movr.public.vehicles WITH OPTIONS AS OF SYSTEM TIME '-1m' | | root | succeeded | NULL | 2020-08-26 16:56:36.663773+00:00 | 2020-08-26 16:56:36.668101+00:00 | 2020-08-26 16:56:36.705743+00:00 | 2020-08-26 16:56:36.704696+00:00 | 1 | | NULL (5 rows)
Use
SHOW AUTOMATIC JOBS
to see statistics jobs that were created by the automatic statistics feature:> SELECT * FROM [SHOW AUTOMATIC JOBS] WHERE job_type LIKE '%CREATE STATS%';
job_id | job_type | description | statement | user_name | status | running_status | created | started | finished | modified | fraction_completed | error | coordinator_id ---------------------+-------------------+---------------------------------------------------------------------+-------------------------------------------------------------------------------------------+-----------+-----------+----------------+----------------------------------+----------------------------------+----------------------------------+----------------------------------+--------------------+-------+----------------- 584550071026876417 | AUTO CREATE STATS | Table statistics refresh for movr.public.user_promo_codes | CREATE STATISTICS __auto__ FROM [58] WITH OPTIONS THROTTLING 0.9 AS OF SYSTEM TIME '-30s' | root | succeeded | NULL | 2020-08-26 16:55:24.605364+00:00 | 2020-08-26 16:55:24.608296+00:00 | 2020-08-26 16:55:24.632626+00:00 | 2020-08-26 16:55:24.631635+00:00 | 1 | | NULL 584550071124131841 | AUTO CREATE STATS | Table statistics refresh for movr.public.vehicle_location_histories | CREATE STATISTICS __auto__ FROM [56] WITH OPTIONS THROTTLING 0.9 AS OF SYSTEM TIME '-30s' | root | succeeded | NULL | 2020-08-26 16:55:24.635051+00:00 | 2020-08-26 16:55:24.636861+00:00 | 2020-08-26 16:55:24.672699+00:00 | 2020-08-26 16:55:24.671777+00:00 | 1 | | NULL 584550071255498753 | AUTO CREATE STATS | Table statistics refresh for movr.public.promo_codes | CREATE STATISTICS __auto__ FROM [57] WITH OPTIONS THROTTLING 0.9 AS OF SYSTEM TIME '-30s' | root | succeeded | NULL | 2020-08-26 16:55:24.675136+00:00 | 2020-08-26 16:55:24.677263+00:00 | 2020-08-26 16:55:24.709619+00:00 | 2020-08-26 16:55:24.708881+00:00 | 1 | | NULL 584550071376281601 | AUTO CREATE STATS | Table statistics refresh for movr.public.rides | CREATE STATISTICS __auto__ FROM [55] WITH OPTIONS THROTTLING 0.9 AS OF SYSTEM TIME '-30s' | root | succeeded | NULL | 2020-08-26 16:55:24.712003+00:00 | 2020-08-26 16:55:24.713674+00:00 | 2020-08-26 16:55:24.754449+00:00 | 2020-08-26 16:55:24.753735+00:00 | 1 | | NULL 584550071523082241 | AUTO CREATE STATS | Table statistics refresh for movr.public.users | CREATE STATISTICS __auto__ FROM [53] WITH OPTIONS THROTTLING 0.9 AS OF SYSTEM TIME '-30s' | root | succeeded | NULL | 2020-08-26 16:55:24.756802+00:00 | 2020-08-26 16:55:24.758638+00:00 | 2020-08-26 16:55:24.77889+00:00 | 2020-08-26 16:55:24.777899+00:00 | 1 | | NULL 584550071604314113 | AUTO CREATE STATS | Table statistics refresh for movr.public.vehicles | CREATE STATISTICS __auto__ FROM [54] WITH OPTIONS THROTTLING 0.9 AS OF SYSTEM TIME '-30s' | root | succeeded | NULL | 2020-08-26 16:55:24.781594+00:00 | 2020-08-26 16:55:24.783519+00:00 | 2020-08-26 16:55:24.815104+00:00 | 2020-08-26 16:55:24.814103+00:00 | 1 | | NULL (6 rows)