Indexes improve your database's performance by helping SQL locate data without having to look through every row of a table.
How do indexes work?
When you create an index, CockroachDB "indexes" the columns you specify, which creates a copy of the columns and then sorts their values (without sorting the values in the table itself).
After a column is indexed, SQL can easily filter its values using the index instead of scanning each row one-by-one. On large tables, this greatly reduces the number of rows SQL has to use, executing queries exponentially faster.
For example, if you index an INT
column and then filter it WHERE <indexed column> = 10
, SQL can use the index to find values starting at 10 but less than 11. In contrast, without an index, SQL would have to evaluate every row in the table for values equaling 10. This is also known as a "full table scan", and it can be very bad for query performance.
You can also create an index on a subset of rows. This type of index is called a partial index. For more information, see Partial indexes.
To index spatial data, CockroachDB uses spatial indexes. For more information about spatial indexes, see Spatial Indexes.
New in v21.1:
In multi-region deployments, most users should use REGIONAL BY ROW
tables instead of explicit index partitioning. When you add an index to a REGIONAL BY ROW
table, it is automatically partitioned on the crdb_region
column. Explicit index partitioning is not required.
While CockroachDB process an ADD REGION
or DROP REGION
statement on a particular database, creating or modifying an index will throw an error. Similarly, all ADD REGION
and DROP REGION
statements will be blocked while an index is being modified on a REGIONAL BY ROW
table within the same database.
Creation
Each table automatically has an index created called primary
, which indexes either its primary key or—if there is no primary key—a unique value for each row known as rowid
. We recommend always defining a primary key because the index it creates provides much better performance than letting CockroachDB use rowid
.
The primary
index helps filter a table's primary key but doesn't help SQL find values in any other columns. However, you can use secondary indexes to improve the performance of queries using columns not in a table's primary key. You can create them:
- At the same time as the table with the
INDEX
clause ofCREATE TABLE
. In addition to explicitly defined indexes, CockroachDB automatically creates secondary indexes for columns with theUNIQUE
constraint. - For existing tables with
CREATE INDEX
. - By applying the
UNIQUE
constraint to columns withALTER TABLE
, which automatically creates an index of the constrained columns.
To create the most useful secondary indexes, you should also check out our best practices.
Selection
Because each query can use only a single index, CockroachDB selects the index it calculates will scan the fewest rows (i.e., the fastest). For more detail, check out our blog post Index Selection in CockroachDB, which will show you how to use the EXPLAIN
statement for your query to see which index is being used.
To override CockroachDB's index selection, you can also force queries to use a specific index (also known as "index hinting"). Index hinting is supported for SELECT
, DELETE
, and UPDATE
statements.
Storage
CockroachDB stores indexes directly in your key-value store. You can find more information in our blog post Mapping Table Data to Key-Value Storage.
Locking
Tables are not locked during index creation thanks to CockroachDB's schema change procedure.
Performance
Indexes create a trade-off: they greatly improve the speed of queries, but may slightly slow down writes to an affected column (because new values have to be written for both the table and the index).
To maximize your indexes' performance, we recommend following the secondary index best practices.
For more information about how to tune CockroachDB's performance, see SQL Performance Best Practices.
Storing columns
The STORING
clause specifies columns which are not part of the index key but should be stored in the index. This optimizes queries which retrieve those columns without filtering on them, because it prevents the need to read the primary index.
An index that stores all the columns needed by a query is also known as a covering index for that query. When a query has a covering index, CockroachDB can use that index directly instead of doing a join with the primary key, which is likely to be slower.
Note that the synonym COVERING
is also supported.
For example, say we have a table with three columns, two of which are indexed:
> CREATE TABLE tbl (col1 INT, col2 INT, col3 INT, INDEX (col1, col2));
If we filter on the indexed columns but retrieve the unindexed column, this requires reading col3
from the primary index via an "index join."
> EXPLAIN SELECT col3 FROM tbl WHERE col1 = 10 AND col2 > 1;
tree | field | description
-------------+---------------+------------------------
| distribution | local
| vectorized | false
index join | |
│ | table | tbl@primary
└── scan | |
| missing stats |
| table | tbl@tbl_col1_col2_idx
| spans | [/10/2 - /10]
(8 rows)
However, if we store col3
in the index, the index join is no longer necessary. This means our query only needs to read from the secondary index, so it will be more efficient.
> CREATE TABLE tbl (col1 INT, col2 INT, col3 INT, INDEX (col1, col2) STORING (col3));
> EXPLAIN SELECT col3 FROM tbl WHERE col1 = 10 AND col2 > 1;
tree | field | description
-------+---------------+------------------------
| distribution | local
| vectorized | false
scan | |
| missing stats |
| table | tbl@tbl_col1_col2_idx
| spans | [/10/2 - /10]
(6 rows)
Best practices
For best practices, see Add a Secondary Index: Best Practices
Indexes on REGIONAL BY ROW
tables in multi-region databases
New in v21.1:
In multi-region deployments, most users should use REGIONAL BY ROW
tables instead of explicit index partitioning. When you add an index to a REGIONAL BY ROW
table, it is automatically partitioned on the crdb_region
column. Explicit index partitioning is not required.
While CockroachDB process an ADD REGION
or DROP REGION
statement on a particular database, creating or modifying an index will throw an error. Similarly, all ADD REGION
and DROP REGION
statements will be blocked while an index is being modified on a REGIONAL BY ROW
table within the same database.
Note that the behavior described above also applies to GIN indexes.
For an example that uses unique indexes but applies to all indexes on REGIONAL BY ROW
tables, see Add a unique index to a REGIONAL BY ROW
table.