Indexes improve your database's performance by helping SQL locate data without having to look through every row of a table.
How Do Indexes Work?
When you create an index, CockroachDB "indexes" the columns you specify, which creates a copy of the columns and then sorts their values (without sorting the values in the table itself).
After a column is indexed, SQL can easily filter its values using the index instead of scanning each row one-by-one. On large tables, this greatly reduces the number of rows SQL has to use, executing queries exponentially faster.
For example, if you index an INT
column and then filter it WHERE <indexed column> = 10
, SQL can use the index to find values starting at 10 but less than 11. In contrast, without an index, SQL would have to evaluate every row in the column for values equaling 10.
Creation
Each table automatically has an index created called primary
, which indexes either its primary key or—if there is no primary key—a unique value for each row known as rowid
. We recommend always defining a primary key because the index it creates provides much better performance than letting CockroachDB use rowid
.
The primary
index helps filter a table's primary key but doesn't help SQL find values in any other columns. However, you can improve the performance of queries using columns besides the primary key with secondary indexes. You can create them:
- At the same time as the table with the
INDEX
clause ofCREATE TABLE
. In addition to explicitly defined indexes, CockroachDB automatically creates secondary indexes for columns with the Unique constraint. - For existing tables with
CREATE INDEX
. - By applying the Unique constraint to columns with
ALTER TABLE
, which automatically creates an index of the constrained columns.
To create the most useful secondary indexes, you should also check out our best practices.
Selection
Because each query can use only a single index, CockroachDB selects the index it calculates will scan the fewest rows (i.e., the fastest). For more detail, check out our blog post Index Selection in CockroachDB.
To override CockroachDB's index selection, you can also force queries to use a specific index (also known as "index hinting").
Storage
CockroachDB stores indexes directly in your key-value store. You can find more information in our blog post Mapping Table Data to Key-Value Storage.
Locking
Tables are not locked during index creation thanks to CockroachDB's schema change procedure.
Performance
Indexes create a trade-off: they greatly improve the speed of queries, but slightly slow down writes (because new values have to be copied and sorted). The first index you create has the largest impact, but additional indexes only introduce marginal overhead.
To maximize your indexes' performance, we recommend following a few best practices.
Best Practices
We recommend creating indexes for all of your common queries. To design the most useful indexes, look at each query's WHERE
and SELECT
clauses, and create indexes that:
- Index all columns in the
WHERE
clause. - Store columns that are only in the
SELECT
clause.
Indexing Columns
When designing indexes, it's important to consider which columns you index and the order you list them. Here are a few guidelines to help you make the best choices:
- Each table's primary key (which we recommend always defining) is automatically indexed. The index it creates (called
primary
) cannot be changed, nor can you change the primary key of a table after it's been created, so this is a critical decision for every table. - Queries can benefit from an index even if they only filter a prefix of its columns. For example, if you create an index of columns
(A, B, C)
, queries filtering(A)
or(A, B)
can still use the index. However, queries that do not filter(A)
will not benefit from the index.
This feature also lets you avoid using single-column indexes. Instead, use the column as the first column in a multiple-column index, which is useful to more queries. - Columns filtered in the
WHERE
clause with the equality operators (=
orIN
) should come first in the index, before those referenced with inequality operators (<
,>
). - Indexes of the same columns in different orders can produce different results for each query. For more information, see our blog post on index selection—specifically the section "Restricting the search space."
Storing Columns
The STORING
clause specifies columns which are not part of the index key but should be stored in the index. This optimizes queries which retrieve those columns without filtering on them, because it prevents the need to read the primary index.
Example
Say we have a table with three columns, two of which are indexed:
> CREATE TABLE tbl (col1 INT, col2 INT, col3 INT, INDEX (col1, col2));
If we filter on the indexed columns but retrieve the unindexed column, this requires reading col3
from the primary index via an "index join."
> EXPLAIN SELECT col3 FROM tbl WHERE col1 = 10 AND col2 > 1;
tree | field | description
+-----------------+-------------+-----------------------+
render | |
└── index-join | |
│ | table | tbl@primary
│ | key columns | rowid
└── scan | |
| table | tbl@tbl_col1_col2_idx
| spans | /10/2-/11
However, if we store col3
in the index, the index join is no longer necessary. This means our query only needs to read from the secondary index, so it will be more efficient.
> CREATE TABLE tbl (col1 INT, col2 INT, col3 INT, INDEX (col1, col2) STORING (col3));
> EXPLAIN SELECT col3 FROM tbl WHERE col1 = 10 AND col2 > 1;
tree | field | description
+-----------+-------------+-------------------+
render | |
└── scan | |
| table | tbl@tbl_col1_col2_idx
| spans | /10/2-/11