This page provides best-practice guidance on creating indexes, with a simple example based on Cockroach Labs' fictional vehicle-sharing company, MovR.
For detailed reference documentation on the CREATE INDEX
statement, including additional examples, see the CREATE INDEX
syntax page.
Before you begin
Before reading this page, do the following:
- Install CockroachDB.
- Start a local cluster, or create a CockroachDB Cloud cluster.
- Review the database schema objects.
- Create a database.
- Create a user-defined schema.
- Create a table.
Create a secondary index
Indexes are logical objects in a cluster that help CockroachDB queries find data more efficiently. When you create an index, CockroachDB creates a copy of the columns selected for the index, and then sorts the rows of data by indexed column values, without sorting the values in the table itself.
CockroachDB automatically creates an index on the table's primary key columns. This index is called the primary index. The primary index helps CockroachDB more efficiently scan rows, as sorted by the table's primary key columns, but it does not help find values as identified by any other columns.
Secondary indexes (i.e., all indexes that are not the primary index) improve the performance of queries that identify rows with columns that are not in a table's primary key. CockroachDB automatically creates secondary indexes for columns with a UNIQUE
constraint.
To add a secondary index to a table, do one of the following, following the best practices listed below:
Add an
INDEX
clause to the end of aCREATE TABLE
statement.INDEX
clauses generally take the following form:INDEX {index_name} ({column_names});
Parameter Description {index_name}
The name of the index. {column_names}
The name of the column to index, or a comma-separated list of names of the columns to index. Use a
CREATE INDEX
statement.CREATE INDEX
statements generally take the following form:CREATE INDEX {index_name} ON {table_name} ({column_names});
Parameter Description {index_name}
The name of the index. {table_name}
The name of the table. {column_names}
The name of the column to index, or a comma-separated list of names of the columns to index.
For an example, see below.
If you do not specify a name for an index, CockroachDB will generate a name.
After creation, the notation for referring to indexes in CockroachDB is {table_name}@{index_name}
.
Best practices
Here are some best practices for creating indexes:
Index all columns that you plan to use for sorting or filtering data.
An index that stores all the columns needed by a query is also known as a covering index for that query. When a query has a covering index, CockroachDB can use that index directly instead of doing a join with the primary key, which is likely to be slower.
Note that columns listed in a filtering
WHERE
clause with the equality operators (=
orIN
) should come first in the index, before those referenced with inequality operators (<
,>
).Columns with a higher cardinality (higher number of distinct values) should be placed in the index before columns with a lower cardinality. If the cardinality of the columns you wish to add to the index are similar, test multiple column arrangements in a non-production environment to determine the most performant arrangement.
Avoid indexing on sequential values.
Writes to indexes with sequential keys can result in range hotspots that negatively affect performance. Instead, use randomly generated unique IDs, or multi-column keys.
If you are working with a table that must be indexed on sequential keys, use hash-sharded indexes. For details about the mechanics and performance improvements of hash-sharded indexes in CockroachDB, see our Hash Sharded Indexes Unlock Linear Scaling for Sequential Workloads blog post.
Avoid creating secondary indexes that you do not need, and drop unused indexes whenever possible. Secondary indexes can slow down write performance and take up node memory.
Note that queries can benefit from an index even if they only filter a prefix of its columns. For example, if you create an index of columns
(A, B, C)
, queries filtering(A)
or(A, B)
can still use the index. However, queries that do not filter(A)
will not benefit from the index. This feature also lets you avoid using single-column indexes. Instead, use the column as the first column in a multiple-column index, which is useful to more queries.Also note that
ALTER PRIMARY KEY
creates a secondary index from the old primary key. If you need to change a primary key, and you do not plan to filter queries on the old primary key column(s), do not useALTER PRIMARY KEY
. Instead, useDROP CONSTRAINT ... PRIMARY KEY
/ADD CONSTRAINT ... PRIMARY KEY
, which does not create a secondary index.Limit creation and deletion of secondary indexes to off-peak hours. Performance impacts are likely if done during peak business hours.
Use a
STORING
clause to store columns of data that you want returned by common queries, but that you do not plan to use in query filters. Note that the synonymCOVERING
is also supported.The
STORING
clause specifies columns that are not part of the index key but should be stored in the index, without being sorted. If a column is specified in a query, and the column is neither indexed nor stored in an index, CockroachDB will perform a full scan of the table, which can result in poor performance. For an example, see below.Review the specialized indexes that CockroachDB supports, and decide if you need to create a specialized index instead of a standard index.
Do not create indexes as the
root
user. Instead, create indexes as a different user, with fewer privileges, following authorization best practices. This will likely be the same user that created the table to which the index belongs.As a general best practice, we discourage the use of client libraries to execute database schema changes. Instead, use a database schema migration tool, or the CockroachDB SQL client.
Review the limitations of online schema changes in CockroachDB. Note that CockroachDB has limited support for schema changes within the same explicit transaction.
We recommend doing schema changes outside explicit transactions, where possible. When a database schema management tool manages transactions on your behalf, we recommend only including one schema change operation per transaction.
Example
Suppose you want the MovR application to display all of the bikes available to the users of the MovR platform.
Recall that the vehicles
table that you created in Create a Table stores rows of data for each vehicle registered with MovR. Your application will need to read any data about vehicles into the application's persistence layer from this table. To display available bikes, the reads will need to filter on the available
and type
columns.
Open max_init.sql
, and, under the CREATE TABLE
statement for the vehicles
table, add a CREATE INDEX
statement for an index on the type
and available
columns of the vehicles
table:
CREATE INDEX type_available_idx ON movr.vehicles (type, available));
This statement creates a secondary index named type_available_idx
, on the vehicles
table.
The MovR app might also need to display the vehicle's location and ID, but the app will not be filtering or sorting on those values. If any of the columns referenced in or returned by a query are not in a primary or secondary index key, CockroachDB will need to perform a full scan of the table to find the value. Full table scans can be costly, and should be avoided whenever possible.
To help avoid unnecessary full table scans, add a STORING
clause to the index:
CREATE INDEX type_available_idx ON movr.vehicles (type, available) STORING (last_location);
The index will now store the values in last_location
, which will improve the performance of reads from the vehicles
table that return type
, available
, id
, and last_location
values and do not filter or sort on the last_location
column.
The max_init.sql
file should now look similar to the following:
CREATE TABLE movr.max_schema.users (
first_name STRING,
last_name STRING,
email STRING UNIQUE,
CONSTRAINT "primary" PRIMARY KEY (first_name, last_name)
);
CREATE TYPE movr.max_schema.vtype AS ENUM ('bike', 'scooter', 'skateboard');
CREATE TABLE movr.max_schema.vehicles (
id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
type movr.max_schema.vtype,
creation_time TIMESTAMPTZ DEFAULT now(),
available BOOL,
last_location STRING
);
CREATE INDEX type_available_idx ON movr.max_schema.vehicles (type, available) STORING (last_location);
CREATE TABLE movr.max_schema.rides (
id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
vehicle_id UUID REFERENCES movr.max_schema.vehicles(id),
start_address STRING,
end_address STRING,
start_time TIMESTAMPTZ DEFAULT now(),
end_time TIMESTAMPTZ
);
If you executed this file when following the Create a Table example, then all of these objects already exist. To clear the database and re-initialize the schemas, first execute the statements in the dbinit.sql
file as the root
user:
$ cockroach sql \
--certs-dir={certs-directory} \
--user=root \
-f dbinit.sql
Then, execute the statements in the max_init.sql
and abbey_init.sql
files:
$ cockroach sql \
--certs-dir={certs-directory} \
--user=max \
--database=movr \
-f max_init.sql
$ cockroach sql \
--certs-dir={certs-directory} \
--user=abbey \
--database=movr \
-f abbey_init.sql
After the statements have been executed, you can see the new index in the CockroachDB SQL shell.
Open the SQL shell to your cluster:
$ cockroach sql \
--certs-dir={certs-directory} \
--user=max \
--database=movr
To view the indexes in the vehicles
table, issue a SHOW INDEXES
statement:
> SHOW INDEXES FROM movr.max_schema.vehicles;
table_name | index_name | non_unique | seq_in_index | column_name | direction | storing | implicit
-------------+--------------------+------------+--------------+---------------+-----------+---------+-----------
vehicles | primary | false | 1 | id | ASC | false | false
vehicles | type_available_idx | true | 1 | type | ASC | false | false
vehicles | type_available_idx | true | 2 | available | ASC | false | false
vehicles | type_available_idx | true | 3 | last_location | N/A | true | false
vehicles | type_available_idx | true | 4 | id | ASC | false | true
(5 rows)
The output from this SHOW
statement displays the names and columns of the two indexes on the table (i.e., primary
and type_available_idx
).
Note that the last_location
column's storing
value is true
in the type_available_idx
index, and is therefore not sorted. Also note that the primary key column id
is implicit in the index, meaning the id
column is implicitly indexed in type_available_idx
.
To see an index definition, use a SHOW CREATE
statement on the table that contains the index:
> SHOW CREATE TABLE movr.max_schema.vehicles;
table_name | create_statement
---------------------------+----------------------------------------------------------------------------------
movr.max_schema.vehicles | CREATE TABLE max_schema.vehicles (
| id UUID NOT NULL DEFAULT gen_random_uuid(),
| type max_schema.vtype NULL,
| creation_time TIMESTAMPTZ NULL DEFAULT now():::TIMESTAMPTZ,
| available BOOL NULL,
| last_location STRING NULL,
| CONSTRAINT "primary" PRIMARY KEY (id ASC),
| INDEX type_available_idx (available ASC, type ASC) STORING (last_location),
| FAMILY "primary" (id, type, creation_time, available, last_location)
| )
(1 row)
After creating a database, a user-defined schema, some tables, and secondary indexes, the database schema should be ready for your application to write and read data.
It's likely that you will need to update your database schema at some point. For an overview on how to update a database schema, see Change and Remove Objects in a Database Schema. We also recommend reading about how online schema changes work in CockroachDB.
What's next?
- Change and Remove Objects in a Database Schema
- Read about how schema changes work
- Insert Data
- Query Data
You might also be interested in the following pages: