If you are working with a table that must be indexed on sequential keys, you should use hash-sharded indexes. Hash-sharded indexes distribute sequential traffic uniformly across ranges, eliminating single-range hot spots and improving write performance on sequentially-keyed indexes at a small cost to read performance.
This is an experimental feature. The interface and output are subject to change.
How hash-sharded indexes work
CockroachDB automatically splits ranges of data in the key-value store based on the size of the range, and on the load streaming to the range. To split a range based on load, the system looks for a point in the range that evenly divides incoming traffic. If the range is indexed on a column of data that is sequential in nature (e.g., an ordered sequence, or a series of increasing, non-repeating TIMESTAMP
s), then all incoming writes to the range will be the last (or first) item in the index and appended to the end of the range. As a result, the system cannot find a point in the range that evenly divides the traffic, and the range cannot benefit from load-based splitting, creating a hot spot at the single range.
Hash-sharded indexes solve this problem by distributing sequential data across multiple nodes within your cluster, eliminating hotspots. The trade-off to this, however, is a small performance impact on reading sequential data or ranges of data, as it's not guaranteed that sequentially close values will be on the same node.
Hash-sharded indexes create a physical STORED
computed column, known as a shard column. CockroachDB uses this shard column, as opposed to the sequential column in the index, to control the distribution of values across the index. The shard column is hidden by default but can be seen with SHOW COLUMNS
.
When dropping a hash-sharded index, the shard column will also be dropped. This will require a rewrite of the table.
For details about the mechanics and performance improvements of hash-sharded indexes in CockroachDB, see our Hash Sharded Indexes Unlock Linear Scaling for Sequential Workloads blog post.
Create a hash-sharded index
To create a hash-sharded index, set the experimental_enable_hash_sharded_indexes
session variable to on
. Then, add the optional USING HASH WITH BUCKET_COUNT = n_buckets
clause to a CREATE INDEX
statement, to an INDEX
definition in a CREATE TABLE
statement, or to an ALTER PRIMARY KEY
statement. When this clause is used, CockroachDB creates n_buckets
computed columns, shards the index into n_buckets
shards, and then stores each index shard in the underlying key-value store with one of the computed column's hash as its prefix.
To change the bucket size of an existing hash-sharded primary key index, use an ALTER PRIMARY KEY
statement with a USING HASH WITH BUCKET_COUNT = n_buckets
clause that specifies the new bucket size and the existing primary key columns.
Examples
For an example of a hash-sharded index, see Create a hash-sharded secondary index.