New in v2.0 CockroachDB allows you to define table partitions, thus giving you row-level control of how and where your data is stored. Partitioning enables you to reduce latencies and costs and can assist in meeting regulatory requirements for your data.
Why Use Table Partitioning
Table partitioning helps you reduce latency and cost:
- Geo-partitioning allows you to keep user data close to the user, which reduces the distance that the data needs to travel, thereby reducing latency. To geo-partition a table, define location-based partitions while creating a table, create location-specific zone configurations, and apply the zone configurations to the corresponding partitions.
- Archival-partitioning allows you to store infrequently-accessed data on slower and cheaper storage, thereby reducing costs. To archival-partition a table, define frequency-based partitions while creating a table, create frequency-specific zone configurations with appropriate storage devices constraints, and apply the zone configurations to the corresponding partitions.
How It Works
Table partitioning involves a combination of CockroachDB features:
Node Attributes
To store partitions in specific locations (e.g., geo-partitioning), or on machines with specific attributes (e.g., archival-partitioning), the nodes of your cluster must be started with the relevant flags:
- Use the
--locality
flag to assign key-value pairs that describe the location of a node, for example,--locality=region=east,datacenter=us-east-1
. - Use the
--attrs
flag to specify node capability, which might include specialized hardware or number of cores, for example,--attrs=ram:64gb
. - Use the
attrs
field of the--store
flag to specify disk type or capability, for example,--store=path=/mnt/ssd01,attrs=ssd
.
For more details about these flags, see the cockroach start
documentation.
Enterprise License
You must have a valid enterprise license to use table partitioning features. For details about requesting and setting a trial or full enterprise license, see Enterprise Licensing.
Note that the following features do not work with an expired license:
- Creating new table partitions or adding new zone configurations for partitions
- Changing the partitioning scheme on any table or index
- Changing the zone config for a partition
However, the following features continue to work even with an expired enterprise license:
- Querying a partitioned table (for example,
SELECT foo PARTITION
) - Inserting or updating data in a partitioned table
- Dropping a partitioned table
- Unpartitioning a partitioned table
- Making non-partitioning changes to a partitioned table (for example, adding a column/index/foreign key/check constraint)
Table Creation
You can define partitions and subpartitions over one or more columns of a table. During table creation, you declare which values belong to each partition in one of two ways:
- List partitioning: Enumerate all possible values for each partition. List partitioning is a good choice when the number of possible values is small. List partitioning is well-suited for geo-partitioning.
- Range partitioning: Specify a contiguous range of values for each partition by specifying lower and upper bounds. Range partitioning is a good choice when the number of possible values is too large to explicitly list out. Range partitioning is well-suited for archival-partitioning.
Partition by List
PARTITION BY LIST
lets you map one or more tuples to a partition.
To partition a table by list, use the PARTITION BY LIST
syntax while creating the table. While defining a list partition, you can also set the DEFAULT
partition that acts as a catch-all if none of the rows match the requirements for the defined partitions.
See Partition by List example below for more details.
Partition by Range
PARTITION BY RANGE
lets you map ranges of tuples to a partition.
To define a table partition by range, use the PARTITION BY RANGE
syntax while creating the table. While defining a range partition, you can use CockroachDB-defined MINVALUE
and MAXVALUE
parameters to define the lower and upper bounds of the ranges respectively.
NULL
is considered less than any other data, which is consistent with our key encoding ordering and ORDER BY
behavior.Partition values can be any SQL expression, but it’s only evaluated once. If you create a partition with value < (now() - '1d')
on 2017-01-30, it would be contain all values less than 2017-01-29. It would not update the next day, it would continue to contain values less than 2017-01-29.
See Partition by Range example below for more details.
Partition using Primary Key
The primary key required for partitioning is different from the conventional primary key. To define the primary key for partitioning, prefix the unique identifier(s) in the primary key with all columns you want to partition and subpartition the table on, in the order in which you want to nest your subpartitions.
For instance, consider the database of a global online learning portal that has a table for students of all the courses across the world. If you want to geo-partition the table based on the countries of the students, then the primary key needs to be defined as:
> CREATE TABLE students (
id INT DEFAULT unique_rowid(),
name STRING,
email STRING,
country STRING,
expected_graduation_date DATE,
PRIMARY KEY (country, id));
Primary Key Considerations
- For v2.0, you cannot change the primary key after you create the table. Provision for all future subpartitions by including those columns in the primary key. In the example of the online learning portal, if you think you might want to subpartition based on
expected_graduation_date
in the future, define the primary key as(country, expected_graduation_date, id)
. v2.1 will allow you to change the primary key. - The order in which the columns are defined in the primary key is important. The partitions and subpartitions need to follow that order. In the example of the online learning portal, if you define the primary key as
(country, expected_graduation_date, id)
, the primary partition is bycountry
, and then subpartition is byexpected_graduation_date
. You can’t skipcountry
and partition byexpected_graduation_date
.
Partition using Secondary Index
The primary key discussed above has two drawbacks:
- It does not enforce that the identifier column is globally unique.
- It does not provide fast lookups on the identifier.
To ensure uniqueness or fast lookups, create a unique, unpartitioned secondary index on the identifier.
Indexes can also be partitioned, but are not required to be. Each partition is required to have a name that is unique among all partitions on that table and its indexes. For example, the following CREATE INDEX
scenario will fail because it reuses the name of a partition of the primary key:
CREATE TABLE foo (a STRING PRIMARY KEY, b STRING) PARTITION BY LIST (a) (
PARTITION bar VALUES IN ('bar'),
PARTITION default VALUES IN (DEFAULT)
);
CREATE INDEX foo_b_idx ON foo (b) PARTITION BY LIST (b) (
PARTITION baz VALUES IN ('baz'),
PARTITION default VALUES IN (DEFAULT)
);
Consider using a naming scheme that uses the index name to avoid conflicts. For example, the partitions above could be named primary_idx_bar
, primary_idx_default
, b_idx_baz
, b_idx_default
.
Define Partitions on Interleaved Tables
For interleaved tables, partitions can be defined only on the root table of the interleave hierarchy, while children are interleaved the same as their parents.
Replication Zones
On their own, partitions are inert and simply apply a label to the rows of the table that satisfy the criteria of the defined partitions. Applying functionality to a partition requires creating and applying replication zone to the corresponding partitions.
CockroachDB uses the most granular zone config available. Zone configs that target a partition are considered more granular than those that target a table or index, which in turn are considered more granular than those that target a database.
Examples
Define Table Partitions by List
Consider a global online learning portal, RoachLearn, that has a database containing a table of students across the world. Suppose we have two datacenters: one in the United States and another in Australia. To reduce latency, we want to keep the students' data closer to their locations:
- We want to keep the data of the students located in the United States and Canada in the United States datacenter.
- We want to keep the data of students located in Australia and New Zealand in the Australian datacenter.
Step 1. Identify the partitioning method
We want to geo-partition the table to keep the students' data closer to their locations. We can achieve this by partitioning on country
and using the PARTITION BY LIST
syntax.
Step 2. Start each node with its datacenter location specified in the --locality
flag
# Start the node in the US datacenter:
$ cockroach start --insecure \
--locality=datacenter=us1 \
--store=node1 \
--host=<node1 hostname> \
--port=26257 \
--http-port=8080 \
--join=<node1 hostname>:26257,<node2 hostname>:26258
# Start the node in the AUS datacenter:
$ cockroach start --insecure \
--locality=datacenter=au1 \
--store=node2 \
--host=<node2 hostname> \
--port=26258 \
--http-port=8081 \
--join=<node1 hostname>:26257,<node2 hostname>:26258
Step 3. Set the enterprise license
To set the enterprise license, see Set the Trial or Enterprise License Key.
Step 4. Create a table with the appropriate partitions
> CREATE TABLE students_by_list (
id INT DEFAULT unique_rowid(),
name STRING,
email STRING,
country STRING,
expected_graduation_date DATE,
PRIMARY KEY (country, id))
PARTITION BY LIST (country)
(PARTITION north_america VALUES IN ('CA','US'),
PARTITION australia VALUES IN ('AU','NZ'),
PARTITION DEFAULT VALUES IN (default));
Step 5. Create and apply corresponding zone configurations
Create appropriate zone configurations:
$ cat > north_america.zone.yml
constraints: [+datacenter=us1]
$ cat > australia.zone.yml
constraints: [+datacenter=au1]
Apply zone configurations to corresponding partitions:
$ cockroach zone set roachlearn.students_by_list.north_america --insecure -f north_america.zone.yml
$ cockroach zone set roachlearn.students_by_list.australia --insecure -f australia.zone.yml
Step 6. Verify table partitions
> SHOW EXPERIMENTAL_RANGES FROM TABLE students_by_list;
You should see the following output:
+-----------------+-----------------+----------+----------+--------------+
| Start Key | End Key | Range ID | Replicas | Lease Holder |
+-----------------+-----------------+----------+----------+--------------+
| NULL | /"AU" | 251 | {1,2,3} | 1 |
| /"AU" | /"AU"/PrefixEnd | 257 | {1,2,3} | 1 |
| /"AU"/PrefixEnd | /"CA" | 258 | {1,2,3} | 1 |
| /"CA" | /"CA"/PrefixEnd | 252 | {1,2,3} | 1 |
| /"CA"/PrefixEnd | /"NZ" | 253 | {1,2,3} | 1 |
| /"NZ" | /"NZ"/PrefixEnd | 256 | {1,2,3} | 1 |
| /"NZ"/PrefixEnd | /"US" | 259 | {1,2,3} | 1 |
| /"US" | /"US"/PrefixEnd | 254 | {1,2,3} | 1 |
| /"US"/PrefixEnd | NULL | 255 | {1,2,3} | 1 |
+-----------------+-----------------+----------+----------+--------------+
(9 rows)
Time: 7.209032ms
Define Table Partitions by Range
Suppose we want to store the data of current students on fast and expensive storage devices (e.g., SSD) and store the data of the graduated students on slower, cheaper storage devices (e.g., HDD).
Step 1. Identify the partitioning method
We want to archival-partition the table to keep newer data on faster devices and older data on slower devices. We can achieve this by partitioning the table by date and using the PARTITION BY RANGE
syntax.
Step 2. Set the enterprise license
To set the enterprise license, see Set the Trial or Enterprise License Key.
Step 3. Start each node with the appropriate storage device specified in the --store
flag
$ cockroach start --insecure \
--store=path=/mnt/1,attrs=ssd \
--host=<node1 hostname> \
--port=26257 \
--http-port=8080 \
--join=<node1 hostname>:26257,<node2 hostname>:26258
$ cockroach start --insecure \
--store=path=/mnt/2,attrs=hdd \
--host=<node2 hostname> \
--port=26258 \
--http-port=8081 \
--join=<node1 hostname>:26257,<node2 hostname>:26258
Step 4. Create a table with the appropriate partitions
> CREATE TABLE students_by_range (
id INT DEFAULT unique_rowid(),
name STRING,
email STRING,
country STRING,
expected_graduation_date DATE,
PRIMARY KEY (expected_graduation_date, id))
PARTITION BY RANGE (expected_graduation_date)
(PARTITION graduated VALUES FROM (MINVALUE) TO ('2017-08-15'),
PARTITION current VALUES FROM ('2017-08-15') TO (MAXVALUE));
Step 5. Create and apply corresponding zone configurations
Create appropriate zone configurations:
$ cat > current.zone.yml
constraints: [+ssd]
$ cat > graduated.zone.yml
constraints: [+hdd]
Apply zone configurations to corresponding partitions:
$ cockroach zone set roachlearn.students_by_range.current --insecure -f current.zone.yml
$ cockroach zone set roachlearn.students_by_range.graduated --insecure -f graduated.zone.yml
Step 6. Verify table partitions
> SHOW EXPERIMENTAL_RANGES FROM TABLE students_by_range;
You should see the following output:
+-----------+---------+----------+----------+--------------+
| Start Key | End Key | Range ID | Replicas | Lease Holder |
+-----------+---------+----------+----------+--------------+
| NULL | /17393 | 244 | {1,2,3} | 1 |
| /17393 | NULL | 242 | {1,2,3} | 1 |
+-----------+---------+----------+----------+--------------+
(2 rows)
Time: 5.850903ms
Define Subpartitions on a Table
A list partition can itself be partitioned, forming a subpartition. There is no limit on the number of levels of subpartitioning; that is, list partitions can be infinitely nested.
Going back to RoachLearn's scenario, suppose we want to do all of the following:
- Keep the students' data close to their location.
- Store the current students' data on faster storage devices.
- Store the graduated students' data on slower, cheaper storage devices (example: HDD).
Step 1. Identify the Partitioning method
We want to geo-partition as well as archival-partition the table. We can achieve this by partitioning the table first by location and then by date.
Step 2. Start each node with the appropriate storage device specified in the --store
flag
Start a node in the US datacenter:
$ cockroach start --insecure \
--host=<node1 hostname> \
--locality=datacenter=us1 \
--store=path=/mnt/1,attrs=ssd \
--store=path=/mnt/2,attrs=hdd \
Start a node in the AUS datacenter:
$ cockroach start --insecure \
--host=<node2 hostname> \
--locality=datacenter=au1 \
--store=path=/mnt/3,attrs=ssd \
--store=path=/mnt/4,attrs=hdd \
--join=<node1 hostname>:26257
Initialize the cluster:
$ cockroach init --insecure --host=<node1 hostname>
Step 3. Set the enterprise license
To set the enterprise license, see Set the Trial or Enterprise License Key.
Step 4. Create a table with the appropriate partitions
> CREATE TABLE students (
id INT DEFAULT unique_rowid(),
name STRING,
email STRING,
country STRING,
expected_graduation_date DATE,
PRIMARY KEY (country, expected_graduation_date, id))
PARTITION BY LIST (country)(
PARTITION australia VALUES IN ('AU','NZ') PARTITION BY RANGE (expected_graduation_date)(PARTITION graduated_au VALUES FROM (MINVALUE) TO ('2017-08-15'), PARTITION current_au VALUES FROM ('2017-08-15') TO (MAXVALUE)),
PARTITION north_america VALUES IN ('US','CA') PARTITION BY RANGE (expected_graduation_date)(PARTITION graduated_us VALUES FROM (MINVALUE) TO ('2017-08-15'), PARTITION current_us VALUES FROM ('2017-08-15') TO (MAXVALUE))
);
Subpartition names must be unique within a table. In our example, even though graduated
and current
are sub-partitions of distinct partitions, they still need to be uniquely named. Hence the names graduated_au
, graduated_us
, and current_au
and current_us
.
Step 5. Create and apply corresponding zone configurations
Create appropriate zone configurations:
$ cat > current_us.zone.yml
constraints: [+ssd,+datacenter=us1]
$ cat > graduated_us.zone.yml
constraints: [+hdd,+datacenter=us1]
$ cat > current_au.zone.yml
constraints: [+ssd,+datacenter=au1]
$ cat > graduated_au.zone.yml
constraints: [+hdd,+datacenter=au1]
Apply zone configurations to corresponding partitions:
$ cockroach zone set roachlearn.students.current_us --insecure -f current_us.zone.yml
$ cockroach zone set roachlearn.students.graduated_us --insecure -f graduated_us.zone.yml
$ cockroach zone set roachlearn.students.current_au --insecure -f current_au.zone.yml
$ cockroach zone set roachlearn.students.graduated_au --insecure -f graduated_au.zone.yml
Step 6. Verify table partitions
> SHOW EXPERIMENTAL_RANGES FROM TABLE students;
You should see the following output:
+-----------------+-----------------+----------+----------+--------------+
| Start Key | End Key | Range ID | Replicas | Lease Holder |
+-----------------+-----------------+----------+----------+--------------+
| NULL | /"AU" | 260 | {1,2,3} | 1 |
| /"AU" | /"AU"/17393 | 268 | {1,2,3} | 1 |
| /"AU"/17393 | /"AU"/PrefixEnd | 266 | {1,2,3} | 1 |
| /"AU"/PrefixEnd | /"CA" | 267 | {1,2,3} | 1 |
| /"CA" | /"CA"/17393 | 265 | {1,2,3} | 1 |
| /"CA"/17393 | /"CA"/PrefixEnd | 261 | {1,2,3} | 1 |
| /"CA"/PrefixEnd | /"NZ" | 262 | {1,2,3} | 3 |
| /"NZ" | /"NZ"/17393 | 284 | {1,2,3} | 3 |
| /"NZ"/17393 | /"NZ"/PrefixEnd | 282 | {1,2,3} | 3 |
| /"NZ"/PrefixEnd | /"US" | 283 | {1,2,3} | 3 |
| /"US" | /"US"/17393 | 281 | {1,2,3} | 3 |
| /"US"/17393 | /"US"/PrefixEnd | 263 | {1,2,3} | 1 |
| /"US"/PrefixEnd | NULL | 264 | {1,2,3} | 1 |
+-----------------+-----------------+----------+----------+--------------+
(13 rows)
Time: 11.586626ms
Repartition a Table
Consider the partitioned table of students of RoachLearn. Suppose the table has been partitioned on range to store the current students on fast and expensive storage devices (example: SSD) and store the data of the graduated students on slower, cheaper storage devices(example: HDD). Now suppose we want to change the date after which the students will be considered current to 2018-08-15
. We can achieve this by using the PARTITION BY
subcommand of the ALTER TABLE
command.
> ALTER TABLE students_by_range PARTITION BY RANGE (expected_graduation_date) (
PARTITION graduated VALUES FROM (MINVALUE) TO ('2018-08-15'),
PARTITION current VALUES FROM ('2018-08-15') TO (MAXVALUE));
Unpartition a Table
You can remove the partitions on a table by using the PARTITION BY NOTHING
syntax:
> ALTER TABLE students PARTITION BY NOTHING;
Locality–Resilience Tradeoff
There is a tradeoff between making reads/writes fast and surviving failures. Consider a partition with three replicas of roachlearn.students
for Australian students.
- If only one replica is pinned to an Australian datacenter, then reads may be fast (via leases follow the workload) but writes will be slow.
- If two replicas are pinned to an Australian datacenter, then reads and writes will be fast (as long as the cross-ocean link has enough bandwidth that the third replica doesn’t fall behind). If those two replicas are in the same datacenter, then the loss of one datacenter can lead to data unavailability, so some deployments may want two separate Australian datacenters.
- If all three replicas are in Australian datacenters, then three Australian datacenters are needed to be resilient to a datacenter loss.
How CockroachDB's Partitioning Differs from Other Databases
Other databases use partitioning for three additional use cases: secondary indexes, sharding, and bulk loading/deleting. CockroachDB addresses these use-cases not by using partitioning, but in the following ways:
- Changes to secondary indexes: CockroachDB solves these changes through online schema changes. Online schema changes are a superior feature to partitioning because they require zero-downtime and eliminate the potential for consistency problems.
- Sharding: CockroachDB automatically shards data as a part of its distributed database architecture.
- Bulk Loading & Deleting: CockroachDB does not have a feature that supports this use case as of now.
Known Limitations
When defining a table partition, either during table creation or table alteration, it is not possible to use placeholders in the PARTITION BY
clause.