The PRIMARY KEY
constraint specifies that the constrained columns' values must uniquely identify each row.
Unlike other constraints which have very specific uses, the PRIMARY KEY
constraint must be used for every table because it provides an intrinsic structure to the table's data.
A table's primary key should be explicitly defined in the CREATE TABLE
statement. Tables can only have one primary key.
A table's primary key can only be specified in the CREATE TABLE
statement. It cannot be changed later using ALTER TABLE
, though it is possible to go through a process to create a new table with the new primary key you want and then migrate the data.
Syntax
PRIMARY KEY
constraints can be defined at the table level. However, if you only want the constraint to apply to a single column, it can be applied at the column level.
Column level
Parameter | Description |
---|---|
table_name |
The name of the table you're creating. |
column_name |
The name of the Primary Key column. |
column_type |
The Primary Key column's data type. |
column_constraints |
Any other column-level constraints you want to apply to this column. |
column_def |
Definitions for any other columns in the table. |
table_constraints |
Any table-level constraints you want to apply. |
Example
> CREATE TABLE orders (
order_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
order_date TIMESTAMP NOT NULL,
order_mode STRING(8),
customer_id INT,
order_status INT
);
Table level
Parameter | Description |
---|---|
table_name |
The name of the table you're creating. |
column_def |
Definitions for any other columns in the table. |
name |
The name you want to use for the constraint, which must be unique to its table and follow these identifier rules. |
column_name |
The name of the column you want to use as the PRIMARY KEY .The order in which you list columns here affects the structure of the primary index. |
table_constraints |
Any other table-level constraints you want to apply. |
Example
> CREATE TABLE IF NOT EXISTS inventories (
product_id INT,
warehouse_id INT,
quantity_on_hand INT NOT NULL,
PRIMARY KEY (product_id, warehouse_id)
);
Details
The columns in the PRIMARY KEY
constraint are used to create its primary
index, which CockroachDB uses by default to access the table's data. This index does not take up additional disk space (unlike secondary indexes, which do) because CockroachDB uses the primary
index to structure the table's data in the key-value layer. For more information, see our blog post SQL in CockroachDB: Mapping Table Data to Key-Value Storage.
To ensure each row has a unique identifier, the PRIMARY KEY
constraint combines the properties of both the UNIQUE
and NOT NULL
constraints. The properties of both constraints are necessary to make sure each row's primary key columns contain distinct sets of values. The properties of the UNIQUE
constraint ensure that each value is distinct from all other values. However, because NULL values never equal other NULL values, the UNIQUE
constraint is not enough (two rows can appear the same if one of the values is NULL). To prevent the appearance of duplicated values, the PRIMARY KEY
constraint also enforces the properties of the NOT NULL
constraint.
Performance considerations
When defining a primary key constraint, it's important to consider:
How the data in the primary key column(s) is distributed across a cluster.
Non-uniform data distributions can lead to hotspots on a single range, or cause transaction contention.
The data type of the primary key column(s).
A primary key column's data type can determine where its row data is stored on a cluster. For example, some data types are sequential in nature (e.g.,
TIMESTAMP
). Defining primary keys on columns of sequential data can result in data being concentrated in a smaller number of ranges, which can negatively affect performance.
For optimal performance, we recommend that you do the following:
Define a primary key for every table.
If you create a table without defining a primary key, CockroachDB will automatically create a primary key over a hidden,
INT
-typed column namedrowid
. By default, sequential, unique identifiers are generated for each row in therowid
column with theunique_rowid()
function. The sequential nature of therowid
values can lead to a poor distribution of the data across a cluster, which can negatively affect performance. Furthermore, because you cannot meaningfully use therowid
column to filter table data, the primary key index onrowid
does not offer any performance optimization. This means you will always have improved performance by defining a primary key for a table. For more information, see our blog post Index Selection in CockroachDB.Define primary key constraints over multiple columns (i.e., use composite primary keys).
When defining composite primary keys, make sure the data in the first column of the primary key prefix is well-distributed across the nodes in the cluster. To improve the performance of ordered queries, you can add monotonically increasing primary key columns after the first column of the primary key prefix. For an example, see Use multi-column primary keys on the SQL Performance Best Practices page.
For single-column primary keys, use
UUID
-typed columns with randomly-generated default values.Randomly generating
UUID
values ensures that the primary key values will be unique and well-distributed across a cluster.Avoid defining primary keys over a single column of sequential data.
Querying a table with a primary key on a single sequential column (e.g., an auto-incrementing
INT
column) can result in single-range hotspots that negatively affect performance. Instead, use a composite key with a non-sequential prefix, or use aUUID
-typed column.
Example
> CREATE TABLE IF NOT EXISTS inventories (
product_id INT,
warehouse_id INT,
quantity_on_hand INT NOT NULL,
PRIMARY KEY (product_id, warehouse_id)
);
> INSERT INTO inventories VALUES (1, 1, 100);
> INSERT INTO inventories VALUES (1, 1, 200);
pq: duplicate key value (product_id,warehouse_id)=(1,1) violates unique constraint "primary"
> INSERT INTO inventories VALUES (1, NULL, 100);
pq: null value in column "warehouse_id" violates not-null constraint