This page provides best-practice guidance on creating tables, with some simple examples based on Cockroach Labs' fictional vehicle-sharing company, MovR.
For detailed reference documentation on the CREATE TABLE
statement, including additional examples, see the CREATE TABLE
syntax page.
Before you begin
Before reading this page, do the following:
- Create a CockroachDB Serverless cluster or start a local cluster.
- Review the database schema objects.
- Create a database.
- Create a user-defined schema.
Create a table
Tables are the logical objects in a cluster that store data sent from your application's persistence layer. Tables organize records of data in rows and columns.
To create a table, use a CREATE TABLE
statement, following the best practices that we have listed in the following sections:
- Name a table
- Define columns
- Select primary key columns
- Add additional constraints
- Execute
CREATE TABLE
statements
After reviewing the best practices in each section, see the example provided in that section.
Name a table
Naming a table is the first step in table creation.
CREATE TABLE
statements generally take the form:
CREATE TABLE {schema_name}.{table_name} (
{elements}
);
Parameter | Description |
---|---|
{schema_name} |
The name of the user-defined schema. |
{table_name} |
The name of the table. |
{elements} |
A comma-separated list of table elements, such as column definitions. |
For an example, see below.
Table naming best practices
Here are some best practices to follow when naming tables:
Use a fully-qualified name (i.e.,
CREATE TABLE database_name.schema_name.table_name
). If you do not specify the database name, CockroachDB will use the SQL session's current database (defaultdb
, by default). If you do not specify the user-defined schema in the table name, CockroachDB will create the table in the preloadedpublic
schema.Use a table name that reflects the contents of the table. For example, for a table containing information about orders, you could use the name
orders
(as opposed to naming the table something liketable1
).
Table naming example
Suppose you want to create a table to store information about users of the MovR platform, and you want the SQL user max
to manage that table.
Create an empty .sql
file for max
:
$ touch max_init.sql
This file will initialize the objects in the max_schema
user-defined schema that you created in Create a Schema, starting with a users
table.
In a text editor, open max_init.sql
, and add an empty CREATE TABLE
statement for the users
table:
CREATE TABLE movr.max_schema.users (
);
Next, define the table's columns.
Define columns
Column definitions give structure to a table by separating the values in each row into columns of a single data type.
Column definitions generally take the following form:
{column_name} {DATA_TYPE} {column_qualification}
Parameter | Description |
---|---|
{column_name} |
The name of the column. |
{DATA_TYPE} |
The data type of the row values in the column. |
{column_qualification} |
Some column qualification, such as a column-level constraint, or a computed column clause. |
For examples, see below.
Column definition best practices
Here are some best practices to follow when defining table columns:
Review the supported column data types, and select the appropriate type for the data you plan to store in a column, following the best practices listed on the data type's reference page.
Use column data types with a fixed size limit, or set a maximum size limit on column data types of variable size (e.g.,
VARBIT(n)
). Values exceeding 1MB can lead to write amplification and cause significant performance degradation.Review the primary key best practices and examples, decide if you need to define any dedicated primary key columns.
Review the best practices and examples for adding additional constraints, and decide if you need to add any additional constraints to your columns.
We strongly recommend adding size limits to all indexed columns, which includes columns in primary keys.
Values exceeding 1 MiB can lead to storage layer write amplification and cause significant performance degradation or even crashes due to OOMs (out of memory errors).
To add a size limit using CREATE TABLE
:
CREATE TABLE name (first STRING(100), last STRING(100));
To add a size limit using ALTER TABLE ... ALTER COLUMN
:
SET enable_experimental_alter_column_type_general = true;
ALTER TABLE name ALTER first TYPE STRING(99);
Column definition examples
In the max_init.sql
file, add a few column definitions to the users
table's CREATE TABLE
statement, for user names and email addresses:
CREATE TABLE movr.max_schema.users (
first_name STRING,
last_name STRING,
email STRING
);
All of the columns shown above use the STRING
data type, meaning that any value in any of the columns must be of the data type STRING
.
CockroachDB supports a number of other column data types, including DECIMAL
, INT
, TIMESTAMP
, UUID
, and enumerated data types and spatial data types. We recommend that you review the supported types, and create columns with data types that correspond to the types of data that you intend to persist to the cluster from your application.
Let's add another example table to our max_schema
schema, with more column data types.
As a vehicle-sharing platform, MovR needs to store data about its vehicles. In max_init.sql
, add a CREATE TABLE
statement for a vehicles
table, under the CREATE TABLE
statement for users
. This table should probably include information about the type of vehicle, when it was created, what its availability is, and where it is located:
CREATE TABLE movr.max_schema.vehicles (
id UUID,
type STRING,
creation_time TIMESTAMPTZ,
available BOOL,
last_location STRING
);
This table includes a few more data types than the users
table:
UUID
, which we recommend for columns with values that uniquely identify rows (like an "id" column).TIMESTAMPTZ
, which we recommend for timestamp values.BOOL
, which we recommend for columns that will only take one of two possible values.
The rest of the columns are STRING
-typed.
Note that values in the type
column will likely only be STRING
values from a fixed list of values. Specifically, the vehicle type can only be one of the vehicle types supported by the MovR platform (e.g., a bike
, a scooter
, or a skateboard
). For values like this, we recommend using a user-defined, enumerated type.
To create a user-defined type, use a CREATE TYPE
statement. For example, above the CREATE TABLE
statement for the vehicles
table, add the following statements:
CREATE TYPE movr.max_schema.vtype AS ENUM ('bike', 'scooter', 'skateboard');
For detailed reference documentation on the CREATE TYPE
statement, including additional examples, see the CREATE TYPE
syntax page.
For detailed reference documentation on enumerated data types, including additional examples, see ENUM
.
You can then use movr.max_schema.vtype
as the type
column's data type:
CREATE TABLE movr.max_schema.vehicles (
id UUID,
type movr.max_schema.vtype,
creation_time TIMESTAMPTZ,
available BOOL,
last_location STRING
);
Only values in the set of movr.max_schema.vtype
values will be allowed in the type
column.
The users
and vehicles
tables now have syntactically valid column definitions. As a best practice, you should explicitly select primary key columns and add any additional constraints before executing the CREATE TABLE
statements.
Select primary key columns
A primary key is a column, or set of columns, whose values uniquely identify rows of data. Every table requires a primary key.
Primary keys are defined in CREATE TABLE
statements with the PRIMARY KEY
column constraint. The PRIMARY KEY
constraint requires that all the constrained column(s) contain only unique and non-NULL
values.
When a table is created, CockroachDB creates an index (called the primary
index) on the column(s) constrained by the PRIMARY KEY
constraint. CockroachDB uses this index to find rows in a table more efficiently.
To add a single column to a primary key, add the PRIMARY KEY
keyword to the end of the column definition. To add multiple columns to a primary key (i.e., to create a composite primary key), add a separate CONSTRAINT "primary" PRIMARY KEY
clause after the column definitions in the CREATE TABLE
statement.
For examples, see below.
For detailed reference documentation on the PRIMARY KEY
constraint, including additional examples, see the PRIMARY KEY
constraint page.
Primary key best practices
Here are some best practices to follow when selecting primary key columns:
Avoid defining primary keys over a single column of sequential data.
Querying a table with a primary key on a single sequential column (e.g., an auto-incrementing
INT
column, or aTIMESTAMP
value) can result in single-range hot spots that negatively affect performance, or cause transaction contention.If you are working with a table that must be indexed on sequential keys, use hash-sharded indexes. For details about the mechanics and performance improvements of hash-sharded indexes in CockroachDB, see our Hash Sharded Indexes Unlock Linear Scaling for Sequential Workloads blog post.
Define a primary key for every table.
If you create a table without defining a primary key, CockroachDB will automatically create a primary key over a hidden,
INT
-typed column namedrowid
. By default, sequential, unique identifiers are generated for each row in therowid
column with theunique_rowid()
function. The sequential nature of therowid
values can lead to a poor distribution of the data across a cluster, which can negatively affect performance. Furthermore, because you cannot meaningfully use therowid
column to filter table data, the primary key index onrowid
does not offer any performance optimization. This means you will always have improved performance by defining a primary key for a table.New in v21.2: To require an explicitly defined primary key for all tables created in your cluster, set the
sql.defaults.require_explicit_primary_keys.enabled
cluster setting totrue
.When possible, define primary key constraints over multiple columns (i.e., use composite primary keys).
When defining composite primary keys, make sure the data in the first column of the primary key prefix is well-distributed across the nodes in the cluster. To improve the performance of ordered queries, you can add monotonically increasing primary key columns after the first column of the primary key prefix. For an example, see below.
For single-column primary keys, use
UUID
-typed columns with default values randomly-generated, using thegen_random_uuid()
SQL function.Randomly generating
UUID
values ensures that the primary key values will be unique and well-distributed across a cluster. For an example, see below.
We strongly recommend adding size limits to all indexed columns, which includes columns in primary keys.
Values exceeding 1 MiB can lead to storage layer write amplification and cause significant performance degradation or even crashes due to OOMs (out of memory errors).
To add a size limit using CREATE TABLE
:
CREATE TABLE name (first STRING(100), last STRING(100));
To add a size limit using ALTER TABLE ... ALTER COLUMN
:
SET enable_experimental_alter_column_type_general = true;
ALTER TABLE name ALTER first TYPE STRING(99);
Primary key examples
To follow a primary key best practice, the CREATE TABLE
statements in max_init.sql
for the users
and vehicles
tables need to explicitly define a primary key.
In the max_init.sql
file, add a composite primary key on the first_name
and last_name
columns of the users
table:
CREATE TABLE movr.max_schema.users (
first_name STRING,
last_name STRING,
email STRING,
CONSTRAINT "primary" PRIMARY KEY (first_name, last_name)
);
This primary key will uniquely identify rows of user data.
Because PRIMARY KEY
constraints imply UNIQUE
indexes, only one unique combination of first_name
and last_name
will be allowed in rows of the users
table. Additionally, because PRIMARY KEY
implies the NOT NULL
constraint, all rows of data inserted into the users
table must include values for first_name
and last_name
.
Primary key columns can also be single columns, if those columns are guaranteed to uniquely identify the row. Their values should also be well-distributed across the cluster.
In the vehicles
table definition, add a PRIMARY KEY
constraint on the id
column:
CREATE TABLE movr.max_schema.vehicles (
id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
type movr.max_schema.vtype,
creation_time TIMESTAMPTZ,
available BOOL,
last_location STRING
);
Note that, in addition to the PRIMARY KEY
constraint, the id
column has a DEFAULT
constraint. This constraint sets a default value for the column to a value generated by the gen_random_uuid()
function, following UUID
best practices. The values generated by this function are guaranteed to be unique and well-distributed across the cluster. We discuss the DEFAULT
constraint more below.
Add additional constraints
In addition to the PRIMARY KEY
constraint, CockroachDB supports a number of other column-level constraints, including CHECK
, DEFAULT
, FOREIGN KEY
, UNIQUE
, and NOT NULL
. Using constraints can simplify table queries, improve query performance, and ensure that data remains semantically valid.
To constrain a single column, add a constraint keyword to the column's definition, as shown in the single-column PRIMARY KEY
example above. To constrain more than one column, add the entire constraint's definition after the list of columns in the CREATE TABLE
statement, also shown in the composite PRIMARY KEY
example above.
For guidance and examples for each constraint, see the sections below.
For detailed reference documentation for each supported constraint, see the constraint's syntax page.
Populate with default values
To set default values on columns, use the DEFAULT
constraint. Default values enable you to write queries without the need to specify values for every column.
When combined with supported SQL functions, default values can save resources in your application's persistence layer by offloading computation onto CockroachDB. For example, rather than using an application library to generate unique UUID
values, you can set a default value to be an automatically-generated UUID
value with the gen_random_uuid()
SQL function. Similarly, you could use a default value to populate a TIMESTAMP
column with the current time of day, using the now()
function.
For example, in the vehicles
table definition in max_init.sql
, you added a DEFAULT gen_random_uuid()
clause to the id
column definition. This set the default value to a generated UUID
value. Now, add a default value to the creation_time
column:
CREATE TABLE movr.max_schema.vehicles (
id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
type movr.max_schema.vtype,
creation_time TIMESTAMPTZ DEFAULT now(),
available BOOL,
last_location STRING
);
When a row is inserted into the vehicles
table, CockroachDB generates a random default value for the vehicle id
, and uses the current timestamp for the vehicle's creation_time
. Rows inserted into the vehicles
table do not need to include an explicit value for id
or creation_time
.
For detailed reference documentation on the DEFAULT
constraint, including additional examples, see the DEFAULT
syntax page.
Reference other tables
To reference values in another table, use a FOREIGN KEY
constraint. FOREIGN KEY
constraints enforce referential integrity, which means that a column can only refer to an existing column.
For example, suppose you want to add a new table that contains data about the rides that MovR users are taking on vehicles. This table should probably include information about the location and duration of the ride, as well as information about the vehicle used for the ride.
In max_init.sql
, under the CREATE TABLE
statement for vehicles
, add a definition for a rides
table, with a foreign key dependency on the vehicles
table. To define a foreign key constraint, use the REFERENCES
keyword:
CREATE TABLE movr.max_schema.rides (
id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
vehicle_id UUID REFERENCES movr.max_schema.vehicles(id),
start_address STRING,
end_address STRING,
start_time TIMESTAMPTZ DEFAULT now(),
end_time TIMESTAMPTZ
);
The vehicle_id
column will be identical to the id
column in the vehicles
table. Any queries that insert a vehicle_id
that does not exist in the id
column of the vehicles
table will return an error.
Foreign keys cannot reference tables in a different database. They can, however reference tables in a different schema.
Suppose that you want to introduce promotional codes for users on the MovR platform, but you want the user promo code data to be managed by the abbey
user that you created in Create a Database.
Create an empty .sql
initialization file for abbey
.
$ touch abbey_init.sql
This file will initialize the objects in the abbey_schema
user-defined schema that you created in Create a Schema.
In a text editor, open abbey_init.sql
, and add a CREATE TABLE
statement for a table called user_promo_codes
:
CREATE TABLE movr.abbey_schema.user_promo_codes (
code STRING,
user_email STRING REFERENCES movr.max_schema.users(email),
valid BOOL,
CONSTRAINT "primary" PRIMARY KEY (code, user_email)
);
This new table references the email
column of the users
table in max_schema
. Because the user_promo_codes
table depends on the users
table, you'll need to execute max_init.sql
before abbey_init.sql
.
Foreign key dependencies can significantly impact query performance, as queries involving tables with foreign keys, or tables referenced by foreign keys, require CockroachDB to check two separate tables. We recommend using them sparingly.
For detailed reference documentation on the FOREIGN KEY
constraint, including additional examples, see the FOREIGN KEY
syntax page.
Prevent duplicates
To prevent duplicate values in a column, use the UNIQUE
constraint.
For example, suppose that you want to ensure that the email addresses of all users are different, to prevent users from registering for two accounts with the same email address. Add a UNIQUE
constraint to the email
column of the users
table:
CREATE TABLE movr.max_schema.users (
first_name STRING,
last_name STRING,
email STRING UNIQUE,
CONSTRAINT "primary" PRIMARY KEY (first_name, last_name)
);
Attempting to insert email
values that already exist in the users
table will return an error.
When you add a UNIQUE
constraint to a column, CockroachDB creates a secondary index on that column, to help speed up checks on a column value's uniqueness.
Also note that the UNIQUE
constraint is implied by the PRIMARY KEY
constraint.
For detailed reference documentation on the UNIQUE
constraint, including additional examples, see the UNIQUE
syntax page.
Prevent NULL
values
To prevent NULL
values in a column, use the NOT NULL
constraint. If you specify a NOT NULL
constraint, all queries against the table with that constraint must specify a value for that column, or have a default value specified with a DEFAULT
constraint.
For example, if you require all users of the MovR platform to have an email on file, you can add a NOT NULL
constraint to the email
column of the users
table:
CREATE TABLE movr.max_schema.users (
first_name STRING,
last_name STRING,
email STRING UNIQUE NOT NULL,
CONSTRAINT "primary" PRIMARY KEY (first_name, last_name)
);
Note that the NOT NULL
constraint is implied by the PRIMARY KEY
constraint.
For detailed reference documentation on the NOT NULL
constraint, including additional examples, see the NOT NULL
syntax page.
Execute CREATE TABLE
statements
After you have defined CREATE TABLE
statements for your tables, you can execute the statements.
CREATE TABLE
execution best practices
Here are some general best practices to follow when executing CREATE TABLE
statements:
Do not create tables as the
root
user. Instead, create tables as a different user, with fewer privileges, following authorization best practices. The user that creates an object becomes that object's owner.As a general best practice, we discourage the use of client libraries to execute database schema changes. Instead, use a database schema migration tool, or the CockroachDB SQL client.
Review the limitations of online schema changes in CockroachDB. Note that CockroachDB has limited support for schema changes within the same explicit transaction.
We recommend doing schema changes outside explicit transactions, where possible. When a database schema management tool manages transactions on your behalf, we recommend only including one schema change operation per transaction.
Execute the example CREATE TABLE
statements
After following the examples provided in the sections above, the max_init.sql
file should look similar to the following:
CREATE TABLE movr.max_schema.users (
first_name STRING,
last_name STRING,
email STRING UNIQUE NOT NULL,
CONSTRAINT "primary" PRIMARY KEY (first_name, last_name)
);
CREATE TYPE movr.max_schema.vtype AS ENUM ('bike', 'scooter', 'skateboard');
CREATE TABLE movr.max_schema.vehicles (
id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
type movr.max_schema.vtype,
creation_time TIMESTAMPTZ DEFAULT now(),
available BOOL,
last_location STRING
);
CREATE TABLE movr.max_schema.rides (
id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
vehicle_id UUID REFERENCES movr.max_schema.vehicles(id),
start_address STRING,
end_address STRING,
start_time TIMESTAMPTZ DEFAULT now(),
end_time TIMESTAMPTZ
);
To execute the statements in the max_init.sql
file, run the following command:
$ cockroach sql \
--certs-dir={certs-directory} \
--user=max \
--database=movr \
-f max_init.sql
The SQL client will execute any statements in max_init.sql
, with movr
as the database and max
as the user. max
is now the owner of all objects created by the statements in the max_init.sql
file.
After the statements have been executed, you can see the tables in the CockroachDB SQL shell.
Open the SQL shell to your cluster, with movr
as the database and max
as the user:
$ cockroach sql \
--certs-dir={certs-directory} \
--user=max \
--database=movr
To view the tables in the max_schema
user-defined schema, issue a SHOW TABLES
statement:
> SHOW TABLES FROM max_schema;
schema_name | table_name | type | owner | estimated_row_count
--------------+------------+-------+-------+----------------------
max_schema | rides | table | max | 0
max_schema | users | table | max | 0
max_schema | vehicles | table | max | 0
(3 rows)
To see the individual CREATE TABLE
statements for each table, use a SHOW CREATE
statement. For example, to see the vehicles
CREATE TABLE
statement:
> SHOW CREATE TABLE movr.max_schema.vehicles;
table_name | create_statement
---------------------------+---------------------------------------------------------------------------
movr.max_schema.vehicles | CREATE TABLE max_schema.vehicles (
| id UUID NOT NULL DEFAULT gen_random_uuid(),
| type max_schema.vtype NULL,
| creation_time TIMESTAMPTZ NULL DEFAULT now():::TIMESTAMPTZ,
| available BOOL NULL,
| last_location STRING NULL,
| CONSTRAINT "primary" PRIMARY KEY (id ASC),
| FAMILY "primary" (id, type, creation_time, available, last_location)
| )
(1 row)
After following the examples provided in the sections above, the abbey_init.sql
file should look similar to the following:
CREATE TABLE movr.abbey_schema.user_promo_codes (
code STRING,
user_email STRING REFERENCES movr.max_schema.users(email),
valid BOOL,
CONSTRAINT "primary" PRIMARY KEY (code, user_email)
);
To execute the statement in the abbey_init.sql
file, run the following command:
$ cockroach sql \
--certs-dir={certs-directory} \
--user=abbey \
--database=movr \
-f abbey_init.sql
After the statements have been executed, you can see the table in the CockroachDB SQL shell.
Open the SQL shell to your cluster, with movr
as the database and abbey
as the user, and view the table:
$ cockroach sql \
--certs-dir={certs-directory} \
--user=abbey \
--database=movr
> SHOW TABLES FROM abbey_schema;
schema_name | table_name | type | owner | estimated_row_count
---------------+------------------+-------+-------+----------------------
abbey_schema | user_promo_codes | table | abbey | 0
(1 row
> SHOW CREATE TABLE movr.abbey_schema.user_promo_codes;
table_name | create_statement
-------------------------------------+------------------------------------------------------------------------------------------------------
movr.abbey_schema.user_promo_codes | CREATE TABLE abbey_schema.user_promo_codes (
| code STRING NOT NULL,
| user_email STRING NOT NULL,
| valid BOOL NULL,
| CONSTRAINT "primary" PRIMARY KEY (code ASC, user_email ASC),
| CONSTRAINT fk_user_email_ref_users FOREIGN KEY (user_email) REFERENCES max_schema.users(email),
| FAMILY "primary" (code, user_email, usage_count)
| )
(1 row)
Note that none of the tables that you have created thus far have secondary indexes. For guidance on adding secondary indexes, see at Add Secondary Indexes.
What's next?
You might also be interested in the following pages: