This page walks you through creating a database schema for an example multi-region application. It is the second section of the Develop and Deploy a Multi-Region Web Application tutorial.
CockroachDB versions v21.1 and above support new multi-region capabilities, with different SQL syntax.
For the latest version of the application and database schema built on v21.1 multi-region features, see the movr-flask
repository.
For the latest version of the tutorial, see the v21.1 docs.
Before you begin
Before you begin this section, complete the previous section of the tutorial, MovR: An Example Multi-Region Use-Case.
The movr
database
The example application is built on a multi-region deployment of CockroachDB, loaded with the movr
database. This database contains the following tables:
users
vehicles
rides
These tables store information about the users and vehicles registered with MovR, and the rides associated with those users and vehicles.
Here's a diagram of the database schema, generated with DBeaver:
Initialization statements for movr
are defined in dbinit.sql
, a SQL file that you use later in this tutorial to load the database to a running cluster.
This database is a slightly simplified version of the movr
database that is built into the cockroach
binary. Although the two databases are similar, they have different schemas.
Geo-partition the movr
database
Distributed CockroachDB deployments consist of multiple regional deployments of CockroachDB nodes that communicate as a single, logical database. In CockroachDB terminology, these nodes comprise a cluster. CockroachDB splits rows of table data into ranges, and then replicates the ranges and distributes them to the individual nodes of the cluster. You can control where ranges are replicated and distributed with CockroachDB metadata objects known as replication zones.
At startup, each node in a cluster is assigned a locality. You can assign nodes to the same replication zone based on their locality. When you partition data, you break up tables into segments of rows, based on a common value or characteristic. To geo-partition the data, you constrain a partition to a specific replication zone.
For example, suppose that the movr
database is loaded to a multi-region CockroachDB cluster, with each node assigned a cloud provider region
locality at startup.
Each table in the movr
database contains a city
column, which signals a location for each row of data. If a user is registered in New York, their row in the users
table will have a city
value of new york
. If that user takes a ride in Seattle, that ride's row in the rides
table has a city
value of seattle
.
You can partition the tables of the movr
database, based on the row's city
value.
For example:
> PARTITION BY LIST (city) (
PARTITION us_west VALUES IN (('seattle'), ('san francisco'), ('los angeles')),
PARTITION us_east VALUES IN (('new york'), ('boston'), ('washington dc')),
PARTITION europe_west VALUES IN (('amsterdam'), ('paris'), ('rome'))
);
After you define a partition, you can constrain it to a replication zone, using a zone constraint on the region
locality. For the users
table, this looks like:
> ALTER PARTITION europe_west OF INDEX movr.public.users@primary CONFIGURE ZONE USING
constraints = '[+region=gcp-europe-west1]';
ALTER PARTITION us_east OF INDEX movr.public.users@primary CONFIGURE ZONE USING
constraints = '[+region=gcp-us-east1]';
ALTER PARTITION us_west OF INDEX movr.public.users@primary CONFIGURE ZONE USING
constraints = '[+region=gcp-us-west1]';
For full partitioning statements for each table and secondary index, see dbinit.sql
.
See below for the CREATE TABLE
statements for each table in the database.
The users
table
Here is the CREATE TABLE
statement for the users
table:
>
CREATE TABLE IF NOT EXISTS users (
id UUID NOT NULL DEFAULT gen_random_uuid(),
city STRING NOT NULL,
first_name STRING NULL,
last_name STRING NULL,
email STRING NULL,
username STRING NULL,
password_hash STRING NULL,
is_owner BOOL NULL,
CONSTRAINT "primary" PRIMARY KEY (city ASC, id ASC),
CONSTRAINT check_city CHECK (city IN ('amsterdam','boston','los angeles','new york','paris','rome','san francisco','seattle','washington dc')),
UNIQUE INDEX users_username_key (username ASC),
FAMILY "primary" (id, city, first_name, last_name, email, username, password_hash, is_owner)
) PARTITION BY LIST (city) (
PARTITION us_west VALUES IN (('seattle'), ('san francisco'), ('los angeles')),
PARTITION us_east VALUES IN (('new york'), ('boston'), ('washington dc')),
PARTITION europe_west VALUES IN (('amsterdam'), ('paris'), ('rome'))
);
ALTER PARTITION europe_west OF INDEX movr.public.users@primary CONFIGURE ZONE USING
constraints = '[+region=gcp-europe-west1]';
ALTER PARTITION us_east OF INDEX movr.public.users@primary CONFIGURE ZONE USING
constraints = '[+region=gcp-us-east1]';
ALTER PARTITION us_west OF INDEX movr.public.users@primary CONFIGURE ZONE USING
constraints = '[+region=gcp-us-west1]'
;
Note the following:
- We want to partition this table on the
city
column. In order to partition on a column value, the column must be indexed. This table's composite primary key index is on thecity
andid
columns. Note that primary keys also imply unique andNOT NULL
constraints on the constrained column pair. - To optimize queries on partitioned data, the
city
column precedes theid
column in the primary index. This guarantees that scans on theusers
table evaluate each row'scity
first, and then itsid
. - To improve the performance of filtered query scans, it is a best practice to index columns in a
WHERE
clause.. The composite primary key ensures that queries on partitioned data and on rows filtered byid
(e.g., single-row look-ups) are optimized. To optimize queries filtered on theusername
column, there is a secondary index on theusername
column. Although explicitly stated here, CockroachDB automatically applies a unique constraint to columns that are indexed. - There is a check constraint on the
city
column, which verifies that the value of thecity
column is valid. When querying partitions, check constraints also optimize queries filtered on the constrained columns.
The vehicles
table
>
CREATE TABLE IF NOT EXISTS vehicles (
id UUID NOT NULL DEFAULT gen_random_uuid(),
city STRING NOT NULL,
type STRING NULL,
owner_id UUID NULL,
date_added DATE NULL,
status STRING NULL,
last_location STRING NULL,
color STRING NULL,
brand STRING NULL,
CONSTRAINT "primary" PRIMARY KEY (city ASC, id ASC),
CONSTRAINT check_city CHECK (city IN ('amsterdam','boston','los angeles','new york','paris','rome','san francisco','seattle','washington dc')),
CONSTRAINT fk_city_ref_users FOREIGN KEY (city, owner_id) REFERENCES users(city, id),
INDEX vehicles_auto_index_fk_city_ref_users (city ASC, owner_id ASC, status ASC) PARTITION BY LIST (city) (
PARTITION us_west VALUES IN (('seattle'), ('san francisco'), ('los angeles')),
PARTITION us_east VALUES IN (('new york'), ('boston'), ('washington dc')),
PARTITION europe_west VALUES IN (('amsterdam'), ('paris'), ('rome'))
),
FAMILY "primary" (id, city, type, owner_id, date_added, status, last_location, color, brand)
) PARTITION BY LIST (city) (
PARTITION us_west VALUES IN (('seattle'), ('san francisco'), ('los angeles')),
PARTITION us_east VALUES IN (('new york'), ('boston'), ('washington dc')),
PARTITION europe_west VALUES IN (('amsterdam'), ('paris'), ('rome'))
);
ALTER PARTITION europe_west OF INDEX movr.public.vehicles@primary CONFIGURE ZONE USING
constraints = '[+region=gcp-europe-west1]';
ALTER PARTITION us_east OF INDEX movr.public.vehicles@primary CONFIGURE ZONE USING
constraints = '[+region=gcp-us-east1]';
ALTER PARTITION us_west OF INDEX movr.public.vehicles@primary CONFIGURE ZONE USING
constraints = '[+region=gcp-us-west1]';
ALTER PARTITION europe_west OF INDEX movr.public.vehicles@vehicles_auto_index_fk_city_ref_users CONFIGURE ZONE USING
constraints = '[+region=gcp-europe-west1]';
ALTER PARTITION us_east OF INDEX movr.public.vehicles@vehicles_auto_index_fk_city_ref_users CONFIGURE ZONE USING
constraints = '[+region=gcp-us-east1]';
ALTER PARTITION us_west OF INDEX movr.public.vehicles@vehicles_auto_index_fk_city_ref_users CONFIGURE ZONE USING
constraints = '[+region=gcp-us-west1]'
;
Note the following:
- Like the
users
table, thevehicles
table has a composite primary key oncity
andid
. - The
vehicles
table has a foreign key constraint on theusers
table, for thecity
andowner_id
columns. This guarantees that a vehicle is registered to a particular user (i.e., an "owner") in the city where that user is registered. - The table has a secondary index (
vehicles_auto_index_fk_city_ref_users
) on thecity
,owner_id
, andstatus
. By default, CockroachDB creates secondary indexes for all foreign key constraints. This optimizes scans made on the foreign key columns, for foreign key enforcement. Here, we addstatus
to the secondary index, because reading and writing the status of a vehicle is a common query for the application. As mentioned in theusers
table section, it is a best practice to index columns in aWHERE
clause. We include a
PARTITION BY
statement for thevehicles_auto_index_fk_city_ref_users
index. When geo-partitioning a database, it's important to geo-partition all indexes containing partition columns. In this case, the index includescity
, so we should partition the index.After defining this partition, you also need to add a zone constraint to the partition, so that it is truly geo-partitioned. For example, for the index's
us_east
partition, we use the following statement to configure the zone:> ALTER PARTITION us_east OF INDEX movr.public.vehicles@vehicles_auto_index_fk_city_ref_users CONFIGURE ZONE USING constraints = '[+region=gcp-us-east1]';
See
dbinit.sql
for full zone configuration statements for all partitioned indexes.Like
users
, thevehicles
table also has aCHECK
constraint on thecity
row, to optimize table scans in filtered queries.
The rides
table
>
CREATE TABLE rides (
id UUID NOT NULL DEFAULT gen_random_uuid(),
city STRING NOT NULL,
vehicle_id UUID NULL,
rider_id UUID NULL,
rider_city STRING NOT NULL,
start_location STRING NULL,
end_location STRING NULL,
start_time TIMESTAMPTZ NULL,
end_time TIMESTAMPTZ NULL,
length INTERVAL NULL,
CONSTRAINT "primary" PRIMARY KEY (city ASC, id ASC),
CONSTRAINT check_city CHECK (city IN ('amsterdam','boston','los angeles','new york','paris','rome','san francisco','seattle','washington dc')),
CONSTRAINT fk_city_ref_users FOREIGN KEY (rider_city, rider_id) REFERENCES users(city, id),
CONSTRAINT fk_vehicle_city_ref_vehicles FOREIGN KEY (city, vehicle_id) REFERENCES vehicles(city, id),
INDEX rides_auto_index_fk_city_ref_users (rider_city ASC, rider_id ASC) PARTITION BY LIST (rider_city) (
PARTITION us_west VALUES IN (('seattle'), ('san francisco'), ('los angeles')),
PARTITION us_east VALUES IN (('new york'), ('boston'), ('washington dc')),
PARTITION europe_west VALUES IN (('amsterdam'), ('paris'), ('rome'))
),
INDEX rides_auto_index_fk_vehicle_city_ref_vehicles (city ASC, vehicle_id ASC) PARTITION BY LIST (city) (
PARTITION us_west VALUES IN (('seattle'), ('san francisco'), ('los angeles')),
PARTITION us_east VALUES IN (('new york'), ('boston'), ('washington dc')),
PARTITION europe_west VALUES IN (('amsterdam'), ('paris'), ('rome'))
),
FAMILY "primary" (id, city, rider_id, rider_city, vehicle_id, start_location, end_location, start_time, end_time, length)
) PARTITION BY LIST (city) (
PARTITION us_west VALUES IN (('seattle'), ('san francisco'), ('los angeles')),
PARTITION us_east VALUES IN (('new york'), ('boston'), ('washington dc')),
PARTITION europe_west VALUES IN (('amsterdam'), ('paris'), ('rome'))
);
ALTER PARTITION europe_west OF INDEX movr.public.rides@primary CONFIGURE ZONE USING
constraints = '[+region=gcp-europe-west1]';
ALTER PARTITION us_east OF INDEX movr.public.rides@primary CONFIGURE ZONE USING
constraints = '[+region=gcp-us-east1]';
ALTER PARTITION us_west OF INDEX movr.public.rides@primary CONFIGURE ZONE USING
constraints = '[+region=gcp-us-west1]';
ALTER PARTITION europe_west OF INDEX movr.public.rides@rides_auto_index_fk_city_ref_users CONFIGURE ZONE USING
constraints = '[+region=gcp-europe-west1]';
ALTER PARTITION us_east OF INDEX movr.public.rides@rides_auto_index_fk_city_ref_users CONFIGURE ZONE USING
constraints = '[+region=gcp-us-east1]';
ALTER PARTITION us_west OF INDEX movr.public.rides@rides_auto_index_fk_city_ref_users CONFIGURE ZONE USING
constraints = '[+region=gcp-us-west1]';
ALTER PARTITION europe_west OF INDEX movr.public.rides@rides_auto_index_fk_vehicle_city_ref_vehicles CONFIGURE ZONE USING
constraints = '[+region=gcp-europe-west1]';
ALTER PARTITION us_east OF INDEX movr.public.rides@rides_auto_index_fk_vehicle_city_ref_vehicles CONFIGURE ZONE USING
constraints = '[+region=gcp-us-east1]';
ALTER PARTITION us_west OF INDEX movr.public.rides@rides_auto_index_fk_vehicle_city_ref_vehicles CONFIGURE ZONE USING
constraints = '[+region=gcp-us-west1]'
;
Note the following:
- Like the
users
andvehicles
tables, therides
table has a composite primary key oncity
andid
. - Like the
vehicles
table, therides
table has foreign key constraints. These constraints are on theusers
and thevehicles
tables. - The foreign key indexes are partitioned. These partitions need to be zone-constrained like the other partitions. For full zone configuration statements, see
dbinit.sql
. - Like
users
andvehicles
, the table has aCHECK
constraint on thecity
row, to optimize filtered queries.
Next steps
Now that you are familiar with the movr
schema, set up a development environment for a multi-region application.