Geo-Partitioned Leaseholders Topology

On this page Carat arrow pointing down
Warning:
As of May 10, 2022, CockroachDB v20.2 is no longer supported. For more details, refer to the Release Support Policy.

In a multi-region deployment, the geo-partitioned leaseholders topology is a good choice for tables with the following requirements:

  • Read latency must be low, but write latency can be higher.
  • Reads must be up-to-date for business reasons or because the table is referenced by foreign keys.
  • Rows in the table, and all latency-sensitive queries, can be tied to specific geographies, e.g., city, state, region.
  • Table data must remain available during a region failure.
Tip:

See It In Action - Read about how a large telecom provider with millions of customers across the United States is using the Geo-Partitioned Leaseholders topology in production for strong resiliency and performance.

Prerequisites

Fundamentals

  • Multi-region topology patterns are almost always table-specific. If you haven't already, review the full range of patterns to ensure you choose the right one for each of your tables.
  • Review how data is replicated and distributed across a cluster, and how this affects performance. It is especially important to understand the concept of the "leaseholder". For a summary, see Reads and Writes in CockroachDB. For a deeper dive, see the CockroachDB Architecture documentation.
  • Review the concept of locality, which makes CockroachDB aware of the location of nodes and able to intelligently place and balance data based on how you define replication controls.
  • Review the recommendations and requirements in our Production Checklist.
  • This topology doesn't account for hardware specifications, so be sure to follow our hardware recommendations and perform a POC to size hardware for your use case.
  • Adopt relevant SQL Best Practices to ensure optimal performance.

Cluster setup

Each multi-region topology pattern assumes the following setup:

Multi-region hardware setup

Hardware

  • 3 regions

  • Per region, 3+ AZs with 3+ VMs evenly distributed across them

  • Region-specific app instances and load balancers

    • Each load balancer redirects to CockroachDB nodes in its region.
    • When CockroachDB nodes are unavailable in a region, the load balancer redirects to nodes in other regions.

Cluster

Each node is started with the --locality flag specifying its region and AZ combination. For example, the following command starts a node in the west1 AZ of the us-west region:

icon/buttons/copy
$ cockroach start \
--locality=region=us-west,zone=west1 \
--certs-dir=certs \
--advertise-addr=<node1 internal address> \
--join=<node1 internal address>:26257,<node2 internal address>:26257,<node3 internal address>:26257 \        
--cache=.25 \
--max-sql-memory=.25 \
--background

Configuration

Note:

Geo-partitioning requires an Enterprise license.

Summary

Using this pattern, you design your table schema to allow for partitioning, with a column identifying geography as the first column in the table's compound primary key (e.g., city/id). You tell CockroachDB to partition the table and all of its secondary indexes by that geography column, each partition becoming its own range of 3 replicas. You then tell CockroachDB to put the leaseholder for each partition in the relevant region (e.g., LA partitions in us-west, NY partitions in us-east). The other replicas of a partition remain balanced across the other regions. This means that reads in each region will access local leaseholders and, therefore, will have low, intra-region latencies. Writes, however, will leave the region to get consensus and, therefore, will have higher, cross-region latencies.

Geo-partitioned leaseholders topology

Steps

Assuming you have a cluster deployed across three regions and a table and secondary index like the following:

icon/buttons/copy
> CREATE TABLE users (
    id UUID NOT NULL DEFAULT gen_random_uuid(),
    city STRING NOT NULL,
    first_name STRING NOT NULL,
    last_name STRING NOT NULL,
    address STRING NOT NULL,
    PRIMARY KEY (city ASC, id ASC)
);
icon/buttons/copy
> CREATE INDEX users_last_name_index ON users (city, last_name);
  1. If you do not already have one, request a trial Enterprise license.

  2. Partition the table by city. For example, assuming there are three possible city values, los angeles, chicago, and new york:

    icon/buttons/copy
    > ALTER TABLE users PARTITION BY LIST (city) (
        PARTITION la VALUES IN ('los angeles'),
        PARTITION chicago VALUES IN ('chicago'),
        PARTITION ny VALUES IN ('new york')
    );
    

    This creates distinct ranges for each partition of the table.

  3. Partition the secondary index by city as well:

    icon/buttons/copy
    > ALTER INDEX users_last_name_index PARTITION BY LIST (city) (
        PARTITION la VALUES IN ('los angeles'),
        PARTITION chicago VALUES IN ('chicago'),
        PARTITION ny VALUES IN ('new york')
    );
    

    This creates distinct ranges for each partition of the secondary index.

  4. For each partition of the table and its secondary index, create a replication zone that tells CockroachDB to put the partition's leaseholder in the relevant region:

    icon/buttons/copy
    > ALTER PARTITION la OF INDEX users@*
        CONFIGURE ZONE USING
          num_replicas = 3,
          constraints = '{"+region=us-west":1}',
          lease_preferences = '[[+region=us-west]]';
      ALTER PARTITION chicago OF INDEX users@*
        CONFIGURE ZONE USING
          num_replicas = 3,
          constraints = '{"+region=us-central":1}',
          lease_preferences = '[[+region=us-central]]';
      ALTER PARTITION ny OF INDEX users@*
        CONFIGURE ZONE USING
          num_replicas = 3,
          constraints = '{"+region=us-east":1}',
          lease_preferences = '[[+region=us-east]]';
    
  5. To confirm that partitions are in effect, you can use the SHOW CREATE TABLE or SHOW PARTITIONS statement:

    icon/buttons/copy
    > SHOW CREATE TABLE users;
    
      table_name |                                          create_statement
    +------------+----------------------------------------------------------------------------------------------------+
      users      | CREATE TABLE users (
                 |     id UUID NOT NULL DEFAULT gen_random_uuid(),
                 |     city STRING NOT NULL,
                 |     first_name STRING NOT NULL,
                 |     last_name STRING NOT NULL,
                 |     address STRING NOT NULL,
                 |     CONSTRAINT "primary" PRIMARY KEY (city ASC, id ASC),
                 |     INDEX users_last_name_index (city ASC, last_name ASC) PARTITION BY LIST (city) (
                 |         PARTITION la VALUES IN (('los angeles')),
                 |         PARTITION chicago VALUES IN (('chicago')),
                 |         PARTITION ny VALUES IN (('new york'))
                 |     ),
                 |     FAMILY "primary" (id, city, first_name, last_name, address)
                 | ) PARTITION BY LIST (city) (
                 |     PARTITION la VALUES IN (('los angeles')),
                 |     PARTITION chicago VALUES IN (('chicago')),
                 |     PARTITION ny VALUES IN (('new york'))
                 | );
                 | ALTER PARTITION chicago OF INDEX defaultdb.public.users@primary CONFIGURE ZONE USING
                 |     num_replicas = 3,
                 |     constraints = '{+region=us-central: 1}',
                 |     lease_preferences = '[[+region=us-central]]';
                 | ALTER PARTITION la OF INDEX defaultdb.public.users@primary CONFIGURE ZONE USING
                 |     num_replicas = 3,
                 |     constraints = '{+region=us-west: 1}',
                 |     lease_preferences = '[[+region=us-west]]';
                 | ALTER PARTITION ny OF INDEX defaultdb.public.users@primary CONFIGURE ZONE USING
                 |     num_replicas = 3,
                 |     constraints = '{+region=us-east: 1}',
                 |     lease_preferences = '[[+region=us-east]]';
                 | ALTER PARTITION chicago OF INDEX defaultdb.public.users@users_last_name_index CONFIGURE ZONE USING
                 |     num_replicas = 3,
                 |     constraints = '{+region=us-central1: 1}',
                 |     lease_preferences = '[[+region=us-central1]]';
                 | ALTER PARTITION la OF INDEX defaultdb.public.users@users_last_name_index CONFIGURE ZONE USING
                 |     num_replicas = 3,
                 |     constraints = '{+region=us-west: 1}',
                 |     lease_preferences = '[[+region=us-west]]';
                 | ALTER PARTITION ny OF INDEX defaultdb.public.users@users_last_name_index CONFIGURE ZONE USING
                 |     num_replicas = 3,
                 |     constraints = '{+region=us-east: 1}',
                 |     lease_preferences = '[[+region=us-east]]'
    (1 row)
    
    icon/buttons/copy
    > SHOW PARTITIONS FROM TABLE users;
    
      database_name | table_name | partition_name | parent_partition | column_names |         index_name          | partition_value |                  zone_config
    +---------------+------------+----------------+------------------+--------------+-----------------------------+-----------------+-----------------------------------------------+
      defaultdb     | users      | la             | NULL             | city         | users@primary               | ('los angeles') | num_replicas = 3,
                    |            |                |                  |              |                             |                 | constraints = '{+region=us-west1: 1}',
                    |            |                |                  |              |                             |                 | lease_preferences = '[[+region=us-west1]]'
      defaultdb     | users      | chicago        | NULL             | city         | users@primary               | ('chicago')     | num_replicas = 3,
                    |            |                |                  |              |                             |                 | constraints = '{+region=us-central1: 1}',
                    |            |                |                  |              |                             |                 | lease_preferences = '[[+region=us-central1]]'
      defaultdb     | users      | ny             | NULL             | city         | users@primary               | ('new york')    | num_replicas = 3,
                    |            |                |                  |              |                             |                 | constraints = '{+region=us-east1: 1}',
                    |            |                |                  |              |                             |                 | lease_preferences = '[[+region=us-east1]]'
      defaultdb     | users      | la             | NULL             | city         | users@users_last_name_index | ('los angeles') | num_replicas = 3,
                    |            |                |                  |              |                             |                 | constraints = '{+region=us-west1: 1}',
                    |            |                |                  |              |                             |                 | lease_preferences = '[[+region=us-west1]]'
      defaultdb     | users      | chicago        | NULL             | city         | users@users_last_name_index | ('chicago')     | num_replicas = 3,
                    |            |                |                  |              |                             |                 | constraints = '{+region=us-central1: 1}',
                    |            |                |                  |              |                             |                 | lease_preferences = '[[+region=us-central1]]'
      defaultdb     | users      | ny             | NULL             | city         | users@users_last_name_index | ('new york')    | num_replicas = 3,
                    |            |                |                  |              |                             |                 | constraints = '{+region=us-east1: 1}',
                    |            |                |                  |              |                             |                 | lease_preferences = '[[+region=us-east1]]'
    (6 rows)
    
Tip:

As you scale and add more cities, you can repeat steps 2 and 3 with the new complete list of cities to re-partition the table and its secondary indexes, and then repeat step 4 to create replication zones for the new partitions.

Tip:

In testing, scripting, and other programmatic environments, we recommend querying the crdb_internal.partitions internal table for partition information instead of using the SHOW PARTITIONS statement. For more information, see Querying partitions programmatically.

Characteristics

Latency

Reads

Because each partition's leaseholder is constrained to the relevant region (e.g., the la partitions' leaseholders are located in the us-west region), reads that specify the local region key access the relevant leaseholder locally. This makes read latency very low, with the exception of reads that do not specify a region key or that refer to a partition in another region.

For example, in the animation below:

  1. The read request in us-west reaches the regional load balancer.
  2. The load balancer routes the request to a gateway node.
  3. The gateway node routes the request to the leaseholder for the relevant partition.
  4. The leaseholder retrieves the results and returns to the gateway node.
  5. The gateway node returns the results to the client.

Geo-partitioned leaseholders topology

Writes

Just like for reads, because each partition's leaseholder is constrained to the relevant region (e.g., the la partitions' leaseholders are located in the us-west region), writes that specify the local region key access the relevant leaseholder replicas locally. However, a partition's other replicas are spread across the other regions, so writes involve multiple network hops across regions to achieve consensus. This increases write latency significantly.

For example, in the animation below:

  1. The write request in us-west reaches the regional load balancer.
  2. The load balancer routes the request to a gateway node.
  3. The gateway node routes the request to the leaseholder replicas for the relevant table and secondary index partitions.
  4. While each leaseholder appends the write to its Raft log, it notifies its follower replicas, which are in the other regions.
  5. In each case, as soon as one follower has appended the write to its Raft log (and thus a majority of replicas agree based on identical Raft logs), it notifies the leaseholder and the write is committed on the agreeing replicas.
  6. The leaseholders then return acknowledgement of the commit to the gateway node.
  7. The gateway node returns the acknowledgement to the client.

Geo-partitioned leaseholders topology

Resiliency

Because this pattern balances the replicas for each partition across regions, one entire region can fail without interrupting access to any partitions. In this case, if any range loses its leaseholder in the region-wide outage, CockroachDB makes one of the range's other replicas the leaseholder:

Geo-partitioning topology

Alternatives

  • If reads from a table can be historical (4.8 seconds or more in the past), consider the Follower Reads pattern.
  • If rows in the table, and all latency-sensitive queries, cannot be tied to specific geographies, consider the Duplicate Indexes pattern.

See also


Yes No
On this page

Yes No