This page shows you how to reproduce CockroachDB's TPC-C performance benchmarking results on commodity AWS hardware. Across all scales, CockroachDB can process tpmC (new order transactions per minute) at near maximum efficiency. Start by choosing the scale you're interested in:
Warehouses | Data size | Cluster size |
---|---|---|
10 | 2GB | 3 nodes on your laptop |
1000 | 80GB | 3 nodes on c5d.4xlarge machines |
10,000 | 800GB | 15 nodes on c5d.4xlarge machines |
100,000 | 8TB | 81 nodes on c5d.9xlarge machines |
Before you begin
Review TPC-C concepts
TPC-C provides the most realistic and objective measure for OLTP performance at various scale factors. Before you get started, consider reviewing what TPC-C is and how it is measured.
Request a trial license
Reproducing CockroachDB's 100,000 warehouse TPC-C results involves using CockroachDB's partitioning feature to ensure replicas for any given section of data are located on the same nodes that will be queried by the load generator for that section of data. Partitioning helps distribute the workload evenly across the cluster.
The partitioning feature requires an Enterprise license, so request a 30-day trial license before you get started.
You should receive your trial license via email within a few minutes. You'll enable your license once your cluster is up-and-running.
Step 1. Set up the environment
Provision VMs
Create 86 VM instances, 81 for CockroachDB nodes and 5 for the TPC-C workload.
- Create all instances in the same region and the same security group.
- Use the
c5d.9xlarge
machine type. - Use local SSD instance store volumes. Local SSDs are low latency disks attached to each VM, which maximizes performance. This configuration best resembles what a bare metal deployment would look like, with machines directly connected to one physical disk each. We do not recommend using network-attached block storage.
Note the internal IP address of each instance. You'll need these addresses when starting the CockroachDB nodes.
This configuration is intended for performance benchmarking only. For production deployments, there are other important considerations, such as security, load balancing, and data location techniques to minimize network latency. For more details, see the Production Checklist.
Configure your network
CockroachDB requires TCP communication on two ports:
26257
for inter-node communication (i.e., working as a cluster) and for the TPC-C workload to connect to nodes8080
for exposing your Admin UI
Create inbound rules for your security group:
Inter-node and TPCC-to-node communication
Field | Recommended Value |
---|---|
Type | Custom TCP Rule |
Protocol | TCP |
Port Range | 26257 |
Source | The name of your security group (e.g., sg-07ab277a) |
Admin UI
Field | Recommended Value |
---|---|
Type | Custom TCP Rule |
Protocol | TCP |
Port Range | 8080 |
Source | Your network's IP ranges |
Step 2. Start CockroachDB
SSH to the first VM where you want to run a CockroachDB node.
Download the CockroachDB archive for Linux, extract the binary, and copy it into the
PATH
:$ curl https://binaries.cockroachdb.com/cockroach-v19.2.12.linux-amd64.tgz \ | tar -xz
$ cp -i cockroach-v19.2.12.linux-amd64/cockroach /usr/local/bin/
If you get a permissions error, prefix the command with
sudo
.Run the
cockroach start
command:$ cockroach start \ --insecure \ --advertise-addr=<node1 internal address> \ --join=<node1 internal address>,<node2 internal address>,<node3 internal address> \ --cache=.25 \ --max-sql-memory=.25 \ --locality=rack=0 \ --background
Each node will start with a locality that includes an artificial "rack number" (e.g.,
--locality=rack=0
). Use 27 racks for 81 nodes so that 3 nodes will be assigned to each rack.Repeat steps 1 - 3 for the other 80 VMs for CockroachDB nodes. Each time, be sure to:
- Adjust the
--advertise-addr
flag. - Set the
--locality
flag to the appropriate "rack number", as described above.
- Adjust the
On any of the VMs with the
cockroach
binary, run the one-timecockroach init
command to join the first nodes into a cluster:$ cockroach init --insecure --host=<address of any node>
Step 3. Configure the cluster
You'll be importing a large TPC-C data set. To speed that up, you can temporarily disable replication and tweak some cluster settings. You'll also need to enable the enterprise license you requested earlier.
SSH to any VM with the
cockroach
binary.Launch the built-in SQL shell:
$ cockroach sql --insecure --host=<address of any node>
Disable replication:
> ALTER RANGE default CONFIGURE ZONE USING num_replicas = 1;
Adjust some cluster settings:
> SET CLUSTER SETTING rocksdb.ingest_backpressure.l0_file_count_threshold = 100; SET CLUSTER SETTING rocksdb.ingest_backpressure.pending_compaction_threshold = '5 GiB'; SET CLUSTER SETTING schemachanger.backfiller.max_buffer_size = '5 GiB'; SET CLUSTER SETTING kv.snapshot_rebalance.max_rate = '128 MiB'; SET CLUSTER SETTING rocksdb.min_wal_sync_interval = '500us';
Enable the trial license you requested earlier:
> SET CLUSTER SETTING cluster.organization = '<your organization>';
> SET CLUSTER SETTING enterprise.license = '<your license key>';
Exit the SQL shell:
> \q
Step 4. Import the TPC-C dataset
CockroachDB offers a pre-built workload
binary for Linux that includes the TPC-C benchmark. You'll need to put the binary on the VMs for importing the dataset and running TPC-C.
SSH to one of the VMs where you want to run TPC-C.
Download the
workload
binary for Linux and make it executable:$ wget https://edge-binaries.cockroachdb.com/cockroach/workload.LATEST -O workload; chmod 755 workload
Repeat steps 1 and 2 for the other 4 VMs where you'll run TPC-C.
On one of the VMs with the
workload
binary, import the TPC-C dataset:$ ./workload fixtures import tpcc \ --warehouses 100000 \ "postgres://root@<address of any CockroachDB node>:26257?sslmode=disable"
This will load 8TB of data for 100,000 "warehouses". This can take around 6 hours to complete.
You can monitor progress on the Jobs screen of the Admin UI. Open the Admin UI by pointing a browser to the address in the
admin
field in the standard output of any node on startup.
Step 5. Partition the database
Next, partition your database to divide all of the TPC-C tables and indexes into 27 partitions, one per rack, and then use zone configurations to pin those partitions to a particular rack.
Re-enable 3-way replication:
- SSH to any VM with the
cockroach
binary. Launch the built-in SQL shell:
$ cockroach sql --insecure --host=<address of any node>
Enable replication:
> ALTER RANGE default CONFIGURE ZONE USING num_replicas = 3;
Exit the SQL shell:
> \q
- SSH to any VM with the
On one of the VMs with the
workload
binary, briefly run TPC-C to set up partitioning:$ ulimit -n 200500 && ./workload run tpcc \ --partitions 27 \ --warehouses 100000 \ --duration 1m \ --ramp 1ms \ "postgres://root@<address of any CockroachDB node>:26257?sslmode=disable"
Wait for up-replication and partitioning to finish.
This will likely take 10s of minutes. To watch the progress, go to the Metrics > Queues > Replication Queue graph in the Admin UI. Once the Replication Queue gets to
0
for all actions and stays there, you can move on to the next step.
Step 6. Re-start CockroachDB
At the moment, running TPC-C against CockroachDB at this scale requires customizations that are not in the latest official release, so you'll need to stop the cluster, put a new binary with these patches on the 81 VMs for CockroachDB, and then restart the cluster.
SSH to the first VM where CockroachDB is running.
Stop the
cockroach
process:$ cockroach quit --insecure
Rename the old binary:
$ i="$(which cockroach)"; mv "$i" "$i"_old
Download the new binary, rename it, and copy it into the
PATH
:$ wget https://edge-binaries.cockroachdb.com/cockroach/cockroach.linux-gnu-amd64.9582884c143639d3acc85a7ba1a829fb5a5ae9dd ; chmod 755 cockroach.linux-gnu-amd64.9582884c143639d3acc85a7ba1a829fb5a5ae9dd
$ cp -i cockroach.linux-gnu-amd64.9582884c143639d3acc85a7ba1a829fb5a5ae9dd /usr/local/bin/cockroach
If you get a permissions error, prefix the command with
sudo
.Re-start the node, using the same
cockroach start
command you used the first time you started the node:$ cockroach start \ --insecure \ --advertise-addr=<node1 internal address> \ --join=<node1 internal address>,<node2 internal address>,<node3 internal address> \ --cache=.25 \ --max-sql-memory=.25 \ --locality=rack=0 \ --background
Repeat steps 1 - 5 for the other 80 VMs for CockroachDB nodes. Each time, be sure to use the
cockroach start
command you used the first time you started the node.
Step 7. Allocate partitions
Before running the benchmark, it's important to allocate partitions to workload binaries properly to ensure that the cluster is balanced.
Create an
addrs
file containing connection strings to all 81 CockroachDB nodes:postgres://root@<node 1 internal address>:26257?sslmode=disable postgres://root@<node 2 internal address>:26257?sslmode=disable postgres://root@<node 3 internal address>:26257?sslmode=disable postgres://root@<node 4 internal address>:26257?sslmode=disable ...
Upload the
addrs
file to the 5 VMs with theworkload
binary:$ scp addrs <username>@<workload instance 1 address>:.
$ scp addrs <username>@<workload instance 2 address>:.
$ scp addrs <username>@<workload instance 3 address>:.
$ scp addrs <username>@<workload instance 4 address>:.
$ scp addrs <username>@<workload instance 5 address>:.
SSH to each VM with
workload
and allocate partitions:$ ulimit -n 200500 && ./workload run tpcc \ --partitions 27 \ --warehouses 100000 \ --partition-affinity 0,5,10,15,20,25 \ --ramp 30m \ --duration 1ms \ $(cat addrs)
$ ulimit -n 200500 && ./workload run tpcc \ --partitions 27 \ --warehouses 100000 \ --partition-affinity 1,6,11,16,21,26 \ --ramp 30m \ --duration 1ms \ $(cat addrs)
$ ulimit -n 200500 && ./workload run tpcc \ --partitions 27 \ --warehouses 100000 \ --partition-affinity 2,7,12,17,22 \ --ramp 30m \ --duration 1ms \ $(cat addrs)
$ ulimit -n 200500 && ./workload run tpcc \ --partitions 27 \ --warehouses 100000 \ --partition-affinity 3,8,13,18,23 \ --ramp 30m \ --duration 1ms \ $(cat addrs)
$ ulimit -n 200500 && ./workload run tpcc \ --partitions 27 \ --warehouses 100000 \ --partition-affinity 4,9,14,19,24 \ --ramp 30m \ --duration 1ms \ $(cat addrs)
Step 8. Run the benchmark
Once the allocations finish, run TPC-C for 30 minutes on each VM with workload
:
It is critical to run the benchmark from the workload nodes in parallel, so start them as simultaneously as possible.
$ ulimit -n 200500 && ./workload run tpcc \
--partitions 27 \
--warehouses 100000 \
--partition-affinity 0,5,10,15,20,25 \
--ramp 1m \
--duration 30m \
--histograms workload1.histogram.ndjson \
$(cat addrs)
$ ulimit -n 200500 && ./workload run tpcc \
--partitions 27 \
--warehouses 100000 \
--partition-affinity 1,6,11,16,21,26 \
--ramp 1m \
--duration 30m \
--histograms workload2.histogram.ndjson \
$(cat addrs)
$ ulimit -n 200500 && ./workload run tpcc \
--partitions 27 \
--warehouses 100000 \
--partition-affinity 2,7,12,17,22 \
--ramp 1m \
--duration 30m \
--histograms workload3.histogram.ndjson \
$(cat addrs)
$ ulimit -n 200500 && ./workload run tpcc \
--partitions 27 \
--warehouses 100000 \
--partition-affinity 3,8,13,18,23 \
--ramp 1m \
--duration 30m \
--histograms workload4.histogram.ndjson \
$(cat addrs)
$ ulimit -n 200500 && ./workload run tpcc \
--partitions 27 \
--warehouses 100000 \
--partition-affinity 4,9,14,19,24 \
--ramp 1m \
--duration 30m \
--histograms workload5.histogram.ndjson \
$(cat addrs)
Step 9. Interpret the results
Collect the result files from each VM with
workload
:$ scp <username>@<workload instance 1 address>:workload1.histogram.ndjson .
$ scp <username>@<workload instance 2 address>:workload2.histogram.ndjson .
$ scp <username>@<workload instance 3 address>:workload3.histogram.ndjson .
$ scp <username>@<workload instance 4 address>:workload4.histogram.ndjson .
$ scp <username>@<workload instance 5 address>:workload5.histogram.ndjson .
Upload the result files to one of the VMs with the
workload
binary:Note:The commands below assume you're uploading to the VM with the
workload1.histogram.ndjson
file.$ scp workload2.histogram.ndjson <username>@<workload instance 2 address>:.
$ scp workload3.histogram.ndjson <username>@<workload instance 3 address>:.
$ scp workload4.histogram.ndjson <username>@<workload instance 4 address>:.
$ scp workload5.histogram.ndjson <username>@<workload instance 5 address>:.
SSH to the VM where you uploaded the results files.
Run the
workload debug tpcc-merge-results
command to synthesize the results:$ ./workload debug tpcc-merge-results \ --warehouses 100000 \ workload*.histogram.ndjson
You'll should see results similar to the following, with 1.2M tpmC, a nearly perfect score:
Duration: 1h0m0, Warehouses: 100000, Efficiency: 98.81, tpmC: 1245461.78 _elapsed___ops/sec(cum)__p50(ms)__p90(ms)__p95(ms)__p99(ms)_pMax(ms) 3600.1s 2082.1 151.0 369.1 453.0 637.5 5100.3 delivery 3600.1s 20757.7 167.8 402.7 486.5 671.1 8321.5 newOrder 3600.1s 2083.2 9.4 62.9 92.3 159.4 1073.7 orderStatus 3600.1s 20829.5 100.7 251.7 318.8 469.8 7516.2 payment 3600.1s 2082.8 29.4 100.7 142.6 234.9 103079.2 stockLevel
See also
Hardware
CockroachDB works well on commodity hardware in public cloud, private cloud, on-prem, and hybrid environments. For hardware recommendations, see our Production Checklist.
Also note that CockroachDB creates a yearly cloud report focused on evaluating hardware performance. In November 2019, we will provide metrics on AWS, GCP, and Azure. In the meantime, you can read the 2018 Cloud Report that focuses on AWS and GCP.
Performance Tuning
For guidance on tuning a real workload's performance, see SQL Best Practices, and for guidance on data location techniques to minimize network latency, see Topology Patterns.