This page shows you how to orchestrate the deployment and management of a secure 3-node CockroachDB cluster with Kubernetes, using the StatefulSet feature.
If you are only testing CockroachDB, or you are not concerned with protecting network communication with TLS encryption, you can use an insecure cluster instead. Select Insecure above for instructions.
Before You Begin
Before getting started, it's helpful to review some Kubernetes-specific terminology and current limitations.
Kubernetes Terminology
Feature | Description |
---|---|
instance | A physical or virtual machine. In this tutorial, you'll create GCE or AWS instances and join them into a single Kubernetes cluster from your local workstation. |
pod | A pod is a group of one or more Docker containers. In this tutorial, each pod will run on a separate instance and include one Docker container running a single CockroachDB node. You'll start with 3 pods and grow to 4. |
StatefulSet | A StatefulSet is a group of pods treated as stateful units, where each pod has distinguishable network identity and always binds back to the same persistent storage on restart. StatefulSets are considered stable as of Kubernetes version 1.9 after reaching beta in version 1.5. |
persistent volume | A persistent volume is a piece of networked storage (Persistent Disk on GCE, Elastic Block Store on AWS) mounted into a pod. The lifetime of a persistent volume is decoupled from the lifetime of the pod that's using it, ensuring that each CockroachDB node binds back to the same storage on restart. This tutorial assumes that dynamic volume provisioning is available. When that is not the case, persistent volume claims need to be created manually. |
CSR | A CSR, or Certificate Signing Request, is a request to have a TLS certificate signed by the Kubernetes cluster's built-in CA. As each pod is created, it issues a CSR for the CockroachDB node running in the pod, which must be manually checked and approved. The same is true for clients as they connect to the cluster. |
RBAC | RBAC, or Role-Based Access Control, is the system Kubernetes uses to manage permissions within the cluster. In order to take an action (e.g., get or create ) on an API resource (e.g., a pod or CSR ), the client must have a Role that allows it to do so. This tutorial creates the RBAC resources necessary for CockroachDB to create and access certificates. |
Limitations
Kubernetes version
Kubernetes 1.18 or higher is required in order to use our most up-to-date configuration files. Earlier Kubernetes releases do not support some of the options used in our configuration files. If you need to run on an older version of Kubernetes, we have kept around configuration files that work on older Kubernetes releases in the versioned subdirectories of https://github.com/cockroachdb/cockroach/tree/master/cloud/kubernetes (e.g., v1.7).
Storage
At this time, orchestrations of CockroachDB with Kubernetes use external persistent volumes that are often replicated by the provider. Because CockroachDB already replicates data automatically, this additional layer of replication is unnecessary and can negatively impact performance. High-performance use cases on a private Kubernetes cluster may want to consider using local volumes.
Step 1. Choose your deployment environment
Choose whether you want to orchestrate CockroachDB with Kubernetes using the hosted Google Kubernetes Engine (GKE) service or manually on Google Compute Engine (GCE) or AWS. The instructions below will change slightly depending on your choice.
Step 2. Start Kubernetes
Complete the Before You Begin steps described in the Google Kubernetes Engine Quickstart documentation.
This includes installing
gcloud
, which is used to create and delete Kubernetes Engine clusters, andkubectl
, which is the command-line tool used to manage Kubernetes from your workstation.Tip:The documentation offers the choice of using Google's Cloud Shell product or using a local shell on your machine. Choose to use a local shell if you want to be able to view the CockroachDB Admin UI using the steps in this guide.From your local workstation, start the Kubernetes cluster:
$ gcloud container clusters create cockroachdb
Creating cluster cockroachdb...done.
This creates GKE instances and joins them into a single Kubernetes cluster named
cockroachdb
.The process can take a few minutes, so do not move on to the next step until you see a
Creating cluster cockroachdb...done
message and details about your cluster.Get the email address associated with your Google Cloud account:
$ gcloud info | grep Account
Account: [your.google.cloud.email@example.org]
Create the RBAC roles CockroachDB needs for running on GKE, using the address from the previous step:
$ kubectl create clusterrolebinding cluster-admin-binding --clusterrole=cluster-admin --user=<your.google.cloud.email@example.org>
clusterrolebinding "cluster-admin-binding" created
From your local workstation, install prerequisites and start a Kubernetes cluster as described in the Running Kubernetes on Google Compute Engine documentation.
The process includes:
- Creating a Google Cloud Platform account, installing
gcloud
, and other prerequisites. - Downloading and installing the latest Kubernetes release.
- Creating GCE instances and joining them into a single Kubernetes cluster.
- Installing
kubectl
, the command-line tool used to manage Kubernetes from your workstation.
From your local workstation, install prerequisites and start a Kubernetes cluster as described in the Running Kubernetes on AWS EC2 documentation.
Step 3. Start CockroachDB nodes
From your local workstation, use our cockroachdb-statefulset-secure.yaml
file to create the StatefulSet that automatically creates 3 pods, each with a CockroachDB node running inside it:
$ kubectl create -f https://raw.githubusercontent.com/cockroachdb/cockroach/master/cloud/kubernetes/cockroachdb-statefulset-secure.yaml
serviceaccount "cockroachdb" created
role "cockroachdb" created
clusterrole "cockroachdb" created
rolebinding "cockroachdb" created
clusterrolebinding "cockroachdb" created
service "cockroachdb-public" created
service "cockroachdb" created
poddisruptionbudget "cockroachdb-budget" created
statefulset "cockroachdb" created
Step 4. Approve node certificates
As each pod is created, it issues a Certificate Signing Request, or CSR, to have the node's certificate signed by the Kubernetes CA. You must manually check and approve each node's certificates, at which point the CockroachDB node is started in the pod.
Get the name of the
Pending
CSR for pod 1:$ kubectl get csr
NAME AGE REQUESTOR CONDITION default.node.cockroachdb-0 1m system:serviceaccount:default:default Pending node-csr-0Xmb4UTVAWMEnUeGbW4KX1oL4XV_LADpkwjrPtQjlZ4 4m kubelet Approved,Issued node-csr-NiN8oDsLhxn0uwLTWa0RWpMUgJYnwcFxB984mwjjYsY 4m kubelet Approved,Issued node-csr-aU78SxyU69pDK57aj6txnevr7X-8M3XgX9mTK0Hso6o 5m kubelet Approved,Issued
If you do not see a
Pending
CSR, wait a minute and try again.Examine the CSR for pod 1:
$ kubectl describe csr default.node.cockroachdb-0
Name: default.node.cockroachdb-0 Labels: <none> Annotations: <none> CreationTimestamp: Thu, 09 Nov 2017 13:39:37 -0500 Requesting User: system:serviceaccount:default:default Status: Pending Subject: Common Name: node Serial Number: Organization: Cockroach Subject Alternative Names: DNS Names: localhost cockroachdb-0.cockroachdb.default.svc.cluster.local cockroachdb-public IP Addresses: 127.0.0.1 10.48.1.6 Events: <none>
If everything looks correct, approve the CSR for pod 1:
$ kubectl certificate approve default.node.cockroachdb-0
certificatesigningrequest "default.node.cockroachdb-0" approved
Repeat steps 1-3 for the other 2 pods.
Step 5. Initialize the cluster
Confirm that three pods are
Running
successfully:$ kubectl get pods
NAME READY STATUS RESTARTS AGE cockroachdb-0 1/1 Running 0 2m cockroachdb-1 1/1 Running 0 2m cockroachdb-2 1/1 Running 0 2m
Confirm that the persistent volumes and corresponding claims were created successfully for all three pods:
$ kubectl get persistentvolumes
NAME CAPACITY ACCESSMODES RECLAIMPOLICY STATUS CLAIM REASON AGE pvc-52f51ecf-8bd5-11e6-a4f4-42010a800002 1Gi RWO Delete Bound default/datadir-cockroachdb-0 26s pvc-52fd3a39-8bd5-11e6-a4f4-42010a800002 1Gi RWO Delete Bound default/datadir-cockroachdb-1 27s pvc-5315efda-8bd5-11e6-a4f4-42010a800002 1Gi RWO Delete Bound default/datadir-cockroachdb-2 27s
Use our
cluster-init-secure.yaml
file to perform a one-time initialization that joins the nodes into a single cluster:$ kubectl create -f https://raw.githubusercontent.com/cockroachdb/cockroach/master/cloud/kubernetes/cluster-init-secure.yaml
job "cluster-init-secure" created
Approve the CSR for the one-off pod from which cluster initialization happens:
$ kubectl certificate approve default.client.root
certificatesigningrequest "default.client.root" approved
Confirm that cluster initialization has completed successfully:
$ kubectl get job cluster-init-secure
NAME DESIRED SUCCESSFUL AGE cluster-init-secure 1 1 19m
The StatefulSet configuration sets all CockroachDB nodes to log to stderr
, so if you ever need access to a pod/node's logs to troubleshoot, use kubectl logs <podname>
rather than checking the log on the persistent volume.
Step 6. Test the cluster
To use the built-in SQL client, you need to launch a pod that runs indefinitely with the cockroach
binary inside it, check and approve the CSR for the pod, get a shell into the pod, and then start the built-in SQL client.
From your local workstation, use our
client-secure.yaml
file to launch a pod and keep it running indefinitely:$ kubectl create -f https://raw.githubusercontent.com/cockroachdb/cockroach/master/cloud/kubernetes/client-secure.yaml
pod "cockroachdb-client-secure" created
The pod uses the
root
client certificate created earlier to initialize the cluster, so there's no CSR approval required.Get a shell into the pod and start the CockroachDB built-in SQL client:
$ kubectl exec -it cockroachdb-client-secure -- ./cockroach sql --certs-dir=/cockroach-certs --host=cockroachdb-public
# Welcome to the cockroach SQL interface. # All statements must be terminated by a semicolon. # To exit: CTRL + D. # # Server version: CockroachDB CCL v1.1.2 (linux amd64, built 2017/11/02 19:32:03, go1.8.3) (same version as client) # Cluster ID: 3292fe08-939f-4638-b8dd-848074611dba # # Enter \? for a brief introduction. # root@cockroachdb-public:26257/>
Run some basic CockroachDB SQL statements:
> CREATE DATABASE bank;
> CREATE TABLE bank.accounts (id INT PRIMARY KEY, balance DECIMAL);
> INSERT INTO bank.accounts VALUES (1, 1000.50);
> SELECT * FROM bank.accounts;
+----+---------+ | id | balance | +----+---------+ | 1 | 1000.5 | +----+---------+ (1 row)
Exit the SQL shell and pod:
> \q
cockroach
client commands, such as cockroach node
or cockroach zone
, repeat step 2 using the appropriate cockroach
command.If you'd prefer to delete the pod and recreate it when needed, run
kubectl delete pod cockroachdb-client-secure
Step 7. Monitor the cluster
To access the cluster's Admin UI:
Port-forward from your local machine to one of the pods:
$ kubectl port-forward cockroachdb-0 8080
Forwarding from 127.0.0.1:8080 -> 8080
Note:Theport-forward
command must be run on the same machine as the web browser in which you want to view the Admin UI. If you have been running these commands from a cloud instance or other non-local shell, you will not be able to view the UI without configuringkubectl
locally and running the aboveport-forward
command on your local machine.Go to https://localhost:8080.
In the UI, verify that the cluster is running as expected:
- Click View nodes list on the right to ensure that all nodes successfully joined the cluster.
- Click the Databases tab on the left to verify that
bank
is listed.
Step 8. Simulate node failure
Based on the replicas: 3
line in the StatefulSet configuration, Kubernetes ensures that three pods/nodes are running at all times. When a pod/node fails, Kubernetes automatically creates another pod/node with the same network identity and persistent storage.
To see this in action:
Terminate one of the CockroachDB nodes:
$ kubectl delete pod cockroachdb-2
pod "cockroachdb-2" deleted
In the Admin UI, the Summary panel will soon show one node as Suspect. As Kubernetes auto-restarts the node, watch how the node once again becomes healthy.
Back in the terminal, verify that the pod was automatically restarted:
$ kubectl get pod cockroachdb-2
NAME READY STATUS RESTARTS AGE cockroachdb-2 1/1 Running 0 12s
Step 9. Scale the cluster
The Kubernetes cluster we created contains 3 nodes that pods can be run on. To ensure that you do not have two pods on the same node (as recommended in our production best practices), you need to add a new node and then edit your StatefulSet configuration to add another pod.
Add a worker node:
- On GKE, resize your cluster.
- On GCE, resize your Managed Instance Group.
- On AWS, resize your Auto Scaling Group.
Use the
kubectl scale
command to add a pod to your StatefulSet:$ kubectl scale statefulset cockroachdb --replicas=4
statefulset "cockroachdb" scaled
Get the name of the
Pending
CSR for the new pod:$ kubectl get csr
NAME AGE REQUESTOR CONDITION default.client.root 1h system:serviceaccount:default:default Approved,Issued default.node.cockroachdb-0 1h system:serviceaccount:default:default Approved,Issued default.node.cockroachdb-1 1h system:serviceaccount:default:default Approved,Issued default.node.cockroachdb-2 1h system:serviceaccount:default:default Approved,Issued default.node.cockroachdb-3 2m system:serviceaccount:default:default Pending node-csr-0Xmb4UTVAWMEnUeGbW4KX1oL4XV_LADpkwjrPtQjlZ4 1h kubelet Approved,Issued node-csr-NiN8oDsLhxn0uwLTWa0RWpMUgJYnwcFxB984mwjjYsY 1h kubelet Approved,Issued node-csr-aU78SxyU69pDK57aj6txnevr7X-8M3XgX9mTK0Hso6o 1h kubelet Approved,Issued
If you do not see a
Pending
CSR, wait a minute and try again.Examine the CSR for the new pod:
$ kubectl describe csr default.node.cockroachdb-3
Name: default.node.cockroachdb-0 Labels: <none> Annotations: <none> CreationTimestamp: Thu, 09 Nov 2017 13:39:37 -0500 Requesting User: system:serviceaccount:default:default Status: Pending Subject: Common Name: node Serial Number: Organization: Cockroach Subject Alternative Names: DNS Names: localhost cockroachdb-0.cockroachdb.default.svc.cluster.local cockroachdb-public IP Addresses: 127.0.0.1 10.48.1.6 Events: <none>
If everything looks correct, approve the CSR for the new pod:
$ kubectl certificate approve default.node.cockroachdb-3
certificatesigningrequest "default.node.cockroachdb-3" approved
Verify that the new pod started successfully:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE cockroachdb-0 1/1 Running 0 51m cockroachdb-1 1/1 Running 0 47m cockroachdb-2 1/1 Running 0 3m cockroachdb-3 1/1 Running 0 1m cockroachdb-client-secure 1/1 Running 0 15m
Back in the Admin UI, click View nodes list on the right to ensure that the fourth node successfully joined the cluster.
Step 10. Upgrade the cluster
As new versions of CockroachDB are released, it's strongly recommended to upgrade to newer versions in order to pick up bug fixes, performance improvements, and new features. The general CockroachDB upgrade documentation provides best practices for how to prepare for and execute upgrades of CockroachDB clusters, but the mechanism of actually stopping and restarting processes in Kubernetes is somewhat special.
Kubernetes knows how to carry out a safe rolling upgrade process of the CockroachDB nodes. When you tell it to change the Docker image used in the CockroachDB StatefulSet, Kubernetes will go one-by-one, stopping a node, restarting it with the new image, and waiting for it to be ready to receive client requests before moving on to the next one. For more information, see the Kubernetes documentation.
All that it takes to kick off this process is changing the desired Docker image. To do so, pick the version that you want to upgrade to, then run the following command, replacing "VERSION" with your desired new version:
$ kubectl patch statefulset cockroachdb --type='json' -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/image", "value":"cockroachdb/cockroach:VERSION"}]'
statefulset "cockroachdb" patched
If you then check the status of your cluster's pods, you should see one of them being restarted:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE cockroachdb-0 1/1 Running 0 2m cockroachdb-1 1/1 Running 0 2m cockroachdb-2 1/1 Running 0 2m cockroachdb-3 0/1 Terminating 0 1m
This will continue until all of the pods have restarted and are running the new image. To check the image of each pod to determine whether they've all be upgraded, run:
$ kubectl get pods -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[0].image}{"\n"}'
cockroachdb-0 cockroachdb/cockroach:v1.1.9 cockroachdb-1 cockroachdb/cockroach:v1.1.9 cockroachdb-2 cockroachdb/cockroach:v1.1.9 cockroachdb-3 cockroachdb/cockroach:v1.1.9
If this was an upgrade between minor or major versions (e.g., between v1.0.x and v1.1.y or between v1.1.y and v2.0.z), then you'll want to finalize the upgrade if you're happy with the new version. Assuming you upgraded to the v1.1 minor version, you'd run:
$ kubectl exec -it cockroachdb-client-secure -- ./cockroach sql --certs-dir=/cockroach-certs --host=cockroachdb-public -e "SET CLUSTER SETTING version = '1.1';"
SET CLUSTER SETTING
Step 11. Stop the cluster
To shut down the CockroachDB cluster:
Delete all of the resources associated with the
cockroachdb
label, including the logs and remote persistent volumes:$ kubectl delete pods,statefulsets,services,persistentvolumeclaims,persistentvolumes,poddisruptionbudget,jobs,rolebinding,clusterrolebinding,role,clusterrole,serviceaccount -l app=cockroachdb
pod "cockroachdb-0" deleted pod "cockroachdb-1" deleted pod "cockroachdb-2" deleted pod "cockroachdb-3" deleted statefulset "cockroachdb" deleted service "cockroachdb" deleted service "cockroachdb-public" deleted persistentvolumeclaim "datadir-cockroachdb-0" deleted persistentvolumeclaim "datadir-cockroachdb-1" deleted persistentvolumeclaim "datadir-cockroachdb-2" deleted persistentvolumeclaim "datadir-cockroachdb-3" deleted poddisruptionbudget "cockroachdb-budget" deleted job "cluster-init-secure" deleted rolebinding "cockroachdb" deleted clusterrolebinding "cockroachdb" deleted role "cockroachdb" deleted clusterrole "cockroachdb" deleted serviceaccount "cockroachdb" deleted
Delete the pod created for
cockroach
client commands, if you didn't do so earlier:$ kubectl delete pod cockroachdb-client-secure
pod "cockroachdb-client-secure" deleted
Get the names of the CSRs for the cluster:
$ kubectl get csr
NAME AGE REQUESTOR CONDITION default.client.root 1h system:serviceaccount:default:default Approved,Issued default.node.cockroachdb-0 1h system:serviceaccount:default:default Approved,Issued default.node.cockroachdb-1 1h system:serviceaccount:default:default Approved,Issued default.node.cockroachdb-2 1h system:serviceaccount:default:default Approved,Issued default.node.cockroachdb-3 12m system:serviceaccount:default:default Approved,Issued node-csr-0Xmb4UTVAWMEnUeGbW4KX1oL4XV_LADpkwjrPtQjlZ4 1h kubelet Approved,Issued node-csr-NiN8oDsLhxn0uwLTWa0RWpMUgJYnwcFxB984mwjjYsY 1h kubelet Approved,Issued node-csr-aU78SxyU69pDK57aj6txnevr7X-8M3XgX9mTK0Hso6o 1h kubelet Approved,Issued
Delete the CSRs that you created:
$ kubectl delete csr default.client.root default.node.cockroachdb-0 default.node.cockroachdb-1 default.node.cockroachdb-2 default.node.cockroachdb-3
certificatesigningrequest "default.client.root" deleted certificatesigningrequest "default.node.cockroachdb-0" deleted certificatesigningrequest "default.node.cockroachdb-1" deleted certificatesigningrequest "default.node.cockroachdb-2" deleted certificatesigningrequest "default.node.cockroachdb-3" deleted
Get the names of the secrets for the cluster:
$ kubectl get secrets
NAME TYPE DATA AGE default-token-f3b4d kubernetes.io/service-account-token 3 1h default.client.root Opaque 2 1h default.node.cockroachdb-0 Opaque 2 1h default.node.cockroachdb-1 Opaque 2 1h default.node.cockroachdb-2 Opaque 2 1h default.node.cockroachdb-3 Opaque 2 16m
Delete the secrets that you created:
$ kubectl delete secrets default.client.root default.node.cockroachdb-0 default.node.cockroachdb-1 default.node.cockroachdb-2 default.node.cockroachdb-3
secret "default.client.root" deleted secret "default.node.cockroachdb-0" deleted secret "default.node.cockroachdb-1" deleted secret "default.node.cockroachdb-2" deleted secret "default.node.cockroachdb-3" deleted
Stop Kubernetes:
$ gcloud container clusters delete cockroachdb
$ cluster/kube-down.sh
$ cluster/kube-down.sh
Warning:If you stop Kubernetes without first deleting the persistent volumes, they will still exist in your cloud project.