Because CockroachDB is designed with high fault tolerance, backups are primarily needed for disaster recovery (i.e., if your cluster loses a majority of its nodes). Isolated issues (such as small-scale node outages) do not require any intervention. However, as an operational best practice, we recommend taking regular backups of your data.
There are two main types of backups:
You can use the BACKUP
statement to efficiently back up your cluster's schemas and data to popular cloud services such as AWS S3, Google Cloud Storage, or NFS, and the RESTORE
statement to efficiently restore schema and data as necessary. For more information, see Use Cloud Storage for Bulk Operations.
The BACKUP ... TO
and RESTORE ... FROM
syntax is deprecated as of v22.1 and will be removed in a future release.
We recommend using the BACKUP ... INTO {collection}
syntax, which creates or adds to a backup collection in your storage location. For restoring backups, we recommend using RESTORE FROM {backup} IN {collection}
with {backup}
being LATEST
or a specific subdirectory.
For guidance on the syntax for backups and restores, see the BACKUP
and RESTORE
examples.
You can create schedules for periodic backups in CockroachDB. We recommend using scheduled backups to automate daily backups of your cluster.
Backup collections
A backup collection defines a set of backups and their metadata. The collection can contain multiple full backups and their subsequent incremental backups. The path to a backup is created using a date-based naming scheme and stored at the URI passed with the BACKUP
statement.
In the following example, a user has taken weekly full backups and nightly incremental backups to their collectionURI
:
Collection:
|—— 2022
|—— 02
|—— 09-155340.13/
|—— Full backup files
|—— 20220210/
|—— 155530.50/
|—— Incremental backup files
|—— 20220211/
|—— 155628.07/
|—— Incremental backup files
[...]
|—— 16-143018.72/
|—— Full backup files
|—— 20220217/
|—— 155530.50/
|—— Incremental backup files
|—— 20220218/
|—— 155628.07/
|—— Incremental backup files
[...]
SHOW BACKUPS IN {collectionURI}
will display a list of the full backup subdirectories in the collection's storage location.
A locality-aware backup is a specific case where part of the collection data is stored at a different URI. The backup collection will be stored according to the URIs passed with the BACKUP
statement: BACKUP INTO LATEST IN {collectionURI}, {localityURI}, {localityURI}
. Here, the collectionURI
represents the default locality.
In the examples on this page, {collectionURI}
is a placeholder for the storage location that will contain the example backup.
Full backups
Full backups are now available to both core and Enterprise users.
Full backups contain an un-replicated copy of your data and can always be used to restore your cluster. These files are roughly the size of your data and require greater resources to produce than incremental backups. You can take full backups as of a given timestamp. Optionally, you can include the available revision history in the backup.
In most cases, it's recommended to take nightly full backups of your cluster. A cluster backup allows you to do the following:
- Restore table(s) from the cluster
- Restore database(s) from the cluster
- Restore a full cluster
Backups will export Enterprise license keys during a full cluster backup. When you restore a full cluster with an Enterprise license, it will restore the Enterprise license of the cluster you are restoring from.
To set a target for the amount of backup data written to each backup file, use the bulkio.backup.file_size
cluster setting.
See the SET CLUSTER SETTING
page for more details on using cluster settings.
Take a full backup
To perform a full cluster backup, use the BACKUP
statement:
> BACKUP INTO '{collectionURI}';
To restore a backup, use the RESTORE
statement, specifying what you want to restore as well as the collection's URI:
To restore the latest backup of a table:
> RESTORE TABLE bank.customers FROM LATEST IN '{collectionURI}';
To restore the latest backup of a database:
> RESTORE DATABASE bank FROM LATEST IN '{collectionURI}';
To restore the latest backup of your full cluster:
> RESTORE FROM LATEST IN '{collectionURI}';
Note:A full cluster restore can only be run on a target cluster that has never had user-created databases or tables.
To restore a backup from a specific subdirectory:
> RESTORE DATABASE bank FROM {subdirectory} IN '{collectionURI}';
To view the available backup subdirectories, use SHOW BACKUPS
.
Incremental backups
To take incremental backups, you need an Enterprise license.
If your cluster grows too large for nightly full backups, you can take less frequent full backups (e.g., weekly) with nightly incremental backups. Incremental backups are storage efficient and faster than full backups for larger clusters.
Incremental backups are smaller and faster to produce than full backups because they contain only the data that has changed since a base set of backups you specify (which must include one full backup, and can include many incremental backups). You can take incremental backups either as of a given timestamp or with full revision history.
Garbage collection and backups
Incremental backups with revision history are created by finding what data has been created, deleted, or modified since the timestamp of the last backup in the chain of backups. For the first incremental backup in a chain, this timestamp corresponds to the timestamp of the base (full) backup. For subsequent incremental backups, this timestamp is the timestamp of the previous incremental backup in the chain.
Garbage collection Time to Live (GC TTL) determines the period for which CockroachDB retains revisions of a key. If the GC TTL of the backup's target is shorter than the frequency at which you take incremental backups with revision history, then the revisions become susceptible to garbage collection before you have backed them up. This will cause the incremental backup with revision history to fail.
We recommend configuring the garbage collection period to be at least the frequency of incremental backups and ideally with a buffer to account for slowdowns. You can configure garbage collection periods using the ttlseconds
replication zone setting.
If an incremental backup is created outside of the garbage collection period, you will receive a protected ts verification error…
. To resolve this issue, see the Common Errors page.
Take an incremental backup
Periodically run the BACKUP
command to take a full backup of your cluster:
> BACKUP INTO '{collectionURI}';
Then, create nightly incremental backups based off of the full backups you've already created. To append an incremental backup to the most recent full backup created in the given destination, use LATEST
:
> BACKUP INTO LATEST IN '{collectionURI}';
For an example on how to specify the destination of an incremental backup, see Incremental backups with explicitly specified destinations.
If it's ever necessary, you can then use the RESTORE
command to restore your cluster, database(s), and/or table(s). Restoring from incremental backups requires previous full and incremental backups.
To restore from the most recent backup, run the following:
> RESTORE FROM LATEST IN '{collectionURI}';
To restore from a specific backup, run RESTORE
with the backup's subdirectory:
> RESTORE FROM '{subdirectory}' IN '{collectionURI}';
When you restore from an incremental backup, you're restoring the entire table, database, or cluster. CockroachDB uses both the latest (or a specific) incremental backup and the full backup during this process. You cannot restore an incremental backup without a full backup. Furthermore, it is not possible to restore over a table, database, or cluster with existing data. See Restore types for detail on the types of backups you can restore.
RESTORE
will re-validate indexes when incremental backups are created from an older version (v20.2.2 and earlier or v20.1.4 and earlier), but restored by a newer version (v21.1.0+). These earlier releases may have included incomplete data for indexes that were in the process of being created.
Incremental backups with explicitly specified destinations
To explicitly control where your incremental backups go, use the INTO {subdirectory} IN (destination)
syntax:
> BACKUP DATABASE bank INTO '{subdirectory}' IN '{collectionURI}' \
AS OF SYSTEM TIME '-10s' \
WITH revision_history;
A full backup must be present in the destination
for an incremental backup to be stored in a subdirectory
. If there isn't a full backup present in the destination
when taking an incremental backup, one will be taken and stored in the destination
.
To take incremental backups, you need an Enterprise license.
Examples
The following examples make use of:
- Amazon S3 connection strings. For guidance on connecting to other storage options or using other authentication parameters instead, read Use Cloud Storage for Bulk Operations.
- The default
AUTH=specified
parameter. For guidance on usingAUTH=implicit
authentication with Amazon S3 buckets instead, read Use Cloud Storage for Bulk Operations — Authentication.
Automated full backups
Both core and Enterprise users can use backup scheduling for full backups of clusters, databases, or tables. To create schedules that only take full backups, include the FULL BACKUP ALWAYS
clause. For example, to create a schedule for taking full cluster backups:
> CREATE SCHEDULE core_schedule_label
FOR BACKUP INTO 's3://{BUCKET NAME}/{PATH}?AWS_ACCESS_KEY_ID={KEY ID}&AWS_SECRET_ACCESS_KEY={SECRET ACCESS KEY}'
RECURRING '@daily'
FULL BACKUP ALWAYS
WITH SCHEDULE OPTIONS first_run = 'now';
schedule_id | name | status | first_run | schedule | backup_stmt
---------------------+---------------------+--------+---------------------------+----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
588799238330220545 | core_schedule_label | ACTIVE | 2020-09-11 00:00:00+00:00 | @daily | BACKUP INTO 's3://{BUCKET NAME}/{PATH}?AWS_ACCESS_KEY_ID={KEY ID}&AWS_SECRET_ACCESS_KEY={SECRET ACCESS KEY}' WITH detached
(1 row)
For more examples on how to schedule backups that take full and incremental backups, see CREATE SCHEDULE FOR BACKUP
.
Advanced examples
For examples of advanced BACKUP
and RESTORE
use cases, see:
- Incremental backups with a specified destination
- Backup with revision history and point-in-time restore
- Locality-aware backup and restore
- Encrypted backup and restore
- Restore into a different database
- Remove the foreign key before restore
- Restoring users from
system.users
backup
To take incremental backups, backups with revision history, locality-aware backups, and encrypted backups, you need an Enterprise license.