CockroachDB constructs a secure API call to the cloud storage specified in a URL passed to one of the following statements:
We strongly recommend using cloud/remote storage.
URL format
URLs for the files you want to import must use the format shown below. For examples, see Example file URLs.
[scheme]://[host]/[path]?[parameters]
Location | Scheme | Host | Parameters |
---|---|---|---|
Amazon | s3 |
Bucket name | AUTH (optional; can be implicit or specified ), AWS_ACCESS_KEY_ID , AWS_SECRET_ACCESS_KEY , AWS_SESSION_TOKEN For more information, see Authentication — Amazon S3. |
Azure | azure |
Storage container | AZURE_ACCOUNT_KEY , AZURE_ACCOUNT_NAME |
Google Cloud | gs |
Bucket name | AUTH (optional; can be default , implicit , or specified ), CREDENTIALS Deprecation notice: In v21.1, we suggest you do not use the cloudstorage.gs.default.key cluster setting, as the default behavior will be changing in v21.2. For more information, see Authentication - Google Cloud Storage. |
HTTP | http |
Remote host | N/A For more information, see Authentication — HTTP. |
NFS/Local 1 | nodelocal |
nodeID or self 2 (see Example file URLs) |
N/A |
S3-compatible services | s3 |
Bucket name | Warning: Unlike Amazon S3, Google Cloud Storage, and Azure Storage options, the usage of S3-compatible services is not actively tested by Cockroach Labs. AWS_ACCESS_KEY_ID , AWS_SECRET_ACCESS_KEY , AWS_SESSION_TOKEN , AWS_REGION 3 (optional), AWS_ENDPOINT For more information, see Authentication - S3-compatible services. |
The location parameters often contain special characters that need to be URI-encoded. Use Javascript's encodeURIComponent function or Go language's url.QueryEscape function to URI-encode the parameters. Other languages provide similar functions to URI-encode special characters.
You can disable the use of implicit credentials when accessing external cloud storage services for various bulk operations by using the --external-io-disable-implicit-credentials
flag.
1 The file system backup location on the NFS drive is relative to the path specified by the --external-io-dir
flag set while starting the node. If the flag is set to disabled
, then imports from local directories and NFS drives are disabled.
2 Using a nodeID
is required and the data files will be in the extern
directory of the specified node. In most cases (including single-node clusters), using nodelocal://1/<path>
is sufficient. Use self
if you do not want to specify a nodeID
, and the individual data files will be in the extern
directories of arbitrary nodes; however, to work correctly, each node must have the --external-io-dir
flag point to the same NFS mount or other network-backed, shared storage.
3 The AWS_REGION
parameter is optional since it is not a required parameter for most S3-compatible services. Specify the parameter only if your S3-compatible service requires it.
Example file URLs
Example URLs for BACKUP
, RESTORE
, EXPORT
, or changefeeds given a bucket or container name of acme-co
and an employees
subdirectory:
Location | Example |
---|---|
Amazon S3 | s3://acme-co/employees?AWS_ACCESS_KEY_ID=123&AWS_SECRET_ACCESS_KEY=456 |
Azure | azure://acme-co/employees?AZURE_ACCOUNT_NAME=acme-co&AZURE_ACCOUNT_KEY=url-encoded-123 |
Google Cloud | gs://acme-co/employees?AUTH=specified&CREDENTIALS=encoded-123 |
NFS/Local | nodelocal://1/path/employees , nodelocal://self/nfsmount/backups/employees 2 |
URLs for changefeeds should be prepended with experimental-
.
Currently, cloud storage sinks (for changefeeds) only work with JSON
and emits newline-delimited JSON
files.
Example URLs for IMPORT
given a bucket or container name of acme-co
and a filename of employees
:
Location | Example |
---|---|
Amazon S3 | s3://acme-co/employees.sql?AWS_ACCESS_KEY_ID=123&AWS_SECRET_ACCESS_KEY=456 |
Azure | azure://acme-co/employees.sql?AZURE_ACCOUNT_NAME=acme-co&AZURE_ACCOUNT_KEY=url-encoded-123 |
Google Cloud | gs://acme-co/employees.sql?AUTH=specified&CREDENTIALS=encoded-123 |
HTTP | http://localhost:8080/employees.sql |
NFS/Local | nodelocal://1/path/employees , nodelocal://self/nfsmount/backups/employees 2 |
HTTP storage can only be used for IMPORT
.
Encryption
Transport Layer Security (TLS) is used for encryption in transit when transmitting data to or from Amazon S3, Google Cloud Storage, and Azure.
For encryption at rest, if your cloud provider offers transparent data encryption, you can use that to ensure that your backups are not stored on disk in cleartext.
CockroachDB also provides client-side encryption of backup data, for more information, see Take and Restore Encrypted Backups.
Authentication
When running bulk operations to and from a storage bucket, authentication setup can vary depending on the cloud provider. This section details the necessary steps to authenticate to each cloud provider.
implicit
authentication cannot be used to run bulk operations from CockroachDB Cloud clusters—instead, use AUTH=specified
.
The AUTH
parameter passed to the file URL must be set to either specified
or implicit
. The following sections describe how to set up each authentication method.
Specified authentication
If the AUTH
parameter is not provided, AWS connections default to specified
and the access keys must be provided in the URI parameters.
As an example:
BACKUP DATABASE <database> INTO 's3://{bucket name}/{path in bucket}/?AWS_ACCESS_KEY_ID={access key ID}&AWS_SECRET_ACCESS_KEY={secret access key}';
Implicit authentication
If the AUTH
parameter is implicit
, the access keys can be omitted and the credentials will be loaded from the environment, i.e. the machines running the backup.
BACKUP DATABASE <database> INTO 's3://{bucket name}/{path}?AUTH=implicit';
You can associate an EC2 instance with an IAM role to provide implicit access to S3 storage within the IAM role's policy. In the following command, the instance example
EC2 instance is associated with the example profile
instance profile, giving the EC2 instance implicit access to any example profile
S3 buckets.
aws ec2 associate-iam-instance-profile --iam-instance-profile Name={example profile} --region={us-east-2} --instance-id {instance example}
The AUTH
parameter passed to the file URL must be set to either specified
or implicit
. The following sections describe how to set up each authentication method.
In v21.1 and earlier, if no AUTH
parameter is provided with a Google Cloud Storage URI then authentication will default to default
. This means that the connection will only use the key provided in the cloudstorage.gs.default.key
cluster setting, and will error if not present.
Deprecation notice: Currently, GCS connections default to the cloudstorage.gs.default.key
cluster setting. This default behavior will no longer be supported in v21.2. If you are relying on this default behavior, we recommend adjusting your queries and scripts to now specify the AUTH
parameter you want to use. Similarly, if you are using the cloudstorage.gs.default.key
cluster setting to authorize your GCS connection, we recommend switching to use AUTH=specified
or AUTH=implicit
. AUTH=specified
will be the default behavior in v21.2 and beyond.
Specified authentication
To access the storage bucket with specified
credentials, it's necessary to create a service account and add the service account address to the permissions on the specific storage bucket.
The JSON credentials file for authentication can be downloaded from the Service Accounts page in the Google Cloud Console and then base64-encoded:
cat gcs_key.json | base64
Pass the encoded JSON object to the CREDENTIALS
parameter:
BACKUP DATABASE <database> INTO 'gs://{bucket name}/{path}?AUTH=specified&CREDENTIALS={encoded key}';
Implicit authentication
For CockroachDB instances that are running within a Google Cloud Environment, environment data can be used from the service account to implicitly access resources within the storage bucket.
For CockroachDB clusters running in other environments, implicit
authentication access can still be set up manually with the following steps:
Create a service account and add the service account address to the permissions on the specific storage bucket.
Download the JSON credentials file from the Service Accounts page in the Google Cloud Console to the machines that CockroachDB is running on. (Since this file will be passed as an environment variable, it does not need to be base64-encoded.) Ensure that the file is located in a path that CockroachDB can access.
Create an environment variable instructing CockroachDB where the credentials file is located. The environment variable must be exported on each CockroachDB node:
export GOOGLE_APPLICATION_CREDENTIALS="/{cockroach}/gcs_key.json"
Alternatively, to pass the credentials using
systemd
, usesystemctl edit cockroach.service
to add the environment variableEnvironment="GOOGLE_APPLICATION_CREDENTIALS=gcs-key.json"
under[Service]
in thecockroach.service
unit file. Then, runsystemctl daemon-reload
to reload thesystemd
process. Restart thecockroach
process on each of the cluster's nodes withsystemctl restart cockroach
, which will reload the configuration files.To pass the credentials using code, see Google's Authentication documentation.
Run a backup (or other bulk operation) to the storage bucket with the
AUTH
parameter set toimplicit
:BACKUP DATABASE <database> INTO 'gs://{bucket name}/{path}?AUTH=implicit';
If the use of implicit credentials is disabled with --external-io-disable-implicit-credentials
flag, an error will be returned when accessing external cloud storage services for various bulk operations when using AUTH=implicit
.
To access Azure storage containers, it is sometimes necessary to url encode the account key since it is base64-encoded and may contain +
, /
, =
characters. For example:
BACKUP DATABASE <database> INTO 'azure://{container name}/{path}?AZURE_ACCOUNT_NAME={account name}&AZURE_ACCOUNT_KEY={url-encoded key}';
If your environment requires an HTTP or HTTPS proxy server for outgoing connections, you can set the standard HTTP_PROXY
and HTTPS_PROXY
environment variables when starting CockroachDB. You can create your own HTTP server with NGINX. A custom root CA can be appended to the system's default CAs by setting the cloudstorage.http.custom_ca
cluster setting, which will be used when verifying certificates from HTTPS URLs.
If you cannot run a full proxy, you can disable external HTTP(S) access (as well as custom HTTP(S) endpoints) when importing by using the --external-io-disable-http
flag.
Unlike Amazon S3, Google Cloud Storage, and Azure Storage options, the usage of S3-compatible services is not actively tested by Cockroach Labs.
A custom root CA can be appended to the system's default CAs by setting the cloudstorage.http.custom_ca
cluster setting, which will be used when verifying certificates from an S3-compatible service.