New in v2.1: The EXPORT
statement exports tabular data or the results of arbitrary SELECT
statements to CSV files.
Using the CockroachDB distributed execution engine, EXPORT
parallelizes CSV creation across all nodes in the cluster, making it possible to quickly get large sets of data out of CockroachDB in a format that can be ingested by downstream systems. If you do not need distributed exports, you can use the non-enterprise feature to export tabular data in CSV format.
This is an enterprise feature. Also, it is in beta and is currently undergoing continued testing. Please file a Github issue with us if you identify a bug.
Export file location
You can use remote cloud storage (Amazon S3, Google Cloud Platform, etc.) to store the exported CSV data. Alternatively, you can use an HTTP server accessible from all nodes.
For simplicity's sake, it's strongly recommended to use cloud/remote storage for the data you want to export. Local files are supported; however, they must be accessible identically from all nodes in the cluster.
Cancelling export
After the export has been initiated, you can cancel it with CANCEL QUERY
.
Synopsis
Required privileges
Only members of the admin
role can run EXPORT
. By default, the root
user belongs to the admin
role.
Parameters
Parameter | Description |
---|---|
file_location |
Specify the URL of the file location where you want to store the exported CSV data. |
WITH kv_option |
Control your export's behavior with these options. |
select_stmt |
Specify the query whose result you want to export to CSV format. |
table_name |
Specify the name of the table you want to export to CSV format. |
Export file URL
URLs for the file directory location you want to export to must use the following format:
[scheme]://[host]/[path]?[parameters]
Location | Scheme | Host | Parameters |
---|---|---|---|
Amazon S3 | s3 |
Bucket name | AWS_ACCESS_KEY_ID , AWS_SECRET_ACCESS_KEY , AWS_SESSION_TOKEN |
Azure | azure |
N/A (see Example file URLs | AZURE_ACCOUNT_KEY , AZURE_ACCOUNT_NAME |
Google Cloud 1 | gs |
Bucket name | AUTH (optional): can be default or implicit |
HTTP 2 | http |
Remote host | N/A |
NFS/Local 3 | nodelocal |
N/A (see Example file URLs | N/A |
S3-compatible services 4 | s3 |
Bucket name | AWS_ACCESS_KEY_ID , AWS_SECRET_ACCESS_KEY , AWS_SESSION_TOKEN , AWS_REGION 5 (optional), AWS_ENDPOINT |
The location parameters often contain special characters that need to be URI-encoded. Use Javascript's encodeURIComponent function or Go language's url.QueryEscape function to URI-encode the parameters. Other languages provide similar functions to URI-encode special characters.
1 If the
AUTH
parameter isimplicit
, all GCS connections use Google's default authentication strategy. If theAUTH
parameter isdefault
, thecloudstorage.gs.default.key
cluster setting must be set to the contents of a service account file which will be used during authentication. If theAUTH
parameter is not specified, thecloudstorage.gs.default.key
setting will be used if it is non-empty, otherwise theimplicit
behavior is used.2 You can create your own HTTP server with Caddy or nginx. A custom root CA can be appended to the system's default CAs by setting the
cloudstorage.http.custom_ca
cluster setting, which will be used when verifying certificates from HTTPS URLs.3 The file system backup location on the NFS drive is relative to the path specified by the
--external-io-dir
flag set while starting the node. If the flag is set todisabled
, then imports from local directories and NFS drives are disabled.4 A custom root CA can be appended to the system's default CAs by setting the
cloudstorage.http.custom_ca
cluster setting, which will be used when verifying certificates from an S3-compatible service.5 The
AWS_REGION
parameter is optional since it is not a required parameter for most S3-compatible services. Specify the parameter only if your S3-compatible service requires it.
Example file URLs
Location | Example |
---|---|
Amazon S3 | s3://acme-co/employees.sql?AWS_ACCESS_KEY_ID=123&AWS_SECRET_ACCESS_KEY=456 |
Azure | azure://employees.sql?AZURE_ACCOUNT_KEY=123&AZURE_ACCOUNT_NAME=acme-co |
Google Cloud | gs://acme-co/employees.sql |
HTTP | http://localhost:8080/employees.sql |
NFS/Local | nodelocal:///path/employees |
You can specify the base directory where you want to store the exported .csv files. CockroachDB will create several files in the specified directory with programmatically generated names (e.g., n1.1.csv, n1.2.csv, n2.1.csv, ...).
Export options
You can control the EXPORT
process's behavior using any of the following key-value pairs as a kv_option
.
delimiter
If not using comma as your column delimiter, you can specify another ASCII character as the delimiter.
Required? | No |
Key | delimiter |
Value | The ASCII character that delimits columns in your rows. |
Example | To use tab-delimited values: WITH delimiter = e'\t' |
nullas
Convert SQL NULL values so they match the specified string.
Required? | No |
Key | nullas |
Value | The string that should be used to represent NULL values. To avoid collisions, it is important to pick nullas values that does not appear in the exported data. |
Example | To use empty columns as NULL: WITH nullas = '' |
Examples
Export a table
> EXPORT INTO CSV
'azure://acme-co/customer-export-data?AZURE_ACCOUNT_KEY=hash&AZURE_ACCOUNT_NAME=acme-co'
WITH delimiter = '|' FROM TABLE bank.customers;
Export using a SELECT
statement
> EXPORT INTO CSV
'azure://acme-co/customer-export-data?AZURE_ACCOUNT_KEY=hash&AZURE_ACCOUNT_NAME=acme-co'
FROM SELECT * FROM bank.customers WHERE id >= 100;
Non-distributed export using the SQL shell
$ cockroach sql -e "SELECT * from bank.customers WHERE id>=100;" --format=csv > my.csv
View a running export
View running exports by using SHOW QUERIES
:
> SHOW QUERIES;
Cancel a running export
Use SHOW QUERIES
to get a running export's query_id
, which can be used to cancel the export:
> CANCEL QUERY '14dacc1f9a781e3d0000000000000001';
Known limitation
EXPORT
may fail with an error if the SQL statements are incompatible with DistSQL. In that case, use the non-enterprise feature to export tabular data in CSV format.