Cockroach Labs recommends using changefeeds to export data because they provide better performance for growing workloads. Additionally, changefeeds operate as jobs, which offer observability, scheduling, and job management.
The EXPORT
statement exports tabular data or the results of arbitrary SELECT
statements to the following:
- CSV files
- Parquet files
Using the CockroachDB distributed execution engine, EXPORT
parallelizes file creation across all nodes in the cluster, making it possible to quickly get large sets of data out of CockroachDB in a format that can be ingested by downstream systems.
If you do not need distributed exports, you can export tabular data in CSV format.
Cancelling export
After the export has been initiated, you can cancel it with CANCEL QUERY
.
For detail on handling partially complete exports at the destination, refer to Export File URL.
Synopsis
The EXPORT
statement cannot be used within a transaction.
Required privileges
The user must have the SELECT
privilege on the table being exported, unless the destination URI requires admin
privileges.
Destination privileges
You can grant a user the EXTERNALIOIMPLICITACCESS
system-level privilege to interact with external resources that require implicit access.
Either the EXTERNALIOIMPLICITACCESS
system-level privilege or the admin
role is required for the following scenarios:
- Interacting with a cloud storage resource using
IMPLICIT
authentication. - Using a custom endpoint on S3.
- Using the
cockroach nodelocal upload
command. - Using HTTP or HTTPS.
No special privilege is required for:
- Interacting with an Amazon S3 and Google Cloud Storage resource using
SPECIFIED
credentials. Azure Storage is alwaysSPECIFIED
by default. - Using Userfile storage.
We recommend using cloud storage. You also need to ensure that the permissions at your storage destination are configured for the operation. See Storage Permissions for a list of the necessary permissions that each bulk operation requires.
While Cockroach Labs actively tests Amazon S3, Google Cloud Storage, and Azure Storage, we do not test S3-compatible services (e.g., MinIO, Red Hat Ceph).
Parameters
Parameter | Description |
---|---|
file_location |
Specify the URL of the file location where you want to store the exported data. Note: It is best practice to use a unique destination for each export, to avoid mixing files from different exports in one directory. |
opt_with_options |
Control your export's behavior with these options. |
select_stmt |
Specify the query whose result you want to export. |
table_name |
Specify the name of the table you want to export. |
Export file URL
You can specify the base directory where you want to store the exported files. CockroachDB will create the export file(s) in the specified directory with programmatically generated names (e.g., exportabc123-n1.1.csv
, exportabc123-n1.2.csv
, exportabc123-n2.1.csv
, ...). Each export should use a unique destination directory to avoid collision with other exports.
The EXPORT
command returns the list of files to which the data was exported. You may wish to record these for use in subsequent imports.
If an export encounters some kind of failure or cancellation, it will leave any written files behind in the destination. To run a new export and avoid collision with previously written export files, consider doing the following:
- Change the destination (or destination prefix) that you are exporting to.
- Remove the partial results at the destination from any previously attempted exports.
- Use a changefeed export instead because it operates as a job, offering more observability into failures.
A hexadecimal hash code (abc123...
in the file names) uniquely identifies each export run; files sharing the same hash are part of the same export. If you see multiple hash codes within a single destination directory, then the directory contains multiple exports, which will likely cause confusion (duplication) on import. We recommend that you manually clean up the directory, to ensure that it contains only a single export run.
For more information, see the following:
You can create an external connection to represent an external storage or sink URI. This allows you to specify the external connection's name in statements rather than the provider-specific URI. For detail on using external connections, see the CREATE EXTERNAL CONNECTION
page.
Export options
You can control the EXPORT
process's behavior using any of the following key-value pairs as a kv_option
.
Key | Context |
Value |
---|---|---|
delimiter |
CSV DATA |
The ASCII character that delimits columns in your rows. If not using comma as your column delimiter, you can specify another ASCII character as the delimiter. Default: , . To use tab-delimited values: WITH delimiter = e'\t' See the example. |
nullas |
CSV DATA , DELIMITED DATA |
The string that should be used to represent NULL values. To avoid collisions, it is important to pick nullas values that do not appear in the exported data. To use empty columns as NULL : WITH nullas = '' See the example. |
compression |
CSV DATA , PARQUET DATA |
This instructs export to write compressed files to the specified destination. For CSV DATA , gzip compression is supported. For PARQUET DATA , both gzip and snappy compression is supported. See the example. |
chunk_rows |
CSV DATA , PARQUET DATA |
The number of rows to be converted and written to a single file. Default: 100000 . For example, WITH chunk_rows = '5000' for a table with 10,000 rows would produce two files. Note: EXPORT will stop and upload the file whether the configured limit for chunk_rows or chunk_size is reached first. |
chunk_size |
CSV DATA , PARQUET DATA |
A target size per file that you can specify during an EXPORT . Once the target size is reached, the file is uploaded before processing further rows. Default: 32MB . For example, to set the size of each file uploaded during the export to 10MB: WITH chunk_size = '10MB' . Note: EXPORT will stop and upload the file whether the configured limit for chunk_rows or chunk_size is reached first. |
Success responses
Successful EXPORT
returns a table of (perhaps multiple) files to which the data was exported:
Response | Description |
---|---|
filename |
The file to which the data was exported. |
rows |
The number of rows exported to this file. |
bytes |
The file size in bytes. |
Parquet types
CockroachDB types map to Parquet types listed in the following table. All columns witten to Parquet files will be nullable, therefore the Parquet repetition level is optional
.
CockroachDB Type | Parquet Type | Parquet Logical Type |
---|---|---|
BOOL |
BOOLEAN |
nil |
STRING |
byte array | STRING |
COLLATE |
byte array | STRING |
INET |
byte array | STRING |
JSONB |
byte array | JSON |
INT INT8 |
INT64 |
nil |
INT2 INT4 |
INT32 |
nil |
FLOAT FLOAT8 |
FLOAT64 |
nil |
FLOAT4 |
FLOAT32 |
nil |
DECIMAL |
byte array | DECIMAL Note: scale and precision data are preserved in the Parquet file. |
UUID |
fixed_len_byte_array |
nil |
BYTES |
byte array | nil |
BIT |
byte array | nil |
ENUM |
byte array | ENUM |
Box2D |
byte array | STRING |
GEOGRAPHY |
byte array | nil |
GEOMETRY |
byte array | nil |
DATE |
byte array | STRING |
TIME |
INT64 |
TIME Note: microseconds after midnight; exporting to microsecond precision. |
TIMETZ |
byte array | STRING Note: exporting to microsecond precision. |
INTERVAL |
byte array | STRING Note: specifically represented as ISO8601. |
TIMESTAMP |
byte array | STRING Note: exporting to microsecond precision. |
TIMESTAMPTZ |
byte array | STRING Note: exporting to microsecond precision. |
ARRAY |
Encoded as a repeated field; each array value is encoded as per the preceding types. |
nil |
Exports and AS OF SYSTEM TIME
The AS OF SYSTEM TIME
clause is not required in EXPORT
statements, even though they are long-running queries. If it is omitted, AS OF SYSTEM TIME
is implicitly set to the start of the statement's execution. The risk of contention is low because other transactions would need to have exactly the same transaction start time as the EXPORT
statement's start time.
Examples
The following examples make use of:
- Amazon S3 connection strings. For guidance on connecting to other storage options or using other authentication parameters instead, read Use Cloud Storage.
- The default
AUTH=specified
parameter. For guidance on usingAUTH=implicit
authentication with Amazon S3 buckets instead, read Cloud Storage Authentication.
Also, note the following features for connecting and authenticating to cloud storage:
- External connections, which allow you to represent an external storage or sink URI. You can then specify the external connection's name in statements rather than the provider-specific URI. For detail on using external connections, see the
CREATE EXTERNAL CONNECTION
page. - Assume role authentication, which allows you to limit the control specific users have over your storage buckets. See Assume role authentication for more information.
Each of these examples use the bank
database and the customers
table; customer-export-data
is the demonstration path to which we're exporting our customers' data in this example.
Export a table into CSV
This example uses the delimiter
option to define the ASCII character that delimits columns in your rows:
> EXPORT INTO CSV
's3://{BUCKET NAME}/{customer-export-data}?AWS_ACCESS_KEY_ID={ACCESS KEY}&AWS_SECRET_ACCESS_KEY={SECRET ACCESS KEY}'
WITH delimiter = '|' FROM TABLE bank.customers;
This examples uses the nullas
option to define the string that represents NULL
values:
> EXPORT INTO CSV
's3://{BUCKET NAME}/{customer-export-data}?AWS_ACCESS_KEY_ID={ACCESS KEY}&AWS_SECRET_ACCESS_KEY={SECRET ACCESS KEY}'
WITH nullas = '' FROM TABLE bank.customers;
Export a table into Parquet
> EXPORT INTO PARQUET
's3://{BUCKET NAME}/{customer-export-data}?AWS_ACCESS_KEY_ID={ACCESS KEY}&AWS_SECRET_ACCESS_KEY={SECRET ACCESS KEY}'
FROM TABLE bank.customers;
Export using a SELECT
statement
> EXPORT INTO CSV
's3://{BUCKET NAME}/{customer-export-data}?AWS_ACCESS_KEY_ID={ACCESS KEY}&AWS_SECRET_ACCESS_KEY={SECRET ACCESS KEY}'
FROM SELECT * FROM bank.customers WHERE id >= 100;
For more information, see selection queries.
Non-distributed export using the SQL client
$ cockroach sql -e "SELECT * from bank.customers WHERE id>=100;" --format=csv > my.csv
For more information about the SQL client, see cockroach sql
.
Export compressed files
gzip
compression is supported for both PARQUET
and CSV
file formats:
> EXPORT INTO CSV
's3://{BUCKET NAME}/{customer-export-data}?AWS_ACCESS_KEY_ID={ACCESS KEY}&AWS_SECRET_ACCESS_KEY={SECRET ACCESS KEY}'
WITH compression = 'gzip' FROM TABLE bank.customers;
filename | rows | bytes
---------------------------------------------------+------+--------
export16808a04292505c80000000000000001-n1.0.csv.gz | 17 | 824
(1 row)
PARQUET
data also supports snappy
compression:
> EXPORT INTO PARQUET
's3://{BUCKET NAME}/{customer-export-data}?AWS_ACCESS_KEY_ID={ACCESS KEY}&AWS_SECRET_ACCESS_KEY={SECRET ACCESS KEY}'
WITH compression = 'snappy' FROM TABLE bank.customers;
filename | rows | bytes
-----------------------------------------------------------+------+--------
export16808a04292505c80000000000000001-n1.0.parquet.snappy | 17 | 824
(1 row)
Export tabular data with an S3 storage class
To associate your export objects with a specific storage class in your Amazon S3 bucket, use the S3_STORAGE_CLASS
parameter with the class. For example, the following S3 connection URI specifies the INTELLIGENT_TIERING
storage class:
> EXPORT INTO CSV
's3://{BUCKET NAME}/{customer-export-data}?AWS_ACCESS_KEY_ID={ACCESS KEY}&AWS_SECRET_ACCESS_KEY={SECRET ACCESS KEY}&S3_STORAGE_CLASS=INTELLIGENT_TIERING'
WITH delimiter = '|' FROM TABLE bank.customers;
Use the parameter to set one of these storage classes listed in Amazon's documentation. For more general usage information, see Amazon's Using Amazon S3 storage classes documentation.
Export data out of CockroachDB Cloud
Using EXPORT
with userfile
is not recommended. You can either export data to cloud storage or to a local CSV file by using cockroach sql --execute
:
The following example exports the customers
table from the bank
database into a local CSV file:
$ cockroach sql \
--url 'postgres://{username}:{password}@{host}:26257?sslmode=verify-full&sslrootcert={path/to/certs_dir}/cc-ca.crt' \
--execute "SELECT * FROM bank.customers" --format=csv > /Users/{username}/{path/to/file}/customers.csv
The following example exports the customers
table from the bank
database into a cloud storage bucket in CSV format:
EXPORT INTO CSV
's3://{BUCKET NAME}/{customer-export-data}?AWS_ACCESS_KEY_ID={ACCESS KEY}&AWS_SECRET_ACCESS_KEY={SECRET ACCESS KEY}'
WITH delimiter = '|' FROM TABLE bank.customers;
View a running export
View running exports by using SHOW STATEMENTS
:
> SHOW STATEMENTS;
Cancel a running export
Use SHOW STATEMENTS
to get a running export's query_id
, which can be used to cancel the export:
> CANCEL QUERY '14dacc1f9a781e3d0000000000000001';
For detail on handling partially complete exports at the destination, refer to Export File URL.
Known limitation
EXPORT
may fail with an error if the SQL statements are incompatible with DistSQL. In that case, export tabular data in CSV format.