The IMPORT
statement imports tabular data (e.g., CSVs) into a single table.
Requirements
Before using IMPORT
, you should have:
- The schema of the table you want to import.
The tabular data you want to import (e.g., CSV), preferably hosted on cloud storage. This location must be accessible to all nodes using the same address. This means that you cannot use a node's local file storage.
For ease of use, we recommend using cloud storage. However, if that isn't readily available to you, we also have a guide on easily creating your own file server.
Details
Import Targets
Imported tables must not exist and must be created in the IMPORT
statement. If the table you want to import already exists, you must drop it with DROP TABLE
.
You can only import a single table at a time.
You can specify the target database in the table name in the IMPORT
statement. If it's not specified there, the active database in the SQL session is used.
Create Table
Your IMPORT
statement must include a CREATE TABLE
statement (representing the schema of the data you want to import) using one of the following methods:
- A reference to a file that contains a
CREATE TABLE
statement - An inline
CREATE TABLE
statement
We also recommend all secondary indexes you want to use in the CREATE TABLE
statement. It is possible to add secondary indexes later, but it is significantly faster to specify them during import.
CSV Data
The tabular data to import must be valid CSV files, with the caveat that the comma delimiter can be set to another single character. In particular:
- Files must be UTF-8 encoded.
- If the delimiter (
,
by default), a double quote ("
), newline (\n
), or carriage return (\r
) appears in a field, the field must be enclosed by double quotes. - If double quotes are used to enclose fields, then a double quote appearing inside a field must be escaped by preceding it with another double quote. For example:
"aaa","b""bb","ccc"
CockroachDB-specific requirements:
- If a column is of type
BYTES
, it can either be a valid UTF-8 string or a string literal beginning with the two characters\
,x
. For example, a field whose value should be the bytes1
,2
would be written as\x0102
.
Object Dependencies
When importing tables, you must be mindful of the following rules because IMPORT
only creates single tables which must not already exist:
- Objects that the imported table depends on must already exist
- Objects that depend on the imported table can only be created after the import completes
Available Storage Requirements
Each node in the cluster is assigned an equal part of the converted CSV data, and so must have enough temp space to store it. In addition, data is persisted as a normal table, and so there must also be enough space to hold the final, replicated data. The node's first-listed/default store
directory must have enough available storage to hold its portion of the data.
On cockroach start
, if you set --max-disk-temp-storage
, it must also be greater than the portion of the data a node will store in temp space.
Import File Location
You can store the tabular data you want to import using remote cloud storage (Amazon S3, Google Cloud Platform, etc.). Alternatively, you can use an HTTP server accessible from all nodes.
For simplicity's sake, it's strongly recommended to use cloud/remote storage for the data you want to import. Local files are supported; however, they must be accessible identically from all nodes in the cluster.
Table Users and Privileges
Imported tables are treated as new tables, so you must GRANT
privileges to them.
Performance
All nodes are used during tabular data conversion into key-value data, which means all nodes' CPU and RAM will be partially consumed by the IMPORT
task in addition to serving normal traffic.
Viewing and Controlling Import Jobs
After CockroachDB successfully initiates an import, it registers the import as a job, which you can view with SHOW JOBS
.
After the import has been initiated, you can control it with PAUSE JOB
, RESUME JOB
, and CANCEL JOB
.
`IMPORT`
job will cause it to restart from the beginning.Synopsis
Required Privileges
Only the root
user can run IMPORT
.
Parameters
Parameter | Description |
---|---|
table_name |
The name of the table you want to import/create. |
create_table_file |
The URL of a plain text file containing the CREATE TABLE statement you want to use (see this example for syntax). |
table_elem_list |
The table definition you want to use (see this example for syntax). |
file_to_import |
The URL of the file you want to import. |
WITH kv_option |
Control your import's behavior with these options. |
Import File URLs
URLs for the files you want to import must use the following format:
[scheme]://[host]/[path]?[parameters]
Location | scheme | host | parameters |
---|---|---|---|
Amazon S3 | s3 |
Bucket name | AWS_ACCESS_KEY_ID , AWS_SECRET_ACCESS_KEY |
Azure | azure |
Container name | AZURE_ACCOUNT_KEY , AZURE_ACCOUNT_NAME |
Google Cloud 1 | gs |
Bucket name | AUTH (optional): can be default or implicit |
HTTP 2 | http |
Remote host | N/A |
NFS/Local 3 | nodelocal |
File system location | N/A |
S3-compatible services 4 | s3 |
Bucket name | AWS_ACCESS_KEY_ID , AWS_SECRET_ACCESS_KEY , AWS_REGION , AWS_ENDPOINT |
Considerations
1 If the
AUTH
parameter isimplicit
, all GCS connections use Google's default authentication strategy. If theAUTH
parameter isdefault
, thecloudstorage.gs.default.key
cluster setting must be set to the contents of a service account file which will be used during authentication. If theAUTH
parameter is not specified, thecloudstorage.gs.default.key
setting will be used if it is non-empty, otherwise theimplicit
behavior is used.2 You can easily create your own HTTP server with Caddy or nginx. A custom root CA can be appended to the system's default CAs by setting the
cloudstorage.http.custom_ca
cluster setting, which will be used when verifying certificates from HTTPS URLs.3 The file system backup location on the NFS drive is relative to the path specified by the
--external-io-dir
flag set while starting the node. If the flag is set todisabled
, then imports from local directories and NFS drives are disabled.4 A custom root CA can be appended to the system's default CAs by setting the
cloudstorage.http.custom_ca
cluster setting, which will be used when verifying certificates from an S3-compatible service.The location parameters often contain special characters that need to be URI-encoded. Use Javascript's encodeURIComponent function or Go language's url.QueryEscape function to URI-encode the parameters. Other languages provide similar functions to URI-encode special characters.
Import Options
You can control the IMPORT
process's behavior using any of the following key-value pairs as a kv_option
.
delimiter
If not using comma as your column delimiter, you can specify another Unicode character as the delimiter.
Required? | No |
Key | delimiter |
Value | The unicode character that delimits columns in your rows |
Example | To use tab-delimited values: WITH delimiter = e'\t' |
comment
Do not import rows that begin with this character.
Required? | No |
Key | comment |
Value | The unicode character that identifies rows to skip |
Example | WITH comment = '#' |
nullif
Convert values to SQL NULL if they match the specified string.
Required? | No |
Key | nullif |
Value | The string that should be converted to NULL |
Example | To use empty columns as NULL: WITH nullif = '' |
Examples
Use Create Table Statement from a File
> IMPORT TABLE customers
CREATE USING 'azure://acme-co/customer-create-table.sql?AZURE_ACCOUNT_KEY=hash&AZURE_ACCOUNT_NAME=acme-co'
CSV DATA ('azure://acme-co/customer-import-data.csv?AZURE_ACCOUNT_KEY=hash&AZURE_ACCOUNT_NAME=acme-co')
;
Use Create Table Statement from a Statement
> IMPORT TABLE customers (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name TEXT,
INDEX name_idx (name)
)
CSV DATA ('azure://acme-co/customer-import-data.csv?AZURE_ACCOUNT_KEY=hash&AZURE_ACCOUNT_NAME=acme-co')
;
Import a Tab-Separated File
> IMPORT TABLE customers (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name TEXT,
INDEX name_idx (name)
)
CSV DATA ('azure://acme-co/customer-import-data.tsc?AZURE_ACCOUNT_KEY=hash&AZURE_ACCOUNT_NAME=acme-co')
WITH
delimiter = e'\t'
;
Skip Commented Lines
> IMPORT TABLE customers (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name TEXT,
INDEX name_idx (name)
)
CSV DATA ('azure://acme-co/customer-import-data.tsc?AZURE_ACCOUNT_KEY=hash&AZURE_ACCOUNT_NAME=acme-co')
WITH
comment = '#'
;
Use Blank Characters as NULL
> IMPORT TABLE customers (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name TEXT,
INDEX name_idx (name)
)
CSV DATA ('azure://acme-co/customer-import-data.tsc?AZURE_ACCOUNT_KEY=hash&AZURE_ACCOUNT_NAME=acme-co')
WITH
nullif = ''
;
Known Limitation
IMPORT
can sometimes fail with a "context canceled" error, or can restart itself many times without ever finishing. If this is happening, it is likely due to a high amount of disk contention. This can be mitigated by setting the kv.bulk_io_write.max_rate
cluster setting to a value below your max disk write speed. For example, to set it to 10MB/s, execute:
> SET CLUSTER SETTING kv.bulk_io_write.max_rate = '10MB';