Migrate from CSV

On this page Carat arrow pointing down
Warning:
As of November 24, 2023, CockroachDB v22.1 is no longer supported. For more details, refer to the Release Support Policy.

This page has instructions for migrating data from CSV files into CockroachDB using IMPORT INTO.

The examples on this page use the employees data set that is also used in the MySQL docs.

The examples pull real data from Amazon S3. They use the employees data set that is also used in the MySQL docs, dumped as a set of CSV files.

Tip:

For best practices for optimizing import performance in CockroachDB, see Import Performance Best Practices.

Note:

As of v22.1, certain IMPORT TABLE statements that defined the table schema inline are not supported. See Import — Considerations for more details. To import data into a new table, use CREATE TABLE followed by IMPORT INTO. For an example, read Import into a new table from a CSV file.

Step 1. Export data to CSV

Please refer to the documentation of your database for instructions on exporting data to CSV.

You will need to export one CSV file per table, with the following requirements:

  • Files must be in valid CSV format, with the caveat that the delimiter must be a single character. To use a character other than comma (such as a tab), set a custom delimiter using the delimiter option.
  • Files must be UTF-8 encoded.
  • If one of the following characters appears in a field, the field must be enclosed by double quotes:
    • delimiter (, by default)
    • double quote (")
    • newline (\n)
    • carriage return (\r)
  • If double quotes are used to enclose fields, then a double quote appearing inside a field must be escaped by preceding it with another double quote. For example: "aaa","b""bb","ccc".
  • If a column is of type BYTES, it can either be a valid UTF-8 string or a hex-encoded byte literal beginning with \x. For example, a field whose value should be the bytes 1, 2 would be written as \x0102.

Step 2. Host the files where the cluster can access them

Each node in the CockroachDB cluster needs to have access to the files being imported. There are several ways for the cluster to access the data; for more information on the types of storage IMPORT INTO can pull from, see the following:

Tip:

We strongly recommend using cloud storage such as Amazon S3 or Google Cloud to host the data files you want to import.

Step 3. Import the CSV

You will need to write a CREATE TABLE statement that matches the schema of the table data you're importing.

For example, to import the data from employees.csv into an employees table, issue the following statement to create the table:

icon/buttons/copy
CREATE TABLE employees (
  emp_no INT PRIMARY KEY,
  birth_date DATE NOT NULL,
  first_name STRING NOT NULL,
  last_name STRING NOT NULL,
  gender STRING NOT NULL,
  hire_date DATE NOT NULL
      );

Next, use IMPORT INTO to import the data into the new table:

icon/buttons/copy
IMPORT INTO employees (emp_no, birth_date, first_name, last_name, gender, hire_date)
     CSV DATA (
       'https://s3-us-west-1.amazonaws.com/cockroachdb-movr/datasets/employees-db/csv/employees.csv.gz'
     );
       job_id       |  status   | fraction_completed |  rows  | index_entries | system_records |  bytes   
--------------------+-----------+--------------------+--------+---------------+----------------+----------
 381866942129111041 | succeeded |                  1 | 300024 |             0 |              0 | 13258389
(1 row)

Repeat this process for each CSV file you want to import.

Before importing CSV data, consider the following:

  • The column order in your schema must match the column order in the file being imported.
  • You will need to run ALTER TABLE ... ADD CONSTRAINT to add any foreign key relationships.

Configuration Options

The following options are available to IMPORT ... CSV:

Column delimiter

The delimiter option is used to set the Unicode character that marks where each column ends. Default: ,.

Example usage:

icon/buttons/copy
IMPORT INTO employees (emp_no, birth_date, first_name, last_name, gender, hire_date)
     CSV DATA (
       'https://s3-us-west-1.amazonaws.com/cockroachdb-movr/datasets/employees-db/csv/employees.csv.gz')
        WITH delimiter = e'\t';

Comment syntax

The comment option determines which Unicode character marks the rows in the data to be skipped.

Example usage:

icon/buttons/copy
IMPORT INTO employees (emp_no, birth_date, first_name, last_name, gender, hire_date)
     CSV DATA (
       'https://s3-us-west-1.amazonaws.com/cockroachdb-movr/datasets/employees-db/csv/employees.csv.gz')
        WITH comment = '#';

Skip header rows

The skip option determines the number of header rows to skip when importing a file.

Example usage:

icon/buttons/copy
IMPORT INTO employees (emp_no, birth_date, first_name, last_name, gender, hire_date)
     CSV DATA (
       'https://s3-us-west-1.amazonaws.com/cockroachdb-movr/datasets/employees-db/csv/employees.csv.gz')
        WITH skip = '2';

Null strings

The nullif option defines which string should be converted to NULL.

Example usage:

icon/buttons/copy
IMPORT INTO employees (emp_no, birth_date, first_name, last_name, gender, hire_date)
     CSV DATA (
       'https://s3-us-west-1.amazonaws.com/cockroachdb-movr/datasets/employees-db/csv/employees.csv.gz')
        WITH nullif = '';

File compression

The compress option defines which decompression codec should be used on the CSV file to be imported. Options include:

  • gzip: Uses the gzip algorithm to decompress the file.
  • bzip: Uses the bzip algorithm to decompress the file.
  • none: Disables decompression.
  • auto: Default. Guesses based on file extension ('none' for .csv, 'gzip' for .gz, 'bzip' for .bz and .bz2).

Example usage:

icon/buttons/copy
IMPORT INTO employees (emp_no, birth_date, first_name, last_name, gender, hire_date)
     CSV DATA (
       'https://s3-us-west-1.amazonaws.com/cockroachdb-movr/datasets/employees-db/csv/employees.csv.gz')
        WITH compress = 'gzip';

See also


Yes No
On this page

Yes No