This page has instructions for migrating data from Postgres to CockroachDB using IMPORT
's support for reading pg_dump
files.
The examples below pull real data from Amazon S3. They use the employees data set that is also used in the MySQL docs. The data was imported to Postgres using pgloader, and then modified for use here as explained below.
For best practices for optimizing import performance in CockroachDB, see Import Performance Best Practices.
Step 1. Dump the Postgres database
There are several ways to dump data from Postgres to be imported into CockroachDB:
The import will fail if the dump file contains functions or type definitions. In addition to calling pg_dump
as shown below, you may need to edit the dump file to remove functions and data types.
Also, note that CockroachDB's IMPORT
does not support automatically importing data from Postgres' non-public schemas. As a workaround, you can edit the dump file to change the table and schema names in the CREATE TABLE
statements.
Dump the entire database
Most users will want to import their entire Postgres database all at once, as shown below in Import a full database dump.
To dump the entire database, run the pg_dump
command shown below.
$ pg_dump employees > /tmp/employees-full.sql
For this data set, the Postgres dump file required the following edits, which have already been performed on the files used in the examples below:
The type of the
employees.gender
column in theCREATE TABLE
statement had to be changed fromemployees.employees_gender
toSTRING
since Postgres represented the employee's gender using aCREATE TYPE
statement that is not supported by CockroachDB.A
CREATE TYPE employee ...
statement needed to be removed.
If you only want to import one table from a database dump, see Import a table from a full database dump below.
Dump one table at a time
To dump the employees
table from a Postgres database also named employees
, run the pg_dump
command shown below. You can import this table using the instructions in Import a table from a table dump below.
$ pg_dump -t employees employees > /tmp/employees.sql
For this data set, the Postgres dump file required the following edits, which have already been performed on the files used in the examples below.
- The type of the
employees.gender
column in theCREATE TABLE
statement had to be changed fromemployees.employees_gender
toSTRING
since Postgres represented the employee's gender using aCREATE TYPE
statement that is not supported by CockroachDB.
Step 2. Host the files where the cluster can access them
Each node in the CockroachDB cluster needs to have access to the files being imported. There are several ways for the cluster to access the data; for more information on the types of storage IMPORT
can pull from, see the following:
We strongly recommend using cloud storage such as Amazon S3 or Google Cloud to host the data files you want to import.
Step 3. Import the Postgres dump file
You can choose from several variants of the IMPORT
statement, depending on whether you want to import a full database or a single table:
- Import a full database dump
- Import a table from a full database dump
- Import a table from a table dump
Note that all of the IMPORT
statements in this section pull real data from Amazon S3 and will kick off background import jobs that you can monitor with SHOW JOBS
.
As of v21.2 IMPORT TABLE
will be deprecated. We recommend using CREATE TABLE
followed by IMPORT INTO
to import data into a new table. For an example, read Import into a new table from a CSV file.
To import data into an existing table, use IMPORT INTO
.
Import a full database dump
This example assumes you dumped the entire database.
The IMPORT
statement below reads the data and DDL statements (including existing foreign key relationships) from the full database dump file.
> IMPORT PGDUMP 'https://s3-us-west-1.amazonaws.com/cockroachdb-movr/datasets/employees-db/pg_dump/employees-full.sql.gz';
job_id | status | fraction_completed | rows | index_entries | system_records | bytes
--------------------+-----------+--------------------+--------+---------------+----------------+----------
381845110403104769 | succeeded | 1 | 300024 | 0 | 0 | 11534293
(1 row)
Import a table from a full database dump
This example assumes you dumped the entire database.
IMPORT
can import one table's data from a full database dump. It reads the data and applies any CREATE TABLE
statements from the file.
> CREATE DATABASE IF NOT EXISTS employees;
> USE employees;
> IMPORT TABLE employees FROM PGDUMP 'https://s3-us-west-1.amazonaws.com/cockroachdb-movr/datasets/employees-db/pg_dump/employees-full.sql.gz';
job_id | status | fraction_completed | rows | index_entries | system_records | bytes
--------------------+-----------+--------------------+--------+---------------+----------------+----------
383839294913871873 | succeeded | 1 | 300024 | 0 | 0 | 11534293
(1 row)
Import a table from a table dump
The examples below assume you dumped one table.
The simplest way to import a table dump is to run IMPORT TABLE
as shown below. It reads the table data and any CREATE TABLE
statements from the file.
> CREATE DATABASE IF NOT EXISTS employees;
> USE employees;
> IMPORT PGDUMP 'https://s3-us-west-1.amazonaws.com/cockroachdb-movr/datasets/employees-db/pg_dump/employees.sql.gz';
job_id | status | fraction_completed | rows | index_entries | system_records | bytes
--------------------+-----------+--------------------+--------+---------------+----------------+----------
383855569817436161 | succeeded | 1 | 300024 | 0 | 0 | 11534293
(1 row)
If you need to specify the table's columns for some reason, you can use an IMPORT TABLE
statement like the one below, which will import data but ignore any CREATE TABLE
statements in the file, instead relying on the columns you specify.
> IMPORT TABLE employees (
emp_no INT PRIMARY KEY,
birth_date DATE NOT NULL,
first_name STRING NOT NULL,
last_name STRING NOT NULL,
gender STRING NOT NULL,
hire_date DATE NOT NULL
)
PGDUMP DATA ('https://s3-us-west-1.amazonaws.com/cockroachdb-movr/datasets/employees-db/pg_dump/employees.sql.gz');
Configuration Options
The following options are available to IMPORT ... PGDUMP
:
Max row size
The max_row_size
option is used to override limits on line size. Default: 0.5MB. This setting may need to be tweaked if your Postgres dump file has extremely long lines, for example as part of a COPY
statement.
Example usage:
> IMPORT TABLE employees (
emp_no INT PRIMARY KEY,
birth_date DATE NOT NULL,
first_name STRING NOT NULL,
last_name STRING NOT NULL,
gender STRING NOT NULL,
hire_date DATE NOT NULL
)
PGDUMP DATA ('s3://your-external-storage/employees.sql?AWS_ACCESS_KEY_ID=123&AWS_SECRET_ACCESS_KEY=456') WITH max_row_size = '5MB';
Skip foreign keys
By default, IMPORT ... PGDUMP
supports foreign keys. Default: false. Add the skip_foreign_keys
option to speed up data import by ignoring foreign key constraints in the dump file's DDL. It will also enable you to import individual tables that would otherwise fail due to dependencies on other tables.
The most common dependency issues are caused by unsatisfied foreign key relationships. You can avoid these issues by adding the skip_foreign_keys
option to your IMPORT
statement as needed. For more information, see the list of import options.
For example, if you get the error message pq: there is no unique constraint matching given keys for referenced table tablename
, use IMPORT ... WITH skip_foreign_keys
.
Example usage:
> IMPORT TABLE employees (
emp_no INTEGER PRIMARY KEY,
birth_date DATE NOT NULL,
first_name STRING NOT NULL,
last_name STRING NOT NULL,
gender STRING NOT NULL,
hire_date DATE NOT NULL
) PGDUMP DATA ('s3://your-external-storage/employees.sql?AWS_ACCESS_KEY_ID=123&AWS_SECRET_ACCESS_KEY=456') WITH skip_foreign_keys;
Foreign key constraints can be added by using ALTER TABLE ... ADD CONSTRAINT
commands after importing the data.