This page has instructions for migrating data from Oracle into CockroachDB by importing CSV files. Note that IMPORT
only works for creating new tables. For information on how to add CSV data to existing tables, see IMPORT INTO
.
To illustrate this process, we use the following sample data and tools:
- Swingbench OrderEntry data set, which is based on the
oe
schema that ships with Oracle Database 11g and Oracle Database 12c. - Oracle Data Pump, which enables the movement of data and metadata from one database to another, and comes with all Oracle installations.
- SQL*Plus, the interactive and batch query tool that comes with every Oracle Database installation.
For best practices for optimizing import performance in CockroachDB, see Import Performance Best Practices.
Step 1. Export the Oracle schema
Using Oracle's Data Pump Export utility, export the schema:
$ expdp user/password directory=datapump dumpfile=oracle_example.dmp content=metadata_only logfile=example.log
The schema is stored in an Oracle-specific format (e.g., oracle_example.dmp
).
Step 2. Convert the Oracle schema to SQL
Using Oracle's Data Pump Import utility, load the exported DMP file to convert it to a SQL file:
$ impdp user/password directory=datapump dumpfile=oracle_example.dmp sqlfile=example_sql.sql TRANSFORM=SEGMENT_ATTRIBUTES:N:table PARTITION_OPTIONS=MERGE
This SQL output will be used later, in Step 7.
Step 3. Export table data
You need to extract each table's data into a data list file (.lst
). We wrote a simple SQL script (spool.sql
) to do this:
$ cat spool.sql
SET ECHO OFF
SET TERMOUT OFF
SET FEEDBACK OFF
SET PAGESIZE 0
SET TRIMSPOOL ON
SET WRAP OFF
set linesize 30000
SET RECSEP OFF
SET VERIFY OFF
SET ARRAYSIZE 10000
SET COLSEP '|'
SPOOL '&1'
ALTER SESSION SET nls_date_format = 'YYYY-MM-DD HH24:MI:SS';
# Used to set a properly formatted date for CockroachDB
SELECT * from &1;
SPOOL OFF
SET PAGESIZE 24
SET FEEDBACK ON
SET TERMOUT ON
In the example SQL script, |
is used as a delimiter. Choose a delimiter that will not also occur in the rows themselves. For more information, see IMPORT
.
To extract the data, we ran the script for each table in SQL*Plus:
$ sqlplus user/password
> @spool CUSTOMERS
@spool ADDRESSES
@spool CARD_DETAILS
@spool WAREHOUSES
@spool ORDER_ITEMS
@spool ORDERS
@spool INVENTORIES
@spool PRODUCT_INFORMATION
@spool LOGON
@spool PRODUCT_DESCRIPTIONS
@spool ORDERENTRY_METADATA
A data list file (.lst
) with leading and trailing spaces is created for each table.
Exit SQL*Plus:
> EXIT
Step 4. Configure and convert the table data to CSV
Each table's data list file needs to be converted to CSV and formatted for CockroachDB. We wrote a simple Python script (fix-example.py
) to do this:
$ cat fix-example.py
import csv
import string
import sys
for lstfile in sys.argv[1:]:
filename = lstfile.split(".")[0]
with open(sys.argv[1]) as f:
reader = csv.reader(f, delimiter="|")
with open(filename+".csv", "w") as fo:
writer = csv.writer(fo)
for rec in reader:
writer.writerow(map(string.strip, rec))
$ python2 fix.py CUSTOMERS.lst ADDRESSES.lst CARD_DETAILS.lst WAREHOUSES.lst ORDER_ITEMS.lst ORDERS.lst INVENTORIES.lst PRODUCT_INFORMATION.lst LOGON.lst PRODUCT_DESCRIPTIONS.lst ORDERENTRY_METADATA.lst
Format the generated CSV files to meet the CockroachDB's CSV requirements.
CSV requirements
You will need to export one CSV file per table, with the following requirements:
- Files must be in valid CSV format.
- Files must be UTF-8 encoded.
- If one of the following characters appears in a field, the field must be enclosed by double quotes:
- delimiter (
,
by default) - double quote (
"
) - newline (
\n
) - carriage return (
\r
)
- delimiter (
- If double quotes are used to enclose fields, then a double quote appearing inside a field must be escaped by preceding it with another double quote. For example:
"aaa","b""bb","ccc"
- If a column is of type
BYTES
, it can either be a valid UTF-8 string or a hex-encoded byte literal beginning with\x
. For example, a field whose value should be the bytes1
,2
would be written as\x0102
.
CSV configuration options
The following options are available to IMPORT ... CSV
:
For usage examples, see Migrate from CSV - Configuration Options.
Step 5. Compress the CSV files
Compress the CSV files for a faster import:
$ gzip CUSTOMERS.csv ADDRESSES.csv CARD_DETAILS.csv WAREHOUSES.csv ORDER_ITEMS.csv ORDERS.csv INVENTORIES.csv PRODUCT_INFORMATION.csv LOGON.csv PRODUCT_DESCRIPTIONS.csv ORDERENTRY_METADATA.csv
These compressed CSV files will be used to import your data into CockroachDB.
Step 6. Host the files where the cluster can access them
Each node in the CockroachDB cluster needs to have access to the files being imported. There are several ways for the cluster to access the data; for a complete list of the types of storage IMPORT
can pull from, see Import File URLs.
We strongly recommend using cloud storage such as Amazon S3 or Google Cloud to host the data files you want to import.
Step 7. Map Oracle to CockroachDB data types
Using the SQL file created in Step 2, write IMPORT TABLE
statements that match the schemas of the table data you're importing.
Remove all Oracle-specific attributes, remap all Oracle data types, refactor all CREATE TABLE
statements to include primary keys.
Data type mapping
Use the table below for data type mappings:
Oracle Data Type | CockroachDB Data Type |
---|---|
BLOB |
BYTES 1 |
CHAR(n) , CHARACTER(n) n < 256 |
CHAR(n) , CHARACTER(n) |
CLOB |
STRING 1 |
DATE |
DATE |
FLOAT(n) |
DECIMAL(n) |
INTERVAL YEAR(p) TO MONTH |
VARCHAR , INTERVAL |
INTERVAL DAY(p) TO SECOND(s) |
VARCHAR , INTERVAL |
JSON |
JSON 2 |
LONG |
STRING |
LONG RAW |
BYTES |
NCHAR(n) n < 256 |
CHAR(n) |
NCHAR(n) n > 255 |
VARCHAR , STRING |
NCLOB |
STRING |
NUMBER(p,0) , NUMBER(p) 1 <= p < 5 |
INT2 3 |
NUMBER(p,0) , NUMBER(p) 5 <= p < 9 |
INT4 3 |
NUMBER(p,0) , NUMBER(p) 9 <= p < 19 |
INT8 3 |
NUMBER(p,0) , NUMBER(p) 19 <= p <= 38 |
DECIMAL(p) |
NUMBER(p,s) s > 0 |
DECIMAL(p,s) |
NUMBER , NUMBER(\*) |
DECIMAL |
NVARCHAR2(n) |
VARCHAR(n) |
RAW(n) |
BYTES |
TIMESTAMP(p) |
TIMESTAMP |
TIMESTAMP(p) WITH TIME ZONE |
TIMESTAMP WITH TIMEZONE |
VARCHAR(n) , VARCHAR2(n) |
VARCHAR(n) |
XML |
JSON 2 |
- 1
BLOBS
andCLOBS
should be converted toBYTES
, orSTRING
where the size is variable, but it's recommended to keep values under 1 MB to ensure performance. Anything above 1 MB would require refactoring into an object store with a pointer embedded in the table in place of the object. - 2
JSON
andXML
types can be converted toJSONB
using any XML to JSON conversion.XML
must be converted toJSONB
before importing into CockroachDB. - 3 When converting
NUMBER(p,0)
, considerNUMBER
types with Base-10 limits map to the Base-10 Limits for CockroachDBINT
types. Optionally,NUMBERS
can be converted toDECIMAL
.
When moving from Oracle to CockroachDB data types, consider the following:
- Schema changes within transactions
- Schema changes between executions of prepared statements
- If
JSON
columns are used only for payload, consider switching toBYTES
. - Max size of a single column family (512 MiB by default).
For more information, see Known Limitations, Online Schema Changes, and Transactions.
NULLs
For information on how CockroachDB handles NULL
s, see NULL Handling and NOT NULL Constraint.
Primary key, constraints, and secondary indexes
Cockroach distributes a table by the primary key or by a default ROWID
when a primary key is not provided. This also requires the primary key creation to be part of the table creation. Using the above data type mapping, refactor each table DDL to include the primary key, constraints, and secondary indexes.
For more information and examples, refer to the following:
Privileges for users and roles
The Oracle privileges for users and roles must be rewritten for CockroachDB. Once the CockroachDB cluster is secured, CockroachDB follows the same role-based access control methodology as Oracle.
Step 8. Import the CSV
For example, to import the data from CUSTOMERS.csv.gz
into a new CUSTOMERS
table, issue the following statement in the CockroachDB SQL shell:
> IMPORT TABLE customers (
customer_id DECIMAL
NOT NULL
PRIMARY KEY,
cust_first_name VARCHAR(40) NOT NULL,
cust_last_name VARCHAR(40) NOT NULL,
nls_language VARCHAR(3),
nls_territory VARCHAR(30),
credit_limit DECIMAL(9,2),
cust_email VARCHAR(100),
account_mgr_id DECIMAL,
customer_since DATE,
customer_class VARCHAR(40),
suggestions VARCHAR(40),
dob DATE,
mailshot VARCHAR(1),
partner_mailshot VARCHAR(1),
preferred_address DECIMAL,
preferred_card DECIMAL,
INDEX cust_email_ix (cust_email),
INDEX cust_dob_ix (dob),
INDEX cust_account_manager_ix (
account_mgr_id
)
)
CSV DATA (
'https://your-bucket-name.s3.us-east-2.amazonaws.com/CUSTOMERS.csv.gz'
)
WITH delimiter = e'\t',
"nullif" = '',
decompress = 'gzip';
job_id | status | fraction_completed | rows | index_entries | system_records | bytes
--------------------+-----------+--------------------+--------+---------------+----------------+----------
381866942129111041 | succeeded | 1 | 300024 | 0 | 0 | 13258389
(1 row)
To import data into an existing table, use IMPORT INTO
.
Then add the computed columns, constraints, and function-based indexes. For example:
> UPDATE CUSTOMERS SET credit_limit = 50000 WHERE credit_limit > 50000;
ALTER TABLE CUSTOMERS ADD CONSTRAINT CUSTOMER_CREDIT_LIMIT_MAX CHECK (credit_limit <= 50000);
ALTER TABLE CUSTOMERS ADD COLUMN LOW_CUST_LAST_NAME STRING AS (lower(CUST_LAST_NAME)) STORED;
ALTER TABLE CUSTOMERS ADD COLUMN LOW_CUST_FIRST_NAME STRING AS (lower(CUST_FIRST_NAME)) STORED;
CREATE INDEX CUST_FUNC_LOWER_NAME_IX on CUSTOMERS (LOW_CUST_LAST_NAME,CUST_FIRST_NAME);
Repeat the above for each CSV file you want to import.
Step 9. Refactor application SQL
The last phase of the migration process is to change the transactional behavior and SQL dialect of your application.
Transactions, locking, and concurrency control
Both Oracle and CockroachDB support multi-statement transactions, which are atomic and guarantee ACID semantics. However, CockroachDB operates in a serializable isolation mode while Oracle defaults to read committed, which can create both non-repeatable reads and phantom reads when a transaction reads data twice. It is typical that Oracle developers will use SELECT FOR UPDATE
to work around read committed issues. In CockroachDB v20.1 and later, the SELECT FOR UPDATE
statement is also supported.
Regarding locks, Cockroach utilizes a lightweight latch to serialize access to common keys across concurrent transactions. Oracle and CockroachDB transaction control flows only have a few minor differences; for more details, refer to Transactions - SQL statements.
As CockroachDB does not allow serializable anomalies, transactions may experience deadlocks or read/write contention. This is expected during concurrency on the same keys. These can be addressed with either automatic retries or client-side intervention techniques.
SQL dialect
Cockroach is ANSI SQL compliant with a Postgres dialect, which allows you to use native drivers to connect applications and ORMs to CockroachDB. CockroachDB’s SQL Layer supports full relational schema and SQL (similar to Oracle).
You will have to refactor Oracle SQL and functions that do not comply with ANSI SQL-92 in order to work with CockroachDB. For more information about the Cockroach SQL Grammar and a SQL comparison, see below:
- SQL best practices
- Common table expressions (CTE)
DUAL
tableOracle requires use of the
DUAL
table, as Oracle requires aSELECT ... FROM
. In CockroachDB, all reference to theDUAL
table should be eliminated.-
CockroachDB supports
HASH
,MERGE
, andLOOKUP
joins. Oracle uses the+
operator forLEFT
andRIGHT
joins, but CockroachDB uses the ANSI join syntax. -
Sequences in CockroachDB do not require a trigger to self-increment; place the sequence in the table DDL:
> CREATE TABLE customer_list ( id INT PRIMARY KEY DEFAULT nextval('customer_seq'), customer string, address string );
SYSDATE
CockroachDB does not support
SYSDATE
; however, it does support date and time with the following:> SELECT Transaction_timestamp(), clock_timestamp();
> SELECT current_timestamp
> SELECT now();
-
CockroachDB does not have a materialized view implementation.