How do I bulk insert data into CockroachDB?
Currently, you can bulk insert data with batches of INSERT
statements not exceeding a few MB. The size of your rows determines how many you can use, but 1,000 - 10,000 rows typically works best. For more details, see Import Data.
How do I auto-generate unique row IDs in CockroachDB?
To auto-generate unique row IDs, use the UUID
column with the gen_random_uuid()
function as the default value:
> CREATE TABLE t1 (id UUID PRIMARY KEY DEFAULT gen_random_uuid(), name STRING);
> INSERT INTO t1 (name) VALUES ('a'), ('b'), ('c');
> SELECT * FROM t1;
+--------------------------------------+------+
| id | name |
+--------------------------------------+------+
| 60853a85-681d-4620-9677-946bbfdc8fbc | c |
| 77c9bc2e-76a5-4ebc-80c3-7ad3159466a1 | b |
| bd3a56e1-c75e-476c-b221-0da9d74d66eb | a |
+--------------------------------------+------+
(3 rows)
Alternatively, you can use the BYTES
column with the uuid_v4()
function as the default value instead:
> CREATE TABLE t2 (id BYTES PRIMARY KEY DEFAULT uuid_v4(), name STRING);
> INSERT INTO t2 (name) VALUES ('a'), ('b'), ('c');
> SELECT * FROM t2;
+---------------------------------------------------+------+
| id | name |
+---------------------------------------------------+------+
| "\x9b\x10\xdc\x11\x9a\x9cGB\xbd\x8d\t\x8c\xf6@vP" | a |
| "\xd9s\xd7\x13\n_L*\xb0\x87c\xb6d\xe1\xd8@" | c |
| "\uac74\x1dd@B\x97\xac\x04N&\x9eBg\x86" | b |
+---------------------------------------------------+------+
(3 rows)
In either case, generated IDs will be 128-bit, large enough for there to be virtually no chance of generating non-unique values. Also, once the table grows beyond a single key-value range (more than 64MB by default), new IDs will be scattered across all of the table's ranges and, therefore, likely across different nodes. This means that multiple nodes will share in the load.
If it is important for generated IDs to be stored in the same key-value range, you can use an integer type with the unique_rowid()
function as the default value, either explicitly or via the SERIAL
pseudo-type:
> CREATE TABLE t3 (id INT PRIMARY KEY DEFAULT unique_rowid(), name STRING);
> INSERT INTO t3 (name) VALUES ('a'), ('b'), ('c');
> SELECT * FROM t3;
+--------------------+------+
| id | name |
+--------------------+------+
| 293807573840855041 | a |
| 293807573840887809 | b |
| 293807573840920577 | c |
+--------------------+------+
(3 rows)
Upon insert, the unique_rowid()
function generates a default value from the timestamp and ID of the node executing the insert. Such time-ordered values are likely to be globally unique except in cases where a very large number of IDs (100,000+) are generated per node per second.
How do I generate unique, slowly increasing sequential numbers in CockroachDB?
Sequential numbers can be generated in CockroachDB using the unique_rowid()
built-in function or using SQL sequences. However, note the following considerations:
- Unless you need roughly-ordered numbers, we recommend using
UUID
values instead. See the previous FAQ for details. - Sequences produce unique values. However, not all values are guaranteed to be produced (e.g., when a transaction is canceled after it consumes a value) and the values may be slightly reordered (e.g., when a transaction that consumes a lower sequence number commits after a transaction that consumes a higher number).
- For maximum performance, avoid using sequences or
unique_rowid()
to generate row IDs or indexed columns. Values generated in these ways are logically close to each other and can cause contention on few data ranges during inserts. Instead, preferUUID
identifiers.
What are the differences between UUID
, sequences, and unique_rowid()
?
Property | UUID generated with uuid_v4() |
INT generated with unique_rowid() |
Sequences |
---|---|---|---|
Size | 16 bytes | 8 bytes | 1 to 8 bytes |
Ordering properties | Unordered | Highly time-ordered | Highly time-ordered |
Performance cost at generation | Small, scalable | Small, scalable | Variable, can cause contention |
Value distribution | Uniformly distributed (128 bits) | Contains time and space (node ID) components | Dense, small values |
Data locality | Maximally distributed | Values generated close in time are co-located | Highly local |
INSERT latency when used as key |
Small, insensitive to concurrency | Small, but increases with concurrent INSERTs | Higher |
INSERT throughput when used as key |
Highest | Limited by max throughput on 1 node | Limited by max throughput on 1 node |
Read throughput when used as key | Highest (maximal parallelism) | Limited | Limited |
How do I order writes to a table to closely follow time in CockroachDB?
Most use cases that ask for a strong time-based write ordering can be solved with other, more distribution-friendly
solutions instead. For example, CockroachDB's time travel queries (AS OF SYSTEM
TIME
) support the following:
- Paginating through all the changes to a table or dataset
- Determining the order of changes to data over time
- Determining the state of data at some point in the past
- Determining the changes to data between two points of time
Consider also that the values generated by unique_rowid()
, described in the previous FAQ entries, also provide an approximate time ordering.
However, if your application absolutely requires strong time-based write ordering, it is possible to create a strictly monotonic counter in CockroachDB that increases over time as follows:
- Initially:
CREATE TABLE cnt(val INT PRIMARY KEY); INSERT INTO cnt(val) VALUES(1);
- In each transaction:
INSERT INTO cnt(val) SELECT max(val)+1 FROM cnt RETURNING val;
This will cause INSERT
transactions to conflict with each other and effectively force the transactions to commit one at a time throughout the cluster, which in turn guarantees the values generated in this way are strictly increasing over time without gaps. The caveat is that performance is severely limited as a result.
If you find yourself interested in this problem, please contact us and describe your situation. We would be glad to help you find alternative solutions and possibly extend CockroachDB to better match your needs.
How do I get the last ID/SERIAL value inserted into a table?
There’s no function in CockroachDB for returning last inserted values, but you can use the RETURNING
clause of the INSERT
statement.
For example, this is how you’d use RETURNING
to return a value auto-generated via unique_rowid()
or SERIAL
:
> CREATE TABLE users (id SERIAL, name STRING);
> INSERT INTO users (name) VALUES ('mike') RETURNING id;
What is transaction contention?
Transaction contention occurs when transactions issued from multiple clients at the same time operate on the same data. This can cause transactions to wait on each other and decrease performance, like when many people try to check out with the same cashier at a store.
For more information about contention, see Understanding and Avoiding Transaction Contention.
Does CockroachDB support JOIN
?
CockroachDB supports uncorrelated SQL joins.
At this time, LATERAL
(correlated) joins are not yet supported.
When should I use interleaved tables?
Interleaving tables improves query performance by optimizing the key-value structure of closely related tables, attempting to keep data on the same key-value range if it's likely to be read and written together.
You're most likely to benefit from interleaved tables when:
- Your tables form a hierarchy
- Queries maximize the benefits of interleaving
- Queries do not suffer too greatly from interleaving's tradeoffs
Does CockroachDB support JSON or Protobuf datatypes?
Yes, as of v2.0, the JSONB
data type is supported.
How do I know which index CockroachDB will select for a query?
To see which indexes CockroachDB is using for a given query, you can use the EXPLAIN
statement, which will print out the query plan, including any indexes that are being used:
> EXPLAIN SELECT col1 FROM tbl1;
If you'd like to tell the query planner which index to use, you can do so via some special syntax for index hints:
> SELECT col1 FROM tbl1@idx1;
How do I log SQL queries?
There are several ways to log SQL queries. The type of logging you use will depend on your requirements.
- For per-table audit logs, turn on SQL audit logs.
- For system troubleshooting and performance optimization, turn on cluster-wide execution logs.
- For local testing, turn on per-node execution logs.
SQL audit logs
This is an experimental feature. The interface and output are subject to change.
SQL audit logging is useful if you want to log all queries that are run against specific tables.
For a tutorial, see SQL Audit Logging.
For SQL reference documentation, see
ALTER TABLE ... EXPERIMENTAL_AUDIT
.
Cluster-wide execution logs
For production clusters, the best way to log all queries is to turn on the cluster-wide setting sql.trace.log_statement_execute
:
> SET CLUSTER SETTING sql.trace.log_statement_execute = true;
With this setting on, each node of the cluster writes all SQL queries it executes to its log file. When you no longer need to log queries, you can turn the setting back off:
> SET CLUSTER SETTING sql.trace.log_statement_execute = false;
Per-node execution logs
Alternatively, if you are testing CockroachDB locally and want to log queries executed just by a specific node, you can either pass a CLI flag at node startup, or execute a SQL function on a running node.
Using the CLI to start a new node, pass the --vmodule
flag to the cockroach start
command. For example, to start a single node locally and log all SQL queries it executes, you'd run:
$ cockroach start --insecure --host=localhost --vmodule=exec_log=2
From the SQL prompt on a running node, execute the crdb_internal.set_vmodule()
function:
> SELECT crdb_internal.set_vmodule('exec_log=2');
This will result in the following output:
+---------------------------+
| crdb_internal.set_vmodule |
+---------------------------+
| 0 |
+---------------------------+
(1 row)
Once the logging is enabled, all of the node's queries will be written to the CockroachDB log file as follows:
I180402 19:12:28.112957 394661 sql/exec_log.go:173 [n1,client=127.0.0.1:50155,user=root] exec "psql" {} "SELECT version()" {} 0.795 1 ""
Does CockroachDB support a UUID type?
Yes. For more details, see UUID
.
How does CockroachDB sort results when ORDER BY
is not used?
When an ORDER BY
clause is not used in a query, rows are processed or returned in a
non-deterministic order. "Non-deterministic" means that the actual order
can depend on the logical plan, the order of data on disk, the topology
of the CockroachDB cluster, and is generally variable over time.
Why are my INT
columns returned as strings in JavaScript?
In CockroachDB, all INT
s are represented with 64 bits of precision, but JavaScript numbers only have 53 bits of precision. This means that large integers stored in CockroachDB are not exactly representable as JavaScript numbers. For example, JavaScript will round the integer 235191684988928001
to the nearest representable value, 235191684988928000
. Notice that the last digit is different. This is particularly problematic when using the unique_rowid()
function, since unique_rowid()
nearly always returns integers that require more than 53 bits of precision to represent.
To avoid this loss of precision, Node's pg
driver will, by default, return all CockroachDB INT
s as strings.
// Schema: CREATE TABLE users (id INT DEFAULT unique_rowid(), name STRING);
pgClient.query("SELECT id FROM users WHERE name = 'Roach' LIMIT 1", function(err, res) {
var idString = res.rows[0].id;
// idString === '235191684988928001'
// typeof idString === 'string'
});
To perform another query using the value of idString
, you can simply use idString
directly, even where an INT
type is expected. The string will automatically be coerced into a CockroachDB INT
.
pgClient.query("UPDATE users SET name = 'Ms. Roach' WHERE id = $1", [idString], function(err, res) {
// All should be well!
});
If you instead need to perform arithmetic on INT
s in JavaScript, you will need to use a big integer library like Long.js. Do not use the built-in parseInt
function.
parseInt(idString, 10) + 1; // WRONG: returns 235191684988928000
require('long').fromString(idString).add(1).toString(); // GOOD: returns '235191684988928002'