This page describes newly identified limitations in the CockroachDB v19.2.12 release as well as unresolved limitations identified in earlier releases.
New limitations
Collation names that include upper-case or hyphens may cause errors
Using a collation name with upper-case letters or hyphens may result in errors.
For example, the following SQL will result in an error:
> CREATE TABLE nocase_strings (s STRING COLLATE "en-US-u-ks-level2");
> INSERT INTO nocase_strings VALUES ('Aaa' COLLATE "en-US-u-ks-level2"), ('Bbb' COLLATE "en-US-u-ks-level2");
> SELECT s FROM nocase_strings WHERE s = ('bbb' COLLATE "en-US-u-ks-level2");
ERROR: internal error: "$0" = 'bbb' COLLATE en_us_u_ks_level2: unsupported comparison operator: <collatedstring{en-US-u-ks-level2}> = <collatedstring{en_us_u_ks_level2}>
As a workaround, only use collation names that have lower-case letters and underscores.
CHECK
constraint validation for INSERT ON CONFLICT
differs from PostgreSQL
CockroachDB validates CHECK
constraints on the results of INSERT ON CONFLICT
statements, preventing new or changed rows from violating the constraint. Unlike PostgreSQL, CockroachDB does not also validate CHECK
constraints on the input rows of INSERT ON CONFLICT
statements.
If this difference matters to your client, you can INSERT ON CONFLICT
from a SELECT
statement and check the inserted value as part of the SELECT
. For example, instead of defining CHECK (x > 0)
on t.x
and using INSERT INTO t(x) VALUES (3) ON CONFLICT (x) DO UPDATE SET x = excluded.x
, you could do the following:
> INSERT INTO t (x)
SELECT if (x <= 0, crdb_internal.force_error('23514', 'check constraint violated'), x)
FROM (values (3)) AS v(x)
ON CONFLICT (x)
DO UPDATE SET x = excluded.x;
An x
value less than 1
would result in the following error:
pq: check constraint violated
Subqueries in SET
statements
It is not currently possible to use a subquery in a SET
or SET CLUSTER SETTING
statement. For example:
> SET application_name = (SELECT 'a' || 'b');
*
* ERROR: [n1,client=127.0.0.1:53279,user=root] Reported as error fab91916eda440cb9d85b4b91d49d3b1
*
*
* ERROR: [n1,client=127.0.0.1:53279,user=root] Reported as error 271c8808b0e64bde95ba7e853fda9eb7
*
pq: internal error: invalid index 1 for "(SELECT 'ab')"
Unresolved limitations
Filtering by now()
results in a full table scan
When filtering a query by now()
, the cost-based optimizer currently cannot constrain an index on the filtered timestamp column. This results in a full table scan. For example:
> CREATE TABLE bydate (a TIMESTAMP NOT NULL, INDEX (a));
> EXPLAIN SELECT * FROM bydate WHERE a > (now() - '1h'::interval);
tree | field | description
-------+-------------+---------------------------
| distributed | true
| vectorized | false
scan | |
| table | bydate@primary
| spans | FULL SCAN
| filter | a > (now() - '01:00:00')
(6 rows)
As a workaround, pass the correct date into the query as a parameter to a prepared query with a placeholder, which will allow the optimizer to constrain the index correctly:
> PREPARE q AS SELECT * FROM bydate WHERE a > ($1::timestamp - '1h'::interval);
> EXECUTE q ('2020-05-12 00:00:00');
Enterprise BACKUP
does not capture database/table/column comments
The COMMENT ON
statement associates comments to databases, tables, or columns. However, the internal table (system.comments
) in which these comments are stored is not captured by enterprise BACKUP
.
As a workaround, alongside a BACKUP
, run the cockroach dump
command with --dump-mode=schema
for each table in the backup. This will emit COMMENT ON
statements alongside CREATE
statements.
Adding stores to a node
After a node has initially joined a cluster, it is not possible to add additional stores to the node. Stopping the node and restarting it with additional stores causes the node to not reconnect to the cluster.
To work around this limitation, decommission the node, remove its data directory, and then run cockroach start
to join the cluster again as a new node.
Cold starts of large clusters may require manual intervention
If a cluster contains a large amount of data (>500GiB / node), and all nodes are stopped and then started at the same time, clusters can enter a state where they're unable to startup without manual intervention. In this state, logs fill up rapidly with messages like refusing gossip from node x; forwarding to node y
, and data and metrics may become inaccessible.
To exit this state, you should:
- Stop all nodes.
- Set the following environment variables:
COCKROACH_SCAN_INTERVAL=60m
, andCOCKROACH_SCAN_MIN_IDLE_TIME=1s
. - Restart the cluster.
Once restarted, monitor the Replica Quiescence graph on the Replication Dashboard. When >90% of the replicas have become quiescent, conduct a rolling restart and remove the environment variables. Make sure that under-replicated ranges do not increase between restarts.
Once in a stable state, the risk of this issue recurring can be mitigated by increasing your range_max_bytes
to 134217728 (128MiB). We always recommend testing changes to range_max_bytes
in a development environment before making changes on production.
Requests to restarted node in need of snapshots may hang
When a node is offline, the Raft logs for the ranges on the node get truncated. When the node comes back online, it therefore often needs Raft snapshots to get many of its ranges back up-to-date. While in this state, requests to a range will hang until its snapshot has been applied, which can take a long time.
To work around this limitation, you can adjust the kv.snapshot_recovery.max_rate
cluster setting to temporarily relax the throughput rate limiting applied to snapshots. For example, changing the rate limiting from the default 8 MB/s, at which 1 GB of snapshots takes at least 2 minutes, to 64 MB/s can result in an 8x speedup in snapshot transfers and, therefore, a much shorter interruption of requests to an impacted node:
> SET CLUSTER SETTING kv.snapshot_recovery.max_rate = '64mb';
Before increasing this value, however, verify that you will not end up saturating your network interfaces, and once the problem has resolved, be sure to reset to the original value.
Location-based time zone names
When the machine running a CockroachDB node is missing time zone data, the node will be unable to resolve location-based time zone names.
To resolve this issue on Linux, install the tzdata
library (sometimes called tz
or zoneinfo
).
To resolve this issue on Windows, download Go's official zoneinfo.zip and set the ZONEINFO
environment variable to point to the zip file. For step-by-step guidance on setting environment variables on Windows, see this external article.
Make sure to do this across all nodes in the cluster and to keep this time zone data up-to-date.
Change data capture
Change data capture (CDC) provides efficient, distributed, row-level change feeds into Apache Kafka for downstream processing such as reporting, caching, or full-text indexing.
The following are limitations in the current release and will be addressed in the future:
The following are limitations in the v19.2 release and will be addressed in the future:
- Changefeeds only work on tables with a single column family (which is the default for new tables).
- Changefeeds do not share internal buffers, so each running changefeed will increase total memory usage. To watch multiple tables, we recommend creating a changefeed with a comma-separated list of tables.
- Many DDL queries (including
TRUNCATE
andDROP TABLE
) will cause errors on a changefeed watching the affected tables. You will need to start a new changefeed. - Changefeeds cannot be backed up or restored.
- Partial or intermittent sink unavailability may impact changefeed stability; however, ordering guarantees will still hold for as long as a changefeed remains active.
- Changefeeds cannot be altered. To alter, cancel the changefeed and create a new one with updated settings from where it left off.
- Additional target options will be added, including partitions and ranges of primary key rows.
- Changefeeds do not pick up data ingested with the
IMPORT INTO
statement. - Using a cloud storage sink only works with
JSON
and emits newline-delimited json files.
Admin UI may become inaccessible for secure clusters
Accessing the Admin UI for a secure cluster now requires login information (i.e., username and password). This login information is stored in a system table that is replicated like other data in the cluster. If a majority of the nodes with the replicas of the system table data go down, users will be locked out of the Admin UI.
AS OF SYSTEM TIME
in SELECT
statements
AS OF SYSTEM TIME
can only be used in a top-level SELECT
statement. That is, we do not support statements like INSERT INTO t SELECT * FROM t2 AS OF SYSTEM TIME <time>
or two subselects in the same statement with differing AS OF SYSTEM TIME
arguments.
Large index keys can impair performance
The use of tables with very large primary or secondary index keys (>32KB) can result in excessive memory usage. Specifically, if the primary or secondary index key is larger than 32KB the default indexing scheme for RocksDB SSTables breaks down and causes the index to be excessively large. The index is pinned in memory by default for performance.
To work around this issue, we recommend limiting the size of primary and secondary keys to 4KB, which you must account for manually. Note that most columns are 8B (exceptions being STRING
and JSON
), which still allows for very complex key structures.
Admin UI: Statements page latency reports
The Statements page does not correctly report "mean latency" or "latency by phase" for statements that result in schema changes or other background jobs.
Using LIKE...ESCAPE
in WHERE
and HAVING
constraints
CockroachDB tries to optimize most comparisons operators in WHERE
and HAVING
clauses into constraints on SQL indexes by only accessing selected rows. This is done for LIKE
clauses when a common prefix for all selected rows can be determined in the search pattern (e.g., ... LIKE 'Joe%'
). However, this optimization is not yet available if the ESCAPE
keyword is also used.
Using SQLAlchemy with CockroachDB
Users of the SQLAlchemy adapter provided by Cockroach Labs must upgrade the adapter to the latest release before upgrading to CockroachDB v19.2.
Admin UI: CPU percentage calculation
For multi-core systems, the user CPU percent can be greater than 100%. Full utilization of one core is considered as 100% CPU usage. If you have n cores, then the user CPU percent can range from 0% (indicating an idle system) to (n*100)% (indicating full utilization).
Admin UI: CPU count in containerized environments
When CockroachDB is run in a containerized environment (e.g., Kubernetes), the Admin UI does not detect CPU limits applied to a container. Instead, the UI displays the actual number of CPUs provisioned on a VM.
TRUNCATE
does not behave like DELETE
TRUNCATE
is not a DML statement, but instead works as a DDL statement. Its limitations are the same as other DDL statements, which are outlined in Online Schema Changes: Limitations
Cannot DELETE
multiple rows with self-referencing FKs
Because CockroachDB checks foreign keys eagerly (i.e., per row), it cannot trivially delete multiple rows from a table with a self-referencing foreign key.
To successfully delete multiple rows with self-referencing foreign keys, you need to ensure they're deleted in an order that doesn't violate the foreign key constraint.
DISTINCT
operations cannot operate over JSON values
CockroachDB does not currently key-encode JSON values, which prevents DISTINCT
filters from working on them.
As a workaround, you can return the JSON field's values to a string
using the ->>
operator, e.g., SELECT DISTINCT col->>'field'...
.
Current sequence value not checked when updating min/max value
Altering the minimum or maximum value of a series does not check the current value of a series. This means that it is possible to silently set the maximum to a value less than, or a minimum value greater than, the current value.
Using common table expressions in VALUES
and UNION
clauses
When the cost-based optimizer is disabled, or when it does not support a query, a common table expression defined outside of a VALUES
or UNION
clause will not be available inside it. For example ...WITH a AS (...) SELECT ... FROM (VALUES(SELECT * FROM a))
.
This limitation will be lifted when the cost-based optimizer covers all queries. Until then, applications can work around this limitation by including the entire CTE query in the place where it is used.
Using default_int_size
session variable in batch of statements
When setting the default_int_size
session variable in a batch of statements such as SET default_int_size='int4'; SELECT 1::IN
, the default_int_size
variable will not take affect until the next statement. This happens because statement parsing takes place asynchronously from statement execution.
As a workaround, set default_int_size
via your database driver, or ensure that SET default_int_size
is in its own statement.
Importing data using the PostgreSQL COPY protocol
Currently, the built-in SQL shell provided with CockroachDB (cockroach sql
/ cockroach demo
) does not support importing data using the COPY
statement. Users can use the psql
client command provided with PostgreSQL to load this data into CockroachDB instead.
Dumping a table with no user-visible columns
It is not currently possible to use cockroach dump
to dump the schema and data of a table with no user-defined columns. See #35462 for more details.
Import with a high amount of disk contention
IMPORT
can sometimes fail with a "context canceled" error, or can restart itself many times without ever finishing. If this is happening, it is likely due to a high amount of disk contention. This can be mitigated by setting the kv.bulk_io_write.max_rate
cluster setting to a value below your max disk write speed. For example, to set it to 10MB/s, execute:
> SET CLUSTER SETTING kv.bulk_io_write.max_rate = '10MB';
Assigning latitude/longitude for the Node Map
You cannot assign latitude/longitude coordinates to localities if the components of your localities have the same name. For example, consider the following partial configuration:
Node | Region | Datacenter |
---|---|---|
Node1 | us-east | datacenter-1 |
Node2 | us-west | datacenter-1 |
In this case, if you try to set the latitude/longitude coordinates to the datacenter level of the localities, you will get the "primary key exists" error and the Node Map will not be displayed. You can, however, set the latitude/longitude coordinates to the region components of the localities, and the Node Map will be displayed.
Placeholders in PARTITION BY
When defining a table partition, either during table creation or table alteration, it is not possible to use placeholders in the PARTITION BY
clause.
Adding a column with sequence-based DEFAULT
values
It is currently not possible to add a column to a table when the column uses a sequence as the DEFAULT
value, for example:
> CREATE TABLE t (x INT);
> INSERT INTO t(x) VALUES (1), (2), (3);
> CREATE SEQUENCE s;
> ALTER TABLE t ADD COLUMN y INT DEFAULT nextval('s');
ERROR: nextval(): unimplemented: cannot evaluate scalar expressions containing sequence operations in this context
SQLSTATE: 0A000
Available capacity metric in the Admin UI
If you are running multiple nodes on a single machine (not recommended in production) and didn't specify the maximum allocated storage capacity for each node using the --store
flag, the capacity metrics in the Admin UI are incorrect. This is because when multiple nodes are running on a single machine, the machine's hard disk is treated as an available store for each node, while in reality, only one hard disk is available for all nodes. The total available capacity is then calculated as the hard disk size multiplied by the number of nodes on the machine.
Schema changes within transactions
Within a single transaction:
- DDL statements cannot be mixed with DML statements. As a workaround, you can split the statements into separate transactions. For more details, see examples of unsupported statements.
- A
CREATE TABLE
statement containingFOREIGN KEY
orINTERLEAVE
clauses cannot be followed by statements that reference the new table. - A table name cannot be reused. For example, you cannot drop a table named
a
and then create (or rename) a different table with the namea
. Similarly, you cannot rename a table nameda
tob
and then create (or rename) a different table with the namea
. As a workaround, splitALTER TABLE ... RENAME TO
,DROP TABLE
, andCREATE TABLE
statements that reuse table names into separate transactions. - Schema change DDL statements inside a multi-statement transaction can fail while other statements succeed
As of version v2.1, you can run schema changes inside the same transaction as a CREATE TABLE
statement. For more information, see this example. Also, as of v19.1, some schema changes can be used in combination in a single ALTER TABLE
statement. For a list of commands that can be combined, see ALTER TABLE
. For a demonstration, see Add and rename columns atomically.
Schema change DDL statements inside a multi-statement transaction can fail while other statements succeed
Schema change DDL statements that run inside a multi-statement transaction with non-DDL statements can fail at COMMIT
time, even if other statements in the transaction succeed. This leaves such transactions in a "partially committed, partially aborted" state that may require manual intervention to determine whether the DDL statements succeeded.
New in v19.2: If such a failure occurs, CockroachDB will emit a new CockroachDB-specific error code, XXA00
, and the following error message:
transaction committed but schema change aborted with error: <description of error>
HINT: Some of the non-DDL statements may have committed successfully, but some of the DDL statement(s) failed.
Manual inspection may be required to determine the actual state of the database.
This limitation exists in versions of CockroachDB prior to 19.2. In these older versions, CockroachDB returned the Postgres error code 40003
, "statement completion unknown"
.
If you must execute schema change DDL statements inside a multi-statement transaction, we strongly recommend checking for this error code and handling it appropriately every time you execute such transactions.
This error will occur in various scenarios, including but not limited to:
- Creating a unique index fails because values aren't unique.
- The evaluation of a computed value fails.
- Adding a constraint (or a column with a constraint) fails because the constraint is violated for the default/computed values in the column.
To see an example of this error, start by creating the following table.
CREATE TABLE T(x INT);
INSERT INTO T(x) VALUES (1), (2), (3);
Then, enter the following multi-statement transaction, which will trigger the error.
BEGIN;
ALTER TABLE t ADD CONSTRAINT unique_x UNIQUE(x);
INSERT INTO T(x) VALUES (3);
COMMIT;
pq: transaction committed but schema change aborted with error: (23505): duplicate key value (x)=(3) violates unique constraint "unique_x"
HINT: Some of the non-DDL statements may have committed successfully, but some of the DDL statement(s) failed.
Manual inspection may be required to determine the actual state of the database.
In this example, the INSERT
statement committed, but the ALTER TABLE
statement adding a UNIQUE
constraint failed. We can verify this by looking at the data in table t
and seeing that the additional non-unique value 3
was successfully inserted.
SELECT * FROM t;
x
+---+
1
2
3
3
(4 rows)
Schema changes between executions of prepared statements
When the schema of a table targeted by a prepared statement changes after the prepared statement is created, future executions of the prepared statement could result in an error. For example, adding a column to a table referenced in a prepared statement with a SELECT *
clause will result in an error:
CREATE TABLE users (id INT PRIMARY KEY);
PREPARE prep1 AS SELECT * FROM users;
ALTER TABLE users ADD COLUMN name STRING;
INSERT INTO users VALUES (1, 'Max Roach');
EXECUTE prep1;
ERROR: cached plan must not change result type
SQLSTATE: 0A000
It's therefore recommended to explicitly list result columns instead of using SELECT *
in prepared statements, when possible.
INSERT ON CONFLICT
vs. UPSERT
When inserting/updating all columns of a table, and the table has no secondary indexes, we recommend using an UPSERT
statement instead of the equivalent INSERT ON CONFLICT
statement. Whereas INSERT ON CONFLICT
always performs a read to determine the necessary writes, the UPSERT
statement writes without reading, making it faster.
This issue is particularly relevant when using a simple SQL table of two columns to simulate direct KV access. In this case, be sure to use the UPSERT
statement.
Using \|
to perform a large input in the SQL shell
In the built-in SQL shell, using the \|
operator to perform a large number of inputs from a file can cause the server to close the connection. This is because \|
sends the entire file as a single query to the server, which can exceed the upper bound on the size of a packet the server can accept from any client (16MB).
As a workaround, execute the file from the command line with cat data.sql | cockroach sql
instead of from within the interactive shell.
New values generated by DEFAULT
expressions during ALTER TABLE ADD COLUMN
When executing an ALTER TABLE ADD COLUMN
statement with a DEFAULT
expression, new values generated:
- use the default search path regardless of the search path configured in the current session via
SET SEARCH_PATH
. - use the UTC time zone regardless of the time zone configured in the current session via
SET TIME ZONE
. - have no default database regardless of the default database configured in the current session via
SET DATABASE
, so you must specify the database of any tables they reference. - use the transaction timestamp for the
statement_timestamp()
function regardless of the time at which theALTER
statement was issued.
Load-based lease rebalancing in uneven latency deployments
When nodes are started with the --locality
flag, CockroachDB attempts to place the replica lease holder (the replica that client requests are forwarded to) on the node closest to the source of the request. This means as client requests move geographically, so too does the replica lease holder.
However, you might see increased latency caused by a consistently high rate of lease transfers between datacenters in the following case:
- Your cluster runs in datacenters which are very different distances away from each other.
- Each node was started with a single tier of
--locality
, e.g.,--locality=datacenter=a
. - Most client requests get sent to a single datacenter because that's where all your application traffic is.
To detect if this is happening, open the Admin UI, select the Queues dashboard, hover over the Replication Queue graph, and check the Leases Transferred / second data point. If the value is consistently larger than 0, you should consider stopping and restarting each node with additional tiers of locality to improve request latency.
For example, let's say that latency is 10ms from nodes in datacenter A to nodes in datacenter B but is 100ms from nodes in datacenter A to nodes in datacenter C. To ensure A's and B's relative proximity is factored into lease holder rebalancing, you could restart the nodes in datacenter A and B with a common region, --locality=region=foo,datacenter=a
and --locality=region=foo,datacenter=b
, while restarting nodes in datacenter C with a different region, --locality=region=bar,datacenter=c
.
Overload resolution for collated strings
Many string operations are not properly overloaded for collated strings, for example:
> SELECT 'string1' || 'string2';
+------------------------+
| 'string1' || 'string2' |
+------------------------+
| string1string2 |
+------------------------+
(1 row)
> SELECT ('string1' collate en) || ('string2' collate en);
pq: unsupported binary operator: <collatedstring{en}> || <collatedstring{en}>
Max size of a single column family
When creating or updating a row, if the combined size of all values in a single column family exceeds the max range size (64MiB by default) for the table, the operation may fail, or cluster performance may suffer.
As a workaround, you can either manually split a table's columns into multiple column families, or you can create a table-specific zone configuration with an increased max range size.
Simultaneous client connections and running queries on a single node
When a node has both a high number of client connections and running queries, the node may crash due to memory exhaustion. This is due to CockroachDB not accurately limiting the number of clients and queries based on the amount of available RAM on the node.
To prevent memory exhaustion, monitor each node's memory usage and ensure there is some margin between maximum CockroachDB memory usage and available system RAM. For more details about memory usage in CockroachDB, see this blog post.
SQL subexpressions and memory usage
Many SQL subexpressions (e.g., ORDER BY
, UNION
/INTERSECT
/EXCEPT
, GROUP BY
, subqueries) accumulate intermediate results in RAM on the node processing the query. If the operator attempts to process more rows than can fit into RAM, the node will either crash or report a memory capacity error. For more details about memory usage in CockroachDB, see this blog post.
Query planning for OR
expressions
Given a query like SELECT * FROM foo WHERE a > 1 OR b > 2
, even if there are appropriate indexes to satisfy both a > 1
and b > 2
, the query planner performs a full table or index scan because it cannot use both conditions at once.
Privileges for DELETE
and UPDATE
Every DELETE
or UPDATE
statement constructs a SELECT
statement, even when no WHERE
clause is involved. As a result, the user executing DELETE
or UPDATE
requires both the DELETE
and SELECT
or UPDATE
and SELECT
privileges on the table.