The UPSERT
statement inserts rows in cases where specified values do not violate uniqueness constraints and updates rows in cases where values do violate uniqueness constraints. UPSERT
considers uniqueness only for primary key columns.
To read more about upserts, see our Upsert in SQL: What is an Upsert, and When Should You Use One? blog post.
UPSERT
vs. INSERT ON CONFLICT
Assuming that columns a
and b
are the primary key, the following UPSERT
and INSERT ... ON CONFLICT
statements are equivalent:
UPSERT INTO t (a, b, c) VALUES (1, 2, 3);
INSERT INTO t (a, b, c)
VALUES (1, 2, 3)
ON CONFLICT (a, b)
DO UPDATE SET c = excluded.c;
However, UPSERT
does not let you specify columns to infer a unique constraint as an arbiter. An arbiter is a UNIQUE
constraint used to check for conflicts during execution of INSERT ... ON CONFLICT
. UPSERT
always uses the primary key as the arbiter. You must therefore use INSERT ... ON CONFLICT ... DO UPDATE
if your statement considers uniqueness for columns other than primary key columns. For an example, see Upsert that fails (conflict on non-primary key).
When inserting or updating columns on a table that does not have secondary indexes, Cockroach Labs recommends using an UPSERT
statement instead of INSERT ON CONFLICT DO UPDATE
. Whereas INSERT ON CONFLICT
always performs a read, the UPSERT
statement writes without reading, making it faster. This may be useful if you are using a simple SQL table of two columns to simulate direct KV access.
If the table has a secondary index, there is no performance difference between UPSERT
and INSERT ON CONFLICT
. However, INSERT
without an ON CONFLICT
clause may not scan the table for existing values. This can provide a performance improvement over UPSERT
.
To learn more about how to perform and when to use an upsert in CockroachDB, PostgreSQL, and MySQL, see Upsert in SQL: What is an Upsert, and When Should You Use One?.
Considerations
An
UPSERT
statement affecting a proper subset of columns behaves differently depending on whether or not you specify the target columns in the statement.- If you specify target columns (e.g.,
UPSERT INTO accounts (id, name) VALUES (2, 'b2');
), the values of columns that do not have new values in theUPSERT
statement will not be updated. - If you do not specify the target columns (e.g.,
UPSERT INTO accounts VALUES (2, 'b2');
), the value of columns that do not have new values in theUPSERT
statement will be updated to their default values.
For examples, see Upsert a proper subset of columns.
- If you specify target columns (e.g.,
A single multi-row
UPSERT
statement is faster than multiple single-rowUPSERT
statements. Whenever possible, use multi-rowUPSERT
instead of multiple single-rowUPSERT
statements.If the input data contains duplicates, see Import data containing duplicate rows using
DISTINCT ON
.
Required privileges
The user must have the INSERT
, SELECT
, and UPDATE
privileges on the table.
Synopsis
Parameters
Parameter | Description |
---|---|
common_table_expr |
See Common Table Expressions. |
table_name |
The name of the table. |
AS table_alias_name |
An alias for the table name. When an alias is provided, it completely hides the actual table name. |
column_name |
The name of a column to populate during the insert. |
select_stmt |
A selection query. Each value must match the data type of its column. Also, if column names are listed after INTO , values must be in corresponding order; otherwise, they must follow the declared order of the columns in the table. |
DEFAULT VALUES |
To fill all columns with their default values, use DEFAULT VALUES in place of select_stmt . To fill a specific column with its default value, leave the value out of the select_stmt or use DEFAULT at the appropriate position. |
RETURNING target_list |
Return values based on rows inserted, where target_list can be specific column names from the table, * for all columns, or computations using scalar expressions.Within a transaction, use RETURNING NOTHING to return nothing in the response, not even the number of rows affected. |
Examples
Upsert a row (no conflict)
In this example, the id
column is the primary key. Because the inserted id
value does not conflict with the id
value of any existing row, the UPSERT
statement inserts a new row into the table.
> SELECT * FROM accounts;
+----+----------+
| id | balance |
+----+----------+
| 1 | 10000.5 |
| 2 | 20000.75 |
+----+----------+
> UPSERT INTO accounts (id, balance) VALUES (3, 6325.20);
> SELECT * FROM accounts;
+----+----------+
| id | balance |
+----+----------+
| 1 | 10000.5 |
| 2 | 20000.75 |
| 3 | 6325.2 |
+----+----------+
Upsert multiple rows
In this example, the UPSERT
statement inserts multiple rows into the table.
> SELECT * FROM accounts;
+----+----------+
| id | balance |
+----+----------+
| 1 | 10000.5 |
| 2 | 20000.75 |
| 3 | 6325.2 |
+----+----------+
> UPSERT INTO accounts (id, balance) VALUES (4, 1970.4), (5, 2532.9), (6, 4473.0);
> SELECT * FROM accounts;
+----+----------+
| id | balance |
+----+----------+
| 1 | 10000.5 |
| 2 | 20000.75 |
| 3 | 6325.2 |
| 4 | 1970.4 |
| 5 | 2532.9 |
| 6 | 4473.0 |
+----+----------+
Upsert that updates a row (conflict on primary key)
In this example, the id
column is the primary key. Because the inserted id
value is not unique, the UPSERT
statement updates the row with the new balance
.
> SELECT * FROM accounts;
+----+----------+
| id | balance |
+----+----------+
| 1 | 10000.5 |
| 2 | 20000.75 |
| 3 | 6325.2 |
| 4 | 1970.4 |
| 5 | 2532.9 |
| 6 | 4473.0 |
+----+----------+
> UPSERT INTO accounts (id, balance) VALUES (3, 7500.83);
> SELECT * FROM accounts;
+----+----------+
| id | balance |
+----+----------+
| 1 | 10000.5 |
| 2 | 20000.75 |
| 3 | 7500.83 |
| 4 | 1970.4 |
| 5 | 2532.9 |
| 6 | 4473.0 |
+----+----------+
Upsert that fails (conflict on non-primary key)
UPSERT
will not update rows when the uniqueness conflict is on columns not in the primary key. In this example, the a
column is the primary key, but the b
column also has the UNIQUE
constraint. Because the inserted b
value is not unique, the UPSERT
fails.
> SELECT * FROM unique_test;
+---+---+
| a | b |
+---+---+
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
+---+---+
> UPSERT INTO unique_test VALUES (4, 1);
pq: duplicate key value (b)=(1) violates unique constraint "unique_test_b_key"
In such a case, you would need to use the INSERT ON CONFLICT
statement to specify the b
column as the column with the UNIQUE
constraint.
> INSERT INTO unique_test VALUES (4, 1) ON CONFLICT (b) DO UPDATE SET a = excluded.a;
> SELECT * FROM unique_test;
+---+---+
| a | b |
+---+---+
| 2 | 2 |
| 3 | 3 |
| 4 | 1 |
+---+---+
Upsert a proper subset of columns
> CREATE TABLE accounts (
id INT PRIMARY KEY,
name STRING,
balance DECIMAL(10, 2) DEFAULT 0
);
> INSERT INTO accounts (id, name, balance) VALUES
(1, 'a1', 10000.5),
(2, 'b1', 20000.75),
(3, 'c1', 6325.2);
> SELECT * FROM accounts;
+----+------+----------+
| id | name | balance |
+----+------+----------+
| 1 | a1 | 10000.50 |
| 2 | b1 | 20000.75 |
| 3 | c1 | 6325.20 |
+----+------+----------+
Upserting a proper subset of columns without specifying the column names will write the default values of the unspecified columns when there is a conflict on the primary key. The account with id
of 1
has a balance of 0
(the column's default value) after the UPSERT
:
> UPSERT INTO accounts VALUES (1, 'a2');
> SELECT * FROM accounts;
+----+------+----------+
| id | name | balance |
+----+------+----------+
| 1 | a2 | 0.00 |
| 2 | b1 | 20000.75 |
| 3 | c1 | 6325.20 |
+----+------+----------+
If the target column names are included in the UPSERT
, then the subset of columns without values will not change when there is a conflict on the primary key. The balance
of the account with id
of 2
is unchanged after the UPSERT
:
> UPSERT INTO accounts (id, name) VALUES (2, 'b2');
> SELECT * FROM accounts;
+----+------+----------+
| id | name | balance |
+----+------+----------+
| 1 | a2 | 0.00 |
| 2 | b2 | 20000.75 |
| 3 | c1 | 6325.20 |
+----+------+----------+
Import data containing duplicate rows using DISTINCT ON
If the input data to insert or update contains duplicate rows, you must
use DISTINCT ON
to
ensure there is only one row for each value of the primary key.
For example:
> WITH
-- the following data contains duplicates on the conflict column "id":
inputrows AS (VALUES (8, 130), (8, 140))
UPSERT INTO accounts (id, balance)
(SELECT DISTINCT ON(id) id, balance FROM inputrows); -- de-duplicate the input rows
The DISTINCT ON
clause does not guarantee which of the duplicates is
considered. To force the selection of a particular duplicate, use an
ORDER BY
clause:
> WITH
-- the following data contains duplicates on the conflict column "id":
inputrows AS (VALUES (8, 130), (8, 140))
UPSERT INTO accounts (id, balance)
(SELECT DISTINCT ON(id) id, balance
FROM inputrows
ORDER BY balance); -- pick the lowest balance as value to update in each account
Using DISTINCT ON
incurs a performance cost to search and eliminate duplicates.
For best performance, avoid using it when the input is known to not contain duplicates.
Limit the size of rows
To help you avoid failures arising from misbehaving applications that bloat the size of rows, you can specify the behavior when a row or individual column family larger than a specified size is written to the database. Use the cluster settings sql.guardrails.max_row_size_log
to discover large rows and sql.guardrails.max_row_size_err
to reject large rows.
When you write a row that exceeds sql.guardrails.max_row_size_log
:
INSERT
,UPSERT
,UPDATE
,CREATE TABLE AS
,CREATE INDEX
,ALTER TABLE
,ALTER INDEX
,IMPORT
, orRESTORE
statements will log aLargeRow
to theSQL_PERF
channel.SELECT
,DELETE
,TRUNCATE
, andDROP
are not affected.
When you write a row that exceeds sql.guardrails.max_row_size_err
:
INSERT
,UPSERT
, andUPDATE
statements will fail with a code54000 (program_limit_exceeded)
error.CREATE TABLE AS
,CREATE INDEX
,ALTER TABLE
,ALTER INDEX
,IMPORT
, andRESTORE
statements will log aLargeRowInternal
event to theSQL_INTERNAL_PERF
channel.SELECT
,DELETE
,TRUNCATE
, andDROP
are not affected.
You cannot update existing rows that violate the limit unless the update shrinks the size of the
row below the limit. You can select, delete, alter, back up, and restore such rows. We
recommend using the accompanying setting sql.guardrails.max_row_size_log
in conjunction with
SELECT pg_column_size()
queries to detect and fix any existing large rows before lowering
sql.guardrails.max_row_size_err
.