EXPERIMENTAL CHANGEFEED FOR

On this page Carat arrow pointing down
Warning:
As of May 16, 2023, CockroachDB v21.2 is no longer supported. For more details, refer to the Release Support Policy.
Note:

EXPERIMENTAL CHANGEFEED FOR is the core implementation of changefeeds. For the Enterprise-only version, see CREATE CHANGEFEED.

The EXPERIMENTAL CHANGEFEED FOR statement creates a new core changefeed, which streams row-level changes to the client indefinitely until the underlying connection is closed or the changefeed is canceled. A core changefeed can watch one table or multiple tables in a comma-separated list.

For more information, see Stream Data Out of CockroachDB Using Changefeeds.

Note:

This feature is experimental.This feature is subject to change. To share feedback and/or issues, contact Support.

Required privileges

Changefeeds can only be created by superusers, i.e., members of the admin role. The admin role exists by default with root as the member.

Considerations

  • Because core changefeeds return results differently than other SQL statements, they require a dedicated database connection with specific settings around result buffering. In normal operation, CockroachDB improves performance by buffering results server-side before returning them to a client; however, result buffering is automatically turned off for core changefeeds. Core changefeeds also have different cancellation behavior than other queries: they can only be canceled by closing the underlying connection or issuing a CANCEL QUERY statement on a separate connection. Combined, these attributes of changefeeds mean that applications should explicitly create dedicated connections to consume changefeed data, instead of using a connection pool as most client drivers do by default.

    This cancellation behavior (i.e., close the underlying connection to cancel the changefeed) also extends to client driver usage; in particular, when a client driver calls Rows.Close() after encountering errors for a stream of rows. The pgwire protocol requires that the rows be consumed before the connection is again usable, but in the case of a core changefeed, the rows are never consumed. It is therefore critical that you close the connection, otherwise the application will be blocked forever on Rows.Close().

  • In most cases, each version of a row will be emitted once. However, some infrequent conditions (e.g., node failures, network partitions) will cause them to be repeated. This gives our changefeeds an at-least-once delivery guarantee. For more information, see Change Data Capture - Ordering Guarantees.

Synopsis

> EXPERIMENTAL CHANGEFEED FOR table_name [ WITH (option [= value] [, ...]) ];

Parameters

Parameter Description
table_name The name of the table (or tables in a comma separated list) to create a changefeed for.
option / value For a list of available options and their values, see Options below.

Options

Option Value Description
updated N/A Include updated timestamps with each row.
resolved INTERVAL Emits resolved timestamp events for the changefeed. Resolved timestamp events do not emit until all ranges in the changefeed have progressed to a specific point in time.

Set an optional minimal duration between emitting resolved timestamps. Example: resolved='10s'. This option will only emit a resolved timestamp event if the timestamp has advanced and at least the optional duration has elapsed. If unspecified, all resolved timestamps are emitted as the high-water mark advances.
envelope key_only / row Use key_only to emit only the key and no value, which is faster if you only want to know when the key changes.

Default: envelope=row
cursor Timestamp Emits any changes after the given timestamp, but does not output the current state of the table first. If cursor is not specified, the changefeed starts by doing a consistent scan of all the watched rows and emits the current value, then moves to emitting any changes that happen after the scan.

cursor can be used to start a new changefeed where a previous changefeed ended.

Example: CURSOR=1536242855577149065.0000000000
mvcc_timestamp N/A New in v21.2: Include the MVCC timestamp for each emitted row in a changefeed. With the mvcc_timestamp option, each emitted row will always contain its MVCC timestamp, even during the changefeed's initial backfill.
format json / avro Format of the emitted record. Currently, support for Avro is limited.

Default: format=json.
confluent_schema_registry Schema Registry address The Schema Registry address is required to use avro.

Avro limitations

Below are clarifications for particular SQL types and values for Avro changefeeds:

  • Decimals must have precision specified.
  • BYTES (or its aliases BYTEA and BLOB) are often used to store machine-readable data. When you stream these types through a changefeed with format=avro, CockroachDB does not encode or change the data. However, Avro clients can often include escape sequences to present the data in a printable format, which can interfere with deserialization. A potential solution is to hex-encode BYTES values when initially inserting them into CockroachDB. This will ensure that Avro clients will consistently decode the hexadecimal. Note that hex-encoding values at insertion process will increase record size.
  • BIT and VARBIT types are encoded as arrays of 64-bit integers.

    For efficiency, CockroachDB encodes BIT and VARBIT bitfield types as arrays of 64-bit integers. That is, base-2 (binary format) BIT and VARBIT data types are converted to base 10 and stored in arrays. Encoding in CockroachDB is big-endian, therefore the last value may have many trailing zeroes. For this reason, the first value of each array is the number of bits that are used in the last value of the array.

    For instance, if the bitfield is 129 bits long, there will be 4 integers in the array. The first integer will be 1; representing the number of bits in the last value, the second integer will be the first 64 bits, the third integer will be bits 65–128, and the last integer will either be 0 or 9223372036854775808 (i.e., the integer with only the first bit set, or 1000000000000000000000000000000000000000000000000000000000000000 when base 2).

    This example is base-10 encoded into an array as follows:

    {"array": [1, <first 64 bits>, <second 64 bits>, 0 or 9223372036854775808]}
    

    For downstream processing, it is necessary to base-2 encode every element in the array (except for the first element). The first number in the array gives you the number of bits to take from the last base-2 number — that is, the most significant bits. So, in the example above this would be 1. Finally, all the base-2 numbers can be appended together, which will result in the original number of bits, 129.

    In a different example of this process where the bitfield is 136 bits long, the array would be similar to the following when base-10 encoded:

    {"array": [8, 18293058736425533439, 18446744073709551615, 13690942867206307840]}
    

    To then work with this data, you would convert each of the elements in the array to base-2 numbers, besides the first element. For the above array, this would convert to:

    [8, 1111110111011011111111111111111111111111111111111111111111111111, 1111111111111111111111111111111111111111111111111111111111111111, 1011111000000000000000000000000000000000000000000000000000000000]
    

    Next, you use the first element in the array to take the number of bits from the last base-2 element, 10111110. Finally, you append each of the base-2 numbers together — in the above array, the second, third, and truncated last element. This results in 136 bits, the original number of bits.

Examples

Create a changefeed

In this example, you'll set up a core changefeed for a single-node cluster.

  1. In a terminal window, start cockroach:

    icon/buttons/copy
    $ cockroach start-single-node \
    --insecure \
    --listen-addr=localhost \
    --background
    
  2. As the root user, open the built-in SQL client:

    icon/buttons/copy
    $ cockroach sql \
    --url="postgresql://root@127.0.0.1:26257?sslmode=disable" \
    --format=csv
    
    Note:

    Because core changefeeds return results differently than other SQL statements, they require a dedicated database connection with specific settings around result buffering. In normal operation, CockroachDB improves performance by buffering results server-side before returning them to a client; however, result buffering is automatically turned off for core changefeeds. Core changefeeds also have different cancellation behavior than other queries: they can only be canceled by closing the underlying connection or issuing a CANCEL QUERY statement on a separate connection. Combined, these attributes of changefeeds mean that applications should explicitly create dedicated connections to consume changefeed data, instead of using a connection pool as most client drivers do by default.

    Note:

    To determine how wide the columns need to be, the default table display format in cockroach sql buffers the results it receives from the server before printing them to the console. When consuming core changefeed data using cockroach sql, it's important to use a display format like csv that does not buffer its results. To set the display format, use the --format=csv flag when starting the built-in SQL client, or set the \set display_format=csv option once the SQL client is open.

  3. Enable the kv.rangefeed.enabled cluster setting:

    icon/buttons/copy
    > SET CLUSTER SETTING kv.rangefeed.enabled = true;
    
  4. Create table foo:

    icon/buttons/copy
    > CREATE TABLE foo (a INT PRIMARY KEY);
    
  5. Insert a row into the table:

    icon/buttons/copy
    > INSERT INTO foo VALUES (0);
    
  6. Start the core changefeed:

    icon/buttons/copy
    > EXPERIMENTAL CHANGEFEED FOR foo;
    
    table,key,value
    foo,[0],"{""after"": {""a"": 0}}"
    
  7. In a new terminal, add another row:

    icon/buttons/copy
    $ cockroach sql --insecure -e "INSERT INTO foo VALUES (1)"
    
  8. Back in the terminal where the core changefeed is streaming, the following output has appeared:

    foo,[1],"{""after"": {""a"": 1}}"
    

    Note that records may take a couple of seconds to display in the core changefeed.

  9. To stop streaming the changefeed, enter CTRL+C into the terminal where the changefeed is running.

  10. To stop cockroach:

    Get the process ID of the node:

    icon/buttons/copy
    ps -ef | grep cockroach | grep -v grep
    
      501 21766     1   0  6:21PM ttys001    0:00.89 cockroach start-single-node --insecure --listen-addr=localhost
    

    Gracefully shut down the node, specifying its process ID:

    icon/buttons/copy
    kill -TERM 21766
    
    initiating graceful shutdown of server
    server drained and shutdown completed
    

Create a changefeed with Avro

In this example, you'll set up a core changefeed for a single-node cluster that emits Avro records. CockroachDB's Avro binary encoding convention uses the Confluent Schema Registry to store Avro schemas.

  1. Use the cockroach start-single-node command to start a single-node cluster:

    icon/buttons/copy
    $ cockroach start-single-node \
    --insecure \
    --listen-addr=localhost \
    --background
    
  2. Download and extract the Confluent Open Source platform.

  3. Move into the extracted confluent-<version> directory and start Confluent:

    icon/buttons/copy
    $ ./bin/confluent local services start
    

    Only zookeeper, kafka, and schema-registry are needed. To troubleshoot Confluent, see their docs and the Quick Start Guide.

  4. As the root user, open the built-in SQL client:

    icon/buttons/copy
    $ cockroach sql --url="postgresql://root@127.0.0.1:26257?sslmode=disable" --format=csv
    
    Note:

    Because core changefeeds return results differently than other SQL statements, they require a dedicated database connection with specific settings around result buffering. In normal operation, CockroachDB improves performance by buffering results server-side before returning them to a client; however, result buffering is automatically turned off for core changefeeds. Core changefeeds also have different cancellation behavior than other queries: they can only be canceled by closing the underlying connection or issuing a CANCEL QUERY statement on a separate connection. Combined, these attributes of changefeeds mean that applications should explicitly create dedicated connections to consume changefeed data, instead of using a connection pool as most client drivers do by default.

    Note:

    To determine how wide the columns need to be, the default table display format in cockroach sql buffers the results it receives from the server before printing them to the console. When consuming core changefeed data using cockroach sql, it's important to use a display format like csv that does not buffer its results. To set the display format, use the --format=csv flag when starting the built-in SQL client, or set the \set display_format=csv option once the SQL client is open.

  5. Enable the kv.rangefeed.enabled cluster setting:

    icon/buttons/copy
    > SET CLUSTER SETTING kv.rangefeed.enabled = true;
    
  6. Create table bar:

    icon/buttons/copy
    > CREATE TABLE bar (a INT PRIMARY KEY);
    
  7. Insert a row into the table:

    icon/buttons/copy
    > INSERT INTO bar VALUES (0);
    
  8. Start the core changefeed:

    icon/buttons/copy
    > EXPERIMENTAL CHANGEFEED FOR bar WITH format = avro, confluent_schema_registry = 'http://localhost:8081';
    
    table,key,value
    bar,\000\000\000\000\001\002\000,\000\000\000\000\002\002\002\000
    
  9. In a new terminal, add another row:

    icon/buttons/copy
    $ cockroach sql --insecure -e "INSERT INTO bar VALUES (1)"
    
  10. Back in the terminal where the core changefeed is streaming, the output will appear:

    bar,\000\000\000\000\001\002\002,\000\000\000\000\002\002\002\002
    

    Note that records may take a couple of seconds to display in the core changefeed.

  11. To stop streaming the changefeed, enter CTRL+C into the terminal where the changefeed is running.

  12. To stop cockroach:

    Get the process ID of the node:

    icon/buttons/copy
    ps -ef | grep cockroach | grep -v grep
    
      501 21766     1   0  6:21PM ttys001    0:00.89 cockroach start-single-node --insecure --listen-addr=localhost
    

    Gracefully shut down the node, specifying its process ID:

    icon/buttons/copy
    kill -TERM 21766
    
    initiating graceful shutdown of server
    server drained and shutdown completed
    
  13. To stop Confluent, move into the extracted confluent-<version> directory and stop Confluent:

    icon/buttons/copy
    $ ./bin/confluent local services stop
    

    To terminate all Confluent processes, use:

    icon/buttons/copy
    $ ./bin/confluent local destroy
    

See also


Yes No
On this page

Yes No