New in v23.1: A full-text search is used to perform natural-language searches on documents such as articles, websites, or other written formats.
This page describes how to perform full-text searches using the provided built-in functions.
Some PostgreSQL syntax and features are unsupported. For details, see Unsupported features.
How does full-text search work?
In the PostgreSQL terminology, a document is a natural-language text converted to a data type that is searchable using specially formatted queries. A document is typically stored within a single database row or concatenated from multiple fields.
A full-text search has the following advantages over pattern matching with LIKE
and ILIKE
:
- A full-text search can specify a text search configuration that enables language-specific searches.
- The results of a full-text search can be ranked.
- A full-text search can be accelerated using a full-text index.
LIKE
andILIKE
are only fast for prefix searches or when indexed with a trigram index.
Depending on your use case, you may prefer to use trigram indexes to do fuzzy string matching and pattern matching. For more information about use cases for trigram indexes that could make having full-text search unnecessary, see the 2022 blog post Use cases for trigram indexes (when not to use Full Text Search).
Process a document
To make a document searchable, convert it to the TSVECTOR
data type. A TSVECTOR
value consists of individual lexemes, which are normalized strings used for text matching. Each lexeme also includes a list of integer positions that indicate where the lexeme existed in the original document.
The to_tsvector()
built-in function converts a string input into a TSVECTOR
value:
SELECT to_tsvector('How do trees get on the internet?');
to_tsvector
---------------------------------
'get':4 'internet':7 'tree':3
This TSVECTOR
consists of the lexemes get
, internet
, and tree
. Normalization removes the following from the input:
- Derivatives of words, which are reduced using a stemming algorithm. In this example, "trees" is normalized to
tree
. - Stop words. These are words that are considered not useful for indexing and searching, based on the text search configuration. This example does not specify a configuration, and
english
is used by default. "How", "do", "on", and "the" are identified as stop words. - Punctuation and capitalization.
In the preceding output, the integers indicate that get
is in the fourth position, internet
is in the seventh position, and tree
is in the third position in the input.
Form a query
A full-text search attempts to match a query to a document. A full-text search query has the TSQUERY
data type. Like TSVECTOR
, a TSQUERY
value consists of individual lexemes, which are normalized strings used for text matching. Lexemes in a TSQUERY
are separated with any combination of &
(AND), |
(OR), <->
(FOLLOWED BY), or !
(NOT) operators.
The
to_tsquery()
built-in function normalizes aTSQUERY
input. The input must also be formatted as aTSQUERY
, or the statement will error.SELECT to_tsquery('How & do & trees & get & on & the & internet?');
to_tsquery ------------------------------- 'tree' & 'get' & 'internet'
The
plainto_tsquery()
built-in function converts a string input into aTSQUERY
value, and separates the lexemes with&
(AND):SELECT plainto_tsquery('How do trees get on the internet?');
plainto_tsquery ------------------------------- 'tree' & 'get' & 'internet'
The
phraseto_tsquery()
built-in function converts a string input into aTSQUERY
value, and separates the lexemes with<->
(FOLLOWED BY):SELECT phraseto_tsquery('How do trees get on the internet?');
phraseto_tsquery ----------------------------------- 'tree' <-> 'get' <3> 'internet'
In the preceding output,
<->
(equivalent to<1>
) indicates thatget
must followtree
in a matchingTSVECTOR
.<3>
further indicates thatget
andinternet
must be separated by two lexemes in a matchingTSVECTOR
. This resulted from converting the stop words "on" and "the" in the input.To match this query, a document must therefore contain phrases such as "get tree" and "get {word} {word} internet".
Queries and documents are matched using the @@
comparison operator. For usage examples, see Match queries to documents.
Rank search results
You can rank the results of a full-text search.
The ts_rank()
built-in function outputs a search rank based on the frequency of matching lexemes. In the following example, two lexemes match:
SELECT ts_rank(to_tsvector('How do trees get on the internet?'), plainto_tsquery('how to get internet'));
ts_rank
--------------
0.09735848
In this example, three lexemes match, resulting in a higher rank:
SELECT ts_rank(to_tsvector('How do trees get on the internet?'), plainto_tsquery('wow, do trees get internet?'));
ts_rank
--------------
0.26426345
Because a rank must be calculated for each matching document, ranking a full-text search can incur a performance overhead if there are many matching documents.
For more information about using ts_rank()
, see the PostgreSQL documentation.
Comparisons
Full-text searches support the following comparison operator:
- matching:
@@
. This operator is set between aTSQUERY
andTSVECTOR
, and returnstrue
if the lexemes match. TheTSQUERY
andTSVECTOR
can be specified in any order.
For usage examples, see Match queries to documents.
Full-text indexes
You can perform full-text searches without a full-text index. However, an index will drastically improve search performance when searching a large number of documents.
To create a full-text index, use the CREATE INDEX
syntax that defines an inverted index, specifying a TSVECTOR
column.
Using the PostgreSQL-compatible syntax:
CREATE INDEX {optional name} ON {table} USING GIN ({column});
Note:GIN and GiST indexes are implemented identically on CockroachDB.
GIN
andGIST
are therefore synonymous when defining a full-text index.Using
CREATE INVERTED INDEX
:CREATE INVERTED INDEX {optional name} ON {table} ({column});
For more ways to define full-text indexes, see Create a full-text index with an expression and Create a full-text index with a stored computed column.
Text search configuration
A text search configuration determines how inputs are parsed into TSVECTOR
and TSQUERY
values. This includes a dictionary that is used to identify derivatives of words, as well as stop words to exclude when normalizing documents and queries.
The supported dictionaries are English, Danish, Dutch, Finnish, French, German, Hungarian, Italian, Norwegian, Portuguese, Russian, Spanish, Swedish, and Turkish. An additional simple
dictionary does not perform stemming or stopwording when normalizing documents or queries.
You can specify a text search configuration as the first parameter when calling any of the built-in functions to process a document or form a query. For example:
SELECT to_tsvector('swedish', 'Hur får träd tillgång till internet?');
to_tsvector
----------------------------------------------
'får':2 'internet':6 'tillgång':4 'träd':3
If you do not specify a configuration when calling the function, the value of the default_text_search_config
session variable is used. This defaults to english
and can be changed as follows:
SET default_text_search_config = swedish;
For more information about text search configurations, see the PostgreSQL documentation.
At this time, only the dictionary can be specified in a text search configuration. See Unsupported features.
Examples
Match queries to documents
Use the @@
operator to match a query (TSQUERY
) to a searchable document (TSVECTOR
). In the following example, because the TSQUERY
comprises the lexemes get
and internet
, which are both contained in the TSVECTOR
, the output will be true
:
SELECT to_tsvector('How do trees get on the internet?') @@ to_tsquery('How & to & get & internet?');
?column?
------------
t
Use the plainto_tsquery()
built-in function to match text to a searchable document. This search is equivalent to the preceding example:
SELECT to_tsvector('How do trees get on the internet?') @@ plainto_tsquery('How to get internet?');
?column?
------------
t
Use the phraseto_tsquery()
built-in function to match text phrases to a searchable document. Because phraseto_tsquery()
separates the lexemes get
and internet
with the <->
(FOLLOWED BY) operator, and the document does not contain a phrase like "get internet", the output will be false
:
SELECT to_tsvector('How do trees get on the internet?') @@ phraseto_tsquery('How to get internet?');
?column?
------------
f
For an example of how text matching is used on a table, see Perform a full-text search with ranked results.
Create a full-text index with an expression
You can create an expression index on a STRING
column, using to_tsvector()
to convert the value to TSVECTOR
.
Given the table:
CREATE TABLE t (a STRING);
Create an expression index that converts column a
to TSVECTOR
:
CREATE INDEX ON t USING GIN (to_tsvector('english', a));
When using a full-text search function in an expression index, you must specify a text search configuration. In the preceding example, the english
configuration is specified.
Create a full-text index with a stored computed column
You can create a full-text index on a stored computed column that has a TSVECTOR
data type.
Given the table:
CREATE TABLE t (a STRING);
Add a new TSVECTOR
column that is computed from a
using to_tsvector()
:
ALTER TABLE t ADD COLUMN b TSVECTOR
AS (to_tsvector('english', a)) STORED;
When using a full-text search function in a stored generated column, you must specify a text search configuration. In the preceding example, the english
configuration is specified.
View the columns on the table:
SHOW COLUMNS FROM t;
column_name | data_type | is_nullable | column_default | generation_expression | indices | is_hidden
--------------+-----------+-------------+----------------+---------------------------+---------------------+------------
a | STRING | t | NULL | | {t_pkey} | f
rowid | INT8 | f | unique_rowid() | | {t_expr_idx,t_pkey} | t
b | TSVECTOR | t | NULL | to_tsvector('english', a) | {t_pkey} | f
(3 rows)
Create an inverted index on the TSVECTOR
column:
CREATE INDEX ON t USING GIN (b);
Perform a full-text search with ranked results
Create a table with
STRING
columns:CREATE TABLE dadjokes (opener STRING, response STRING);
Populate the table with sample values. These are the documents that you will search:
INSERT INTO dadjokes (opener, response) VALUES ('How do trees get on the internet?', 'They log on.'), ('What do you call a pony with a sore throat?', 'A little horse.'), ('What would a bathroom for fancy cats be called?', 'The glitter box.'), ('Why did the scarecrow win an award?', 'It was outstanding in its field.'), ('What kind of tree fits in your hand?', 'A palm tree.'), ('What was a better invention than the first telephone?', 'The second one.'), ('Where do you learn to make banana splits?', 'At sundae school.'), ('How did the hipster burn the roof of his mouth?', 'He ate the pizza before it was cool.'), ('What did one wall say to the other wall?', 'Meet you at the corner.'), ('When does a joke become a dad joke?', 'When it becomes apparent.');
You can use
LIKE
orILIKE
to search for text. However, the results will be unranked:SELECT opener, response FROM dadjokes WHERE opener LIKE '%tree%' OR response LIKE '%tree%';
opener | response ---------------------------------------+--------------- How do trees get on the internet? | They log on. What kind of tree fits in your hand? | A palm tree. (2 rows)
Create a full-text index on the concatenation of both table columns, specifying a text search configuration (in this case,
english
), as is mandatory when defining an expression index:CREATE INDEX ON dadjokes USING GIN (to_tsvector('english', opener || response));
Note:Because inverted joins on
TSVECTOR
values are not yet supported, this index won't be used to accelerate the SQL queries in this example. See Unsupported features.Search the table for a query (in this case,
tree
), and rank the results.In the following statement,
to_tsvector()
makes the table values searchable,to_tsquery()
forms the query, andts_rank()
calculates the search rankings:SELECT opener, response, ts_rank(joke, query) AS rank FROM dadjokes, to_tsvector(opener || response) joke, to_tsquery('tree') query WHERE query @@ joke ORDER BY rank DESC LIMIT 10;
opener | response | rank ---------------------------------------+--------------+-------------- What kind of tree fits in your hand? | A palm tree. | 0.075990885 How do trees get on the internet? | They log on. | 0.06079271 (2 rows)
The frequency of the
tree
lexeme in each row determines the difference in the rankings.Search the table for the query
calling
, and rank the results:SELECT opener, response, ts_rank(joke, query) AS rank FROM dadjokes, to_tsvector(opener || response) joke, to_tsquery('calling') query WHERE query @@ joke ORDER BY rank DESC LIMIT 10;
opener | response | rank --------------------------------------------------+------------------+------------- What would a bathroom for fancy cats be called? | The glitter box. | 0.06079271 What do you call a pony with a sore throat? | A little horse. | 0.06079271 (2 rows)
Unlike pattern matching with
LIKE
andILIKE
, a full-text search forcalling
produced matches. This is becauseto_tsvector()
andto_tsquery()
each normalized derivatives of the word "call" in their respective inputs to the lexemecall
, using the defaultenglish
text search configuration.Use
plainto_tsquery()
to convert text input to a search query:SELECT opener, response, ts_rank(joke, query) AS rank FROM dadjokes, to_tsvector(opener || response) joke, plainto_tsquery('no more joking, dad') query WHERE query @@ joke ORDER BY rank DESC LIMIT 10;
opener | response | rank --------------------------------------+---------------------------+------------- When does a joke become a dad joke? | When it becomes apparent. | 0.18681315 (1 row)
Alternatively, use
phraseto_tsquery()
to search for matching text phrases (in this example, "joke dad"):SELECT opener, response, ts_rank(joke, query) AS rank FROM dadjokes, to_tsvector(opener || response) joke, phraseto_tsquery('no more joking, dad') query WHERE query @@ joke ORDER BY rank DESC LIMIT 10;
opener | response | rank ---------+----------+------- (0 rows)
Unsupported features
Some PostgreSQL syntax and features are unsupported. These include, but are not limited to:
- Aspects of text search configurations other than the specified dictionary.
websearch_to_tsquery()
built-in function.tsquery_phrase()
built-in function.ts_rank_cd()
built-in function.setweight()
built-in function.- Inverted joins on
TSVECTOR
values. tsvector || tsvector
comparisons.tsquery || tsquery
comparisons.tsquery && tsquery
comparisons.tsquery <-> tsquery
comparisons.!! tsquery
comparisons.tsquery @> tsquery
andtsquery <@ tsquery
comparisons.
For full details, see the tracking issue.
See also
- PostgreSQL documentation on Full Text Search
TSVECTOR
TSQUERY
- Inverted indexes
- Indexes
- SQL Statements