Text search

Learn how to search text data types in YSQL

Basic text search uses pattern matching, using the =, LIKE, and ILIKE operators to check if a given pattern is present in the document. While this is adequate for many scenarios, it has the following shortcomings:

  • Results aren't ordered.
  • No index support, so all rows must be scanned, resulting in slower performance.
  • Searching for multiple patterns is cumbersome and needs strict ordering of search terms.

Full-text search brings in a whole new paradigm of pre-processing of data, enabling parsing the data into tokens, converting tokens to lexemes (c.f. stemming) and storing the processed data in a manner that is optimal for search (for example, inverted indexes). It also adds new types of pattern matching, like prefix match, suffix match, fuzzy match, and so on.

YugabyteDB provides the following methods and features for searching text.

Pattern matching

Pattern matching is accomplished by specifying a pattern using the %, _, *, . characters and using the LIKE, ILIKE, and ~ operators. The patterns can range from basic prefix/suffix expressions to complex regular expressions. For example:

'abc' LIKE 'a%'             --> true
'abc' LIKE '_bc'            --> true
'abc' LIKE 'c'              --> false
'abc' SIMILAR TO '%(b|d)%'  --> true
'abc' ~ 'a.*c'              --> true

To learn more about pattern matching, see Pattern matching.

Similarity matching works by determining how similar two strings are by taking into account how many letters are different and how many occur together. Use similarity search when you don't know the exact spelling of your query term. Similarity search can be used to design spell checkers.

For example, the distance between warehoused and warehouse is 1, as it has one additional character (d) than warehouse.

To learn more about similarity search, see Similarity search.

Almost all of today's search engines use Inverted indexes extensively. An inverted index parses a document and stores the individual words (that is, tokens) and their corresponding position in the document. For example:

'The quick brown fox jumps jumps over the lazy dog'

would be parsed as follows:

 'brown':3 'dog':10 'fox':4 'jump':5,6 'lazi':9 'quick':2

This enables you to search for the document which has fox and quick or jumping dog.

To learn more about full-text search, see Full-text search.

In the case where you do not know the exact search term and want to find similar items or documents that sound similar to a specific term, fuzzy or phonetic search would come in handy. YugabyteDB supports fuzzy search like Soundex, Metaphone via PostgreSQL extensions. For example, to find words that sound like anapistagafi, you could execute,

select word from words where dmetaphone(word) = dmetaphone('anapistagafi') limit 5;

To learn more about pattern matching, see Phonetic Search

Learn more