Basic text search uses pattern matching, using the
ILIKE operators to check if a given pattern is present in the document. While this is adequate for many scenarios, it has the following shortcomings:
- Results aren't ordered.
- No index support, so all rows must be scanned, resulting in slower performance.
- Searching for multiple patterns is cumbersome and needs strict ordering of search terms.
Full-text search brings in a whole new paradigm of pre-processing of data, enabling parsing the data into tokens, converting tokens to lexemes (c.f. stemming) and storing the processed data in a manner that is optimal for search (for example, inverted indexes). It also adds new types of pattern matching, like prefix match, suffix match, fuzzy match, and so on.
YugabyteDB provides the following methods and features for searching text.
Pattern matching is accomplished by specifying a pattern using the
%, _, *, . characters and using the
~ operators. The patterns can range from basic prefix/suffix expressions to complex regular expressions. For example:
'abc' LIKE 'a%' --> true 'abc' LIKE '_bc' --> true 'abc' LIKE 'c' --> false 'abc' SIMILAR TO '%(b|d)%' --> true 'abc' ~ 'a.*c' --> true
Similarity matching works by determining how similar two strings are by taking into account how many letters are different and how many occur together. Use similarity search when you don't know the exact spelling of your query term. Similarity search can be used to design spell checkers.
For example, the distance between
1, as it has one additional character (
Almost all of today's search engines use Inverted indexes extensively. An inverted index parses a document and stores the individual words (that is, tokens) and their corresponding position in the document. For example:
'The quick brown fox jumps jumps over the lazy dog'
would be parsed as follows:
'brown':3 'dog':10 'fox':4 'jump':5,6 'lazi':9 'quick':2
This enables you to search for the document which has
In the case where you do not know the exact search term and want to find similar items or documents that sound similar to a specific term, fuzzy or phonetic search would come in handy. YugabyteDB supports fuzzy search like Soundex, Metaphone via PostgreSQL extensions. For example, to find words that sound like
anapistagafi, you could execute,
select word from words where dmetaphone(word) = dmetaphone('anapistagafi') limit 5;