Phonetic search

Learn how to do phonetic matching in YSQL

While looking up exact matches suffices for most scenarios, you need approximate matching in situations where you don't know the exact term you are searching for but may remember the phonetics of the term or parts of the term.

For phonetic matching, YugabyteDB natively supports the PostgreSQL extension fuzzystrmatch, which provides multiple functions - Soundex, Metaphone, and Double Metaphone - which you can use to determine phonetic similarities between text.

Setup

Setup

To set up a local universe, refer to Set up a local YugabyteDB universe.

Setup

To set up a cluster, refer to Set up a YugabyteDB Managed cluster.

Setup

To set up a universe, refer to Set up a YugabyteDB Anywhere universe.

Create the following table:

CREATE TABLE IF NOT EXISTS words (
    id SERIAL,
    word TEXT NOT NULL,
    PRIMARY KEY(id)
);

Load some sample words into the table as follows:

INSERT INTO words(word) VALUES 
  ('anopisthographic'),('ambassadorship'),('ambuscader'),('ambiguity'),('ampycides'),
  ('anapaestically'),('anapests'),('anapsidan'),('unpsychic'),('anapsid'),
  ('unpessimistic'),('unepistolary'),('inabstracted'),('anapaest'),('unobstinate'),
  ('amphigoric'),('amebic'),('amebous'),('ambassage'),('unpacified'),('unposing');

To enable fuzzy matching, activate the fuzzystrmatch extension as follows:

CREATE extension IF NOT EXISTS fuzzystrmatch;

Soundex

The Soundex system is a method of matching similar-sounding names by converting them to the same four letter code which can then be used to determine similarities between text.

For example, suppose you heard a word that sounded like anapistagafi. To find the actual word that sounds similar, you could use soundex method to find the closest-sounding word as follows:

SELECT word, soundex(word), difference(word,'anapistagafi') FROM words WHERE soundex(word) = soundex('anapistagafi') limit 5;
      word      | soundex | difference
----------------+---------+------------
 anapaestically | A512    |          4
 ampycides      | A512    |          4
 anapsid        | A512    |          4
 amebous        | A512    |          4
 anapaest       | A512    |          4

The difference method calculates how different one Soundex code is from another. The value ranges from 0-4 and can be used to rank the results. In this case, the difference is the same because the value is very naive.

Metaphone

The metaphone algorithm improves upon Soundex by taking into consideration the inconsistencies in English spelling and pronunciation to produce a more accurate encoding. For example:

SELECT word, metaphone(word,4) FROM words WHERE metaphone(word,4) = metaphone('anapistagafi',4) limit 5;
       word       | metaphone
------------------+-----------
 anapaestically   | ANPS
 anapsid          | ANPS
 anapaest         | ANPS
 anapsidan        | ANPS
 anopisthographic | ANPS

The metaphone function takes in an additional parameter of code output length that can be used for slightly modifying the matching errors.

Double Metaphone

The Double Metaphone makes a number of fundamental design improvements over the original Metaphone algorithm. It calculates two different codes that enable it to compare a wider range of pronunciations.

SELECT word, dmetaphone(word) FROM words WHERE dmetaphone(word) = dmetaphone('anapistagafi') limit 5;
      word      | dmetaphone
----------------+------------
 unpsychic      | ANPS
 anapaestically | ANPS
 inabstracted   | ANPS
 unobstinate    | ANPS
 unposing       | ANPS

Learn more