Netezza nzsql language support phonetic matching functions, that is, you can encode names into phonetic representations. These functions are useful to match names that are based on their pronunciation and reduce misses that might result from spelling variations. In this article, we will check types of Netezza phonetic matching functions and usage with some examples.
Netezza Phonetic Matching Functions
There are two types of Phonetic matching functions supported in Netezza:
- SoundEx NYSIIS algorithm
- Double Metaphone (dbl_mp) algorithm
There are three helper functions in Double Metaphone:
- Primary Metaohine (pri_mp)
- Secondary Metaphone (sec_mp)
- Score Metaphone (score_mp)
SoundEx NYSIIS
Soundex algorithm is widely used phonetic algorithm to index names based on the sound as it pronounced in English language. This function uses New York State Identification and Intelligence System (NYSIIS) variation of Soundex to convert a string into its SoundEx representation.
SoundEx NYSIIS Syntax
Below is the syntax for SoundEx NYSIIS:
nysiis(<str_expr>)
The function return results up-to 6 string that identifies the pronunciation of input string.
SoundEx NYSIIS Example
Below is the example that demonstrates SoundEx NYSIIS function:
select nysiis('Bangalore');
Above function will return ‘BANGAL’.
Double Metaphone (dbl_mp) Algorithms
Double metaphone is another algorithm that is used to index names based on its pronunciation. The dbl_mp function maps the 4-character keys to 16-bit numbers and returns a composite 32-bit value (Netezza type int4) that holds both the 16-bit primary and secondary keys.
Double Metaphone (dbl_mp) Algorithm Syntax
Below is the syntax for Double Metaphone (dbl_mp) algorithm:
dbl_mp(<str_expr>)
Double Metaphone (dbl_mp) Example
Below is the example that demonstrates Double Metaphone (dbl_mp) function:
select dbl_mp ('Bangalore');
Above function returns -1452824217 number for both primary and secondary keys.
Primary Metaphone
This is the double metaphone helper function, that takes the value returned by dbl_mp function and return 4-character primary metaphone string.
Syntax:
pri_mp(<int4 dbl_mp return value>)
For example,
select pri_mp(dbl_mp ('Bangalore'));
Above function return 4 – character primary string ‘PNKL’.
Secondary Metaphone
This is the double metaphone helper function, that takes the value returned by dbl_mp function and return 4-character secondary metaphone string.
Syntax:
sec_mp(<int4 dbl_mp return value>)
For example,
select sec_mp(dbl_mp ('Bangalore'));
Above function return 4 – character secondary string ‘PNKL’.
Score Metaphone
This function takes two values that are returned by the dbl_mp function and compares them to determine how closely they match.
Syntax:
score_mp(<int4 dbl_mp value 1>, <int4 dbl_mp value 1>, <int4 strong match value>, <int4 normal match value>, <int4 minor match value>, <int4 no match value>)
For the four match value arguments, you can specify values such as 1, 2, 3, and 4 (for strongest, normal, minimal, or no match). You can also use weighted values such as 100, 50, 25, and 0 to return more points for better match results.
for example,
select score_mp(dbl_mp ('Bangalore'),dbl_mp ('Bengaluru'),1,2,3,4);
Above example returns value 1, that means, strong match.
Read:
- Netezza Data Transformation Functions and Examples
- Netezza LIKE Statement and Pattern matching Examples