LE_DST
Tip: This function was added in Version 5.2.26 and is not available in earlier 5.2.x versions.
Return the difference between two character strings in terms of the number of changes required to convert the first string into the second. The comparison of the strings is case-sensitive. This function uses the Levenshtein edit distance algorithm to compute the "distance" between the strings. With this algorithm, a character transposition change such as 'ab'
to 'ba'
counts as two changes.
This function returns an integer.
See also DLE_DST.
Configuration
The behavior of the LE_DST
and DLE_DST
functions is configurable, using the following parameters:
max_levenshtein_string_size
: Maximum size (in bytes) of the two input arguments to the function. The default is255
.
Because these functions can be very CPU and memory intensive, you may have to wait for results when long strings are used as inputs. You may be able to improve performance by setting an upper bound on their length.
max_levenshtein_distance
: Maximum distance for which you expect an accurate answer. The default is-1
, which means there is no limit.
For example, you may only want to check if the distance between two strings is greater than 5 characters; you don’t need to know what the real distance value is, just whether or not it is bigger than 5. In this case, set max_levenshtein_distance
to 5
. If the real distance between the two strings is less than or equal to 5, the function returns the real value. Otherwise, it returns an answer for the first comparison that is greater than 5 and stops processing.
The ability to check that a distance between two strings is no greater than a known value is a common use case; setting this parameter may optimize performance in those situations.
Note: If one of the inputs is an empty string, the function simply returns the number of characters in the other argument, regardless of the current settings of these configuration parameters.
Examples
The following example requires three changes, one for the case of the letter C
, and two for the transposition of kc
:
yellowbrick=# select le_dst('Chekcing','checking') from sys.const;
le_dst
--------
3
(1 row)
Calculate the distance between the London team names and their nicknames in the team
table:
premdb=# select name, nickname, le_dst(name,nickname) ledst from team
where city='London' and name is not null order by 3 desc;
name | nickname | ledst
---------------------+------------+-------
Queens Park Rangers | Hoops | 18
Charlton Athletic | Addicks | 16
Tottenham Hotspur | Spurs | 15
West Ham United | Hammers | 12
Crystal Palace | Eagles | 12
Wimbledon | Dons | 8
Chelsea | Pensioners | 8
Fulham | Cottagers | 8
Arsenal | Gunners | 7
(9 rows)
Calculate the distance between two specific strings with both LE_DST
and DLE_DST
. Note that the results are different for these particular strings because the functions count character transpositions differently.
premdb=# select le_dst('Arsenal','Gunners'),dle_dst('Arsenal','Gunners') from sys.const;
le_dst | dle_dst
--------+---------
7 | 6
(1 row)
The following example shows the behavior of this function when max_levenshtein_string_size
is set to 10
. Because of the length of the specified strings, both queries return an expected error:
yellowbrick=# set max_levenshtein_string_size to 10;
SET
yellowbrick=# show max_levenshtein_string_size;
max_levenshtein_string_size
-----------------------------
10
(1 row)
yellowbrick=# create table abc(c1 varchar);
CREATE TABLE
yellowbrick=# insert into abc values('abcdefghijklmnopqrstuvwxyz');
INSERT 0 1
yellowbrick=# insert into abc values('abcdefghijklm');
INSERT 0 1
yellowbrick=# select le_dst(c1,'abcdefghijklm') from abc;
ERROR: LE_DST argument exceeds maximum length of 10 bytes
HINT: Change the max value with the 'max_levenshtein_string_size' guc
yellowbrick=# select le_dst('abcdefghijklm', c1) from abc;
ERROR: LE_DST argument exceeds maximum length of 10 bytes
HINT: Change the max value with the 'max_levenshtein_string_size' guc
Also note that if the empty string is passed as one of the arguments in these queries, the result is the length of the other string:
yellowbrick=# select le_dst('', c1) from abc;
le_dst
--------
13
26
(2 rows)
yellowbrick=# select le_dst(c1, '') from abc;
le_dst
--------
26
13
(2 rows)
The following example shows the behavior when the max_levenshtein_distance
parameter is set to 5
(and max_levenshtein_string_size
is reset to its default value of 255
):
yellowbrick=# reset max_levenshtein_string_size;
RESET
yellowbrick=# set max_levenshtein_distance to 5;
SET
yellowbrick=# select c1, le_dst('abcdefghijklm', c1) from abc;
c1 | le_dst
----------------------------+--------
abcdefghijklmnopqrstuvwxyz | 6
abcdefghijklm | 0
(2 rows)
The first row returns 6 (max_levenshtein_string_size
+ 1) for the first row and 0
for the second row. The function returns 6
for the first value that has a distance greater than 5
, then stops processing.
Parent topic:String Functions