Appearance
SIMILAR TO: Best Practices
To improve performance for SIMILAR TO
patterns, follow the tips in this section or consider using regular expressions directly. SIMILAR TO
expressions are converted to regular expressions.
The syntax SIMILAR TO 'pattern'
is converted to regex('^(?:pattern)$')
, which alternatively can be written as ~ '^(?:pattern)$'
For example, the following expression:
SIMILAR TO '(%one%|%two%)'
is converted to:
regex('^(?:.*one.*|.*two.*)$')
The wildcard %
sign is converted to .*
(any character repeated zero or more times) and _
is converted to .
(any single character). As an essential part of the conversion, regular expression special characters (such as ^
, $
, and .
) are properly escaped to preserve the original SIMILAR TO
pattern behavior.
For example, consider the following query with SIMILAR TO
, which returns true
(a match):
premdb=# select '123.456$' similar to '123.456$' from sys.const;
?column?
----------
t
(1 row)
To preserve the semantics of this expression when it is converted, the .
and $
characters must be escaped with the \
character:
premdb=# select '123.456$' ~ '^(?:123\.456\$)$' from sys.const;
?column?
----------
t
(1 row)
The following sections suggest some SIMILAR TO
patterns that you should consider rewriting (either as more optimal SIMILAR TO
expressions or as regular expressions).
Rewrite (%...%|%...%) as %(...|...)%
For example, rewrite this SIMILAR TO
expression:
name SIMILAR TO '(%Manchester United%|%Man Utd%|%Man United%|%Man U%)'
as follows:
name SIMILAR TO '%(Manchester United|Man Utd|Man United|Man U)%'
Alternatively, you could rewrite the same expression as the following regular expression:
name ~ '(Manchester United|Man Utd|Man United|Man U)'
Both expressions will return the same result. For example:
premdb=# select * from team
where name SIMILAR TO '%(Manchester United|Man Utd|Man United|Man U)%';
teamid | htid | atid | name | nickname | city | stadium | capacity | avg_att
--------+------+------+-------------------+------------+------------+--------------+----------+---------
25 | 26 | 75 | Manchester United | Red Devils | Manchester | Old Trafford | 75635 | 75.286
(1 row)
premdb=# select * from team
where name ~ '(Manchester United|Man Utd|Man United|Man U)';
teamid | htid | atid | name | nickname | city | stadium | capacity | avg_att
--------+------+------+-------------------+------------+------------+--------------+----------+---------
25 | 26 | 75 | Manchester United | Red Devils | Manchester | Old Trafford | 75635 | 75.286
(1 row)
Rewrite '(%lower case pattern%|% upper case pattern%)' as ~* '(lower case pattern)'
For example, you could rewrite this SIMILAR TO
expression:
c1 SIMILAR TO '(%trial%|%TRIAL%|%TRAIL%|%trail%|%Trail%|%Trial%)'
as follows:
c1 SIMILAR TO '%(trial|TRIAL|TRAIL|trail|Trail|Trial)%'
If you use a regular expression rewrite, you can also apply a case-insensitive match. For example:
c1 ~* '(?:trial|trail)'
However, keep in mind that case-insensitive matching in a regular expression may find more matches than an equivalent SIMILAR TO
expression, which may or may not be what you want. For example, note the different results from the following two queries:
premdb=# select distinct winners from season where winners ~* 'ch';
winners
-------------------
Manchester United
Chelsea
Manchester City
(3 rows)
premdb=# select distinct winners from season where winners similar to '%ch%';
winners
-------------------
Manchester United
Manchester City
(2 rows)
Rewrite (...%|...%|...%) as (..[.]|...)%
For example, rewrite this SIMILAR TO
expression:
c1 SIMILAR TO '(12%|13%|14%|15%|16%|17%|OB%|OS%|BU%|''|OY%|OZ%|VT%)'
as follows:
c1 SIMILAR TO '(12|13|14|15|16|17|OB|OS|BU|OY|OZ|VT)%' OR c1 = ''
Take the rewrite a step further by using character classes to reduce the number of optional sub-expressions. For example, a better rewrite in this case is as follows:
c1 SIMILAR TO '(1[234567]|O[BSYZ]|BU|VT)%' OR c1 = ''
This is an example where a SIMILAR TO
expression is still being specified but it contains regular expression constructs (character classes in this case).
Parent topic:SQL Operators and Pattern Matching Functions