MySQL REGEXP: Search Based On Regular Expressions

MySQL REGEXP: Search Based On Regular Expressions

 MySQL REGEXP: Search Based On Regular Expressions



Summaryin this tutorial, you will learn how to use the MySQL REGEXP operator to perform complex searches based on regular expressions.

Introduction to regular expressions

A regular expression is a special string that describes a search pattern. It is a powerful tool that gives you a concise and flexible way to identify strings of text e.g., characters, and words, based on patterns.

For example, you can use regular expressions to search for email, IP address, phone number, social security number, or anything that has a specific pattern.

A regular expression uses its own syntax that can be interpreted by a regular expression processor. A regular expression is widely used in almost platforms from programming languages to databases including MySQL.

The advantage of using regular expression is that you are not limited to search for a string based on a fixed pattern with the percent sign (%) and underscore (_) in the LIKE operator. The regular expressions have more meta-characters to construct flexible patterns.

The disadvantage of using regular expression is that it is quite difficult to understand and maintain such a complicated pattern. Therefore, you should describe the meaning of the regular expression in the comment of the SQL statement. In addition, the speed of data retrieval, in some cases, is decreased if you use complex patterns in a regular expression.

The abbreviation of regular expressions is regex or regexp.

MySQL REGEXP operator

MySQL adapts the regular expression implemented by Henry Spencer. MySQL allows you to match pattern right in the SQL statements by using REGEXP operator.

The following illustrates the syntax of the REGEXP operator in the  WHERE clause:

SELECT column_list FROM table_name WHERE string_column REGEXP pattern;

This statement performs a pattern match of a string_column against a pattern.

If a value in the string_column matches the pattern, the expression in the WHERE clause returns true, otherwise, it returns false.

If either string_column or pattern is NULL, the result is NULL.

In addition to the REGEXP operator, you can use the RLIKE operator, which is the synonym of the REGEXP operator.

The negation form of the REGEXP operator is NOT REGEXP.

MySQL REGEXP examples

Suppose you want to find all products whose last names start with characters A, B, or C. You can use a regular expression in the following SELECT statement:

SELECT productname FROM products WHERE productname REGEXP '^(A|B|C)' ORDER BY productname;

The pattern allows you to find the product whose name begins with A, B, or C.

  • The character ^ means to match from the beginning of the string.
  • The character | means to search for alternatives if one fails to match.

The following table illustrates some commonly used metacharacters and constructs in a regular expression.

MetacharacterBehavior
^matches the position at the beginning of the searched string
$matches the position at the end of the searched string
.matches any single character
[…]matches any character specified inside the square brackets
[^…]matches any character not specified inside the square brackets
p1|p2matches any of the patterns p1 or p2
*matches the preceding character zero or more times
+matches preceding character one or more times
{n}matches n number of instances of the preceding character
{m,n}matches from m to n number of instances of the preceding character

To find products whose names start with the character a, you use the metacharacter '^' to match at the beginning of the name:

SELECT productname FROM products WHERE productname REGEXP '^a';

If you want the REGEXP operator to compare strings in a case-sensitive fashion, you can use the BINARY operator to castcast a string to a binary string.

Because MySQL compares binary strings byte by byte rather than character by character. This allows the string comparison to be case-sensitive.

For example, the following statement matches the only uppercase "C" at the beginning of the product name.

SELECT productname FROM products WHERE productname REGEXP BINARY '^C';

To find the product whose name ends with f, you use 'f$' to match the end of a string.

SELECT productname FROM products WHERE productname REGEXP 'f$'

To find the product whose name contains the word "ford", you use the following query:

SELECT productname FROM products WHERE productname REGEXP 'ford';

To find the product whose name contains exactly 10 characters, you use ‘^' and ‘$ to match the beginning and end of the product name, and repeat {10} times of any character ‘.' in between as shown in the following query:

SELECT productname FROM products WHERE productname REGEXP '^.{10}$';

In this tutorial, you have learned how to query data using the MySQL REGEXP operator with regular expressions.

Reactions

Post a Comment

0 Comments

close