PHP regular expressions use the "Perl Compatible Regular Expressions" (PCRE) syntax. You can read more about PCRE in this Wikipedia article.
What this PCRE compatibility means for you, is that if you learn the PHP regular expression syntax, you will also understand regular expressions in a lot of other systems such as JavaScript, ColdFusion, etc. And if you already know about regular expressions in these other systems, you should be comfortable quickly with PHP's regular expressions.
A regular expression is a pattern that is matched against a subject string from left to right. Most characters stand for themselves in the pattern, and match the corresponding characters in the subject. As a trivial example, the pattern The quick brown fox matches a portion of a subject string that is, not surprisingly, The quick brown fox.
A regular expression must be enclosed in delimiters. A very commonly-used delimiter for regular expressions is the forward slash (/).
Here are some samples of regular expressions:
Meta-characters in a regular expression do not stand for themselves, but rather are interpreted in a special way.
Some of the most common meta-characters are:
| Character | Description |
|---|---|
| \ | escape character (See the next section regarding escape sequences.) |
| ^ | start of subject, or start of line in multiline mode |
| $ | end of subject, or end of line in multiline mode |
| . | matches any character except newline |
| [ | start of character class definition |
| ] | end of character class definition |
| | | alternative branch ("OR") |
| ( | start of subpattern |
| ) | end of subpattern |
| ? | 0 or 1 quantifier (counting meta-character) |
| * | 0 or more quantifier (counting meta-character) |
| + | 1 or more quantifier (counting meta-character) |
| { | start min/max quantifier |
| } | end min/max quantifier |
| ^ | in a character class, negates the class (but only if first character) |
| - | indicates a character range |
The backslash character has several uses. First, if it is followed by a non-alphanumeric character, it takes away any special meaning that character may have. This use of backslash as an escape character applies both inside and outside character classes.
For example, if you want to match a * character, you write \* in the pattern. This rule applies whether or not the following character would otherwise be interpreted as a meta-character, so it is always safe to precede a non-alphanumeric character with \ to specify that it stands for itself. This consideration also means that if you want to match a backslash, you put \\ into the pattern.
In PHP regular expressions, you need to remember that quoted strings give special meaning to the backslash. So if you need to match \ with a regular expression \\ in a PHP string, you must put either "\\\\" or '\\\\' into the pattern.
You can also use the backslash in regular expressions to encode character sequences in a compact format. Here are some of the most commonly-used character-sequence escape sequences:
| Escape Sequence | Description |
|---|---|
| \n | Newline |
| \r | Carriage Return |
| \t | Tab |
| \xhh | Character with hex code hh |
| \d | Any decimal digit |
| \D | Any character that is not a decimal digit |
| \s | Any whitespace character |
| \S | Any character that is not a whitespace character |
| \w | Any "word" character |
| \W | Any "non-word" character |
| \b | Word boundary |
| \B | Not a word boundary |
You can use regular expressions to test whether a pattern is in a character (text) string.
One very handy thing that you can do with this text testing is to validate form input before you change the fields in a database table, for instance.
In fact, the e-Handout "PHP Form Validation" has at least one regular expression in its class definition for the FormValidator object. Look for the sub-section "isEmailAddress()", and notice the value that is assigned to the variable $pattern. That value is a regular expression which tests for a basic e-mail address format.
The regular expression that is assigned to the variable $pattern is this:
"/^([a-zA-Z0-9])+([\.a-zA-Z0-9_-])*@([a-zA-Z0-9_-])+(\.[a-zA-Z0-9_-]+)+/"
The sample code from the e-Handout section "isEmailAddress()" "tests" the text value that came to this page, probably from an HTML form. These lines of code do the testing (starting with the call to the function preg_match()):
if(preg_match($pattern, $value))
{
return true;
}
else
{
$this->_errorList[] = array("field" => $field,
"value" => $value, "msg" => $msg);
return false;
}
The various portions in the pattern in the above regular expression are explained here:
| Portion of the regular expression | Explanation |
|---|---|
| / | Begins the regular expression. This is the opening delimiter. |
| ^ | Tells the testing/comparing to start at the beginning of the text. |
| ([a-zA-Z0-9])+ | Test for one or more characters in the set [a through z, A through Z, and 0 through 9]. |
| ([\.a-zA-Z0-9_-])* | Tests for zero or more characters in the set [period, a through z, A through Z, 0 through 9, underscore, and hyphen]. |
| @ | Tests for the presence of an @ sign. |
| ([a-zA-Z0-9_-])+ | Tests for one or more characters in the set [a through z, A through Z, and 0 through 9]. |
| (\.[a-zA-Z0-9_-]+)+ | Tests for one or more characters in the set [period, a through z, A through Z, 0 through 9, underscore, and hyphen]. |
| / | Ends the regular expression. This is the closing delimiter. |
Please note that the built-in PHP function preg_match() is only one of a set of functions which use regular expressions. You can find one list of these functions here at the PHP.net site.
You can find regular expressions at many online sites, which can accomplish many specific tests.
Following is a table of some of the more common regular expressions which you may be able to use in your PHP pages:
| Type of Data | Possible Regular Expression | Notes |
|---|---|---|
| Very Simple Username | /([a-zA-Z]{6,})/ | This pattern requires:
|
| Very Simple Password | /^(\D+\d+\D*)$/ | This pattern requires:
|
| Another Simple Password | /(?=.{6,}).*/; | This pattern requires:
|
| Medium-Strength Password | /^(?=.{7,})(((?=.*[A-Z])(?=.*[a-z]))|((?=.*[A-Z])(?=.*[0-9]))|((?=.*[a-z])(?=.*[0-9]))).*$/ | This pattern requires:
|
| Very Strong Password | /^(?=.{8,})(?=.*[A-Z])(?=.*[a-z])(?=.*[0-9])(?=.*\\W).*$/ | This pattern requires:
|
| Name | /^[A-Z]+[a-zA-Z]*(\s+|([A-Z]+[a-zA-Z]*))*\.?(\s|[A-Z]+[a-zA-Z]*)*$/ | This pattern requires:
|
| Address | /^\d+\s[A-Z]+[a-zA-Z]*$/ | This very simplistic pattern requires:
|
| Zip Code | /^\d{5}-?(\d{4})?$/ | This pattern requires:
|
| Social Security Number (SSN) | /^\d{3}(\s|-)?\d{2}(\s|-)?\d{4}$/ | This pattern requires:
|
| Integer Numeric Info | /^\d+$/ | This pattern requires one or more digits, only, with no alphabetic characters |
| Telephone Number | /^\(?\d{3}\)?(\s|\.|-)?\d{3}(\s|-|\.)?\d{4}$/ | As described more fully above, this phone number pattern requires:
|
| Fancier Telephone Number | /^((\+\d{1,3}(-| )?\(?\d\)?(-| )?\d{1,5})|(\(?\d{2,6}\)?))(-| )?(\d{3,4})(-| )?(\d{4})(( x| ext)\d{1,5}){0,1}$/ | This more complete phone number pattern requires:
|
| E-mail Address | /^\w+([\.-]?\w+)*@\w+([\.-]?\w+)*\.(\w{2}|(com|net|org|edu|int|mil|gov|arpa|biz|aero|name|coop|info|pro|museum))$/ | This e-mail pattern requires:
|
| Date, for example 11/03/2011 | /^\d{1,2}(\-|\/|\.)\d{1,2}\1\d{4}$/ | This date pattern requires:
|
| Time, for example 08:15 p.m. | /^(([0][\d])|[1][0-2]|[\d])(:)([0-5])([\d])(\s)*[pPaA][\.]?[mM]?[\.]?$/ | This time pattern requires:
|