Getting Started
This section the very basics of understanding, creating and using regular expressions.
Simple String Matching
The simplest regular expression is a string of characters. To have a match, those characters must appear in the target string. Specifically, they must appear in the same order just as they appear in the regular expression.
The following table shows some simple string regular expressions and the matches for a target string:
Regex | Matches | Target String |
---|---|---|
a | a | "Maui no ka oi" |
"abc | "abc | "abcdef" |
Maui | Maui | "Maui no ka oi" |
ka oi | ka oi | "Maui no ka oi" |
Note: Not all characters can be used "as is" in a match. Some characters, called metacharacters, are special characters in regular expressions. The metacharacters are:
{}[]()^$.|*+?\
Simple string matching works well, but the regular expression needs to be pretty specific for each target string. The next section will address this.
Using Character Classes
A character class allows a set of possible characters, rather than just a single character, to match at a particular point in a regex.
The following table shows some regular expressions with character classes and the matches for a target string:
Regex | Matches | Target String |
---|---|---|
[a-z] | a,b,c | "abcABC" |
[a-zA-Z] | a,b,c,A,B,C | "abcABC" |
[^0-9] | ",a,b,c,A,B,C," | "abcABC" |
[ch]at | cat, hat | "The cat and the hat" |
Matching This or That (Alternation)
The vertical bar '|' metacharacter can be used to match different character strings. To match "cat" or "hat", the regular expression "cat|hat" can be used. The regular expression engine will try at each character position to match "cat". If "cat" doesn't match, the engine will try the next alternative, "hat". If "hat" doesn't match either, then the match fails and the engine moves to the next position in the string.
It is important to remember that the regular expression engine will try to match the regex at the earliest possible point in the string.
The following table shows some regular expressions with alternation and the matches for a target string:
Regex | Matches | Target String |
---|---|---|
c|co|cow | c | cows |
cow|co|c | cow | cows |
cow|pig|chicken | cow, pig, chicken | "The farmer raises cows, pigs, and chickens" |
pig|chicken|cow | cow, pig, chicken | "The farmer raises cows, pigs, and chickens" |
Grouping and Capturing
In a regular expression, the '(' and ')' characters perform two functions: grouping and capturing.
Grouping
A subpattern within the parenthesis is treated as a single unit.
The following table shows some regular expressions using grouping and the matches for a target string:
Regex | Matches | Target String |
---|---|---|
car(toon|pet) | carpet | "There is a spot on the carpet." |
car(toon|pet) | cartoon | "Scooby Doo is my favorite cartoon." |
Capturing
Any text matched by the pattern within parenthesis is captured for later use. The captures are numbered by counting the opening parenthesis '(' started from the left.
Note: The captures can also be named by using the form (?<name>expression)
.
If the regular expression engine supports backreferences, the match can be referred to within the same expression with \1, \2, etc.
In many cases, the captured text is also made available after a match, depending upon the implementation. In some engines, the captures are placed in special variables like $1, $2, etc.
The following table shows some regular expressions with alternation and the matches for a target string:
Regex | Matches | Target String |
---|---|---|
(\w)\1 | oo | scooby |
(?<ch1>\w)\k<ch1> | oo, oo, oo, oo (.NET only) | scooby doooooo! |
Note: To keep the parenthesis metacharacters from capturing matches (ie. a non-capturing group), use the form: (?:expression)
.
Quantifiers (Repetition)
To specify that a portion of a regular expression repeats, use the quantifier metacharacters ('*', '?', '+', and "{ }". These metacharacters have the following meanings:
-
exp* = match exp 0 or more times
-
exp? = match exp 0 or 1 times
-
exp+ = match exp 1 or more times
-
exp{n} = match exp exactly n times
-
exp{n,} = match exp at least n or more times
-
exp{n,m} = match exp at least n times, but not more than m times.
The following table shows some regular expressions with quantifiers and the matches for a target string:
Regex | Matches | Target String |
---|---|---|
[a-z]+ | The | "The farmer raises cows, pigs, and chickens" |
\w.*\w | Green Eggs and Ham | "Green Eggs and Ham" |
\d{4} | 1955 | Nov 5, 1955 |
Summary
This section has covered some of the basic and more commonly used regular expression features, but there is much more. Please refer to the specific section regarding each of these subjects along with additional sections covering more advanced features.