IT. Expert System.

REGEX

C# and .NET


C# and .NET

Regular expressions have been a part of all .NET implementations. This reference covers the .NET regular expression syntax, .NET classes, and C# examples.

Version: Since 1.0

Engine Type: Traditional NFA

Supported Metacharacters

The following tables list the supported metacharacters:

Character Escapes

Sequence Description
\a Alarm (beep)
\b In a character class, matches a backspace character
\e Escape
\n Newline
\r Carriage Return
\f Form feed
\t Horizontal Tab
\v Vertical tab, x0B
\0octal Character specified by a two-digit octal code
\xhex Character specified by a two-digit hexadecimal code
\uhex Character specified by a four-digit hexadecimal code
\cchar Named control character

Character Classes

Class Description
[chars] Matches any single character in chars (by default, the match is case-sensitive)
Pattern:
[ae]

matches:
'a' in "gray"
'a', 'e' in "lane"
[^chars] Negation: Matches any single character that is not in chars (by default, characters in chars are case-sensitive)
Pattern:
[^ae]

matches:
'r', 'g', 'n' in "reign"
[first-last] Character range: Matches any single character in the range from first to last
Pattern:
[A-Z]

matches:
'A', 'B', in "AB123"
. Wildcard: Matches any single character except \n
Pattern:
a.e

matches:
"ave" in "wave"
"ate" in "water"
\d Matches any decimal digit
[0-9]
Pattern:
\d

matches:
'4' in "4 = IV"
\D Matches an non-digit,
[^0-9]
Pattern:
\D

matches:
' ', '=', 'I', 'V' in "4 = IV"
\w Matches any word character
[a-zA-Z_0-9]
Pattern:
\w

matches:
'I', 'D', 'A', '1', '3' in "IDA1.3"
\W Matches any non-word character
[^a-zA-Z_0-9]
Pattern:
\W

matches:
' ', '.' in "ID A1.3"
\s Matches any white-space character
[ \f\n\r\t\v]
Pattern:
\s

matches:
' ' in "ID A1.3"
\S Matches any non-white-space character
[^ \f\n\r\t\v]
Pattern:
\S

matches:
'I','D','A','1','.','3' in "ID A1.3"
\p{name} Matches any single character in the Unicode general category or named block specified by name
\P{name} Matches any single character that is not in the Unicode general category or named block specified by name

Anchors

Assertion Description
^ The match must start at the beginning of the string or line

Note: In multiline mode, ^ matches after any newline
Pattern:
^\d{3}

matches:
"123" in "123-456"
$ The match must occur at the end of the string or before \n at the end of the line or string
Pattern:
\d{3}$

matches:
"456" in "123-456"
\A The match must occur at the start of the string

Pattern:
\A\d{3}

matches:
"123" in "123-456"
\Z The match must occur at the end of the string or before \n at the end of the string

Pattern:
\d{3}\Z

matches:
"456" in "123-456"
\z The match must occur at the end of the string
Pattern:
\d{3}\Z

matches:
"456" in "123-456"
\b The match must occur on a boundary between a \w (alphanumeric) and a \W (nonalphanumeric) character
\B The match must not occur on a \b boundary
\G The match must occur at the point where the previous match ended
Pattern:
\G\(\d\)

matches:
"(1)", "(3)", "(5)" in "(1)(3)(5)[7](9)"

Modifiers

Modifier/sequence Mode character Description
Singleline s Dot (.) matches any character, including a line terminator
Multiline m ^ and $ match next to embedded line terminators
IgnorePatternWhitespace x Ignore whitespace and allow embedded comments starting with #
IgnoreCase i Case-insensitive match based on characters in the current culture
CultureInvariant i Culture-insensitive match
ExplicitCapture n Allow named capture groups, but treat parentheses as non-capturing groups
Compiled Compile regular expression
RightToLeft Search from right to left, starting to the left of the start position
ECMAScript Enables ECMAScript compliance when used with IgnoreCase or Multiline
(?imnsx-imnsx) Turn match flags on or off for rest of pattern
(?imnsx-imnsx:...) Turn match flags on or off for the rest of the subexpression
(?#...) Treat substring as a comment
#.. Treat rest of line as a comment in /x mode

Grouping

Sequence Description
(...) Grouping. Submatches fill \1,\2,... and $1, $2,...
\n In a regular expression, match what was matched by the nth earlier submatch
$n In a replacement string, contains the nth earlier submatch
(?<name>...) Captures matched substring into group, name
(?:...) Grouping-only parentheses, no capturing
(?>...) Disallow backtracking for subpattern

Quantifiers

Quantifier Description
* Matches the previous element zero or more times
+ Matches the previous element one or more times
? Match 1 or 0 times
{n} Match exactly n times
{n,} Match at least n times
{n,n} Match at least x times, but no more than y times
*? Match 0 or more times, but as few times as possible
+? Match 1 or more times, but as few times as possible
?? Match 0 or 1 times, but as few times as possible
{n,}? Match at least n times, but as few times as possible
{n,m}? Match at least x times, no more than y times, but as few times as possible

Alternation

Alternation Construct Description
| Matches any one element separated by the vertical bar (|) character
(?( expression ) yes | no ) Matches yes part if expression matches; otherwise, matches the optional no part (expression is interpreted as a zero-width assertion)
(?( name ) yes | no ) Matches yes part if the named capture name has a match; otherwise, matches the optional no part

Replacement sequences

Sequence Description
$1, $2, .. Captured submatches
${name} Matched text of a named capture group
$' Text before match
$& Text of match
$' Text after match
$+ Last parenthesized match
$_ Copy of original input string

Regular Expression Classes and Interfaces

.NET defines its regular expression support in the System.Text.RegularExpressions module. The RegExp( ) constructor handles regular expression creation, and the rest of the RegExp methods handle pattern matching. The Groups and Match classes contain information about each match.

C#'s raw string syntax, @"", allows you to define regular expression patterns without having to escape embedded backslashes.

This class handles the creation of regular expressions and pattern matching. Several static methods allow for pattern matching without creating a RegExp object.

Methods

public Regex(string pattern)
public Regex(string pattern, RegexOptions options)

Return a regular expression object based on pattern and with the optional mode modifiers, options.

public static void CompileToAssembly(RegexCompilationInfo[ ] regexinfos, System.Reflection.AssemblyName assemblyname)
public static void CompileToAssembly(RegexCompilationInfo[ ] regexinfos, System.Reflection.AssemblyName assemblyname)
public static void CompileToAssembly(RegexCompilationInfo[ ] regexinfos, System.Reflection.AssemblyName assemblyname, System.Reflection.Emit.CustomAttributeBuilder[ ] attributes)
public static void CompileToAssembly(RegexCompilationInfo[ ] regexinfos, System.Reflection.AssemblyName assemblyname, System.Reflection.Emit.CustomAttributeBuilder[ ] attributes, string resourceFile)

Compile one or more Regex objects to an assembly. The regexinfos array describes the regular expressions to include. The assembly filename is assemblyname. The array attributes defines attributes for the assembly. resourceFile is the name of a Win32 resource file to include in the assembly.

public static string Escape(string str)

Return a string with all regular expression metacharacters, pound characters (#), and whitespace escaped.

public static bool IsMatch(string input, string pattern)
public static bool IsMatch(string input, string pattern, RegexOptions options)
public bool IsMatch(string input)
public bool IsMatch(string input, int startat)

Return the success of a single match against the input string input. Static versions of this method require the regular expression pattern. The options parameter allows for optional mode modifiers (OR'd together). The startat parameter defines a starting position in input to start matching.

public static Match Match(string input, string pattern)
public static Match Match(string input, string pattern, RegExpOptions options)
public Match Match(string input)
public Match Match(string input, int startat)
public Match Match(string input, int startat, int length)

Perform a single match against the input string input and return information about the match in a Match object. Static versions of this method require the regular expression pattern. The options parameter allows for optional mode modifiers (OR'd together). The startat and length parameters define a starting position and the number of characters after the starting position to perform the match.

public static MatchCollection Matches(string input, string pattern)
public static MatchCollection Matches(string input, string pattern, RegExpOptions options)
public MatchCollection Matches(string input)
public MatchCollection Matches(string input, int startat)

Find all matches in the input string input, and return information about the matches in a MatchCollection object. Static versions of this method require the regular expression pattern. The options parameter allows for optional mode modifiers (OR'd together). The startat parameter defines a starting position in input to perform the match.

public static string Replace(string input, pattern, MatchEvaluator evaluator)
public static string Replace(string input, pattern, MatchEvaluator evaluator, RegexOptions options)
public static string Replace(string input, pattern, string replacement)
public static string Replace(string input, pattern, string replacement, RegexOptions options)
public string Replace(string input, MatchEvaluator evaluator)
public string Replace(string input, MatchEvaluator evaluator, int count)
public string Replace(string input, MatchEvaluator evaluator, int count, int startat)
public string Replace(string input, string replacement)
public string Replace(string input, string replacement, int count)
public string Replace(string input, string replacement, int count, int startat)

Return a string in which each match in input is replaced with either the evaluation of the replacement string or a call to a MatchEvaluator object. The string replacement can contain backreferences to captured text with the $n or ${name} syntax.

The options parameter allows for optional mode modifiers (OR'd together). The count paramenter limits the number of replacements. The startat parameter defines a starting position in input to start the replacement.

public static string[ ] Split(string input, string pattern)
public static string[ ] Split(string input, string pattern, RegexOptions options)
public static string[ ] Split(string input)
public static string[ ] Split(string input, int count)
public static string[ ] Split(string input, int count, int startat)

Return an array of strings broken around matches of the regex pattern. If specified, no more than count strings are returned. You can specify a starting position in input with startat.

Match

Properties

public bool Success

Indicates whether the match was successful.

public string Value

Text of the match.

public int Length

Number of characters in the matched text.

public int Index

Zero-based character index of the start of the match.

public GroupCollection Groups

A GroupCollection object where Groups[0].value contains the text of the entire match, and each additional Groups element contains the text matched by a capture group.

Methods

public Match NextMatch( )

Return a Match object for the next match of the regex in the input string.

public virtual string Result(string result)

Return result with special replacement sequences replaced by values from the previous match.

public static Match Synchronized(Match inner)

Return a Match object identical to inner, except also safe for multithreaded use.

Group

Properties

public bool Success

True if the group participated in the match.

public string Value

Text captured by this group.

public int Length

Number of characters captured by this group.

public int Index

Zero-based character index of the start of the text captured by this group.

Unicode Support

.NET provides built-in support for Unicode 3.1, including full support in the \w, \d, and \s sequences. The range of characters matched can be limited to ASCII characters by turning on ECMAScript mode. Case-insensitive matching is limited to the characters of the current language defined in Thread.CurrentCulture, unless the CultureInvariant option is set.

.NET supports the standard Unicode properties (see Table 1-2) and blocks. Only the short form of property names are supported. Block names require the Is prefix and must use the simple name form, without spaces or underscores.

Examples

Example: Simple match

//Match Spider-Man, Spiderman, SPIDER-MAN, etc.  namespace Regex_PocketRef {    using System.Text.RegularExpressions;      class SimpleMatchTest {      static void Main(  )      {        string dailybugle = "Spider-Man Menaces City!";          string regex = "spider[- ]?man";          if (Regex.IsMatch(dailybugle, regex, RegexOptions.IgnoreCase)) {        //do something }      }  }

Example 1-10. Match and capture group

//Match dates formatted like MM/DD/YYYY, MM-DD-YY,...  using System.Text.RegularExpressions;    class MatchTest {    static void Main(  )      {      string date = "12/30/1969";      Regex r =         new Regex( @"(\d\d)[-/](\d\d)[-/](\d\d(?:\d\d)?)" );        Match m = r.Match(date);        if (m.Success) {        string month = m.Groups[1].Value;        string day = m.Groups[2].Value;        string year = m.Groups[3].Value;      }    }   }

Example 1-11. Simple substitution

//Convert <br> to <br /> for XHTML compliance using System.Text.RegularExpressions;    class SimpleSubstitutionTest {    static void Main(  )     {      string text = "Hello world. <br>";      string regex = "<br>";      string replacement = "<br />";        string result =         Regex.Replace(text, regex, replacement, RegexOptions.IgnoreCase);    }  }

Example 1-12. Harder substitution

//urlify - turn URL's into HTML links using System.Text.RegularExpressions;    public class Urlify {    static Main (  )     {     string text = "Check the website, http://www.oreilly.com/catalog/repr.";     string regex =                                                        @"\b # start at word boundary (                             # capture to $1 (https?|telnet|gopher|file|wais|ftp) :                                        # resource and colon [\w/#~:.?+=&%@!\-] +?         # one or more valid # characters # but take as little as # possible )                                                                         (?=                           # lookahead [.:?\-] *                     # for possible # punctuation (?: [^\w/#~:.?+=&%@!\-]       # invalid character | $ )                         # or end of string )";        Regex r = new Regex(regex,  RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);      string result = r.Replace(text, "<a href=\"$1\">$1</a>");    }   }


Content

Android Reference

Java basics

Java Enterprise Edition (EE)

Java Standard Edition (SE)

SQL

HTML

PHP

CSS

Java Script

MYSQL

JQUERY

VBS

REGEX

C

C++

C#

Design patterns

RFC (standard status)

RFC (proposed standard status)

RFC (draft standard status)

RFC (informational status)

RFC (experimental status)

RFC (best current practice status)

RFC (historic status)

RFC (unknown status)

IT dictionary

License.
All information of this service is derived from the free sources and is provided solely in the form of quotations. This service provides information and interfaces solely for the familiarization (not ownership) and under the "as is" condition.
Copyright 2016 © ELTASK.COM. All rights reserved.
Site is optimized for mobile devices.
Downloads: 219 / . Delta: 0.03286 с