IT. Expert System.

REGEX

Python


Python

Python has included a rich, Perl-like regular expression syntax in the re module since version 1.5.

Version: Python 2.2

Engine Type: Traditional NFA

Supported Metacharacters

The following tables list the supported metacharacters:

Special Characters

Sequence Description
\a Alarm (beep)
\b In a character class, matches a backspace character
\f Form feed
\n Newline
\r Carriage Return
\t Horizontal Tab
\v Vertical tab, x0B
\octal Character specified by up to three octal digits
\xhh Character specified by a two-digit hexadecimal code
\uhhhh Character specified by a four-digit hexadecimal code
\Uhhhhhhhh Character specified by an eight-digit hexadecimal code

Character Classes

Class Description
[...] Any character listed or contained within a listed range
[^...] Any character that is not listed and is not contained within a listed range
. Any character, except a newline (unless DOTALL mode)
\w Word character, [a-zA-z0-9_] (unless LOCALE or UNICODE mode)
\W Non-word character, [^a-zA-z0-9_] (unless LOCALE or UNICODE mode)
\d Digit character, [0-9]
\D Non-digit character, [^0-9]
\s Whitespace character, [ \t\n\r\f\v]
\S Nonwhitespace character, [ \t\n\r\f\v]

Anchors and zero-width tests

Sequence Description
^ Start of string, or after any newline if in MULTILINE match mode
\A Start of search string, in all match modes
$ End of search string or before a string-ending newline, or before any newline in MULTILINE match mode
\Z End of string or before a string-ending newline, in any match mode
\b Word boundary
\B Not-word-boundary
(?=...) Positive lookahead
(?!...) Negative lookahead
(?<=...) Positive lookbehind
(?<!...) Negative lookbehind

Modifiers

Modifier/sequence Mode character Description
I or IGNORECASE i Case-insensitive matching
L or LOCALE L Cause \w, \W, \b, and \B to use current locale's definition of alphanumeric
M or MULTILINE or (?m) m ^ and $ match next to embedded \n
S or DOTALL or (?s) s Dot (.) matches newline
U or UNICODE or (?u) u Cause \w, \W, \b, and \B to use Unicode definition of alphanumeric
X or VERBOSE or (?x) x Ignore whitespace and allow comments (#) in pattern
(?mode) Turn listed modes (iLmsux) on for the entire regular expression
(?#...) Treat substring as a comment
#.. Treat rest of line as a comment in VERBOSE mode

Grouping, capturing, conditional, and control

Sequence Description
(...) Group subpattern and capture submatch into \1,\2,..
(?P<name> ...) Group subpattern and capture submatch into named capture group, name
(?P=name) Match text matched by earlier named capture group, name
\n Contains the results of the nth earlier submatch
(?:...) Groups subpattern, but does not capture submatch
...|.. Try subpatterns in alternation
* Match 0 or more times
+ Match 1 or more times
? Match 1 or 0 times
{n} Match exactly n times
{x,y} Match at least x times but no more than y times
*? Match 0 or more times, but as few times as possible
+? Match 1 or more times, but as few times as possible
?? Match 0 or 1 time, but as few times as possible
{x,y}? Match at least x times, no more than y times, and as few times as possible

e Module Objects and Functions

The re module defines all regular expression functionality. Pattern matching is done directly through module functions, or patterns are compiled into regular expression objects that can be used for repeated pattern matching. Information about the match, including captured groups, is retrieved through match objects.

Python's raw string syntax, r'' or r"", allows you to specify regular expression patterns without having to escape embedded backslashes. The raw-string pattern, r'\n', is equivalent to the regular string pattern, '\\n'. Python also provides triple-quoted raw strings for multiline regular expressions: r'''text''' and r"""text""".

Module Functions

The re

compile( pattern [, flags])

Return a regular expression object with the optional mode modifiers, flags.

match( pattern, string [, flags])

Search for pattern at starting position of string, and return a match object or None if no match.

search( pattern, string [, flags])

Search for pattern in string, and return a match object or None if no match.

split( pattern, string [, maxsplit=0])

Split string on pattern. Limit the number of splits to maxsplit. Submatches from capturing parentheses are also returned.

sub( pattern, repl, string [, count=0])

Return a string with all or up to count occurrences of pattern in string replaced with repl. repl may be either a string or a function that takes a match object argument.

subn( pattern, repl, string [, count=0])

Perform sub( ) but return a tuple of the new string and the number of replacements.

findall( pattern, string)

Return matches of pattern in string. If pattern has capturing groups, returns a list of submatches or a list of tuples of submatches.

finditer( pattern, string)

Return an iterator over matches of pattern in string. For each match, the iterator returns a match object.

escape( string)

Return string with alphanumerics backslashed so that string can be matched literally.

exception error

Exception raised if an error occurs during compilation or matching. This is common if a string passed to a function is not a valid regular expression.

RegExp

Regular expression objects are created with the re.compile function.

flags

Return the flags argument used when the object was compiled or 0.

groupindex

Return a dictionary that maps symbolic group names to group numbers.

pattern

Return the pattern string used when the object was compiled.

match( string [, pos [, endpos]])
search( string [, pos [, endpos]])
split( string [, maxsplit=0])
sub( repl, string [, count=0])
subn( repl, string [, count=0])
findall( string)

Same as the re module functions, except pattern is implied. pos and endpos give start and end string indexes for the match.

Match Objects

Match objects are created by the match

pos
endpos

Value of pos or endpos passed to search or match.

re

The regular expression object whose match or search returned this object.

string

String passed to match or search.

group([ g1, g2, ...])

Return one or more submatches from capturing groups. Groups may be either numbers corresponding to capturing groups or strings corresponding to named capturing groups. Group zero corresponds to the entire match. If no arguments are provided, this function returns the entire match. Capturing groups that did not match have a result of None.

groups([ default])

Return a tuple of the results of all capturing groups. Groups that did not match have the value None or default.

groupdict([ default])

Return a dictionary of named capture groups, keyed by group name. Groups that did not match have the value None or default.

start([ group])

Index of start of substring matched by group (or start of entire matched string if no group).

end([ group])

Index of end of substring matched by group (or start of entire matched string if no group).

span([ group])

Return a tuple of starting and ending indexes of group (or matched string if no group).

expand([ template])

Return a string obtained by doing backslash substitution on template. Character escapes, numeric backreferences, and named backreferences are expanded.

lastgroup

Name of the last matching capture group, or None if no match or if the group had no name.

lastindex

Index of the last matching capture group, or None if no match.

Unicode Support

re provides limited Unicode support. Strings may contain Unicode characters, and individual Unicode characters can be specified with \u. Additionally, the UNICODE flag causes \w, \W, \b, and \B to recognize all Unicode alphanumerics. However, re does not provide support for matching Unicode properties, blocks, or categories.

Examples

Example 1-13. Simple match

#Match Spider-Man, Spiderman, SPIDER-MAN, etc.  import re dailybugle = 'Spider-Man Menaces City!'  pattern = r'spider[- ]?man.'    if re.match(pattern, dailybugle, re.IGNORECASE):       print dailybugle

Example 1-14. Match and capture group

#Match dates formatted like MM/DD/YYYY, MM-DD-YY,...  import re date = '12/30/1969'    regex = re.compile(r'(\d\d)[-/](\d\d)[-/](\d\d(?:\d\d)?)')    match = regex.match(date)    if match:      month = match.group(1) #12 day = match.group(2) #30 year = match.group(3) #1969

Example 1-15. Simple substitution

#Convert <br> to <br /> for XHTML compliance import re text = 'Hello world. <br>'  regex = re.compile(r'<br>', re.IGNORECASE);  repl = r'<br />'    result = regex.sub(repl,text)

Example 1-16. Harder substitution

#urlify - turn URL's into HTML links import re text = 'Check the website, http://www.oreilly.com/catalog/repr.'    pattern =  r'''                                                         \b # start at word boundary (                           # capture to \1 (https?|telnet|gopher|file|wais|ftp) :                                      # resource and colon [\w/#~:.?+=&%@!\-] +?       # one or more valid chars # take little as possible )                                                                         (?=                         # lookahead [.:?\-] *                   #  for possible punc (?: [^\w/#~:.?+=&%@!\-]     #  invalid character | $ )                       #  or end of string )'''    regex = re.compile(pattern,  re.IGNORECASE + re.VERBOSE);     


Content

Android Reference

Java basics

Java Enterprise Edition (EE)

Java Standard Edition (SE)

SQL

HTML

PHP

CSS

Java Script

MYSQL

JQUERY

VBS

REGEX

C

C++

C#

Design patterns

RFC (standard status)

RFC (proposed standard status)

RFC (draft standard status)

RFC (informational status)

RFC (experimental status)

RFC (best current practice status)

RFC (historic status)

RFC (unknown status)

IT dictionary

License.
All information of this service is derived from the free sources and is provided solely in the form of quotations. This service provides information and interfaces solely for the familiarization (not ownership) and under the "as is" condition.
Copyright 2016 © ELTASK.COM. All rights reserved.
Site is optimized for mobile devices.
Downloads: 506 / 159172397. Delta: 0.04483 с