IT. Expert System.

REGEX

PCRE Lib


PCRE Library

The Perl Compatible Regular Expression (PCRE) C library is a free, open source, regular expression library developed by Philip Hazel. PCRE has been incorporated into PHP, Apache 2.0, KDE, Exim MTA, Analog, and Postfix.

Version: PCRE 4.0

Engine Type: Traditional NFA

Web Site: http://www.pcre.org

Supported Metacharacters

The following tables list the supported metacharacters:

Special Characters

Sequence Description
\a Alarm (beep)
\b In a character class, matches a backspace character
\e Escape
\f Form feed
\n Newline
\r Carriage Return
\t Horizontal Tab
\octal Character specified by a three-digit octal code
\xhex Character specified by a one- or two-digit hexadecimal code
\x{hex} Character specified by any hexadecimal code
\cchar Named control character

Character Classes

Class Description
[...] A single character listed or contained in a listed range
[^...] A single character not listed and not contained within a listed range
[:class:] POSIX-style character class valid only within a regex character class
. Any character except newline (unless single-line mode, /s)
\C One byte; however, this may corrupt a Unicode character stream
\w Word character, [a-zA-z0-9_]
\W Non-word character, [^a-zA-z0-9_]
\d Digit character, [0-9]
\D Non-digit character, [^0-9]
\s Whitespace character, [\n\r\f\t ]
\S Non-whitespace character, [^\n\r\f\t ]

Anchors and zero-width tests

Sequence Description
^ Start of string, or after any newline if in multiline match mode, /m
\A Start of search string, in all match modes
$ End of search string or before a string-ending newline, or before any newline if in multiline match mode, /m
\Z End of string or before a string-ending newline, in any match mode
\z End of string, in any match mode
\G Beginning of current search
\b Word boundary; position between a word character (\w) and either a non-word character (\W), the start of the string, or the end of the string
\B Not-word-boundary
(?=...) Positive lookahead
(?!...) Negative lookahead
(?<=...) Positive lookbehind
(?<!...) Negative lookbehind

Modifiers

Modifier/sequence Mode character Description
PCRE_CASELESS i Case-insensitive matching for characters with codepoints values less than 256
PCRE_MULTILINE m ^ and $ match next to embedded \n
PCRE_DOTALL s Dot (.) matches newline
PCRE_EXTENDED x Ignore whitespace and allow comments (#) in pattern
PCRE_UNGREEDY U Reverse greediness of all quantifiers: * becomes non-greedy and *? becomes greedy
PCRE_ANCHORED Force match to start at the first position searched
PCRE_DOLLAR_ENDONLY Force $ to match at only the end of a string instead of before a string ending with a newline. Overridden by multiline mode
PCRE_NO_AUTO_CAPTURE Disable capturing function of parentheses
PCRE_UTF8 Treat regular expression and subject strings as strings of multibyte UTF-8 characters
(?mode) Turn listed modes (imsxU) on for the rest of the subexpression
(?-mode) Turn listed modes (imsxU) off for the rest of the subexpression
(?mode:...) Turn listed modes (xsmi) on within parentheses
(?mode:...) Turn listed modes (xsmi) off within parentheses
\Q Quote all following regex metacharacters
\E End a span started with \Q
(?#...) Treat substring as a comment
#.. Treat rest of line as a comment in PCRE_EXTENDED mode

Grouping, capturing, conditional, and control

Sequence Description
(...) Group subpattern and capture submatch into \1,\2,..
(?P<name>...) Group subpattern and capture submatch into named capture group, name
\n Contains the results of the nth earlier submatch from a parentheses capture group or a named capture group
(?:...) Group subpattern, but do not capture submatch
(?>...) Disallow backtracking for text matched by subpattern
...|.. Try subpatterns in alternation
* Match 0 or more times
+ Match 1 or more times
? Match 1 or 0 times
{n} Match exactly n times
{n,} Match at least n times
{x,y} Match at least x times, but no more than y times
*? Match 0 or more times, but as few times as possible
+? Match 1 or more times, but as few times as possible
?? Match 0 or 1 time, but as few times as possible
{n,}? Match at least n times, but as few times as possible
{x,y}? Match at least x times, no more than y times, and as few times as possible
*+ Match 0 or more times, and never backtrack
++ Match 1 or more times, and never backtrack
?+ Match 0 or 1 times, and never backtrack
{n}+ Match at least n times, and never backtrack
{n,}+ Match at least n times, and never backtrack
{x,y}+ Match at least x times, no more than y times, and never backtrack
(?(condition)...|...) Match with if-then-else pattern. The condition can be either the number of a capture group or a lookahead or lookbehind construct
(?(condition)...) Match with if-then pattern. The condition can be either the number of a capture group or a lookahead or lookbehind construct

PCRE API

Applications using PCRE should look for the API prototypes in pcre.h and include the actual library file, libpcre.a, by compiling with -lpcre.

Most functionality is contained in the functions pcre_compile( ), which prepares a regular expression data structure, and pcre_exec( ), which performs the pattern matching. You are responsible for freeing memory, although PCRE does provide pcre_free_substring( ) and pcre_free_substring_list( ) to help out.

PCRE API Synopsis

pcre *pcre_compile(const char * pattern, int options, const char ** errptr, int * erroffset, const unsigned char * tableptr)

Compile pattern with optional mode modifiers options and optional locale tables tableptr, which are created with pcre_maketables( ). Returns either a compiled regex or NULL with errptr pointing to an error message and erroffset pointing to the position in pattern where the error occurred.

int pcre_exec(const pcre * code, const pcre_extra *extra, const char * subject, int length, int startoffset, int options, int * ovector, int ovecsize)

Perform pattern matching with a compiled regular expression, code, and a supplied input string, subject, of length length. The results of a successful match are stored in ovector. The first and second elements of ovector contain the position of the first character in the overall match and the character following the end of the overall match. Each additional pair of elements, up to two thirds the length of ovector, contain the positions of the starting character and the character after capture group submatches. Optional parameters options contain mode modifiers, and pcre_extra contains the results of a call to pcre_study( ).

pcre_extra *pcre_study(const pcre * code, int options, const char ** errptr)

Return information to speed up calls to pcre_exec( ) with code. There are currently no options, so options should always be zero. If an error occurred, errptr points to an error message.

int pcre_copy_named_substring(const pcre * code, const char * subject, int * ovector, int stringcount, const char * stringname, char * buffer, int buffersize)

Copy the substring matched by the named capture group stringname into buffer. stringcount is the number of substrings placed into ovector, usually the result returned by pcre_exec( ).

int pcre_copy_substring(const char * subject, int * ovector, int stringcount, int stringnumber, char * buffer, int buffersize)

Copy the substring matched by the numbered capture group stringnumber into buffer. stringcount is the number of substrings placed into ovector, usually the result returned by pcre_exec( ).

int pcre_get_named_substring(const pcre * code, const char * subject, int * ovector, int stringcount, const char * stringname, const char ** stringptr)

Create a new string, pointed to by stringptr, containing the substring matched by the named capture group stringname. Returns the length of the substring. stringcount is the number of substrings placed into ovector, usually the result returned by pcre_exec( ).

int pcre_get_stringnumber(const pcre * code, const char * name)

Return the numbered capture group associated with the named capture group, name.

int pcre_get_substring(const char * subject, int * ovector, int stringcount, int stringnumber, const char ** stringptr)

Create a new string, pointed to by stringptr, containing the substring matched by the numbered capture group stringnumber. Returns the length of the substring. stringcount is the number of substrings placed into ovector, usually the result returned by pcre_exec( ).

int pcre_get_substring_list(const char * subject, int * ovector, int stringcount, const char *** listptr)

Return a list of pointers, listptr, to all captured substrings.

void pcre_free_substring(const char * stringptr)

Free memory pointed to by stringptr and allocated by pcre_get_named_substring( ) or pcre_get_substring_list( ).

void pcre_free_substring_list(const char ** stringptr)

Free memory pointed to by stringptr and allocated by pcre_get_substring_list( ).

const unsigned char *pcre_maketables(void)

Build character tables for the current locale.

int pcre_fullinfo(const pcre * code, const pcre_extra * extra, int what, void * where)

Place info on a regex specified by what into where. Available values for what are PCRE_INFO_BACKREFMAX, PCRE_INFO_CAPTURECOUNT, PCRE_INFO_FIRSTBYTE, PCRE_INFO_FIRSTTABLE, PCRE_INFO_LASTLITERAL, PCRE_INFO_NAMECOUNT, PCRE_INFO_NAMEENTRYSIZE, PCRE_INFO_NAMETABLE, PCRE_INFO_OPTIONS, PCRE_INFO_SIZE, and PCRE_INFO_STUDYSIZE.

int pcre_config(int what, void * where)

Place the value of build-time options specified by what into where. Available values for what are PCRE_CONFIG_UTF8, PCRE_CONFIG_NEWLINE, PCRE_CONFIG_LINK_SIZE, PCRE_CONFIG_POSIX_MALLOC_THRESHOLD, and PCRE_CONFIG_MATCH_LIMIT.

char *pcre_version(void)

Return a pointer to a string containing the PCRE version and release date.

void *(*pcre_malloc)(size_t)

Entry point PCRE uses for malloc( ) calls.

void (*pcre_free)(void *)

Entry point PCRE uses for pcre_free( ) calls.

int (*pcre_callout)(pcre_callout_block *)

Can be set to a callout function that will be called during matches.

Unicode Support

PCRE provides basic Unicode support. When a pattern is compiled with the PCRE_UTF8 flag, the pattern will run on Unicode text. However, PCRE has no capability to recognize any properties of characters whose values are greater than 256.

PCRE determines case and the property of being a letter or digit based on a set of default tables. You can supply an alternate set of tables based on a different locale. For example:

setlocale(LC_CTYPE, "fr");  tables = pcre_maketables(  );  re = pcre_compile(..., tables);

Examples

Examples Example 1-17 and Example 1-18 are adapted from an open source example written by Philip Hazel and copyright by the University of Cambridge, England.

Example 1-17. Simple match

#include <stdio.h>  #include <string.h>  #include <pcre.h>    #define CAPTUREVECTORSIZE 30 /* should be a multiple of 3 */    int main(int argc, char **argv)  {  pcre *regex;  const char *error;  int erroffset;  int capturevector[CAPTUREVECTORSIZE];  int rc;    char *pattern = "spider[- ]?man";  char *text ="SPIDERMAN menaces city!";    /* Compile Regex */  regex = pcre_compile(    pattern,                 PCRE_CASELESS,  /* OR'd mode modifiers */         &error,         /* error message */          &erroffset,     /* position in regex where error occurred */    NULL);          /* use default locale */         /* Handle Errors */  if (regex =  = NULL)    {    printf("Compilation failed at offset %d: %s\n", erroffset,           error);    return 1;    }    /* Try Match */  rc = pcre_exec(    regex,    /* compiled regular expression */                       NULL,     /* optional results from pcre_study */                text,     /* input string */             (int)strlen(text), /* length of input string */    0,        /* starting position in input string */                0,        /* OR'd options */                capturevector, /* holds results of capture groups */                CAPTUREVECTORSIZE);                /* Handle Errors */  if (rc < 0)    {    switch(rc)      {      case PCRE_ERROR_NOMATCH: printf("No match\n"); break;          default: printf("Matching error %d\n", rc); break;      }    return 1;    }  return 0;  }

Example 1-18. Match and capture group

#include <stdio.h>  #include <string.h>  #include <pcre.h>    #define CAPTUREVECTORSIZE 30 /* should be a multiple of 3 */    int main(int argc, char **argv)  {  pcre *regex;  const char *error;  int erroffset;  int capturevector[CAPTUREVECTORSIZE];  int rc, i;    char *pattern = "(\\d\\d)[-/](\\d\\d)[-/](\\d\\d(?:\\d\\d)?)";  char *text ="12/30/1969";    /* Compile the Regex */  re = pcre_compile(    pattern,                  PCRE_CASELESS,  /* OR'd mode modifiers */    &error,         /* error message */    &erroffset,     /* position in regex where error occurred */    NULL);          /* use default locale */       /* Handle compilation errors */  if (re =  = NULL)    {    printf("Compilation failed at offset %d: %s\n",            erroffset, error);    return 1;    }        rc = pcre_exec(    regex,    /* compiled regular expression */                       NULL,     /* optional results from pcre_study */                text,     /* input string */             (int)strlen(text), /* length of input string */    0,        /* starting position in input string */                0,        /* OR'd options */                capturevector, /* holds results of capture groups */             CAPTUREVECTORSIZE);               /* Handle Match Errors */  if (rc < 0)    {    switch(rc)      {      case PCRE_ERROR_NOMATCH: printf("No match\n"); break;      /*      Handle other special cases if you like */      default: printf("Matching error %d\n", rc); break;      }    return 1;    }    /* Match succeded */    printf("Match succeeded\n");    /* Check for output vector for capture groups */  if (rc =  = 0)    {    rc = CAPTUREVECTORSIZE/3;    printf("ovector only has room for %d captured substrings\n",           rc - 1);    }    /* Show capture groups */    for (i = 0; i < rc; i++)    {    char *substring_start = text + ovector[2*i];    int substring_length = capturevector[2*i+1]                            - capturevector[2*i];    printf("%2d: %.*s\n", i, substring_length, substring_start);    }    return 0;  }


Content

Android Reference

Java basics

Java Enterprise Edition (EE)

Java Standard Edition (SE)

SQL

HTML

PHP

CSS

Java Script

MYSQL

JQUERY

VBS

REGEX

C

C++

C#

Design patterns

RFC (standard status)

RFC (proposed standard status)

RFC (draft standard status)

RFC (informational status)

RFC (experimental status)

RFC (best current practice status)

RFC (historic status)

RFC (unknown status)

IT dictionary

License.
All information of this service is derived from the free sources and is provided solely in the form of quotations. This service provides information and interfaces solely for the familiarization (not ownership) and under the "as is" condition.
Copyright 2016 © ELTASK.COM. All rights reserved.
Site is optimized for mobile devices.
Downloads: 1246 / 158858309. Delta: 0.02059 с