IT. Expert System.

REGEX

Character Classes


Character Classes

A character class defines a set of characters, used to match one or more characters in a single position. A character class only matches one character.

Most regular expression engines support the following character classes:

  • Positive Character Groups ([...])
  • Negative Character Groups ([^...])
  • Any character (.)
  • Word Character (\w)
  • Non-word Character (\W)
  • Whitespace Character (\s)
  • Non-whitespace Character (\S)
  • Decimal Digit (\d)
  • Non-decimal Digit (\D)
  • POSIX Class
  • General Unicode Category / Named Block (\p{name})

Positive Character Groups

A positive character group specifies a list of characters, any one of which may appear in an input string. This list of characters may be specified individually, as a range, or both.

The syntax for specifying a list of individual characters is as follows:

[chars]
Value Description
chars Specifies a list of the individual characters that can appear in the input string for a match to succeed

Note: chars can consist of any combination of one or more literal characters, escape characters, or character classes

Examples:

[abc] will match either 'a', 'b', or 'c'
[bct]ar will match "bar", "car", and "tar"

The syntax for specifying a range of characters is as follows:

[firstchar-lastchar]
Value Description
firstchar Specifies the character that begins the range
lastchar Specifies the character that ends the range

Note: The - character is treated as a literal character if it is the last or the first (after the ^) character within the brackets: [-abc], [abc-].

Note: Two or more character ranges can be concatenated. For example, to specify the range of decimal digits from "0" through "9", the range of lowercase letters from "a" through "f", and the range of uppercase letters from "A" through "F", use [0-9a-fA-F].

Examples:

[2-5] will match either '2', '3', '4', or '5'
[a-z] will match any lower-case letter from 'a' to 'z'
[-abc] will match either 'a', 'b', 'c', or '-'

Negative Character Groups

A negative character group specifies a list of characters that must not appear in an input string for a match to occur. This list of characters may be specified individually, as a range, or both.

The syntax for specifying a list of individual characters is as follows:

[^chars]
Value Description
chars Specifies a list of the individual characters that cannot appear in the input string for a match to succeed

Note: chars can consist of any combination of one or more literal characters, escape characters, or character classes

Examples:

[^abc] will match any character except 'a', 'b', or 'c'
[^bct]ar will not match "bar", "car", or "tar"

The syntax for specifying a range of characters is as follows:

[^firstchar-lastchar]
Value Description
firstchar Specifies the character that begins the range
lastchar Specifies the character that ends the range

Examples:

[^2-5] will not match '2', '3', '4', or '5'
[^a-z] will not match any lower-case letters

Note: Two or more character ranges can be concatenated. For example, to specify the range of decimal digits from "0" through "9", the range of lowercase letters from "a" through "f", and the range of uppercase letters from "A" through "F", use [0-9a-fA-F].

Any character

The period character (.) matches any character except \n (the newline character, \u000A), with the following two qualifications:

  • In a positive or negative character group, a period is treated as a literal period character, and not as a character class
  • If the "singleline" option is turned on for the regular expression, the period character will also match newline

Examples:

.at will match any character followed by "at"
[.,;] will match period, comma, or semi-colon

Decimal Digit

\d matches a decimal digit character. A decimal digit character is the standard digits 0-9. \d is equivalent to [0-9].

\D matches a non-decimal digit character and is equivalent to [^0-9].

Examples:

\d% will match a decimal digit followed by a percent character
\D% will match a non-decimal digit followed by a percent character

A White-space Character

\s matches a whitespace character. A whitespace character is a space (' '), formfeed ('\f'), newline ('\n'), carriage return ('\r'), tab ('\t'), or vertical tab ('\v') character. \s is equivalent to [ \f\n\r\t\v].

\S matches a non-whitespace character and is equivalent to [^ \f\n\r\t\v].

Examples:

,\s will match a comma followed by a whitespace character
,\S will match a comma followed by a non-whitespace character

A Word Character

\w matches a word character. A word character is a letter, digit, or the underscore ('_') character. \w is equivalent to [a-zA-Z_0-9].

\W matches a non-word character and is equivalent to [^a-zA-Z_0-9].

Examples:

\w, will match any word character followed by a comma (',')
abc\W will match "abc" followed by any non-word character

POSIX Character

POSIX defines several character classes that can be used only within regular expression character classes.

Note: Refer to the section POSIX Character Classes for a list of the available classes.

Examples:

[[:lower:]], will match any lower-case characters

General Unicode Category / Named Block

\p{name} matches any character that belongs to a Unicode general category or named block.

\P{name} matches any character that does not belong to a Unicode general category or named block.

Value Description
name Specifies the category abbreviation or named block name

Note: Refer to the section Unicode General Categories for a list of supported categories and the section Named Blocks for a list of supported named blocks.

Examples:

\p{IsGreek} will match a greek character



Content

Android Reference

Java basics

Java Enterprise Edition (EE)

Java Standard Edition (SE)

SQL

HTML

PHP

CSS

Java Script

MYSQL

JQUERY

VBS

REGEX

C

C++

C#

Design patterns

RFC (standard status)

RFC (proposed standard status)

RFC (draft standard status)

RFC (informational status)

RFC (experimental status)

RFC (best current practice status)

RFC (historic status)

RFC (unknown status)

IT dictionary

License.
All information of this service is derived from the free sources and is provided solely in the form of quotations. This service provides information and interfaces solely for the familiarization (not ownership) and under the "as is" condition.
Copyright 2016 © ELTASK.COM. All rights reserved.
Site is optimized for mobile devices.
Downloads: 2719 / . Delta: 0.02666 с