IT. Expert System.

REGEX

Regex Engine Internals


Regex Engine Internals

Pattern matching is the process of finding a secion of text that is described by a regular expression. The underlying system that trys to match the text is known as a regular expression engine.

All regular expression engines follow these two rules:

  • leftmost matches are found first
    Target strings are searched starting at the first character and continues until the last - once a match is found, the engine is done

  • standard quantifiers are "greedy"
    Regular expression quantifiers direct the engine as to how many times something can be repeated - the standard quantifiers try to match as many times as possible

When working with Regular Expressions, it is important to know the engine type. There are two classes of engines:

  • Deterministic Finite Automaton (DFA)
  • Nondeterministic Finite Automaton (NFA)

In general, DFA engines are faster, but lack capturing, lookaround, and nongreedy quantifiers.

DFA Engines

DFA engines simply compares each character in the target to the regular expression. Without support for capture or lookaround, each character is only compared once making DFA engines the fastest.

Note: The alternation (|) metasequence always matches the longest option

NFA Engines

There are two types of NFA engines: Traditional and POSIX:

Traditional NFA Engines

Traditional NFA engines compare each element of the regex to the target string, keeping track of positions where it between two options in the regex. If any option fails, the engine works backwards to the last saved position.

For standard quantifiers, the engine chooses the greedy option first. If the option fails, the engine returns to the saved position and tries any less greedy options.

For alternation, the traditional NFA engine tries each option sequentially.

POSIX NFA Engines

POSIX NFA engines are similar to Traditional NFA engines with one exception: a POSIX engine always chooses the longest of the lestmost matches.



Content

Android Reference

Java basics

Java Enterprise Edition (EE)

Java Standard Edition (SE)

SQL

HTML

PHP

CSS

Java Script

MYSQL

JQUERY

VBS

REGEX

C

C++

C#

Design patterns

RFC (standard status)

RFC (proposed standard status)

RFC (draft standard status)

RFC (informational status)

RFC (experimental status)

RFC (best current practice status)

RFC (historic status)

RFC (unknown status)

IT dictionary

License.
All information of this service is derived from the free sources and is provided solely in the form of quotations. This service provides information and interfaces solely for the familiarization (not ownership) and under the "as is" condition.
Copyright 2016 © ELTASK.COM. All rights reserved.
Site is optimized for mobile devices.
Downloads: 1623 / . Delta: 0.02217 с