Regular Expressions as a Language

[This documentation is for preview only, and is subject to change in later releases. Blank topics are included as placeholders.]

The regular expression language is designed and optimized to manipulate text. The language comprises two basic character types: literal (normal) text characters and metacharacters. The set of metacharacters gives regular expressions their processing power.

You are probably familiar with the ? and * metacharacters used with the DOS file system to represent any single character or group of characters. The DOS file command COPY *.DOC A: commands the file system to copy any file with a .DOC file name extension to the disk in drive A. The metacharacter * stands in for any file name in front of the file name extension .DOC. Regular expressions extend this basic idea many times over, providing a large set of metacharacters that make it possible to describe very complex text-matching expressions with relatively few characters.

For example, the regular expression \s2000, when applied to a body of text, matches all occurrences of the string "2000" that are preceded by any white-space character, such as a space or a tab.

Note

If you are using C++, C#, or JScript, special character escapes, such as \s, must be preceded by an additional backslash (for example, "\\s2000") to signal that the backslash in the character escape is a literal character. Otherwise, the regular expression engine treats the backslash and the s in \s as two separate operators and does not recognize them as a character escape. You do not have to add the backslash if you are using Visual Basic 2005. If you are using C#, you can use C# literal strings, which are prefixed with @ and disable escaping (for example, @"\s2000").

Regular expressions can also perform searches that are more complex. For example, the regular expression (?<char>\w)\k<char>, using named groups and backreferencing, searches for adjacent paired characters. When applied to the string "I'll have a small coffee" it finds matches in the words "I'll", "small", and "coffee". (For details on this regular expression, see Backreferences.)

The following sections detail the set of metacharacters that define the .NET Framework regular expression language and show how to use the regular expression classes to implement regular expressions in your applications.

See Also

Reference

System.Text.RegularExpressions

Concepts

Regular Expression Classes

Other Resources

Details of Regular Expression Behavior

Regular Expression Examples

Regular Expression Language Elements