Learn Regular Expression (Regex) syntax with C# and .NET

What are Regular Expressions?
Regular Expressions are a powerful pattern-matching language that is part of many modern programming languages. Regular Expressions allow you to apply a pattern to an input string and return a list of the matches within the text. Regular expressions also allow text to be replaced using replacement patterns. It is a very powerful version of find and replace.
There are two parts to learning Regular Expressions:
- Learning the Regex syntax
- Learning how to work with Regex in your programming language
This article introduces you to the Regular Expression syntax. After learning the syntax for Regular Expressions, you can use it in many different languages as the syntax is fairly similar between languages.
Microsoft .NET contains a set of classes for working with Regular Expressions in the System.Text.RegularExpressions namespace.
The basics - Finding text
Regular Expressions are similar to find and replace in that ordinary characters match themselves. If you want to match the word “went,” the Regular Expression pattern would be went
.
Text: Anna Jones and a friend went to lunch
Regex: went
Matches:
went
The following are special characters when working with Regular Expressions. They will be discussed throughout the article:
. $ ^ { [ ( | ) * + ? \
Matching any character with dot
The full stop or period character (.
) is known as dot. It is a wildcard that will match any character except a new line (\n
). For example, if you wanted to match the character ‘a’ followed by any two characters:
Text: abc def ant cow
Regex: a..
Matches:
abc
ant
If the Singleline option is enabled, a dot matches any character including the new line character.
Matching word characters
Backslash and a lowercase ‘w’ (\w
) is a character class that will match any word character. The following Regular Expression matches ‘a’ followed by two word characters:
Text: abc anaconda ant cow apple
Regex: a\w\w
Matches:
abc
ana
ant
app
Backslash and an uppercase ‘W’ (\W
) will match any non-word character.
Matching white-space
White-space can be matched using \s
(backslash and ’s’). The following Regular Expression matches the letter ‘a’ followed by two word characters then a white-space character:
Text: "abc anaconda ant"
Regex: a\w\w\s
Matches:
"abc "
Note that ant was not matched as it is not followed by a white-space character.
White-space is defined as the space character, new line (\n
), form feed (\f
), carriage return (\r
), tab (\t
), and vertical tab (\v
). Be careful using \s
as it can lead to unexpected behavior by matching line breaks (\n
and \r
). Sometimes it is better to explicitly specify the characters to match instead of using \s
. For example, to match Tab and Space, use [ \t\x0020 ]
.
Matching digits
The digits zero to nine can be matched using \d
(backslash and lowercase ’d’). For example, the following Regular Expression matches any three digits in a row:
Text: 123 12 843 8472
Regex: \d\d\d
Matches:
123
843
847
Matching sets of single characters
Square brackets are used to specify a set of single characters to match. Any single character within the set will match. For example, the following Regular Expression matches any three characters where the first character is either ’d’ or ‘a’:
Text: abc def ant cow
Regex: [da]..
Matches:
abc
def
ant
Adding a caret (^
) to the start of the set of characters specifies that none of the characters in the set should match. For example:
Text: abc def ant cow
Regex: [^da]..
Matches:
"bc "
"ef "
"nt "
"cow"
Matching ranges of characters
Ranges of characters can be matched using a hyphen (-
). The following Regular Expression matches any three characters where the second character is either ‘a’, ‘b’, ‘c’, or ’d’:
Text: abc pen nda uml
Regex: .[a-d].
Matches:
abc
nda
Ranges of characters can also be combined. For example:
Text: abc no 0aa i8i
Regex: [a-z0-9]\w\w
Matches:
abc
0aa
i8i
This pattern could also be written as [a-z\d]
.
Specifying the number of times to match with Quantifiers
Quantifiers let you specify the number of times that an expression must match. The most frequently used quantifiers are:
- Asterisk (
*
) - Plus sign (
+
)
Matching zero or more times with star (*)
The star tells the Regular Expression to match the character, group, or character class that immediately precedes it zero or more times. For example:
Text: Anna Jones and a friend owned an anaconda
Regex: a\w*
Options: IgnoreCase
Matches:
Anna
and
a
an
anaconda
Matching one or more times with plus (+)
The plus sign matches the character, group, or class one or more times. For example:
Text: Anna Jones and a friend owned an anaconda
Regex: a\w+
Options: IgnoreCase
Matches:
Anna
and
an
anaconda
Matching zero or one time with question mark (?)
The question mark matches zero or one time. For example:
Text: Anna Jones and a friend owned an anaconda
Regex: an?
Options: IgnoreCase
Matches:
An
an
an
Matching the start and end of a string
To match the start of a string, use the caret (^
):
Text: an anaconda ate Anna Jones
Regex: ^a
Matches:
a
Only the first character of the string will match if it is a
.
To match the end of a string, use the dollar sign ($
). For example:
Text: "an anaconda
ate Anna
Jones"
Regex: \w+$
Options: Multiline, IgnoreCase
Matches:
Jones
For more information, see Microsoft’s Regular Expression Syntax.