Learn Regular Expression (Regex) syntax with C# and .NET

What are Regular Expressions?

Regular Expressions are a powerful pattern-matching language that is part of many modern programming languages. Regular Expressions allow you to apply a pattern to an input string and return a list of the matches within the text. Regular expressions also allow text to be replaced using replacement patterns. It is a very powerful version of find and replace.

There are two parts to learning Regular Expressions:

Learning the Regex syntax
Learning how to work with Regex in your programming language

This article introduces you to the Regular Expression syntax. After learning the syntax for Regular Expressions, you can use it in many different languages as the syntax is fairly similar between languages.

Microsoft .NET contains a set of classes for working with Regular Expressions in the System.Text.RegularExpressions namespace.

The basics - Finding text

Regular Expressions are similar to find and replace in that ordinary characters match themselves. If you want to match the word “went,” the Regular Expression pattern would be went.

Text:    Anna Jones and a friend went to lunch  
Regex:   went  
Matches: 
  went

The following are special characters when working with Regular Expressions. They will be discussed throughout the article:

. $ ^ { [ ( | ) * + ? \

Matching any character with dot

The full stop or period character (.) is known as dot. It is a wildcard that will match any character except a new line (\n). For example, if you wanted to match the character ‘a’ followed by any two characters:

Text:    abc def ant cow  
Regex:   a..  
Matches: 
  abc
  ant

If the Singleline option is enabled, a dot matches any character including the new line character.

Matching word characters

Backslash and a lowercase ‘w’ (\w) is a character class that will match any word character. The following Regular Expression matches ‘a’ followed by two word characters:

Text:    abc anaconda ant cow apple  
Regex:   a\w\w  
Matches:
  abc
  ana
  ant
  app

Backslash and an uppercase ‘W’ (\W) will match any non-word character.

Matching white-space

White-space can be matched using \s (backslash and ’s’). The following Regular Expression matches the letter ‘a’ followed by two word characters then a white-space character:

Text:    "abc anaconda ant"  
Regex:   a\w\w\s  
Matches:
  "abc "

Note that ant was not matched as it is not followed by a white-space character.

White-space is defined as the space character, new line (\n), form feed (\f), carriage return (\r), tab (\t), and vertical tab (\v). Be careful using \s as it can lead to unexpected behavior by matching line breaks (\n and \r). Sometimes it is better to explicitly specify the characters to match instead of using \s. For example, to match Tab and Space, use [ \t\x0020 ].

Matching digits

The digits zero to nine can be matched using \d (backslash and lowercase ’d’). For example, the following Regular Expression matches any three digits in a row:

Text:    123 12 843 8472  
Regex:   \d\d\d  
Matches:
  123
  843
  847

Matching sets of single characters

Square brackets are used to specify a set of single characters to match. Any single character within the set will match. For example, the following Regular Expression matches any three characters where the first character is either ’d’ or ‘a’:

Text:    abc def ant cow  
Regex:   [da]..  
Matches:
  abc
  def
  ant

Adding a caret (^) to the start of the set of characters specifies that none of the characters in the set should match. For example:

Text:    abc def ant cow  
Regex:   [^da]..  
Matches:
  "bc "
  "ef "
  "nt "
  "cow"

Matching ranges of characters

Ranges of characters can be matched using a hyphen (-). The following Regular Expression matches any three characters where the second character is either ‘a’, ‘b’, ‘c’, or ’d’:

Text:    abc pen nda uml  
Regex:   .[a-d].  
Matches:
  abc
  nda

Ranges of characters can also be combined. For example:

Text:    abc no 0aa i8i  
Regex:   [a-z0-9]\w\w  
Matches:
  abc
  0aa
  i8i

This pattern could also be written as [a-z\d].

Specifying the number of times to match with Quantifiers

Quantifiers let you specify the number of times that an expression must match. The most frequently used quantifiers are:

Asterisk (*)
Plus sign (+)

Matching zero or more times with star (*)

The star tells the Regular Expression to match the character, group, or character class that immediately precedes it zero or more times. For example:

Text:    Anna Jones and a friend owned an anaconda  
Regex:   a\w*  
Options: IgnoreCase  
Matches:
  Anna
  and
  a
  an
  anaconda

Matching one or more times with plus (+)

The plus sign matches the character, group, or class one or more times. For example:

Text:    Anna Jones and a friend owned an anaconda  
Regex:   a\w+  
Options: IgnoreCase  
Matches:
  Anna
  and
  an
  anaconda

Matching zero or one time with question mark (?)

The question mark matches zero or one time. For example:

Text:    Anna Jones and a friend owned an anaconda  
Regex:   an?  
Options: IgnoreCase  
Matches:
  An
  an
  an

Matching the start and end of a string

To match the start of a string, use the caret (^):

Text:    an anaconda ate Anna Jones  
Regex:   ^a  
Matches:
  a

Only the first character of the string will match if it is a.

To match the end of a string, use the dollar sign ($). For example:

Text:    "an anaconda  
ate Anna  
Jones"  
Regex:   \w+$  
Options: Multiline, IgnoreCase  
Matches:
  Jones

For more information, see Microsoft’s Regular Expression Syntax.