Thursday, June 28, 2007

Using Regex.Replace for the first time

In VB6 I got really good at using replace. But now I'm trying to avoid being replaced myself so I decided I better learn to use the infinitely more powerful replace feature within the System.Text.RegularExpressions namespace.

I had read a little about the syntax for using regular expressions for validation and I'd used the match method for validation of email addresses. Of course, I am still at the stage where I have to borrow complex regular expressions from regexlib.com but today's adventure was simple enough that I didn't need a complex pattern.

All I wanted to do was take a string like "PaymentDueDate" and convert it to "Payment Due Date". This is pretty simple so I decided it would be a great introduction. All I had to do was use the regex.replace to find each upper case character and put a space in front of each one. After reading one of Scott Mitchell's posts on 4GuysFromRolla.com I finally came across the nugget I needed to make this work.

Here is the code that worked for me:


Public Function AddSpaces(ByVal titleCasedText As String) As String
  Return Regex.Replace(titleCasedText, "([A-Z])", " $1").Trim
End Function


I knew that [A-Z] would find matches of upper cased characters but I'd never tried to use replace before so knowing what to put in as the replacement text was a mystery. Thank God for the new unit test features in .Net because that allowed me to quickly write a test and keep altering my function until I finally got it right. They always say, "Start by writing a test that will fail and then write the function that will allow it to pass." Well, today I definitely followed that rule! My first 3 or 4 attempts at writing this function were definitely failures. The key to making this work was enclosing the [A-Z] in the parentheses. This allowed me to make a "back reference" to those matches. Then I can apply formatting to each back reference by using the $N syntax. Here is an excerpt from Scott Mitchell's post:

For example to format a 10 digit raw phone number use this pattern:
(\d{3})(\d{3})(\d{4})
And use this replacement pattern that takes the 3 back references & reformats them:
($1) $2-$3

Back references take the form $N, where N specifies what back reference you're interested in - the first back reference in the pattern is accessed via $1, the second accessed via $2, and so on. To specify that a particular portion of the pattern can be used as a back reference, simply surround that portion with parentheses in the pattern.

So I learned 2 things in this adventure.
1) How to use System.Text.RegularExpressions.regex.replace()
2) Whenever I do a Google search on a .Net topic I should always include "Scott Mitchell" in the search criteria and then I'll let Google's regular expressions handle it from there. For those of you that are .Newbies like me and have to search the Internet everyday you will soon learn that given 2 search results always click on the one written by Scott Mitchell.

In our next adventure... I will learn the best way to write this function so I don't need the .Trim at the end. That was my kluge for not knowing (yet) how to skip the first character in the input string.

Thanks for joining me on my first adventure.

No comments: