posted on Thursday, January 12, 2006 11:31 PM by bknight

Script Transform Component to Cleanse Data

In yesterday's post, I showed you how to cleanse data using the Script Component acting as a transform. Let's take it a lot farther by doing some more advanced expressions. In this very typical scenario, you receive data that's all upper case from a mainframe. You want to cleanse the data to where the first letter of every word is upper case and everything else is lower case. To do this, drag over the Script Component and select Transform. Check the input column that you wish to send into the script. For my example, let's use the input name of InputName. I'm going to send into the pipeline a column called OutputValue. I add this column in the Inputs and Outputs page.

Once this is selected, you can select Design Script on the Script page. The script will look like something like the below script. In order to make things a bit simplier,  you'll want to import the System.Text.RegularExpressions namespace. As you can see, I'm using RegularExpressions to parse through the string and find when one word ends and another begins. You'll want to make sure you cleanse the input for Nulls before that chunk of the code executes. The MatchHandler function takes the two groups of strings and upper cases one and lower cases another.

All you would have to essentially do to use this code is change the InputValue and OutputValue columns in the code and you're ready to go.

Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
Imports System.Text.RegularExpressions

Public Class ScriptMain
   
Inherits UserComponent

Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

   Dim myDelegate As New MatchEvaluator(AddressOf MatchHandler)
   Dim strpattern As String = "\b(\w)(\w+)?\b"

   Dim re As New Text.RegularExpressions.Regex( _
   strpattern, RegexOptions.Multiline
Or _
   RegexOptions.IgnoreCase _
   )

'Will check to see if there is a NULL value before trying the function.

If Row.InputName_IsNull = False Then
   
 Row.OutputName= re.Replace(Row.InputName, myDelegate)
End If

  
End Sub

Private Function MatchHandler(ByVal m As Match) As String
   Return m.Groups(1).Value.ToUpper() & m.Groups(2).Value.ToLower()
End Function

End
Class

It's important to note that I'm not a coder so I'm sure my code may not be the most prestine. :)

For more on how to use the Script Component as a transform, see yesterday's post.

Update: A much cleaner approach to doing proper case was sent to me by Toby Rogers and can be seen in this post.

-- Brian Knight

Comments

# Proper Case for Data in a SSIS Script Component Transform

Friday, January 13, 2006 2:07 PM by Brian Knight, SQL Server MVP

Well I feel like a college professor that shows the student a five page math problem and then follows...

# Proper Case for Data in a SSIS Script Component Transform

Tuesday, January 17, 2006 9:24 AM by Brian Knight, SQL Server MVP

Well I feel like a college professor that shows the student a five page math problem and then follows...

# re: Script Transform Component to Cleanse Data

Saturday, August 12, 2006 11:49 PM by Mike
This was exactly what I needed to look for 16 digit numeric sequence for audit. I simply replaced the strpattern variable to
Dim strpattern As String = "\b\d\d\d\d\d\d\d\d\d\d\d\d\d\d\d\d"

Thanks Brian

# re: Script Transform Component to Cleanse Data

Thursday, September 27, 2007 1:51 AM by Michel
It works and it's clear well done

# re: Script Transform Component to Cleanse Data

Tuesday, March 18, 2008 7:22 AM by zxevil163
SVd9qH Hi from Russia!