posted on Thursday, January 12, 2006 11:31 PM
by
bknight
Script Transform Component to Cleanse Data
In yesterday's post, I showed you how to cleanse data using the Script Component acting as a transform. Let's take it a lot farther by doing some more advanced expressions. In this very typical scenario, you receive data that's all upper case from a mainframe. You want to cleanse the data to where the first letter of every word is upper case and everything else is lower case. To do this, drag over the Script Component and select Transform. Check the input column that you wish to send into the script. For my example, let's use the input name of InputName. I'm going to send into the pipeline a column called OutputValue. I add this column in the Inputs and Outputs page.
Once this is selected, you can select Design Script on the Script page. The script will look like something like the below script. In order to make things a bit simplier, you'll want to import the System.Text.RegularExpressions namespace. As you can see, I'm using RegularExpressions to parse through the string and find when one word ends and another begins. You'll want to make sure you cleanse the input for Nulls before that chunk of the code executes. The MatchHandler function takes the two groups of strings and upper cases one and lower cases another.
All you would have to essentially do to use this code is change the InputValue and OutputValue columns in the code and you're ready to go.
Imports
System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
Imports System.Text.RegularExpressions
Public
Class ScriptMain
Inherits UserComponent
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
Dim myDelegate As New MatchEvaluator(AddressOf MatchHandler)
Dim strpattern As String = "\b(\w)(\w+)?\b"
Dim re As New Text.RegularExpressions.Regex( _
strpattern, RegexOptions.Multiline Or _
RegexOptions.IgnoreCase _
)
'Will check to see if there is a NULL value before trying the function.
If
Row.InputName_IsNull = False Then
Row.OutputName= re.Replace(Row.InputName, myDelegate)
End If
End Sub
Private Function MatchHandler(ByVal m As Match) As String
Return m.Groups(1).Value.ToUpper() & m.Groups(2).Value.ToLower()
End Function
End Class
It's important to note that I'm not a coder so I'm sure my code may not be the most prestine. :)
For more on how to use the Script Component as a transform, see yesterday's post.
Update: A much cleaner approach to doing proper case was sent to me by Toby Rogers and can be seen in this post.
-- Brian Knight