Building the Fn_Split() Function

This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.

Features
Treat Yourself to Fn_Split()
This UDF lets you emulate VB's Split() function and parameter arrays

Burton Roberts

Hardly a week goes by in the newsgroups that a SQL Server developer doesn't ask how to pass a variable-length parameter array to a stored procedure--and learns that the technique isn't possible. You can use variable-length parameter arrays in Visual Basic (VB), but what if you prefer to program in T-SQL? Last year, I took advantage of SQL Server 2000's new support for user-defined functions (UDFs) to develop a handy UDF that emulates VB's Split() function. Later, I realized I could use the UDF to create a workaround that would let developers incorporate variable-length parameter arrays into their T-SQL programming. Let's look at why I needed the UDF, how I programmed it to solve my business problem, and how I applied it to emulate variable-length parameter arrays.

In "User-Defined Functions," November 2000, author Robert Vieira invited readers to email him new ways to apply UDFs. Shortly after reading Vieira's article, I began a project that involved parsing thousands of lines of ASCII text for a telemarketing firm. The rows of text represented INSERT queries, which were written in a proprietary language instead of standard SQL. My job was to transform these queries into SQL and run them to complete a complex data transformation. Some queries were as long as 1400 characters. Thinking aloud about how to attack my query problem, I said to a coworker, "I wish SQL Server had a function like the Visual Basic Split() function to help me parse these queries." My coworker replied, "Why don't you make your own SQL Server split function?" So, I created the fn_Split() multistatement table-valued function. (A multistatement table-valued function is essentially a stored procedure that returns a table that you can embed in a query.)

Before I describe fn_Split(), however, let's examine the Split() function. In VB, the Split() function fills a string array with values that it parses from a delimited string. Split() accepts the delimited string as its first parameter and the delimiter as its second parameter. In Split(), the default delimiter is a space. The array index is zero-based, so the first print statement in the following code prints the word brown to the Immediate window:

  Dim astr() as String
astr = Split("How now brown cow")
Debug.print astr(2)  'Prints "brown" using space delimiter
astr = Split("How now brown cow", "")
Debug.print astr(2)   'Prints "w" using zero-length string delimiter

Note that the code uses a space as the default delimiter. If the delimiter is a zero-length (empty) string, the function returns an array of single characters and spaces. For example, the second Split() function statement in the above code uses an empty string as its delimiter, so the second print statement prints the character w. The array for the second split has 17 elements, one for each character or space in the string.

Listing 1 shows the fn_Split() function. Because the Split() function has been so useful in my VB development work, I decided to make fn_Split() resemble it as much as possible. However, SQL Server can't pass an array of values between procedures. So instead of returning an array of values, as Split() does, fn_Split() returns a two-column SQL Server table variable.

Each element in Split()'s array is associated with a zero-based numeric index. For example, in astr(2) from the previous code snippet, 2 is the index of the third element in the array. To retain Split()'s index concept, I include a column, idx, that corresponds to the Split() array index and represents the row number starting with zero. The other column, value, corresponds to the values in the individual Split() array elements. The default delimiter for fn_Split() is a space, but for convenience and self-documentation, fn_Split() also lets you specify a space as the delimiter by using the string 'space' as the second parameter. Alternatively, you can use two single quotes with a space between them as the second parameter. You can specify an empty string as the delimiter by using the string 'empty' or two single quotes with no space between them.

Note that you must call UDFs under their owner designation, and you must use the database name when a function resides outside the current database. A sample database called SplitExamples, which contains the fn_Split() function, is available for download from http://www.sqlmag.com (see "More on the Web," page 58, for download instructions). After attaching the sample database to your server, you can invoke fn_Split() from a Query Analyzer window as follows:

  SELECT idx, value FROM SplitExamples.dbo.fn_Split
('How now brown cow', 'space')
GO

This statement returns the information in Table 1 to the Query Analyzer grid. The statement

  SELECT idx, value FROM SplitExamples.dbo.fn_Split
('How now','empty')
GO

returns the information in Table 2.

How Fn_Split() Works

Fn_Split() uses the CHARINDEX() function to identify the position of the first delimiter in the text and uses the SUBSTRING() function to set the value column for the first row, as the following statement shows:

  SET @value = SUBSTRING(@sText,1,CHARINDEX
(@sDelim,@sText)-1)

Then, fn_Split() inserts the value and an index into the return table variable @retArray:

  INSERT @retArray (idx,value)
VALUES (@idx,@value)

Fn_Split() trims the first value and its delimiter from the front of the text, increments the index, then loops to repeat the process until all the delimiters are gone. When fn_Split() can't find any more delimiters, it retrieves the remaining text for the last row and exits the loop.

In my project, a typical query string that runs against the telemarketing survey data looks like this:

  q105=8 and q106>3 and q106<9

which means "get all the IDs of people who replied to question 105 with answer 8 and who replied to question 106 with answers greater than 3 and less than 9." If I use the string 'and' as my delimiter, fn_Split() returns the information in Table 3.

Each row in this table represents a query that's smaller and simpler than the original. For this project, I created a second table-valued function called fn_GetIDs() to transform and run these smaller queries. Fn_GetIDs() accepts one of these smaller strings and returns a one-column table of interview IDs. For the first row, fn_GetIDs() generates a column of IDs that satisfies the following SELECT statement:

  SELECT IDs FROM SourceTable WHERE Question = 105 AND Answer = 8

Now, I can use a SQL cursor to process my fn_Split() table and create a dynamic SQL string that gives me the final query that I need. Listing 2, page 58, shows the procedure that uses a cursor to build the following dynamic SQL string from Table 2:

  INSERT TargetTable(IDs)
SELECT a.IDs
FROM dbo.fn_GetIDs('q105=8') a
JOIN dbo.fn_GetIDs('q106>3') T1
ON a.IDs = T1.IDs
JOIN dbo.fn_GetIDs('q106<9') T2
ON a.IDs = T2.Ids

You can also count substrings with fn_Split(). When you pass this sentence as the delimited string and the word the as the delimiter into the sample stored procedure that Listing 3 shows, the query returns the number 9 for nine occurrences of the word the.

How to Emulate Variable-Length Parameter Arrays

In VB, you can pass variable-length parameter arrays to VB Function and Sub procedures by using the Paramarray keyword. SQL Server has no corollary to Paramarray and, therefore, no native way to accept variable-length arrays as parameters for stored procedures. Sometimes, however, variable-length parameter arrays are exactly what you need. For example, suppose a user wants to view the titles of books written by selected authors in the Pubs database. Because you don't know ahead of time how many authors the user has selected, you can't create a parameter for each author name in your stored procedure. In this situation, an object like Paramarray would be useful in a SQL Server stored procedure to capture any number of author names.

As a workaround to the SQL Server limitation, you can pass a delimited string of author names as one parameter to a single varchar variable in your stored procedure. In the stored procedure, you can use the fn_Split() function to transform the varchar variable that contains the list of author names into a table of author names. Then, you can retrieve the list of titles you want by using a standard SQL query that joins the fn_Split() table of author names with a table or view from the Pubs database. The accompanying downloadable code includes a sample Microsoft Access project file named split.adp, which includes a form for selecting author last names from a multiselect list. When you click the command button on the form, a procedure concatenates the selections and passes them as one comma-delimited string to a stored procedure through an ADO connection. The stored procedure in Listing 4 uses fn_Split() to retrieve a list of titles, author names, and prices for the grid on the form.

Increase Your Value

I don't know how I would have tackled my telemarketing data problem without table-valued functions. I know the project would have taken a lot longer and would have been a lot messier without the fn_Split() UDF. The more UDFs I create, the more applications I find for them in my work. With UDFs in my development arsenal, I work faster and better. On my list of best new features in SQL Server 2000, user-defined functions command the top spot.

Bugs, comments, suggestions | Legal | Privacy | Advertising

Copyright © 2002 Penton Media, Inc. All rights reserved.