Data Points

SQL Server User-defined Functions

John Papa

Contents

Scalar UDFs
Common UDFs and Nesting
Table-valued UDFs
It's a Wrap

Auser-defined function (UDF) is a prepared code segment that can accept parameters, process some logic, and then return some data. According to SQL Server Books Online, UDFs in SQL Server™ 2000 can accept anywhere from 0 to 1024 parameters, although I must confess I have never tried to pass 1024 parameters into a UDF. Another key characteristic of UDFs is that they return a value. Depending on the type of UDF, the value can be used by the calling routine to continue processing its data. Thus, if a UDF returns a single value (a scalar value), the calling routine can use that value anywhere a standard variable or a literal value can be used. If a UDF returns a rowset, the calling routine can loop through the rowset, join to it, or simply select columns from it.

While most programming languages have supported functions for a while now, UDFs were only introduced with SQL Server 2000. Stored procedures and views have been available in SQL Server much longer than UDFs, but each of these objects has their niche in SQL Server development. Stored procedures are great for processing complex SQL logic, securing and controlling access to data, and returning a rowset to a calling routine whether that routine is a Visual Basic®-based program or another Transact-SQL (T-SQL) batch. Unlike views, stored procedures are compiled, making them ideal candidates to represent and process frequently run SQL statements. Views are great for controlling access to data, but they do it differently than stored procedures. Views are limited to only certain columns and rows from the underlying SELECT statement that generated the view. Thus a view is often used to represent a commonly used SELECT statement that may join several tables, employ a WHERE clause, and expose specific columns. Views are often found in the FROM clause of a SQL statement joined to other tables and views.

At their core, UDFs resemble both views and stored procedures. Like views, UDFs can return a rowset that can be used in a JOIN. Therefore, when a UDF returns a rowset and accepts parameters, it's like a stored procedure that you can join to, or a parameterized view. But, as I will demonstrate, UDFs can be this and much more.

There are two main types of UDFs: scalar value-returning UDFs and table value-returning UDFs. Within table value UDFs you'll find UDFs that return inline tables and multistatement tables (see Figure 1). In the following sections I'll take a look at each.

Figure 1 Main UDF Return Types

Return Type Main Features
Scalar Similar to functions in other languages. Returns a single value of a scalar datatype. Can return user-defined datatypes.
Inline table Returns a rowset of the SQL Server table datatype. Extrapolates the structure of the rowset from the SELECT statement itself. Can be SELECTed from or JOINed to. Allows multiple statements to define the UDF.
Multistatement table Explicitly defines the structure of the table to return. Defines column names and datatypes in the RETURNS clause. Able to house more complicated and numerous T-SQL logic blocks.

Scalar UDFs

Scalar value-returning UDFs are most similar to what many programming languages refer to as functions. They return a single value consisting of a scalar datatype such as integer, varchar(n), char(n), money, datetime, bit, and so on. UDFs can also return user-defined datatypes (UDDTs) if they are based on a scalar datatype. With UDFs that return either inline or multistatement tables, a rowset can be returned via the table datatype. However, not all datatypes can be returned from UDFs. For example, a UDF cannot return a value of any of these datatypes: text, ntext, image, cursor, or timestamp.

Scalar datatype-returning UDFs can be used in various situations to make the code more maintainable, reusable, and less complex. This can be very useful when the same segment of T-SQL code is used in several places, perhaps by several stored procedures and batch SQL statements. For example, let's say several parts of an application need to find whether a product must be reordered. In each of the places this is required, the code could check the reorder level and compare it to the units in stock plus the number of units on order. However, since this code is used in several places, a UDF could be used instead to reduce the code blocks and make it easier to maintain this function in case it ever needs to change. Such a UDF might look something like the code in Figure 2 and could be called with the following SQL statement:

SELECT    ProductID,
    ReorderLevel, 
    UnitsInStock, 
    UnitsOnOrder,
    dbo.fnNeedToReorder(ReorderLevel, UnitsInStock, UnitsOnOrder) 
AS sNeedToReorder
FROM    Products

Figure 2 Reorder Level UDF

CREATE FUNCTION fnNeedToReorder (@nReorderLevel INT, @nUnitsInStock INT, 
                                 @nUnitsOnOrder INT)
    RETURNS VARCHAR(3)
AS
BEGIN
    DECLARE @sReturnValue VARCHAR(3)

    IF ((@nUnitsInStock + @nUnitsOnOrder) - @nReorderLevel) < 0
        SET @sReturnValue = 'Yes'
    ELSE
        SET @sReturnValue = 'No'

    RETURN @sReturnValue
END
GO

In Figure 2, the fnNeedToReorder UDF performs the calculation and returns the appropriate value. This could have been accomplished via a CASE statement inside the SELECT clause, but the code is much more compact when a UDF is used instead. Plus it's easier to propagate to other places that may require the same logic. Assuming that there are several sections of an application that need to determine whether to reorder products, the UDF in Figure 2 really becomes valuable as it makes the application easier to maintain when the logic changes. For example, it doesn't make a lot of sense to reorder a product that has been discontinued. Thus, by changing the UDF in order to account for this business rule, the logic is changed in one place (see Figure 3) and can be run with the following code:

SELECT    ProductID,
    ReorderLevel, 
    UnitsInStock, 
    UnitsOnOrder,
    dbo.fnNeedToReorder(ReorderLevel, UnitsInStock, UnitsOnOrder, 
                        Discontinued) AS sNeedToReorder
FROM    Products

Figure 3 Reorder Level UDF, Version 2

CREATE FUNCTION fnNeedToReorder (@nReorderLevel INT, @nUnitsInStock INT, 
                                 @nUnitsOnOrder INT, @bDiscontinued BIT)
    RETURNS VARCHAR(3)
AS
BEGIN
    DECLARE @sReturnValue VARCHAR(3)

    IF @bDiscontinued = 1
        SET @sReturnValue = 'No'
    ELSE
        IF ((@nUnitsInStock + @nUnitsOnOrder) - @nReorderLevel) < 0
            SET @sReturnValue = 'Yes'
        ELSE
            SET @sReturnValue = 'No'

    RETURN @sReturnValue
END
GO

Notice that the UDF is called using the two-part name of object owner and object name. The object's owner is required when using a UDF that returns a scalar datatype value. Granted, by adding the fourth parameter (Discontinued) to the UDF, all of the places that call the UDF must also be changed. For easier maintenance, I could rewrite the UDF to retrieve the data itself using the ProductID for each row, as shown in Figure 4. This technique is easier to maintain because it does not require any of the calling routines to change how the UDF is called when the logic changes—as long as the data can be pulled in from the current Products table row. However, to gain this maintainability there is a performance trade-off. The UDF in Figure 4 has to retrieve a row from the Products table for every row that is returned from the calling routine. Since the calling routine is retrieving every row from the Products table already, if the table has 77 rows, the code will execute 77 SELECT statements, one for each row returned from the main SELECT statement. While each SELECT is selecting based on the primary key field (ProductID) and hence is very fast, performance can be adversely affected when the rowset is very large or the SELECT statement is less efficient. The code in Figure 4 can be called by using the following SQL snippet:

SELECT    ProductID,
    ReorderLevel, 
    UnitsInStock, 
    UnitsOnOrder,
    dbo.fnNeedToReorder(ProductId) AS sNeedToReorder
FROM    Products

Figure 4 Reorder Level UDF by ProductID

CREATE FUNCTION fnNeedToReorder (@nProductID INT)
    RETURNS VARCHAR(3)
AS
BEGIN
    DECLARE @sReturnValue VARCHAR(3)

    SELECT    @sReturnValue = 
        CASE Discontinued 
            WHEN 1
                THEN 'No'
            ELSE
                CASE 
                    WHEN ((UnitsInStock + UnitsOnOrder) - 
                           ReorderLevel) < 0
                        THEN 'Yes'
                    ELSE    'No'
                END
        END 
    FROM    Products 
    WHERE   ProductID = @nProductID

    RETURN @sReturnValue
END
GO

An alternative to using this function in a SELECT statement is to create a computed column in the Products table called NeedToReorder. This column would be defined not as a datatype but as the return value of the fnNeedToReorder UDF from Figure 3. To add this column, I can alter the Products table as follows to indicate that the column should be computed:

ALTER TABLE Products
    ADD  NeedToReorder AS dbo.fnNeedToReorder(ReorderLevel,   
    UnitsInStock, UnitsOnOrder, Discontinued)

This way the column is defined in the table itself, but is automatically calculated using the UDF.

Common UDFs and Nesting

Thus far I have shown several ways to tackle the same issue using a UDF that returns a scalar value. There are other useful applications of UDFs including functions that are not readily available in T-SQL. One example is a specialized formatting function. For instance, phone numbers are commonly stored without their formatting characters in char(10) columns that represent the area code and phone number (assuming it's a United States number). A UDF could be used to retrieve the phone number in a formatted structure (see Figure 5). Thus, retrieving and formatting a phone number is as simple as this:

SELECT dbo.fnCOM_FormatTelephoneNumber ('3335558888')

Figure 5 fnCOM_FormatTelephoneNumber

CREATE FUNCTION fnCOM_FormatTelephoneNumber (@sPhone CHAR(10)) 
    RETURNS VARCHAR(15)
AS
BEGIN
    DECLARE @sPhoneFormat VARCHAR(15) 

    IF LEN(@sPhone) < 10 
        SET @sPhoneFormat = @sPhone

    ELSE SET @sPhoneFormat =     '(' + LEFT(@sPhone, 3) + ') ' + 
                    SUBSTRING(@sPhone, 4, 3) + '-' + RIGHT(@sPhone, 4)
    RETURN @sPhoneFormat 
END

Any common function can be created using this technique to augment the regiment of functions available in SQL Server. Another example is a function that formats a date to the MM/DD/YYYY format with leading zeros:

CREATE FUNCTION fnCOM_StandardDate (@dtDate DATETIME)
    RETURNS VARCHAR(10)
AS
BEGIN
    RETURN 
        dbo.fnCOM_2Digits (CAST(MONTH(@dtDate) AS VARCHAR(2))) + '/' +
        dbo.fnCOM_2Digits (CAST(DAY(@dtDate) AS VARCHAR(2))) + '/' +
        CAST(YEAR(@dtDate) AS VARCHAR(4)) 
END

The fnCOM_StandardDate UDF accepts a datetime value and returns a varchar(10) in a MM/DD/YYYY format. Pretty simple, of course, and if your application requires a particular format often, this technique could make it easier to maintain. One key component to notice in the previous code is the use of a nested UDF. The fnCOM_StandardDate UDF calls the fnCOM_2Digits UDF (shown in the next sample) twice, both times to put a leading zero in front of days or months less than 10.

CREATE FUNCTION fnCOM_2Digits (@sValue VARCHAR(2))
    RETURNS VARCHAR(2)
AS
BEGIN
    IF (LEN(@sValue) < 2)
        SET @sValue = '0' + @sValue 

    RETURN @sValue
END

UDFs can be nested within each other as long as the inside UDF is created first. One catch with nesting functions is that built-in functions that are nondeterministic, such as the getdate function, cannot be nested inside of another UDF (otherwise, a SQL Server error is raised). A nondeterministic function is one which may not return the same result when called multiple times with exactly the same parameters. The getdate function falls into this category since every time it is called, it returns the new current date and time. Another commonly used nondeterministic built-in function is the NewID function. It is also nondeterministic as it always returns a unique GUID and, as such, the NewID function is not allowed to be nested within a UDF.

Table-valued UDFs

Within the category of table-valued UDFs there are two sub-types: inline table value-returning UDFs and multistatement table value-returning UDFs. UDFs that return inline tables return a rowset via the SQL Server table datatype. They are defined with a single SELECT statement making up the body of the function. Inline table value-returning UDFs cannot contain additional T-SQL logic outside of the SQL SELECT statement that defines the table it will return. However, they are simpler to create than UDFs that return multistatement tables since they do not have to define the exact table structure to be returned. UDFs that return inline tables extrapolate the structure of the rowset from the SELECT statement itself. Thus, the columns that the UDF will return are determined by the columns in the SELECT list. The following code shows the fnGetEmployeesByCity UDF, which accepts a city and returns a table containing all employees' first name, last name, and address:

CREATE FUNCTION fnGetEmployeesByCity (@sCity VARCHAR(30))
    RETURNS TABLE
AS
RETURN
    (
        SELECT    FirstName, LastName, Address
        FROM    Employees
        WHERE    City = @sCity 
    )
GO

This inline table value-returning UDF can be selected from or even joined to because it returns a rowset via the table datatype, as shown here:

SELECT * FROM dbo.fnGetEmployeesByCity('seattle')

Notice that the UDF is called using the two-part name of object owner and object name. However, the object's owner is not required (but is acceptable) when using a UDF that returns a table datatype value. Table-valued UDFs are quite flexible in that they can be used like a prepared and parameterized view (if one existed). In table-valued UDFs you can use parameters, achieve the performance of a prepared query, and join or select from the resulting rowset (or table in this case).

Although this type of UDF is compact, it is important to remember that if additional logic needs to be added to the UDF, it will have to be converted to a multistatement table value-returning UDF. Also, inline table value-returning UDFs cannot have an ORDER BY clause on the SELECT statement either (unless it is used in conjunction with the TOP clause).

A UDF that returns multistatement tables explicitly defines the structure of the table to return. It does so by defining the column names and datatypes right in the RETURNS clause. Thus it takes a bit more code to get it set up than an inline table value-returning UDF. However, it has several advantages over inline table value-returning UDFs including the ability to house more complicated, numerous T-SQL logic blocks. As their name suggests, multistatement table value-returning UDFs allow multiple statements to define the UDF. Thus statements such as control of flow, assignments, cursors, SELECTS, INSERTS, UPDATES, and DELETES are allowed and can all exist in a single UDF. So, as opposed to UDFs that return inline tables, their multistatement brethren are not limited to a single SELECT statement nor are they prohibited from ordering the returning rowset.

Figure 6 shows how to rewrite the inline table value-returning UDF from the code snippet I just showed as a multistatement table value-returning UDF. Thus the multistatement type can do anything the inline type can do. A more complicated use of a UDF that returns multistatement tables could involve retrieving all employees by city, but if no customers match the given city then a dummy row is returned where the Address field is filled with "No matching employees found in the specified city," as shown in Figure 7.

Figure 7 fnGetEmployeesByCity3

CREATE FUNCTION fnGetEmployeesByCity3 (@sCity VARCHAR(30))
    RETURNS @tblMyEmployees TABLE
    (
        FirstName VARCHAR(20),
        LastName VARCHAR(40),
        Address VARCHAR(120)
    )
AS
BEGIN
    INSERT    @tblMyEmployees
    SELECT   FirstName, LastName, Address
    FROM     Employees
    WHERE    City = @sCity 
    ORDER BY LastName

    IF NOT EXISTS (SELECT * FROM @tblMyEmployees)
        INSERT @tblMyEmployees (Address)
            VALUES ('No matching employees found in the specified city')
        
    RETURN
END
GO

Figure 6 fnGetEmployeesByCity2

CREATE FUNCTION fnGetEmployeesByCity2 (@sCity VARCHAR(30))
    RETURNS @tblMyEmployees TABLE
    (
        FirstName VARCHAR(20),
        LastName VARCHAR(40),
        Address VARCHAR(120)
    )
AS
BEGIN
    INSERT   @tblMyEmployees
    SELECT   FirstName, LastName, Address
    FROM     Employees
    WHERE    City = @sCity 
    ORDER BY LastName
    RETURN
END
GO

It's a Wrap

There are some other key factors that can help create a powerful UDF of any type, one of which is recursion. Recursion is supported with UDFs such that a UDF can call itself from within itself. Basically, recursion is just nesting a UDF except that the UDF you are nesting is the same one you are in. This can be very useful in certain situations including when you're creating a UDF that must compute a factorial or evaluate each character in a string. There is a limited depth of 32 levels of recursion in SQL Server 2000, after which an error is raised.

It is also important to point out that a UDF can be bound to the schema of the underlying objects to which it refers. To do this, the UDF must be created using the WITH SCHEMABINDING clause. If the UDF is created this way and someone attempts to alter one of the underlying objects' schema without first removing the schema binding, an error will be generated and raised. Using this option will help you ensure that no UDFs break inadvertently due to changes in an underlying object's schema.

When evaluating UDFs it is vital to consider the balance between performance and maintainability. While UDFs can reduce the amount of common code, be used as part of a common function library, can promote shorter code blocks, and are generally easier to maintain than additional versions of the same SQL logic, it would be reckless to use a UDF without first taking into consideration any of the drawbacks.

It would be a bad idea to use a UDF if performance suffers tremendously. For example, assume that there is a UDF that performs a SQL SELECT statement that takes one second to execute. If this UDF is used in a SELECT or a WHERE clause it will be executed for every row. Thus the time the main query takes to execute could increase drastically depending on such factors as the number of rows evaluated and returned and the types of indices in place. Before using a UDF in this type of situation, carefully weigh the options and do some performance testing. However, using a UDF that performs a calculation such as the one shown in Figure 3 barely affects the performance of query. As with any tool, when used properly and evaluated accordingly prior to going live, UDFs offer great convenience and maintainability.

Send your questions and comments for John to  mmdata@microsoft.com.

John Papa is a baseball fanatic who spends most of his summer nights rooting for the Yankees with his two little girls, wife, and faithful dog, Kadi. He has authored several books on ADO, XML, and SQL Server and can often be found speaking at industry conferences such as VSLive. You can reach him at mmdata@microsoft.com.