This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.

Dr. Tom's Workshop: Generating Sequence Numbers

Tom Moreau

Very often, developers need a table that has a sequence of numbers. You know the drill—the number of rows can vary, but they must be guaranteed to be consecutive. Ach, but what about gaps? Pull up a seat while Dr. Tom Moreau shows you how to get there.

Transact-SQL is a powerful language, but as I (and many others) have said before, its power comes from data retrieval and manipulation. Generating a series of sequential numbers isn't something it can do "out of the box." Of course, you want that in the form of a table, since that's how we SQL jocks like to see our data. No arrays for us, no sirree. Oh, and yes, you (and the developer and the users) also want to be able to determine in advance how long the sequence will be. That sounds like a multi-statement, user-defined function (UDF) to me.

I must confess that sometimes, I take the path of least resistance. Yes, even Dr. Tom has his lazy moments and resorts to creating a loop (see Listing 1), iterating as many times as needed.

Listing 1. Generating sequence numbers using INSERT VALUES.

  create function dbo.GenRows
(@NumRows int)
returns @t table
(RowNum int not null primary key)
as
begin
  declare
    @Count int
  set
    @Count = 1
  while @Count <= @NumRows
  begin
    insert @t values (@Count)
    set
      @Count = @Count + 1
  end
return
end
go

Row by row, minute by minute, hour by hour—well, you get the picture. Can you imagine doing this for a million rows? There has to be a better way.

Well, there is a better way—and it involves some set-level logic. No, you still have a loop, but far fewer iterations, as you'll soon see. In Listing 1, you saw the basic INSERT VALUES statement, which inserts only one row at a time. Another form of the INSERT statement is the INSERT SELECT. Here, you insert the results of a SELECT query into a table. It doesn't matter what the source table of that SELECT is; it can even include the target table itself. So what if you did something like the following?

  insert MyTable
select * from MyTable

Well, you just doubled the rows in MyTable—all with a simple INSERT SELECT. Do it again, and you've quadrupled the number of rows from what you originally had. Powerful stuff! Exponential growth—didn't we learn that in high school? Let's harness this little nuclear fission reactor to have it power your UDF. Take a look at Listing 2 to see what I mean.

Listing 2. Generating sequence numbers using INSERT SELECT.

  create function dbo.GenRows
(@NumRows int) 
returns @t table
(RowNum int not null primary key)
as
begin
  declare
  @Count int
  -- initialize
  set
    @Count = 1
  -- seed the table
  insert @t values (1)
  while @Count < @NumRows
  begin -- add rows
    insert @t select 
      RowNum + (select max (RowNum) from @t) from @t
    -- increment
    set @Count = @Count + @@ROWCOUNT
  end
  -- remove rows in excess of max
  delete @t where RowNum > @NumRows
  return
end
go

First, you seed the table, since you need to have at least one row to get things started. Then, you do your loop—but the INSERT SELECT inserts the same number of rows into @t as already existed in @t. However, you need the numbers to be unique and sequential, so you add the maximum row number to each RowNum to be inserted through a nested subquery. You keep track of the number of rows in @t by adding @@ROWCOUNT to @Count.

The job's not done, though. One side effect of the loop is that you can overshoot your target. For example, if you want 25 rows, you'll get 32 before the loop exits. So the next task is to pare off the excess, which you do with the DELETE. Now you have something that really rocks—particularly for very large row numbers. Can this be improved?

I probably wouldn't have asked that rhetorical question unless I knew the answer was "Yes," would I? Let's say you want 1,025 rows. On the penultimate INSERT, you'd have 1,024 rows. On the final INSERT, you'd have 2,048 rows. Your DELETE would then remove 1,023 of those newly inserted rows. What a waste!

What's needed is a check to see how many rows remain to be inserted, and then insert only those. By comparing the maximum RowNum with @NumRows, you can limit the number of rows inserted. Check out the modified code in Listing 3.

Listing 3. Removing the DELETE statement.

  create function dbo.GenRows
(@NumRows int)
returns @t table
(RowNum int not null primary key)
as
begin
  declare
    @Count int
  -- initialize
  set @Count = 1
  -- seed the table
  insert @t values (1) while @Count < @NumRows
  begin
    -- add rows
    insert @t select t.RowNum + x.MaxRowNum from @t t
    cross join
    (select max (RowNum) MaxRowNum from @t) x
    where t.RowNum <= @NumRows - x.MaxRowNum
    -- increment
    set @Count = @Count + @@ROWCOUNT
  end
  return
end
go

What pops out at you immediately is the CROSS JOIN on a derived table. However, the derived table returns only one row and one column. We use that first where the same computation was done in the SELECT list and second in the new WHERE clause. (I use this in order to avoid calculating the same thing twice—see my November 2001 column, "Breaks and CROSS JOINs," for more details.) That WHERE clause now caps the INSERT, so that you can't insert more than the required number of rows. As a result, we don't need the DELETE statement.

Ah, but can we clean this code up just a wee bit more? Sure! Let's eliminate the @Count variable and tap into @@ROWCOUNT right in the WHILE construct. How's that again? Well, let's think about it for a minute. The value of @@ROWCOUNT is updated after the execution of each T-SQL statement. You pick up its value when you go to increment @Count. Then, @Count is used in the WHILE construct to test whether you've reached the end of the line. If you incorporate @@ROWCOUNT directly into the WHILE construct, you won't need @Count—or the BEGIN ELSE construct—at all. Check out the final code in Listing 4.

Listing 4. Optimizing the code.

  create function dbo.GenRows
(@NumRows int) returns @t table
(RowNum int not null primary key)
as
begin
  -- seed the table
  insert @t values (1)
  while @@ROWCOUNT > 0
  insert @t select t.RowNum + x.MaxRowNum from @t t
  cross join
  (select max (RowNum) MaxRowNum from @t) x
  where
    t.RowNum <= @NumRows - x.MaxRowNum
  return
end
go

So how does this work? Once you've seeded the table, the value of @@ROWCOUNT will be 1. The WHILE construct will execute the INSERT SELECT, since @@ROWCOUNT will be greater than zero, but the INSERT SELECT will change the value of @@ROWCOUNT. This value will be non-zero until you've inserted no rows. At that point, you exit the loop.

Being the curious guy that I am, I ran some tests for various values of @NumRows for each of the listings. Basically, I just asked it to give me the maximum RowNum, in order to minimize network traffic. (It still had to generate all of the rows.) I ran 10 tests each, purging the cache each time. The platform was SQL Server 2000 on Windows 2000 Advanced Server with 4CPUs and 4GB of RAM. (The numbers didn't change dramatically when I ran the scripts on my workstation, but the 4CPU server's results are summarized in Table 1 for your reading pleasure.)

Table 1. Query durations (in milliseconds) of Listings 1, 2, 3, and 4 for various row counts.

Rows

1

2

3

4

1,000

125

63

64

61

10,000

887

481

264

263

100,000

8,510

3,358

2,320

2,306

1,000,000

85,447

24,643

23,243

22,196

Clearly, there's a performance gain to be had by going with an INSERT SELECT solution; all were faster than the INSERT VALUES loop. Eliminating the DELETE bought me a bit more speed, and trimming out @Count and the related code looks like it may have added yet a tiny bit more speed. That said, less code usually equates to fewer errors—and I like that.

As you may have noticed, the sequences that dbo.GenRows() generates always start at 1. You can easily modify it to add a parameter for the starting number. Oh, and one final note—if you're going to be using dbo.GenRows() to generate the same number of rows over and over in your code, consider running it once and populating a table variable. Then, use the table variable in the remainder of your code, now that the heavy lifting has been done.

Download TOM1103.SQL

  • that in addition to int, you can use the IDENTITY column property on whole number data types such as smallint, tiny int, and numeric (that is, decimal, with a scale of zero)?
  • that IDENTITYCOL automatically refers to the column in a table that has the IDENTITY property?
  • that @@IDENTITY is a system function that returns the last identity value used by the current connection?
  • that the IDENT_SEED function returns an IDENTITY column's starting number (seed) value—which defaults to 1?
  • that the IDENT_INCR function returns an IDENTITY column's increment value—which also defaults to 1?
  • that the IDENT_CURRENT function, which takes a table name as input, returns the last identity value inserted into that table?
  • that you can use the IDENTITY(data type, seed, increment) function in a SELECT INTO statement?
  • that, in an INSERT statement, you can't provide a value for a column that has the IDENTITY property (for example, to fill a gap created by a DELETE statement) unless you explicitly SET IDENTITY_INSERT ON?
  • that Access's AutoNumber data type maps to SQL Server's int IDENTITY and that its Replication ID (GUID) maps to SQL Server's uniqueidentifier?

To find out more about Microsoft SQL Server Professional and Pinnacle Publishing, visit their website at http://www.pinpub.com/

Note: This is not a Microsoft Corporation website. Microsoft is not responsible for its content.

This article is reproduced from the November 2003 issue of Microsoft SQL Server Professional. Copyright 2003, by Pinnacle Publishing, Inc., unless otherwise noted. All rights are reserved. Microsoft SQL Server Professional is an independently produced publication of Pinnacle Publishing, Inc. No part of this article may be used or reproduced in any fashion (except in brief quotations used in critical articles and reviews) without prior consent of Pinnacle Publishing, Inc. To contact Pinnacle Publishing, Inc., please call 1-800-788-1900.