Article
11/03/2006

This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.

Dr. Tom's Workshop: Performance Through Persistence

Tom Moreau

T-SQL allows you to solve the same problem many different ways. Sometimes, alternatives aren't that obvious but can give you a satisfying, pleasant surprise. Read on as Dr. Tom Moreau checks out some possibilities for a solution–and discovers some additional gems along the way.

I love it when people ask me to help them out with a gnarly "real-world" problem and I can suggest a decent solution. But what I like even more is when I end up providing them with one that's even better than what I had originally envisioned.

	I'll cast this one in the context of our old friend, the Northwind database. Imagine that you need to get the total value of each order–but only for those orders that do not contain an order detail for ProductID 59. An order's value is defined as the sum of the product of Quantity * UnitPrice. Listing 1 gives you a solution and Table 1 shows the results.

Listing 1. Finding the value of orders where there's no detail for ProductID 59–first try.

  select
  OrderID
, sum (Quantity * UnitPrice) Value
from
  [Order Details] o1
where
  ProductID <> 59
group by
  OrderID

Table 1. Results from Listing 1.

CustomerID	Value
10248	440.0000
10249	1863.4000
10250	1813.0000
10251	670.8000
10252	3730.0000
10253	1444.8000
10254	625.2000
10255	1170.5000
10256	517.8000
10257	1119.9000
…	…

	Easy enough, right? But I always like to QC my own results, and if you pick through the rows of the Order Details table and search for those orders where ProductID 59 exists (OrderID 10255, for example), you'll see their OrderIDs in the result set. Huh? What gives?

	Let's reverse-engineer the problem that was solved by this query. The WHERE clause simply says to exclude rows where ProductID is 59. It doesn't say anything about omitting the entire order. Let's rephrase the spec.

	Casting the requirement into a form of English from which a T-SQL solution becomes readily apparent, you want the value of each order where there does not exist any row for ProductID = 59.

	Did you catch the keywords–WHERE, NOT EXIST(S)? With that in mind, Listing 2 gives you the solution and Table 2 shows the results. Hey, would I lie? Don't answer that!

Listing 2. Finding the value of orders where there's no detail for ProductID 59–second try.

  select
  o1.OrderID
, sum (o1.Quantity * o1.UnitPrice) Value
from
  [Order Details] o1
where not exists
(
  select 
    *
  from
    [Order Details] o2
  where
    o2.OrderID   = o1.OrderID
  and o2.ProductID = 59
)
group by
  o1.OrderID

Table 2. Results from Listing 2.

OrderID	Value
10248	440.0000
10249	1863.4000
10250	1813.0000
10251	670.8000
10252	3730.0000
10253	1444.8000
10254	625.2000
10256	517.8000
…	…

	Well, that was painless enough. If you're not happy with using an EXISTS predicate, you can convert this to using a NOT IN predicate as shown in Listing 3.

Listing 3. Finding the value of orders where there's no detail for ProductID 59–an alternative.

  select
  o1.OrderID
, sum (o1.Quantity * o1.UnitPrice) Value
from
  [Order Details] o1
where 59 not in
(
  select
    ProductID
  from
    [Order Details] o2
  where
    o2.OrderID   = o1.OrderID
)
group by
  o1.OrderID

	Some of you will see that Listing 3 isn't as "safe" as Listing 2 if it were possible that ProductID could be NULL. However, since ProductID can never be NULL in this table, you'll get the same results as Listing 2.

	I was real proud of myself back in Listing 1, since I was able to conjure up what looked like the right query and one that involved a single pass through the table. This correlated subquery approach–though correct–makes you hit the same table twice. I wish I could do this in just one pass without having to hit the table twice. I don't know about you (and for the record, no, I'm not an Atkins diet fan), but I like having my cake and eating it, too. Hmm.

	If you've played around with cross-tab queries in T-SQL, you'll be familiar with using CASE constructs inside aggregate functions, such as SUM(). I wondered if you could apply this cool trick to the problem at hand. Indeed, you can! Check out Listing 4.

Listing 4. Finding the value of orders where there's no detail for ProductID 59, using a HAVING clause.

  select
  OrderID
, sum (Quantity * UnitPrice) Value
from
  [Order Details] o1
group by
  OrderID
having
  sum (case when ProductID = 59 then 1 else 0 end) = 0

	The HAVING clause acts as a filter for your groups. Here, the ruse is to put your rejection criterion inside a SUM(). The SUM() will be greater than zero for those orders where there's a ProductID of 59. Simply testing whether the SUM() is zero is all you need to accept the group in your result set. Check out the query plan and you'll see that it made only one pass. Sweet!

	There are other ways you can use this kind of trick. For example, what if you were running a marketing query to determine those customers who place orders through one–and only one–employee? One approach is to use the NOT EXISTS predicate, as you can see in Listing 5.

Listing 5. Finding customers who ordered from just one employee.

  select distinct
  o1.CustomerID
from
  Orders o1
where not exists
(
  select
    *
  from
    Orders o2
    where
      o2.CustomerID = o1.CustomerID
    and o2.EmployeeID <> o1.EmployeeID
)

	The result is just one CustomerID–CENTC. Basically, it searches for CustomerIDs where there doesn't exist for the same CustomerID, any EmployeeID that's not the same (another example of T-SQL English). Again, this can be replaced by a GROUP BY with a HAVING clause, as you can see in Listing 6.

Listing 6. Finding customers who ordered from just one employee using GROUP BY.

  select
  CustomerID
from
  Orders
group by
  CustomerID
having
  min (EmployeeID) = max (EmployeeID)

	An alternative solution is shown in Listing 7.

Listing 7. Finding customers who ordered from just one employee using GROUP BY–another way.

  select
  CustomerID
from
  Orders
group by
  CustomerID
having
  count (distinct EmployeeID) = 1

	Both Listings 6 and 7 have query costs that are less than that of Listing 5, and both have a Scan Count of one as opposed to two for Listing 5. Listing 6 has a slightly lower cost than Listing 7. However, one nice feature about Listing 7 is that it can be adapted to those cases where you want to find customers who deal with two, three, four, and so on employees.

	Using the HAVING clause to filter out group characteristics can eliminate having to tap the same table twice, leading to better performance–and you know how I feel about performance. Oh, and one more thing: The only time the word "cursor" was mentioned was in this sentence! [To the best of our knowledge, no animals were harmed in this exercise, either.–Ed.]

Download 408TOM.SQL

To find out more about SQL Server Professional and Pinnacle Publishing, visit their Web site at http://www.pinpub.com/

Note: This is not a Microsoft Corporation Web site. Microsoft is not responsible for its content.

This article is reproduced from the August 2004 issue of SQL Server Professional. Copyright 2004, by Pinnacle Publishing, Inc., unless otherwise noted. All rights are reserved. SQL Server Professional is an independently produced publication of Pinnacle Publishing, Inc. No part of this article may be used or reproduced in any fashion (except in brief quotations used in critical articles and reviews) without prior consent of Pinnacle Publishing, Inc. To contact Pinnacle Publishing, Inc., please call 1-800-788-1900.

Dr. Tom's Workshop: Performance Through Persistence

Additional resources