Data Points

Standard Query Operators with LINQ

John Papa

Code download available at:DataPoints2008_03.exe(958 KB)

Contents

Operators and LINQ
Types of Operators
Lambda Expressions
First and Single
Aggregates, Hierarchies, and Projections
Projections and Ordering
Quantifiers and Conversion
Wrapping Up

Language Integrated Query (LINQ) allows developers to write SQL-like queries with Microsoft® .NET Framework 3.5 code using a strongly typed syntax. Different LINQ providers, such as LINQ to Objects (which allows you to write queries against object hierarchies) and LINQ to Entities (which allows you to write queries against the Entity Framework's conceptual model), are then able to effectively process these queries according to the nuances of their representative data stores.

Besides the strongly typed syntax, LINQ queries also have an arsenal of standard query operators to enhance their functionality. These standard query operators operate on sequences and allow you to perform operations such as determining if a value exists in the sequence and performing an aggregated function, such as a summation, over a sequence.

In this month's column I will perform practical queries and operations with LINQ, using both LINQ to Objects and LINQ to Entities. I'll query a single collection of entities and dive deeper into a hierarchical set of entities using their navigation properties. I'll also show how you can apply many of the standard query operators to arrays and collections. I will demonstrate how LINQ's standard query operators are enhanced using lambda expressions and how they can let you parse specific information from a sequence and perform complex logic over a sequence. All code examples are available in the download provided for this column (see msdn.microsoft.com/msdnmag/code08.aspx).

Operators and LINQ

LINQ is very powerful all on its own whether you are using LINQ to XML, LINQ to DataSets, LINQ to Entities, LINQ to Objects, or any other LINQ provider that comes along. The core power of LINQ is in its strongly typed query syntax, which can be used to operate over any of these providers. When LINQ is combined with one or more standard query operators, the result is an even more powerful toolset that can give you granular control over a set of data.

Standard query operators exist in the System.Core.dll assembly within the System.Linq namespace as extension methods on the static classes Enumerable and Queryable, and they can be used on objects that implement IEnumerable<T> or IQueryable<T>. This allows them to operate on a variety of types, from in-memory collections and arrays (sequences) to remote databases using providers like LINQ to Entities and LINQ to SQL.

It is easy to determine which set of operators you have at your disposal for a specific task. When you want to use an operator in a LINQ query, you can use an operator from the available list of extension methods in the Queryable static class. When you want to use an operator on a sequence that implements IEnumerable<T>, you can use one of the extension methods found in the Enumerable static class. Keep in mind, however, that not all of the operators found in the Queryable class are applicable to the underlying data store, and hence some may not be supported at run time.

Types of Operators

There are many different types of operators (all of which can be found using the object browser to look at the Enumerable and Queryable classes). Figure A shows a categorization of the different types of operators in alphabetical order. It offers you a glimpse at the functionality the operators provide. I will be demonstrating a subset of these operators using both LINQ to Objects and LINQ to Entities to show how they can be beneficial in a real-world application.

Figure A Categories of Operators

Operator Description
Aggregation  
Aggregate Performs a custom method over a sequence
Average Computes the average of a sequence of numeric values
Count Returns the number of the items in a sequence as an int
LongCount Returns the number of the items in a sequence as a long
Min Finds the minimum number of a sequence of numbers
Max Finds the maximum number of a sequence of numbers
Sum Sums the numbers in a sequence
Concatenation  
Concat Concatenates two sequences into one sequence
Conversion  
Cast Casts elements in a sequence to a given type
OfType Filters elements in a sequence of a given type
ToArray Returns an Array from a sequence
ToDictionary Returns a Dictionary from a sequence
ToList Returns a List from a sequence
ToLookup Returns a Lookup from a sequence
ToSequence Returns an IEnumerable sequence
Element  
DefaultIfEmpty Creates a default element for an empty sequence
ElementAt Returns the element at a given index in a sequence
ElementAtOrDefault Returns the element at a given index in a sequence or a default value if the index is out of range
First Returns the first element of a sequence
FirstOrDefault Returns the first element of a sequence or a default value if no element is found
Last Returns the last element of a sequence
LastOrDefault Returns the last element of a sequence or a default value if no element is found
Single Returns the single element of a sequence
SingleOrDefault Returns the single element of a sequence or a default value if no element is found
Equality  
SequenceEqual Compares two sequences to see if they are equivalent
Generation  
Empty Generates an empty sequence
Range Generates a sequence given a range
Repeat Generates a sequence by repeating an item a given number of times
Grouping  
GroupBy Groups items in a sequence by a given grouping
Joining  
GroupJoin Performs a grouped join on two sequences
Join Performs an inner join on two sequences
Ordering  
OrderBy Orders a sequence by value(s) in ascending order
OrderByDescending Orders a sequence by value(s) in descending order
ThenBy Orders an already-ordered sequence in ascending order
ThenByDescending Orders an already-ordered sequence in descending order
Reverse Reverses the order of the items in a sequence
Partitioning  
Skip Returns a sequence that skips a given number of items
SkipWhile Returns a sequence that skips items that do not meet an expression
Take Returns a sequence that takes a given number of items
TakeWhile Returns a sequence that takes items that meet an expression
Projection  
Select Creates a projection of parts of a sequence
SelectMany Creates a one-to-many projection of parts of a sequence
Quantifiers  
All Determines if all items in a sequence meet a condition
Any Determines if any items in a sequence meet a condition
Contains Determines if a sequence contains a given item
Restriction  
Where Filters the items in a sequence
Set  
Distinct Returns a sequence without duplicate items
Except Returns a sequence representing the difference between two sequences
Intersect Returns a sequence representing the intersection of two sequences
Union Returns a sequence representing the union of two sequences

Lambda Expressions

Many standard query operators use Func delegates to process individual elements as they operate over a sequence. Lambda expressions can work in concert here with the standard query operators to represent the delegates. A lambda expression is a shorthand form for creating an implementation of a delegate and can be used anywhere an anonymous delegate can be used. Lambda expressions are supported by both C# and Visual Basic® .NET. It is important to note, though, that because Visual Basic .NET does not yet support anonymous methods, lambda expressions may consist of only a single statement.

Let's see how the Single operator can be used on an array of integers. An array of integers, each element representing a power of 2 from 1 through 10, is created and then the Single operator is used to retrieve the single integer element that meets the condition specified in the lambda expression:

int[] nums = { 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024 }; int singleNum = nums.Single(x => x > 16 && x < 64); Console.WriteLine(singleNum.ToString());

Lambda expressions have a few key parts. The lambda expression starts by defining the variables passed into the delegate. In the previous code example, x (declared on the left-hand side of the => operator) is the argument that represents each element of the nums array that is passed to it. The remaining portion of the lambda expression represents the logic that is evaluated for each element in the array. The previous expression could have just as easily been rewritten using an anonymous delegate, as follows:

int singleNum = nums.Single<int>( delegate(int x) {return (x > 16 && x < 64); } ) ;

However, this code is less readable than using the lambda expression. C# 2.0 introduced anonymous delegates, which made passing delegates a little easier; however, lambda expressions took that to new heights of simplicity with its terse syntax.

First and Single

When it is necessary to get a single value from a sequence, the First, FirstOrDefault, Single, and SingleOrDefault operators are very useful. The First method returns the first element in a sequence. There is an overloaded method for First that allows you to pass in a lambda expression to represent a condition. For example, if you want to return the first element in a sequence of integers where the integer element is greater than 50, you could use the following code sample:

int[] nums = { 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024 }; int num1 = nums.First<int>(); int num2 = nums.First<int>(x => x > 50); int num3 = nums.FirstOrDefault<int>(x => x > 5000); Console.WriteLine( num1.ToString() + "-" + num2.ToString() + "-" + num3.ToString());

This code will find the first element (1), the first element that is greater than 50 (64), and the first element greater than 5,000. Because there is no element in the array that satisfies the third lambda expression (no integer in the array is greater than 5,000), an exception would be thrown if the code used the First operator instead of FirstOrDefault. By using the FirstOrDefault operator, 0 is returned if no element satisfies the lambda expression. The First operator can also be used in a LINQ to Entities query, as you see here:

using (Entities entities = new Entities()) { var query = (from c in entities.Customers select c).First(c => c.City.Equals("London")); Console.WriteLine(query.CompanyName); }

In this example, the first customer in the city of London will be returned. As you can see, the syntax for using the First method does not change when using it with different LINQ providers—in this case LINQ to Objects and LINQ to Entities.

The First operator is very useful in the context of LINQ to Entities, especially when you know a single record will be returned from a query. For example, you might have a query that always gets a customer record when given the CustomerID. In this situation you would always have 0 or 1 records returned, so a sequence does not help you nearly as much as the entity itself. In other words, you would rather just fetch the Customer entity instead of a sequence of 1 Customer entity. This is a case where the First method can help, as shown in the code snippet that follows. (Because the Entity Framework does not try to distribute the execution of a single query across client and server, the single method is not supported in LINQ to Entities, so using the First method is an easy alternative.)

using (Entities entities = new Entities()) { var query = (from c in entities.Customers where c.CustomerID.Equals("BOLID") select c).First(); Console.WriteLine(query.CompanyName); }

Aggregates, Hierarchies, and Projections

Using an aggregate operator such as Sum on a LINQ to Entities query can help simplify a query. For example, the following code retrieves a sequence of orders where the order total is more than $10,000:

using (Entities entities = new Entities()) { var query = from o in entities.Orders where o.OrderDetails.Sum( od => od.UnitPrice * od.Quantity) >= 10000 select o; foreach (Orders order in query) Console.WriteLine(order.OrderID); }

Because LINQ can query hierarchical entity collections, standard query operators can also be used to perform operations over the nested sequences of entities. This can be very useful when derived data must be calculated or interrogated. The derived data may only exist in its base form, such as in the case where the details of the customer orders have only values for unit price and quantity. In this case, aggregated data representing the total amount for an order is not provided at any point in the model. By applying the Sum operator to your LINQ query, however, you can still retrieve all customers who have spent more than $20,000, as shown here:

using (Entities entities = new Entities()) { var query = from c in entities.Customers where c.Orders.Sum( o => o.OrderDetails.Sum( od => od.UnitPrice * od.Quantity)) >= 25000 select c; foreach (Customers customer in query) Console.WriteLine(customer.CompanyName); }

This example demonstrates how you can apply the standard query operators at multiple levels of a LINQ query. The query eventually returns a sequence of Customers entities, but to get there it must first dive into each customer's orders and each order's order details to get the data it needs so that it can calculate each line item's price, sum the total of the line items for each order, and sum the order totals for each customer.

The Count operator is another aggregate standard query operator. You can determine how many customers spent more than $25,000 by using the following code:

using (Entities entities = new Entities()) { var query = (from c in entities.Customers where c.Orders.Sum( o => o.OrderDetails.Sum( od => od.UnitPrice * od.Quantity)) >= 25000 select c).Count(); Console.WriteLine(query); }

The Max operator can be used to determine the best customer. The following code sample will return the amount that the highest-spending Customer spent. It uses a combination of the Sum and the Max aggregators over multiple levels of a hierarchy:

using (Entities entities = new Entities()) { var query = (from c in entities.Customers select new { c.CustomerID, Total = c.Orders.Sum( o => o.OrderDetails.Sum(od => od.UnitPrice)) }).Max(c2 => c2.Total); Console.WriteLine(query); }

Projections and Ordering

You also may have noticed that I snuck in a projection in the previous example. Before using the Max operator, the LINQ query does not return a list of customers. Instead, it returns a projection, which creates a new entity that contains a property for the CustomerID and a property for the Total (the customer's entire spending amount). Projections are integral to LINQ, and when they are projected into sequences, as in the previous example just shown, they can be further processed using standard query operators.

Figure 1 shows how to create a projection of a new entity that contains a CustomerID and the sum of that customer's order totals (using the Sum operator discussed previously). Figure 1 also uses the OrderByDescending operator to order the sequence of projected entities by the calculated total. If two customers had the same total, an additional ordering operator could be used to further define the ordering. For example, the foreach statement in Figure 1 could be amended to use the following code to further qualify the ordering rules:

Figure 1 Aggregates, Projections, and Ordering

using (Entities entities = new Entities()) { var query = from c in entities.Customers where c.Orders.Sum( o => o.OrderDetails.Sum(od => od.UnitPrice)) > 0 select new { c.CustomerID, Total = c.Orders.Sum( o => o.OrderDetails.Sum(od => od.UnitPrice)) }; foreach (var item in query.OrderByDescending(x => x.Total)) Console.WriteLine(item.CustomerID + " == " + item.Total); }

foreach (var item in query.OrderByDescending(x => x.Total) .ThenBy(x => x.CustomerID)) { Console.WriteLine(item.CustomerID + " == " + item.Total); }

In this code snippet, I added the ThenBy operator and a lambda expression to indicate that the sequence should be ordered first in descending order by the Total property and then in ascending order by the CustomerID property of the projection.

Quantifiers and Conversion

When you need to determine if a value exists in a sequence, the standard query operator Any can be used. Quantifiers such as Any, All, and Contains search a sequence of elements and evaluate whether or not the sequence meets a lambda expression's condition. This can be very helpful when you need to examine a sequence to determine things like whether a customer from a specific address exists, if all customers are from the same country, or any other analytical deterministic question.

For example, the following LINQ query checks to see if all customers from the United Kingdom are located in London. This uses the quantifier All and passes it the lambda expression that simply evaluates if the city is London. If every element in the sequence meets this criterion and returns true from the lambda expression, then the All operator will return true:

using (Entities entities = new Entities()) { bool allUKCustomerAreFromLondon = (from c in entities.Customers where c.Country == "UK" select c).All( c => c.City.Equals("London")); Console.WriteLine(allUKCustomerAreFromLondon ? "Yes" : "No"); }

Another question that might need to be asked in this query is if there are any entities in the sequence from the city Cowes in the United Kingdom. For this question, you can use the Any quantifier to operate on the sequence, as shown here:

using (Entities entities = new Entities()) { bool isOneUKCustomerFromCowes = (from c in entities.Customers where c.Country == "UK" select c).Any( c => c.City.Equals("Cowes")); Console.WriteLine(isOneUKCustomerFromCowes? "Yes" : "No"); }

The Contains operator is similar to the Any operator in that it evaluates whether the sequence contains what you are looking for. The Any operator can determine if a value exists within an item in the sequence, but the Contains operator determines if a specific item instance exists in the sequence. For example, before you add an object to a sequence, you might want to make sure the sequence does not already contain that object. Figure 2 demonstrates how to check.

Figure 2 Using Contains and Conversion

using (Entities entities = new Entities()) { Customers customerBSBEV = (from c in entities.Customers where c.CustomerID == "BSBEV" select c).First(); var customersUK = from c in entities.Customers where c.Country == "UK" select c; bool isCustomerInSequence = customersUK.Contains(customerBSBEV); Console.WriteLine(isCustomerInSequence? "Yes" : "No"); }

Notice in Figure 2 that a Customers entity is retrieved for the BSBEV customer. Then the sequence of Customers entities where the customers are from the United Kingdom is retrieved. Finally, the Contains operator is used to check whether the sequence of customers contains the instance held by the customerBSBEV variable.

The implementation of the Contains operator shown in Figure 2 works in situations where you can confidently compare objects based on their actual instance. However, what if you need the Contains operator to test based on a logical identity? Fortunately, the Contains operator has an overload allowing you to pass in an object implementing the IEqualityComparer<T> interface. To use Contains based on CustomerID, the code in Figure 2 can be rewritten as follows

using (Entities entities = new Entities()) { ... bool isCustomerInSequence = customersUK.Contains(customerBSBEV, new CustomerComparer()); Console.WriteLine(isCustomerInSequence? "Yes" : "No"); }

where CustomerComparer is defined as

private class CustomerComparer : IEqualityComparer<Customers> { public bool Equals(Customers x, Customers y) { if (x == null || y == null) return false; return x.CustomerID.Equals(y.CustomerID); } ... }

Wrapping Up

There are many standard query operators that are defined as extension methods for the Enumerable and Queryable sequence classes. These operators can help extend the functionality of LINQ, as I've show you here. I've also demonstrated how many of the new enhancements to the .NET Framework 3.5 (including lambda expressions, LINQ, the Entity Framework, and implicitly typed variables) come together to make writing robust code and logic easier.

Send your questions and comments for John to mmdatat@microsoft.com.

John Papa is a senior .NET consultant with ASPSOFT (aspsoft.com) and a baseball fanatic who spends most of his summer nights rooting for the Yankees with his family and his faithful dog, Kadi. John, a C# MVP and INETA speaker, has authored several books on data access technologies. He can often be found speaking at industry conferences or blogging at www.johnpapa.net.