Microsoft LINQ Books

Programming Microsoft LINQ & Introducing Microsoft LINQ
Welcome to Microsoft LINQ Books Sign in | Join | Help
in Search

Marco Russo

Use of Distinct and OrderBy in LINQ

A few days ago I found a bug in a program written using LINQ to SQL, which was caused by years of use of SQL. The requirement was something like: get the distinct values of (bla bla bla) sorted alphabetically. An example of the required query with Northwind would be the following one:

SELECT DISTINCT
        e.LastName
FROM    Orders o
LEFT JOIN [Employees] e
        ON e.[EmployeeID] = o.[EmployeeID]
ORDER BY e.LastName 

Fundamentally, we are using both a DISTINCT and an ORDER BY statement in SQL.

Now, if you create a NorthwindDataContext importing the Order and Employee tables, you can try to write a similar statement in LINQ to SQL. Unfortunately, the Distinct clause is not part of the query syntax and the most intuitive path could be the one of calling Distinct at the end of your statement, like in the following query:

var queryA =
    (from o in db.Orders
     orderby o.Employee.LastName
     select o.Employee.LastName)
     .Distinct();

 

However, the Distinct clause is removing the sort condition defined by the orderby keyword. In fact, the SQL statement sent to the database is the following one:

SELECT DISTINCT
        [t1].[LastName]
FROM    [dbo].[Orders] AS [t0]
LEFT OUTER JOIN [dbo].[Employees] AS [t1]
        ON [t1].[EmployeeID] = [t0].[EmployeeID]

 

This behavior might appear strange. The problem is that the Distinct operator does not grant that it will maintain the original order of values. Applied to LINQ to SQL, this mean that a sort constraint can be ignored in the case of a query like queryA.

The solution is pretty sample: put the OrderBy operator after the Distinct one, like in the following queryB definition:

var queryB = 
    (from o in db.Orders
     select o.Employee.LastName)
    .Distinct().OrderBy( n => n );

 

This will result in the following SQL statement sent to Northwind:

SELECT  [t2].[LastName]
FROM    ( SELECT DISTINCT
                    [t1].[LastName]
          FROM      [dbo].[Orders] AS [t0]
          LEFT OUTER JOIN [dbo].[Employees] AS [t1]
                    ON [t1].[EmployeeID] = [t0].[EmployeeID]
        ) AS [t2]
ORDER BY [t2].[LastName]

 

If you remove some syntax redundancy, this is exactly the same query I wrote at the beginning of my post.

The lesson is: in a SQL query, the position of an operator is not relevant until operators belong to the same SELECT/FROM statement. In LINQ, this is not true and the conversion to SQL could remove LINQ operators when their operation might be ignored by other operators in the same LINQ query.

Final consideration: initially I considered that the compiler could emit some warning in case a query reduction is done like in the queryA case. Unfortunately, the query reduction operation is done by the LINQ to SQL provider at execution time and not during compilation. A warning could still be possible, but it's something that I would move to tools like FxCop.

Published domenica 20 luglio 2008 19.16 by Marco.Russo

Comments

 

Marco Russo said:

Recentemente ho scritto alcuni post in inglese nel mio blog su LINQ: Campi varchar(1) usando LINQ to

luglio 20, 2008 10.25
Anonymous comments are disabled
Powered by Community Server (Personal Edition), by Telligent Systems