Hi,
Recently I was reviewing a SQL stored procedure for a company and I noticed that the developers had made an all too common error. They created their WHERE clause with a non “sargable” condition.
What this caused was a rather painful table scan because the query optimiser was unable to use the indexes on the table.
Let’s look at a pretty simple example, and one that you can run yourself.
We’ll begin with creating a table with three fields:
CREATE TABLE tblExampleA (ID INT IDENTITY (1,1) , Word1 VARCHAR(10) , Date1 DATETIME);
Next, we need to put some data in there. This SQL below will throw in about 50,000 rows:
DECLARE @D1 DATETIME = DATEADD(YEAR , -50, GETDATE());
WHILE @D1 < GETDATE()
BEGIN
INSERT INTO tblExampleA ( Word1, Date1 )
VALUES (DATENAME( WEEKDAY , @D1) , @D1);
SET @D1 = DATEADD(HOUR , 9 , @D1);
PRINT @D1;
END
SELECT * FROM tblExampleA;
To illustrate the issue, we want to show the difference between an Index Scan and an Index Seek. So let’s go ahead and put a clustered index on our table, and ask for the SSMS to give us the statistics of the queries we're about to run:
CREATE CLUSTERED INDEX ciByDate ON tblExampleA (Date1);
SET STATISTICS IO ON;
Finally, we’ll run two SQL statements:
DECLARE @ForDate DATE = '1978-04-28';
SELECT Top 100 *
FROM tblExampleA
WHERE CONVERT(varchar,Date1,112) = @ForDate;
SELECT Top 100 *
FROM tblExampleA
WHERE Date1 BETWEEN @ForDate AND DATEADD(day,1,@ForDate);
When we look at the end results, the output is the same.
However … the work done by SQL to get the data is very different. Here are the two execution plans:
And the statistics we asked for:
Look at the difference in Logical reads.
Because of the CONVERT function over the Date1 field, the SQL optimiser was unable to use the index that we created. This is because it had to read all of the records in the table, parse them through the function and then compare them to our variable of @ForDate.
This was easily fixed, by instead of adding a function on the Date1 field to drop the Time off the DateTime field, we instead just used a different way of filtering. In this case, we used the BETWEEN condition to go from the DATE which we declared, and the Day after.
The best advice I can give when is to always test your queries, not just for the desired results, but for their optimisation. So often I see SQL code written by .Net developers, and while it returns the output they want, it is done in an most inefficient manner.
If you have any questions, I'd love to hear them so please send them through.
Rodney
These queries are not logically equivalent, the second one will also return rows where Date1 is 1978-04-29 00:00:00.000 if any. Beware as Between is inclusive for both boundaries.
ReplyDeleteThanks Stan the Man :-)
ReplyDeleteI had to change it slightly from the original client code, as i couldn't show my client's data and query on a public site. I think it still demonstrates the sargability though.