I did some preliminary benchmarking to assess the cost of using LINQ vs using direct SQL today. I was not convinced that the benchmarking was accurate as I was running against a dev SQL server that other developers were also using, and the SQL box is a virtual server on the same hardware that hosts other virtual servers and other systems. Too much white noise causing interference.
So tonight I sat down and started doing some isolated, consistent benchmarking. I’m running SQL Express 2005 and LINQ May CTP. The hardware is a standalone Core Duo running XP. Did I mention that it’s a Macbook Pro?
1 GB RAM, some other apps running but nothing resource hungry.
Things were running like this:
I used a dev database with a few thousand rows of text data to test. The table in question (you’ll see the table name in the LINQ code below): ArticleModuleAttribute has 4060 Rows. I did not inspect or modify any indexing or anything else at the table level as the table in question will be used for all tests.
first-up I wanted to emulate the sort of basic behavior we have in a piece of code that does a search, retrieves a page of results, and shows it on screen. So I created a simple LINQ query as follows:
private static void LinqTest()
{
DataContext db = new DataContext(”Data Source=macbookpro\\sqlexpress;Initial
Catalog=CMDotNet;Integrated Security=True”);
Table<ArticleModuleAttribute> ArticleModuleAttributes = db.GetTable<ArticleModuleAttribute>();
var blocks =
(from ama in ArticleModuleAttributes
where ama.strAttributeName == “Content” && ama.strAttributeValue.Contains(”unselectable”)
select ama).Skip(10).Take(10);
foreach (var block in blocks)
{
Console.WriteLine(block.numAttributeID.ToString());
}
blocks = null;
}
This code is just a simple single-table search, skipping the first page, returning the second page of ten results. The console.writeline is just to slow things down a bit, and this will be applied to LINQ and direct tests.
Next up I wrote the simplest equivalent of this LINQ query. I went into LinqTest in debug and retrieved the SQL. I made it into a stored procedure as follows:
ALTER procedure [dbo].[testSearchProc]
@p0 as nvarchar(20),
@p1 as nvarchar(20),
@p2 as nvarchar(20),
@p3 as nvarchar(20)
as
SELECT TOP 10 [t0].[numAttributeID], [t0].[numModuleID], [t0].[strAttributeName], [t0].[strAttributeValue]
FROM [ArticleModuleAttribute] AS [t0]
WHERE (NOT (EXISTS(
SELECT NULL AS [EMPTY]
FROM (
SELECT TOP 20 [t1].[numAttributeID]
FROM [ArticleModuleAttribute] AS [t1]
WHERE ([t1].[strAttributeName] = @p0) AND ([t1].[strAttributeValue] LIKE @p1)
) AS [t2]
WHERE [t0].[numAttributeID] = [t2].[numAttributeID]
))) AND ([t0].[strAttributeName] = @p2) AND ([t0].[strAttributeValue] LIKE @p3)
This is literally a cut-and-paste from LINQ with the only modification being the parameter declarations being changed to stored procedure parameters. Nothing in the select is modified (which in itself blew me away as it is now THAT easy to do server-side paging, no more excuses for sloppy searches
)
I wrote a simple method to call this stored procedure as follows:
private static void DirectTest(int i)
{
SqlConnection conn = new SqlConnection(”Data Source=macbookpro\\sqlexpress;Initial
Catalog=CMDotNet;Integrated Security=True”);
conn.Open();
SqlCommand cmd = new SqlCommand();
cmd.Connection = conn;
cmd.CommandText = “testSearchProc”
cmd.CommandType = CommandType.StoredProcedure;
cmd.Parameters.Add(new SqlParameter(”p0″, “Content”));
cmd.Parameters.Add(new SqlParameter(”p1″, “%unselectable%”));
cmd.Parameters.Add(new SqlParameter(”p2″, “Content”));
cmd.Parameters.Add(new SqlParameter(”p3″, “%unselectable%”));
SqlDataReader reader = cmd.ExecuteReader(CommandBehavior.CloseConnection);
while (reader.Read())
{
Console.WriteLine(reader.GetSqlDecimal(0).ToString());
}
reader.Close();
reader = null;
}
No Rocket Science there either. the parameter values are exactly as I extracted from LINQ in debug
I then ran these two tests, one hundred times each. The first runs were always slower as (I assume) connections were opened and then pooling sped it up. grabbing times between every loop I got the following results:
This was grabbing the ten records, writing a value to the console, looping 100 times. LINQ appears to be slower than a direct call on a stored procedure. Applying a ten point moving average to get something a bit easier to look at gives:
Showing DLINQ sitting at almost double the execution time. to eliminate the possibility that pooling was somehow skewing results, I added pooling=false to the connection strings for both tests, swapped the execution order and reran. A similar result emerged. To eliminate the console.writeline from skewing things, I changed the code to set and reset a single decimal variable. This also eliminated the .ToString (although it was consistent in both tests.
Based on this trivial example I would suggest detailed analysis including to-scale load testing before rolling LINQ out as a replacement for traditional SQL in situations where high-scalability is required.
Further investigation: Dynamic SQL vs LINQ