Should LINQ be avoided because it's slow? [closed]
I've had been told that since .net linq is so slow we shouldn't use it and was wondering anyone else has come up with the same conclusion, and example is:
Took 1443ms to do 1000000000 compares non-LINQ.
Took 4944ms to do 1000000000 compares with LINQ.
(243% slower)
the non-LINQ code:
for (int i = 0; i < 10000; i++)
{
foreach (MyLinqTestClass1 item in lst1) //100000 items in the list
{
if (item.Name == "9999")
{
isInGroup = true;
break;
}
}
}
Took 1443ms to do 1000000000 compares non-LINQ.
LINQ code:
for (int i = 0; i < 10000; i++)
isInGroup = lst1.Cast<MyLinqTestClass1>().Any(item => item.Name == "9999");
Took 4944ms to do 1000000000 compares with LINQ.
I guess its possible to optimize the LINQ code but the thought was that its easily to get really slow LINQ code and given that it shouldn't be used. Given that LINQ is slow then it would also follow that PLINQ is slow and NHibernate LINQ would be slow so any kind on LINQ statement should not be used.
Has anyone else found that LINQ is so slow that they wished they had never used it, or am I making a too general conclusion based on benchmarks like this?
Should Linq be avoided because its slow?
No. It should be avoided if it is not fast enough. Slow and not fast enough are not at all the same thing!
Slow is irrelevant to your customers, your management and your stakeholders. Not fast enough is extremely relevant. Never measure how fast something is; that tells you nothing that you can use to base a business decision on. Measure how close to being acceptable to the customer it is. If it is acceptable then stop spending money on making it faster; it's already good enough.
Performance optimization is expensive. Writing code so that it can be read and maintained by others is expensive. Those goals are frequently in opposition to each other, so in order to spend your stakeholder's money responsibly you've got to ensure that you're only spending valuable time and effort doing performance optimizations on things that are not fast enough.
You've found an artificial, unrealistic benchmark situation where LINQ code is slower than some other way of writing the code. I assure you that your customers care not a bit about the speed of your unrealistic benchmark. They only care if the program you're shipping to them is too slow for them. And I assure you, your management cares not a bit about that (if they're competent); they care about how much money you're spending needlessly to make stuff that is fast enough unnoticably faster, and making the code more expensive to read, understand, and maintain in the process.
Why are you using Cast<T>()
? You haven't given us enough code to really judge the benchmark, basically.
Yes, you can use LINQ to write slow code. Guess what? You can write slow non-LINQ code, too.
LINQ greatly aids expressiveness of code dealing with data... and it's not that hard to write code which performs well, so long as you take the time to understand LINQ to start with.
If anyone told me not to use LINQ (especially LINQ to Objects) for perceived reasons of speed I would laugh in their face. If they came up with a specific bottleneck and said, "We can make this faster by not using LINQ in this situation, and here's the evidence" then that's a very different matter.
Maybe I've missed something, but I'm pretty sure your benchmarks are off.
I tested with the following methods:
- The
Any
extension method ("LINQ") - A simple
foreach
loop (your "optimized" method) - Using the
ICollection.Contains
method - The
Any
extension method using an optimized data structure (HashSet<T>
)
Here is the test code:
class Program
{
static void Main(string[] args)
{
var names = Enumerable.Range(1, 10000).Select(i => i.ToString()).ToList();
var namesHash = new HashSet<string>(names);
string testName = "9999";
for (int i = 0; i < 10; i++)
{
Profiler.ReportRunningTimes(new Dictionary<string, Action>()
{
{ "Enumerable.Any", () => ExecuteContains(names, testName, ContainsAny) },
{ "ICollection.Contains", () => ExecuteContains(names, testName, ContainsCollection) },
{ "Foreach Loop", () => ExecuteContains(names, testName, ContainsLoop) },
{ "HashSet", () => ExecuteContains(namesHash, testName, ContainsCollection) }
},
(s, ts) => Console.WriteLine("{0, 20}: {1}", s, ts), 10000);
Console.WriteLine();
}
Console.ReadLine();
}
static bool ContainsAny(ICollection<string> names, string name)
{
return names.Any(s => s == name);
}
static bool ContainsCollection(ICollection<string> names, string name)
{
return names.Contains(name);
}
static bool ContainsLoop(ICollection<string> names, string name)
{
foreach (var currentName in names)
{
if (currentName == name)
return true;
}
return false;
}
static void ExecuteContains(ICollection<string> names, string name,
Func<ICollection<string>, string, bool> containsFunc)
{
if (containsFunc(names, name))
Trace.WriteLine("Found element in list.");
}
}
Don't worry about the internals of the Profiler
class. It just runs the Action
in a loop and uses a Stopwatch
to time it. It also makes sure to call GC.Collect()
before each test to eliminate as much noise as possible.
Here were the results:
Enumerable.Any: 00:00:03.4228475
ICollection.Contains: 00:00:01.5884240
Foreach Loop: 00:00:03.0360391
HashSet: 00:00:00.0016518
Enumerable.Any: 00:00:03.4037930
ICollection.Contains: 00:00:01.5918984
Foreach Loop: 00:00:03.0306881
HashSet: 00:00:00.0010133
Enumerable.Any: 00:00:03.4148203
ICollection.Contains: 00:00:01.5855388
Foreach Loop: 00:00:03.0279685
HashSet: 00:00:00.0010481
Enumerable.Any: 00:00:03.4101247
ICollection.Contains: 00:00:01.5842384
Foreach Loop: 00:00:03.0234608
HashSet: 00:00:00.0010258
Enumerable.Any: 00:00:03.4018359
ICollection.Contains: 00:00:01.5902487
Foreach Loop: 00:00:03.0312421
HashSet: 00:00:00.0010222
The data is very consistent and tells the following story:
Naïvely using the
Any
extension method is about 9% slower than naïvely using aforeach
loop.Using the most appropriate method (
ICollection<string>.Contains
) with an unoptimized data structure (List<string>
) is approximately 50% faster than naïvely using aforeach
loop.Using an optimized data structure (
HashSet<string>
) completely blows any of the other methods out of the water in performance terms.
I have no idea where you got 243% from. My guess is it has something to do with all that casting. If you're using an ArrayList
then not only are you using an unoptimized data structure, you're using a largely obsolete data structure.
I can predict what comes next. "Yeah, I know you can optimize it better, but this was just an example to compare the performance of LINQ vs. non-LINQ."
Yeah, but if you couldn't even be thorough in your example, how can you possibly expect to be this thorough in production code?
The bottom line is this:
How you architect and design your software is exponentially more important than what specific tools you use and when.
If you run into performance bottlenecks - which is every bit as likely to happen with LINQ vs. without - then solve them. Eric's suggestion of automated performance tests is an excellent one; that will help you to identify the problems early so that you can solve them properly - not by shunning an amazing tool that makes you 80% more productive but happens to incur a < 10% performance penalty, but by actually investigating the issue and coming up with a real solution that can boost your performance by a factor of 2, or 10, or 100 or more.
Creating high-performance applications is not about using the right libraries. It's about profiling, making good design choices, and writing good code.
Is LINQ a real-world bottleneck (either effecting the overall or perceived performance of the application)?
Will your application be performing this operation on 1,000,000,000+ records in the real-world? If so--then you might want to consider alternatives--if not then it's like saying "we can't buy this family sedan because it doesn't drive well at 180+ MPH".
If it's "just slow" then that's not a very good reason... by that reasoning you should be writing everything in asm/C/C++, and C# should be off the table for being "too slow".
While premature pessimization is (imho) as bad as premature optimization, you shouldn't rule out an entire technology based on absolute speed without taking usage context into consideration. Yes, if you're doing some really heavy number-crunching and this is a bottleneck, LINQ could be problematic - profile it.
An argument you could use in favour of LINQ is that, while you can probably outperform it with handwritten code, the LINQ version could likely be clearer and easier to maintain - plus, there's the advantage of PLINQ compared to complex manual parallelization.