O(n log n) vs O(n) -- practical differences in time complexity
Solution 1:
The main purpose of the Big-O notation is to let you do the estimates like the ones you did in your post, and decide for yourself if spending your effort coding a typically more advanced algorithm is worth the additional CPU cycles that you are going to buy with that code improvement. Depending on the circumstances, you may get a different answer, even when your data set is relatively small:
- If you are running on a mobile device, and the algorithm represents a significant portion of the execution time, cutting down the use of CPU translates into extending the battery life
- If you are running in an all-or-nothing competitive environment, such as a high-frequency trading system, a micro-optimization may differentiate between making money and losing money
- When your profiling shows that the algorithm in question dominates the execution time in a server environment, switching to a faster algorithm may improve performance for all your clients.
Another thing the Big-O notation hides is the constant multiplication factor. For example, Quick Select has very reasonable multiplier, making the time savings from employing it on extremely large data sets well worth the trouble of implementing it.
Another thing that you need to keep in mind is the space complexity. Very often, algorithms with O(N*Log N)
time complexity would have an O(Log N)
space complexity. This may present a problem for extremely large data sets, for example when a recursive function runs on a system with a limited stack capacity.
Solution 2:
It depends.
I was working at amazon, there was a method, which was doing linear search on a list. We could use a Hashtable and do the look up in O(1) compared to O(n).
I suggested the change, and it wasn't approved. because the input was small, it wouldn't really make a huge difference.
However, if the input is large, then it would make a difference.
In another company, where the data/input was huge, using a Tree, Compared to List made a huge difference. So it depends on the data and architecture of the application.
It is always good to know your options and how you can optimize.
Solution 3:
There are times when you will work with billions of elements (and more), where that difference will certainly be significant.
There are other times when you will be working with less than a thousand elements, in which case the difference probably won't be all that significant.
If you have a decent idea what your data will look like, you should have a decent idea which one to pick from the start, but the difference between O(n) and O(n log n) is small enough that it's probably best to start off with whichever one is simplest, benchmark it and only try to improve it if you see it's too slow.
However, note that O(n) may actually be slower than O(n log n) for any given value of n (especially, but not necessarily, for small values of n) because of the constant factors involved, since big-O ignores those (it only considers what happens when n tends to infinity), so, if you're looking purely at the time complexity, what you think may be an improvement may actually slow things down.
Solution 4:
Darth Vader is correct. It always depends. Its also important to rememeber that complexities are asymptotic, worst-case (usually) and that constants are dropped. Each of these is important to consider.
So you could have two algorithms, one of which is O(n) and one of which is O(nlogn), and for every value up to the number of atoms in the universe and beyond (to some finite value of n), the O(nlogn) algorithm outperforms the O(n) algorithm. It could be because lower order terms are dominating, or it could be because in the average case, the O(nlogn) algorithm is actually O(n), or because the actual number of steps is something like 5,000,000n vs 3nlogn.