Why does loop tiling decrease running time of a function? [duplicate]

Can someone please tell if the 2 optimization techniques are same or different?

Also, is it responsibility of programmer or compiler to do it?


The two techniques are different. See descriptions for Loop unrolling and Loop tiling.

Loop unrolling is done to eliminate the overhead of looping. It is (usually) only useful for fairly small loops where the number of iterations is small and is known at compile time. It is mostly done by the compiler.

In older times when computers were slower and compilers were more primitive, programmers would do manual loop unrolling but now it would be unusual for a programmer to do it -- except possibly for a very restrictive embedded system.

Loop tiling is commonly done with very large data sets. The object is: to load some data into cache memory and perform all operations on it before paging in some new data.

Depending on the operations being performed and the internal organisation of the data, a simple loop might jump about into different data pages causing a lot of cache misses (and page loads). Careful planning of the order of execution can significantly improve run-times for certain problems.

While it is likely that a compiler might perform loop tiling, there are times when the programmer might do so manually and possibly do a better job than the compiler.

In general, don't try to do these types of optimisation as they add a lot of complexity (and bugs) to the code and usually provide only modest performance gains. However if your code is slow and profiling indicates particular types of bottlenecks, then something like loop tiling should be considered and may lead to large performance gains.


These are two totally different performance optimisations.

Loop unrolling is a code optimisation where code is replicated within a loop and the total number of loop iterations is reduced. The benefit is reduced loop overhead (normally only relevant for very small loops), and better instruction scheduling with reduced dependency stalls in superscalar CPUs. This can be done both manually and/or as a compiler optimisation.

Tiling is a memory optimisation which aims to make better use of cache by processing tiles (small blocks within a larger data structure), typically in the context of an image or other 2D data structure. This is normally implemented at the source code level, as part of the overall design of an algorithm implementation.