Insertion sort vs Bubble Sort Algorithms

I'm trying to understand a few sorting algorithms, but I'm struggling to see the difference in the bubble sort and insertion sort algorithm.

I know both are O(n2), but it seems to me that bubble sort just bubbles the maximum value of the array to the top for each pass, while insertion sort just sinks the lowest value to the bottom each pass. Aren't they doing the exact same thing but in different directions?

For insertion sort, the number of comparisons/potential swaps starts at zero and increases each time (ie 0, 1, 2, 3, 4, ..., n) but for bubble sort this same behaviour happens, but at the end of the sorting (ie n, n-1, n-2, ... 0) because bubble sort no longer needs to compare with the last elements as they are sorted.

For all this though, it seems a consensus that insertion sort is better in general. Can anyone tell me why?

Edit: I'm primarily interested in the differences in how the algorithms work, not so much their efficiency or asymptotic complexity.


Insertion Sort

After i iterations the first i elements are ordered.

In each iteration the next element is bubbled through the sorted section until it reaches the right spot:

sorted  | unsorted
1 3 5 8 | 4 6 7 9 2
1 3 4 5 8 | 6 7 9 2

The 4 is bubbled into the sorted section

Pseudocode:

for i in 1 to n
    for j in i downto 2
        if array[j - 1] > array[j]
            swap(array[j - 1], array[j])
        else
            break

Bubble Sort

After i iterations the last i elements are the biggest, and ordered.

In each iteration, sift through the unsorted section to find the maximum.

unsorted  | biggest
3 1 5 4 2 | 6 7 8 9
1 3 4 2 | 5 6 7 8 9

The 5 is bubbled out of the unsorted section

Pseudocode:

for i in 1 to n
    for j in 1 to n - i
         if array[j] > array[j + 1]
             swap(array[j], array[j + 1])

Note that typical implementations terminate early if no swaps are made during one of the iterations of the outer loop (since that means the array is sorted).

Difference

In insertion sort elements are bubbled into the sorted section, while in bubble sort the maximums are bubbled out of the unsorted section.


In bubble sort in ith iteration you have n-i-1 inner iterations (n^2)/2 total, but in insertion sort you have maximum i iterations on i'th step, but i/2 on average, as you can stop inner loop earlier, after you found correct position for the current element. So you have (sum from 0 to n) / 2 which is (n^2) / 4 total;

That's why insertion sort is faster than bubble sort.


Another difference, I didn't see here:

Bubble sort has 3 value assignments per swap: you have to build a temporary variable first to save the value you want to push forward(no.1), than you have to write the other swap-variable into the spot you just saved the value of(no.2) and then you have to write your temporary variable in the spot other spot(no.3). You have to do that for each spot - you want to go forward - to sort your variable to the correct spot.

With insertion sort you put your variable to sort in a temporary variable and then put all variables in front of that spot 1 spot backwards, as long as you reach the correct spot for your variable. That makes 1 value assignement per spot. In the end you write your temp-variable into the the spot.

That makes far less value assignements, too.

This isn't the strongest speed-benefit, but i think it can be mentioned.

I hope, I expressed myself understandable, if not, sorry, I'm not a nativ Britain


The main advantage of insert sort is that it's online algorithm. You don't have to have all the values at start. This could be useful, when dealing with data coming from network, or some sensor.

I have a feeling, that this would be faster than other conventional n log(n) algorithms. Because the complexity would be n*(n log(n)) e.g. reading/storing each value from stream (O(n)) and then sorting all the values (O(n log(n))) resulting in O(n^2 log(n))

On the contrary using Insert Sort needs O(n) for reading values from the stream and O(n) to put the value to the correct place, thus it's O(n^2) only. Other advantage is, that you don't need buffers for storing values, you sort them in the final destination.