Argument for O(1) average-case complexity of heap insertion
There is a much better reference for the claim that average time heap insertion is O(1): the 1991 paper "Average Case Analysis of Heap Building by Repeated Insertion" by Hayward & McDiarmid. (This paper is linked in what is currently reference 4 of the Wikipedia article.) That paper in turn references a 1975 paper by Porter & Simon, "Random insertion into a priority queue structure" which deals with a single insertion into a heap, and demonstrates that the average case is O(1).
Intuitively, the argument is simple. Half of the heap is a leaf, and the leaves tend to be larger. If we assume for a moment that leaves are the largest elements in the heap (rather than tending to be larger), then we could say that the probability that a new element will be a leaf -- i.e., that it is in the upper half of the range of values -- would be exactly 0.5. If the new element is not a leaf of the heap (also probability 0.5), we can repeat the process with the truncated heap consisting only of non-leaf nodes in the original heap, so the probability that the new element is at the second-lowest level will be half of what remains: 0.25. And thus the probability that it is at the third level would be 0.125, and so on. Then the expected number of levels we would have to search through would be 1*0.5 + 2*0.25 + 3*0.125..., which is 2.
Of course, the probability that the random new element is greater than a random second-level parent is not really 0.5; it's actually a little less. But, as long as it is bounded by a constant, the sum of the power series which computes the expected number of comparisons will still also be bounded by a constant. And it turns out that the constant is around 2.6.
Also see this useful answer that, while discussing complexities of heaps, and contrasting them with those of BST, gives a detailed graphical analysis of the constant average insertion time in heaps.