History of $f \circ g$
Solution 1:
N. Bourbaki used $f\circ g$ with the interpretation $(f\circ g)(x)=f(g(x))$ in 1949 (Fonctions d'une variable réelle).
Looking at the Bourbaki papers, I found this example from 1944 (middle of page 5), with the same interpretation. I haven't found any older examples, although I haven't tried very hard. (Van der Waerden does not use this notation in his Moderne Algebra from 1930.)
It is certainly conceivable that the notation $f\circ g$ was invented by someone from the Bourbaki group. They were certainly very occupied with good notation, and André Weil introduced the modern symbol for the empty set in 1939 to be able to distinguish between $\emptyset$ and $0$. This notation for composition could have appeared from a similar discussion about $f(g(x))$ and $f(x)g(x)$.
Solution 2:
In dealing with categories and groupoids it is natural to write the composition of
$$A \xrightarrow{f} B \xrightarrow{g} C $$ as $$ A \xrightarrow{fg} C.$$ This convention is used in the book
Higgins, P.J., Categories and Groupoids, Van Nostrand Reinhold Mathematical studies 32, Van Nostrand Reinhold, London, (1971); Reprints in Theory and Applications of Categories, No. 7 (2005) pp 1--195. (downloadable)
and might be called the "algebraist's convention". I have found it very useful in dealing with double and higher categories and groupoids. It involves writing functions on the right as $(x)f$ as mentioned in other answers. This goes against the grain of course in dealing with the functions sin and log !
These ideas and notations, for example $x \mapsto x^2 +1$, evolved through trying to clarify the notion of function, and eventually finding both the domain and codomain were important, leading to a function being $f: A \to B$, with domain $A$ and codomain $B$. This arrow notation is one of the impetuses behind the notion of category. A further complication is that ordinary real number analysis and calculus is largely about partial functions $\mathbb R \to \mathbb R$, i.e. functions whose domain of definition is a subset of $\mathbb R$.
Solution 3:
I have never seen "$(f\circ g)(x)=g(f(x))$" in a math paper or book. Looking at a few of the results you quote from Google, the only academic papers I find there are either in computer science or engineering.
Maybe the confusion arises from the following fact: in Spanish (and maybe in French?) which is the language I took my undergrad classes in, one reads "$f\circ g$" as "$g$ composed with $f$". I remember this was a source of confusion for many students.