surjective immersion $\mathbb{R}^2 \to \mathbb{R}^2$ which is not a diffeomorphism

Here is an explicit construction. Start with the map $f(z)=z^2+1$. It maps complex plane onto itself but has critical point at zero. Let's remove zero from the domain. The map is now a surjective immersion but the domain is not simply connected. Thus, remove a ray starting from the origin from the domain. Say, the ray consisting of all nonpositive real numbers. The resulting map is still onto. Now, precompose this map with a diffeomorphism from the complex plane to the complex plane minus the ray. That's your example. Now, try to fund a similar map onto the entire Riemann sphere.

Edit 1: The example I wrote above did not work since $f^{-1}(1)=0$. Here is a correct example. Consider the function $f(z)=z^2(z-1)$. It has critical points at $z=0$ and $z=2/3$. Now, remove from the domain the real interval $[0, 2/3]$ as well as the ray $\{it: t\ge 0\}$ in the imaginary axis. Call the resulting domain $D$. One then verifies that $f(D)={\mathbb C}$ (it is not hard, just a bit tedious). The domain $D$ is simply-connected, hence, take a diffeomorphism $g: {\mathbb C}\to D$ and consider the composition $h=f\circ g: {\mathbb C}\to {\mathbb C}$. This map is surjective, locally diffeomorphic but not injective. Personally, I prefer a pictorial description to such explicit maps, such description is given in my comments above.

Another personal remark: Once, I almost convinced myself that a locally injective holomorphic map $h$ of the unit disk $D$ onto a simply-connected domain $G$ has to be injective. Here is the bogus argument I had: Consider the multivalued holomorphic map $f=h^{-1}: G\to D$. Since the domain of this map is simply-connected and the map is locally well-defined (here we use the assumption that $h$ has no critical points), the monodromy principle implies that $f$ has a well-defined branch $g$, which will then be inverse to $h$. Hence, $h: D\to G$ is invertible. The mistake in this argument was that the multivalued map $f$ is not obtained via analytic continuation along paths; therefore, the monodromy principle does not apply to it.

One more thing: It would be interesting to give an example of an entire function $f: {\mathbb C}\to {\mathbb C}$ which has no critical points, is surjective and not injective.

Edit 2: See this post at mathoverflow for the proof that the entire function $f: {\mathbb C}\to {\mathbb C}$, $$ f(z)= z\int_0^z e^{h(w)}dw,$$ where $$ h(w)= (e^w -1)/w, $$
has no critical points and is surjective.