Lipschitz Smoothness, Strong Convexity and the Hessian
Solution 1:
I assume that your question concerns convex functions only; without convexity much of it would be false.
Question 2: strictly speaking, being Lipschitz smooth ($C^{1,1}$) does not imply $\nabla^2 f$ exists. But the statement is true if we interpret $\nabla^2 f\preceq LI$ as holding almost everywhere. Indeed, $\nabla^2 f$ is a positive semidefinite matrix, so having $\nabla^2 f\preceq LI$ a.e. is equivalent to $\nabla^2 f\in L^\infty$. And it is well known that having $L^\infty$ derivative is equivalent to being Lipschitz; thus $$\nabla^2 f\in L^\infty \iff \nabla f\in C^{0,1} \iff f\in C^{1,1}$$
Question 3: You misremembered. The correct inequality characterizing $\alpha$-strong convexity is $$f(x+y) \ge f(x) + y^\top\nabla f(x) + \frac{\alpha}{2} \| x - y \|^2 \tag{1}$$ Indeed, (1) is equivalent to saying that the function $g(x)=f(x)-\frac{\alpha}{2} \| x \|^2$ is convex. The latter is equivalent to $\nabla^2 g\succeq 0$, which is $\nabla^2 f\succeq \alpha\, I$.
Question 4. Yes, there is a direct and important relation: a function is strongly convex if and only if its convex conjugate (a.k.a. Legendre-Fenchel transform) is Lipschitz smooth. Indeed, the gradients maps are inverses of each other, which implies that the Hessian of convex conjugate of $f$ is the inverse of the Hessian of $f$ (at an appropriate point). So, a uniform upper bound on $\nabla^2 f$ is equivalent to a uniform lower bound on $\nabla^2 (f^{*}) $, and vice versa. One can also argue without referring to the Hessian (which may fail to exist at some points): the Lipschitz smoothness of $f$, by your item 1, gives us at every $x_0$ a quadratic function $q$ so that $q(x_0)=f(x_0)$ and $f \le q$ everywhere. Taking convex conjugate reverses the order: $q^*\le f^*$; and this means that $f^*$ is strongly convex.
Question 1. The converse is true, but the only proof I see goes through the convex conjugate as described in Q4. Since strong convexity is characterized by the comparison property (1), taking the conjugate gives a matching characterization of Lipschitz smoothness.
Reference: Chapter 5 of Convex functions by Jonathan M. Borwein and Jon D. Vanderwerff.
Solution 2:
Regarding the references, Nesterov's "Introductory Lectures on Convex Optimization" (ISBN: 978-1-4613-4691-3) answers at least three out of the four questions asked here. Specific equations/theorems:
Question 1: See equation (1.2.12) and substitute "y" for "x+y"
Question 2: Theorem 2.1.6. with adjoining proof.
Question 3: As mentioned in the previous answer, the equation was slightly wrong in the original question. But if corrected, this is proven in Theorem 2.1.11.