What's wrong with solving the least-squares problem here? (Simple question)
A naive question that puzzles me a lot:
I have $n$ two-dimentional data points $(z_i,w_i)_{i=1}^{n}$ and I want to regress $(z_i)$ on $(w_i)$ by solving the standard least-squares problem: $$ \min_{m,\beta\in \mathbb{R}} \sum_{i=1}^{n} \left(z_i - m - \beta w_i\right)^2 $$ This is a convex optimization problem in $(m,\beta)$, so the optimal solution is given by solving $\frac{\partial f(m,\beta)}{\partial m} = 0$ and $\frac{\partial f(m,\beta)}{\partial \beta} = 0$, with $f(m,\beta)$ being the objective function of the least-squares problem. But this gives the optimal solution $$\beta^* = \frac{\sum_i w_i(z_i - \hat{z})}{\sum_i w_i(w_i - \hat{w})},$$ with $\hat{z}$ and $\hat{w}$ being the empirical mean of $(z_i)$ and $(w_i)$, respectively. The correct answer should be $$\beta^* = \frac{\sum_i (w_i - \hat{w})(z_i - \hat{z})}{\sum_i (w_i - \hat{w})^2}.$$ I wonder shat is wrong here?
Note that in general, $\sum_i (x_i - \hat x) = 0$. Therefore $\sum_i c(x_i - \hat x) = 0$ for any real number $c$. Thus, for the numerator, we have
\begin{align} \sum_i w_i(z_i - \hat z) &= \sum_i w_i(z_i - \hat z) - \sum_i \hat w(z_i - \hat z)\\ &= \sum_i (w_i - \hat w)(z_i - \hat z). \end{align}
The denominator can be found in the same way.
Tricks involving adding zero, and recognizing when certain sums are equal to zero, come up repeatedly in statistics.