What's wrong with solving the least-squares problem here? (Simple question)

A naive question that puzzles me a lot:

I have $n$ two-dimentional data points $(z_i,w_i)_{i=1}^{n}$ and I want to regress $(z_i)$ on $(w_i)$ by solving the standard least-squares problem: $$ \min_{m,\beta\in \mathbb{R}} \sum_{i=1}^{n} \left(z_i - m - \beta w_i\right)^2 $$ This is a convex optimization problem in $(m,\beta)$, so the optimal solution is given by solving $\frac{\partial f(m,\beta)}{\partial m} = 0$ and $\frac{\partial f(m,\beta)}{\partial \beta} = 0$, with $f(m,\beta)$ being the objective function of the least-squares problem. But this gives the optimal solution $$\beta^* = \frac{\sum_i w_i(z_i - \hat{z})}{\sum_i w_i(w_i - \hat{w})},$$ with $\hat{z}$ and $\hat{w}$ being the empirical mean of $(z_i)$ and $(w_i)$, respectively. The correct answer should be $$\beta^* = \frac{\sum_i (w_i - \hat{w})(z_i - \hat{z})}{\sum_i (w_i - \hat{w})^2}.$$ I wonder shat is wrong here?

Note that in general, $\sum_i (x_i - \hat x) = 0$. Therefore $\sum_i c(x_i - \hat x) = 0$ for any real number $c$. Thus, for the numerator, we have

\begin{align} \sum_i w_i(z_i - \hat z) &= \sum_i w_i(z_i - \hat z) - \sum_i \hat w(z_i - \hat z)\\ &= \sum_i (w_i - \hat w)(z_i - \hat z). \end{align}

The denominator can be found in the same way.

Tricks involving adding zero, and recognizing when certain sums are equal to zero, come up repeatedly in statistics.

A question about Central Limit Theorem and the calculation of a limit.

Find the biggest natural number $n$ so there's an array of $n$ real numbers such that the sum of every three consecutive numbers is negative and...

Why is my intuition about the Jacobian wrong?

What parametrization should I use to evaluate $\int_{\phi}x^{4/3} + y^{4/3}$, where $\phi$ is curve given by $(x^2+y^2)^2 = 9(x^2-y^2)$?

$f$ is increasing if and only if $f_{-}'\leq f_{+}'$

Find the surface area of paraboloid $z=x^2+y^2$, for $0\leq z\leq2$

Does there exists any non trivial linear metric space in which every open ball is not convex?

Finding vectors orthonormal to a given vector set and the Gram-Schmidt process

Beginner feedback on real analysis proof

Let $B$ be a countable set and let $f$ be a surjective function ($f:A\to B$), then $A$ is also countable