Fitting points to curve $g(t) = \frac{100}{1+\alpha e^{-\beta t}}$ by thinking about projections and inner products

This is a reinterpretation of my old question Fit data to function $g(t) = \frac{100}{1+\alpha e^{-\beta t}}$ by using least squares method (projection/orthogonal families of polynomials). I need to understand things in terms of orthogonal projections and inner products and the answers were for common regression techniques.

t --- 0 1 2 3 4 5 6

F(t) 10 15 23 33 45 58 69

Adjust $F$ by a function of the type $$g(t) = \frac{100}{1+\alpha e^{-\beta t}}$$ by the discrete least squares method

First of all, we cannot work with the function $g(t)$ as it is. The way I'm trying to see the problem is via projections.

So let's try to transform the problem like this:

$$\frac{100}{g(t)}-1 = \alpha e^{-\beta t}\implies \ln \left(\frac{100}{g(t)}-1\right) = \ln \alpha -\beta t$$

Since we want to fit the function to the points, we want to minimize the distance of the function from the set of points, that is:

$$\min_{\alpha,\beta} \left(\ln\left(\frac{100}{g(t)}-1\right)-\ln\alpha + \beta t\right)$$

Without using derivative and equating things to $0$, there's a way to see this problem as an orthogonal projection problem.

I know I need to end up with something like this:

$$\langle \ln\left(\frac{100}{g(t)}-1\right)-\ln\alpha + \beta t, 1\rangle = 0\\ \langle \ln\left(\frac{100}{g(t)}-1\right)-\ln\alpha + \beta t, t\rangle=0$$

And I know this comes from the knowledge that our minimum is related to some projection and this projection lives in a space where the inner product with $span\{1, t\}$ (because of $\ln\alpha,\beta t$), gives $0$.

In order to end up with

$$\begin{bmatrix} \langle 1,1\rangle & \langle t,1\rangle \\ \langle 1,t\rangle & \langle t,t\rangle \\ \end{bmatrix} \begin{bmatrix} \ln \alpha \\ -\beta \\ \end{bmatrix}= \begin{bmatrix} \langle \ln\left(\frac{100}{g(t)}-1\right) , 1\rangle \\ \langle \ln\left(\frac{100}{g(t)}-1\right) , t\rangle \\ \end{bmatrix}$$

Where the inner product is

$$\langle f,g\rangle = \sum f_i g_i $$

*why?

Can someone tell me what reasoning gets me to the inner products above, if I did everything rigth and how to finish the exercise?


$\color{brown}{\textbf{Via linear model}}$

Let $$h(t) = \ln\left(\dfrac{100}{g(t)}-1\right),\tag1$$ then the data table is \begin{vmatrix} i & 1 & 2 & 3 & 4 & 5 & 6 & 7\\ t_i & 0 & 1 & 2 & 3 & 4 & 5 & 6\\ g_i & 10 & 15 & 23 & 33 & 45 & 58 & 69\\ h_i & 2.197225 & 1.734631 & 1.208311 & 0.708185 & 0.200671 & -0.322773 & -0.800119\\ h(t_i) & 2.215988 & 1.711902 & 1.207816 & 0.703730 & 0.199644 & -0.304442 & -0.808528\\ g(t_i) & 9.83239 & 15.29172 & 23.00877 & 33.09858 & 45.02541 & 57.55280 & 69.17958\\ r(t_i) & 0.16761 & -0.29172 & -0.00877 & -0.09858 & -0.02541 & 0.44720 & -0.17958\\ g_1(t_i) & 9.83245 & 15.29853 & 23.02728 & 33.13320 & 45.07696 & 57.61634 & 69.2460\\ \tag2 \end{vmatrix}

The task is to estimate parameters of the function $h(t)$ in the form of $$h(t) = \ln\alpha + \beta_* t.\tag 3$$

The least squares method provides minimization of the discrepancy function $$d_h(\alpha,\beta_*) = \sum\limits_{i=1}^7 (\ln\alpha - \beta t_i - h_i)^2\tag 4$$ as the function of the parameters $\alpha$ and $\beta.$

The minimum of the quadratic function achieves in the single stationary point, which can be defined fro the system $(d_h)'_{ln\alpha} = (d_h)'_{\beta*}= 0,$ or \begin{cases} 2\sum\limits_{i=1}^7 (\ln\alpha + \beta* t_i - h_i) = 0\\ 2\sum\limits_{i=1}^7 (\ln\alpha \beta* t_i - h_i)T_I = 0.\tag5. \end{cases}

The system $(5)$ can be presented in the form of \begin{cases} 7\ln\alpha + a_1 \beta* = b_0\\ a_1\ln\alpha + a_2 \beta* = b_1, \end{cases} where $$a_1 = \sum\limits_{i=1}^7 t_1 = 21,\quad a_2 = \sum\limits_{i=1}^7 t_1^2 = 91,$$ $$b_1 = \sum\limits_{i=1}^7 h_1 = 4.926100,\quad b_2 = \sum\limits_{i=1}^7 h_1 = 0.663879.$$ The discriminants are $$\Delta = \begin{vmatrix}7 & 21 \\ 21 & 91\end{vmatrix} = 196,$$ $$\Delta_1 = \begin{vmatrix}4.9261 & 21 \\ 0.663879 & 91\end{vmatrix} \approx 434.33364,$$ $$\Delta_2 = \begin{vmatrix} 7 & 4.926 \\ 21 &0.663879 \end{vmatrix} \approx -98.80095.$$

Then $$\alpha = e^{\large \frac{\Delta_1}\Delta} \approx 9.170465,\quad \beta = -\dfrac{\Delta_2}\Delta \approx 0.504086,$$ $$d_h(\alpha, \beta) \approx 0.001295,\quad d_g(\alpha, \beta)\approx 0.355863.$$

Results of the calculations, which are shown in the table $(2),$ confirm obtained parameters values.

$\color{brown}{\textbf{Orthogonal projections approach}}$

The method of orthogonal projections is used to solve problems of large dimension. The essence of the method for the source data is that the parameters of the linear model are calculated one by one.

The already selected dependences should be subtracted.

In the given case, the data after first stage has not essential correlations. Linear approximation of the difference $r_i = g_i - g(t_i)$ in the form of $$r_i = -0.043425+0.014987 t$$ gives $d_r = 0.349557$.

$\color{brown}{\textbf{Via the gradient descent.}}$

Obtained solution via linear model is not optimal for the discrepancy in the form of $$d_g(\alpha,\beta)=\sum\limits_{i=1}^7\left(\dfrac{100}{1+\alpha e^{-\beta t_i}} - g_i\right)^2.$$

To verify the orthogonal projections approach, can be used the gradient descent method.

Really, the gradient is $$\binom uv = \left(\begin{matrix} \dfrac {\partial d_*}{\partial \alpha}\\[4pt] \dfrac{\partial d_*}{\partial \beta}\end{matrix}\right) = 200\left(\begin{matrix} -\sum\limits_{i=1}^7 \dfrac{e^{-\beta t_i}}{\left(1+\alpha e^{-\beta t_i}\right)^2} \left(\dfrac{100}{1+\alpha e^{-\beta t_i}} - g_i\right)\\[4pt] \sum\limits_{i=1}^7 \dfrac{t_ie^{-\beta t_i}}{\left(1+\alpha e^{-\beta t_i}\right)^2} \left(\dfrac{100}{1+\alpha e^{-\beta t_i}} - g_i\right) \end{matrix}\right),$$ $$\binom uv =\frac1{50}\left(\begin{matrix} \sum\limits_{i=1}^7 e^{-\beta t_i}g^2(t_i)r_i \\[4pt] -\sum\limits_{i=1}^7 t_i e^{-\beta t_i}g^2(t_i)r_i \end{matrix}\right) =\binom{0,26390}{-2.32907}\not=\binom00.$$

Optimization get for the difference $\Delta d_r = -0.000223$ gives $$\binom{\alpha_1}{\beta_1} = \binom{\alpha}{\beta} +\binom{\Delta\alpha}{\Delta\beta} = \binom\alpha\beta + \Delta d_r\binom uv\approx\binom{9,170406} {0,504605}.$$ Then $$d_g(\alpha_1,\beta_1) \approx 0,349343,\quad \operatorname{grad} d_g(\alpha_1,\beta_1) = \dbinom{-0,036480}{-0,081239}.$$

The data in the table $(2)$ confirm the same estimation accuracy.


Linear regression is linear algebra in disguise.

You are searching for a function $$l(t)= c_1 +c_2t$$ (where in your case $c_1= \ln \alpha$ and $c_2=-\beta$), that is a linear combination of functions $v_1(t)=1$ and $v_2(t)=t$. Your goal is to minimize $$e(l,h)=\sum (l(t_i)-h(t_i))^2$$ (where in your case $h(t)=\ln \left(\frac{100}{g(t)}-1 \right)$).

The "sum of squares" formula is suggestive of Pythagoras theorem/norm on some vector space. We want to view $e(l,h)$ as a square of distance on, say, the vector $F$ space of functions $f: \mathbb{R}\to\mathbb{R}$, coming from the dot product

$$<f,g>=\sum_i f(t_i) g(t_i)$$

(Recall that square distance between two vectors in a vector space with a dot product is $d(u,v)^2=<u-v, u-v>$, so we recover $e=d^2$ from the dot product above.)

A slight problem is that on this vector space of functions $F$ the "distance" $d(l,h)=\sqrt{e(l,h)}$ is not really a distance, since it vanishes as soon as $l(t_i)=h(t_i)$ for all $i$ (in math-speak we get only a pseudometric, not a metric). We can either ignore this, or use the standard solution which is to work on the quoutient space $V=F/F_0$ of functions modulo subspace $F_0=\{f: \mathbb{R}\to\mathbb{R}| f(t_i)=0\}$ -- the ones that are "distance zero from the origin". This has an advantage that $V$ is now a finite dimensional vector space (of dimension equal to the number of data points), so we can be more confident using standard linear algebra. Note that $V$ has the dot product $<f,g>=\sum_i f(t_i) g(t_i)$.

In any case, we are now looking for a function $l(t)= c_1 +c_2t$ that is closest to $h(t)$ in the sense of the Euclidean distance $d$, that is a point in subspace spanned by $1, t$ (in $F$, or more precisely by their equivalence classes in $V$). We can forget all the complicated setup, and just think: given a point $h$ and a plane spanned by two vectors, how do we find a point $l$ in the plane closest to $h$? Of course we must project $h$ onto the plane! That is, $l$ must be such that $h-l$ is orthogonal to the plane, meaning orthogonal to both spanning vectors. Thus, we are looking for $l=c_1+tc_2$ such that $<h-l, 1>=0$ and $<h-l, t>=0$ (where the dot product is still $<f,g>=\sum_i f(t_i) g(t_i)$). These are the equations in your question.

Now you just need to solve them. To do so, plug in $l=c_1+c_2 t$ and rewrite the equations as

$<h,1>=c_1<1,1>+c_2<1,t>$

$<h,t>=c_1<1,t>+c_2<t,t>$

This is a linear system with 2 equations and 2 unknowns, which you can write as the matrix equation -- the one you have in the question.

To finish the exercise just compute all the dot products (for example in your case $<1,1>=\sum_i 1 \cdot 1=7$, $<1,t>=\sum_i 1 \cdot i=0+1+\ldots+6=21$, $<t,t>=91$, $<h, 1>=\sum_{i=0}^6 h(i)$, $<h, t>=\sum_{i=0}^6 h(i) \cdot i$) and solve the 2 by 2 linear system by whatever method you like (Gaussian elimination, or multiplying by $\begin{bmatrix}7&21\\21&91\end{bmatrix}^{-1}=\frac{1}{196}\begin{bmatrix}91&-21\\-21&7\end{bmatrix}$, or even the Cramer's rule that Yuri used in another answer). You will get $c_1= \ln \alpha$ and $c_2=-\beta$, and hence can solve for $\alpha$ and $\beta$ as well.