Given a data set, how do you do a sinusoidal regression on paper? What are the equations, algorithms?

Gauss-Newton algorithm directly deals with this type of problems. Given m data points $(x_i,y_i)$ for regression with a function of n parameters $\vec \beta =(\beta_1,...,\beta_n)$ $$min_{\vec \beta }\ S(\vec \beta)\ where\ S(\vec \beta)=\sum_{i=1}^m r_i(\vec \beta)^2=(y_i-f(\vec \beta,x_i))^2$$ I skip the derivation of algorithm which you can find in every textbook (First use Taylor approximation and then use Newton's method). $$\Delta \vec \beta=\big(J^T\ J\big)^{-1}\ J^T\ \vec r$$ $$\vec \beta=\vec \beta + \alpha\ \Delta \beta$$ where $\alpha$ is damping coefficient and $$J=\begin{pmatrix}\bigg(\frac{\partial f}{\partial \beta_1}\bigg)_{x=x_1}&...&\bigg(\frac{\partial f}{\partial \beta_n}\bigg)_{x=x_1}\\\ ...&...&...\\\ \bigg(\frac{\partial f}{\partial \beta_1}\bigg)_{x=x_m}&...&\bigg(\frac{\partial f}{\partial \beta_n}\bigg)_{x=x_m}\end{pmatrix}\quad \vec r=\begin{pmatrix}y_1-f(\vec \beta,x_1)\\\ ... \\\ y_m-f(\vec \beta,x_m) \end{pmatrix}$$ For your specific case $$\frac{\partial f}{\partial A}=sin(Bx_i+C)$$ $$\frac{\partial f}{\partial B}=Ax_icos(Bx_i+C)$$ $$\frac{\partial f}{\partial C}=Acos(Bx_i+C)$$ $$\frac{\partial f}{\partial D}=1$$

In Matlab I generated 60 uniformly distributed random sample points. I used these points to calculate a known sin-curve such as $$y=0.5sin(1.2x+0.3)+0.6$$ I added error terms N(0,0.2) to each point. My initial guess was $A=0.1\ B=0.5\ C=0.9\ D=0.1$ and I set the damping coeff to 0.01. The algorithm determines below approximation equation after 407 iterations $$\hat y=0.497sin(1.178x+0.352)+0.580$$ Below you can see the graph (black - original curve $y(x)$, red - curve with error terms $y(x)+\epsilon (x)$, blue - approximation curve $\hat y(x)$)

Results


Let's look at how regression works.

Typically, you pick a target function for your regression curve (e.g. a line, a parabola, a logarithmic curve, etc.), and you develop some notion of error.

You have unknown parameters in your curve (e.g. polynomial coefficients), and you compute the values of these parameters such that your error functional is minimized.

For simple linear regression, this is usually just a least-squares problem. For polynomials, you can use a Vandermonde matrix and solve an equivalent linear system no problem. But polynomials are easy: these are just linear combinations of successive powers of your independent variable measurements. So what do we do when we want to fit, say, $\sin (kt + b)$ to our data?

Well, it depends. Let's say you have a reasonable belief that your data fits a sinusoidal curve nicely. You can compute $\sin^{-1} y_i$ for all your measurements (with appropriate re-scaling of $y_i$ to the domain of $\sin^{-1}$, and then perform a linear regression on $kt_i + b = \sin^{-1} y_i$.

Of course, inverse trig functions like to behave badly, so this might not work.

Instead, you could think back to calculus, and recall that $\sin$ is a continuous, and indeed, differentiable function. A differentiable function is well-approximated by a linear function at some point. So, you could compute the derivative of your function, and assume that your data is locally approximated by many linear functions.

Another way is to note that $\sin$ has a Taylor series expansion, and that sufficient terms should give you a polynomial that is pretty close to $\sin$ in some domain. So you could perform an $n$-term Taylor series expansion and do a regular linear polynomial regression on the result.

If you think your function is a series of sines, you could write a Fourier series expansion, and perform a least squares fit on the Fourier series coefficients.

If that fails, you could configure a neural network to give you the result as a series of sines. (Actually, this might not work without pre-multiplying each sin term by, say, a Gaussian, or a top-hat function).

You could use an evolutionary algorithm to perform a stochastic search in a parameter space, and use the cost function as your survival criterion.

Finally, you could employ any number of non-linear least squares algorithms to estimate the parameters of your fit. Levenberg-Marquardt is a commonly-used algorithm for these things. Effectively, this is the same as doing a local linearization of the regression function. It's a gradient-based search.

Essentially, all regression problems are search problems: one searches for parameters that shape the target function in the most optimal way. Consequently, any search algorithm will work, but not all will work well. Unfortunately, there is no nice, elementary closed form answer for computing these parameters, as there is with simple linear regression. So unless you can transform your data in some clever way, you're likely not going to be able to do it with pencil and paper.


Um.

If you have a bunch of points $(x_i,y_i)$ and you want to fit the best sinusoidal $A \sin (Bx +C),$ what you do is define a squared error $$E = \sum_{i=1}^n \left( y_i - A \sin (B x_i + C) \right)^2.$$ You then simultaneously solve three equations for the triple $(A,B,C),$ $$ \frac{\partial E}{\partial A} = 0, $$ $$ \frac{\partial E}{\partial B} = 0, $$ $$ \frac{\partial E}{\partial C} = 0. $$

I will need to think about whether there is a closed form solution for the three. As noted, in the linear case there is a well known closed form solution. If not, well, I will leave this here, it illustrates one approach.


As far as I know there is an exact methods for solution. First you have to transform $y$ such as $$y=A\sin(Bx+C)+D=A\sin(Bx)\cos(C)+B\cos(Bx)\sin(c)+D$$ $$y=\alpha \sin(Bx)+\beta \cos(Bx)+D\quad where\ \alpha=A\cos(C)\ \beta=B\sin(c)$$ The first method uses Prony's reformulation for frequency. In Prony's method the original function is replaced by a complex function. It introduces a complex polynomial to find the roots in complex domain and then translate it back to real domain. After choosing frequency the method determines other parameters by standart "Least-square method". It has some drawbacks. First of all due to Prony's reformulation the samples must be equally spaced. In addition during determination of frequency you come up with an over determined system and have to employ some transformations which are quite complex (at least complex for me).

The second method reduces the problem to the analogous problem for algebraic polynomials and solves by Forsythe's method. Using this method you can work with arbitrary distanced samples.

Please note that the info above is coming from my past research; and the topic was about the ease of numerical methods over exact ones; so I am not an expert on these methods. If you want to continue your search, good luck for you. Please note that the research is very limites on this topic.

On the other hand if you want to focus on numerical methos I can provide you algorithms (and methods) for your specific problem; and if you supply me data points a simple example (on Matlab).


Here is a Matlab version

damp = .01;

x = 0:.1:6;

y=0.5*sin(1.2*x+0.3)+0.6;

yn = y' + (rand(length(y),1) - .5)/10;

B=  [0.4; 1; 0.4; 0.5];

rows = length(yn);

for interation = 1:100000

    %build J and r
    for row = 1:rows
         xk = x(row);

        J(row,1) = sin(B(2) * xk + B(3));

        J(row,2) = B(1) * xk * cos(B(2) * xk + B(3)); 

        J(row,3) = B(1) * cos(B(2) * xk + B(3)); 

        J(row,4) = 1.0;  

        r(row) =  -yn(row) +  B(1) * sin(B(2) * xk + B(3)) + B(4);
    end   

    Jt = transpose(J);    
    delta = (inv(Jt*J))*Jt*r';   
    B = B - damp*delta;

end