How can I write an equation that matches any sequence?

One thing I have been wondering about lately is how to write an equation that describes a pattern of numbers. What I mean is:

x   0    1    2 
y   1    5    9

If I have this, I can tell that an equation that describes this would be $y=4x+1$. In fact, I don't even need the third pair of numbers. It's very easy when the equation is a straight line. But when the equation is a parabola, its not always that easy. For example:

x   0    1    2 
y   1    2    5  

I can tell this is $ y=x^2+1$, because I recognize the pattern. But I can't always tell just by looking at the numbers what the right equation should be. Is there some way to always know the right equation? I know that if you get the $x=0$ term you can get the $c$ in $y=ax^2+bx+c$, but that's not enough to let me solve it like I can when the equation is just a line.

For example, can someone show me how you would do it for

x    0     1     2
y    5     4     7

It's not a homework question, I promise!


Solution 1:

If you know your relationship is going to be a polynomial, then there are some pretty (conceptually) simple ways you can do this.

If you know what degree your polynomial is (line, parabola, cubic, etc.) then your job will be much easier. But if not, then you simply need to look at the number of points you have.

  • If you are given one point, the best you can do is of degree 0 ( y = k )
  • If you are given two points, the best you can do is of degree 1 ( y = A x + B )
  • If you are given three points, the best you can do is of degree 2 ( y = A x2 + B x + C )
  • If you are given four points, the best you can do is of degree 3 ( y = A x3 + B x2 + C x + D )
  • etc.

When I say "the best you can do", what I mean is -- if you have a parabola, but are only given two points, then you really can't identify the parabola. But you can say that it's a simple line.

Let's assume you have three points. The "best you can do" is assume that it is degree 2. If it is actually of degree one, your answer will magically turn into a line ( your x^2 coefficient will be 0 )

The basic idea of solving relationships/equations is:

If you have n unknowns, you need n equations/points.

Notice how, in the form of the Parabola ( y = A x2 + B x + C ), you have three unknowns? And also three equations! (points)

Let's pick three arbitrary points

x   1    2    4
y   6    7    3

You would set up three equations:

6 = 12 * A + 1 * B + C

7 = 22 * A + 2 * B + C

3 = 42 * A + 4 * B + C

Three equations, three unknowns. You should be able to solve this with a combination of most system-of-equation-solving rules.

In our case, we find:

A = -1
B = 4
C = 3

So our equation is y = -x2 + 4 x + 3

Note that, if your original three points formed a line, your $A$ would $= 0$


However, if your equation is NOT a polynomial, then you are left with little more than guess and check, plugging in various coefficients and trials (exponential? trigonometric?)

The beauty of the polynomial approach is that a polynomial of high enough degree will always fit any list of points. (provided that the points form a function)

Solution 2:

Given a list of terms of a sequence as you describe, one technique that may be of use (supplementary to Justin's answer) is finite differences. Calculate the differences between successive terms. If these first differences are constant, then a linear equation fits the terms you have. If not, compute the differences of the differences. If these second differences are constant, then a quadratic equation fits the terms you have. If not, you can continue to compute differences until you reach a constant difference (in the nth differences means an nth degree polynomial), differences that are a constant multiple of the previous differences (exponential of some sort), or you run out of terms. In any case, what you find is limited to matching the terms you know, as without some kind of general rule for the sequence, the first unknown term could be anything and completely alter the pattern (and with n known terms, a polynomial of degree n-1 will always perfectly fit).

Solution 3:

In the general case of trying to predict some infinite sequence of integers, there is no formula. This is because there is no reason to expect a pattern to continue, since all sequences are possible.

However, you can say a certain number is more likely given a set of functions. For instance if you considered all Turing machines, you could say that given n elements of a sequence look at all the Turing machines that predict the current sequence, and then find the most predicted next number. There still isn't a efficient way to compute what the most likely next number is.

Ray Solomonoff called this "Universal Probabilistic Induction"

This is explained in more depth here: http://en.wikipedia.org/wiki/Inductive_inference

Solution 4:

Of course there are infinite equation (even if you require them to be infinitely differentiable...) that satisfy the given constraints. As Isaac and Justin already wrote, you may always find a polynomial of degree at most n-1 (where n is the number of points given) which satisfies the given data; but you cannot be sure that this is the right answer. Moreover, if data is not exact but approximate the resulting polynomial may be quite different from the correct function, since it would likely hade huge peaks and falls. In such cases, an approximate method like least squares could be more useful.