Going back from a correlation matrix to the original matrix

Solution 1:

You can do a cholesky-decomposition, such that if your correlation-matrix is R then if

$ L*transpose(L) = R $

then $ L = cholesky(R)$

Here L is triangular of size N by N. Now you could also understand L as a simple rotation of your original readout-matrix where the M columns are rotated to fit into N (<=M ) columns/vectorspace.

So you may apply any arbitrary rotation to the columns of L, which may arbitrarily be extended to some M columns by appending null-columns.

You may also apply a "stretching" of the rows in that reconstructed "readout"-matrix by multiplication of each row by some standard-deviation since the L-matrix represents rows with norm 1.

(If you want you can download my (free) MatMate-program for matrix-visualization and see with a simple script what I mean here/how this works. Here is the download page .)

[update] Here I give an example. After reading more comments I understand, that the goal was to construct a data-matrix given a correlation-matrix. In the example I constructed a random correlation-matrix 4 by 4. Then I produced data and then I check, whether the data, corrleated, reproduce the correlation-matrix. They do, and because the matrix-algebra is simple we can also see that this is exact.

;===========================================
;MatMate-Listing vom:16.01.2011 18:33:13
;============================================
[1] cor = covtocorr(cov)   // this is the correlation-matrix from a given covariance.
      cor : 
    1.00       -0.38        0.22       -0.85
   -0.38        1.00        0.30        0.57
    0.22        0.30        1.00        0.27
   -0.85        0.57        0.27        1.00

[2] lad = cholesky(cor)   // this provides a "loadings"-matrix by cholesky-decomposition
                          // of matrix cor
      lad : 
    1.00        0.00        0.00        0.00
   -0.38        0.93        0.00        0.00
    0.22        0.42        0.88        0.00
   -0.85        0.27        0.39        0.25

Now we want to construct 200 columns of data. As I wrote earlier, the cholesky
matrix L of a given correlationmatrix R is just a rotation of the data-matrix
One should add, that the data must be centered (mean=0) and normed(stddev=1)
Now I do the inverse: I construct a valid rotation-matrix t by just rotating a
dummy random-matrix of 200 columns to the first column(rotation method:"triangular").
[3] n=200
[4]     dummy = randomu(1,n)
[5]     t = gettrans(dummy,"tri")
[6]     hide t,dummy

So I've got a valid rotation-matrix t. 
Now I provide columns to rotate the cholesky/loadings-matrix within
and insert the cholesky matrix in the first 4 columns
[7] data = (lad || null(v,n-v)) * sqrt(n)

Then I rotate that data-matrix with that random rotation
[8] data = data * t
[9] disp = data[*,1..10]   // show the first 10 columns of the created dataset
      disp : 
    1.71       -0.11       -0.20       -0.05       -0.06       -0.21       -0.16       -0.14       -0.11       -0.21
    0.23       13.12        0.08        0.02        0.02        0.08        0.06        0.05        0.04        0.08
    2.22        5.77       12.34       -0.01       -0.01       -0.05       -0.03       -0.03       -0.02       -0.05
   -0.46        3.83        5.63        3.52        0.05        0.18        0.13        0.12        0.09        0.17

Check whether that data really give the correlation-matrix     
[10] chk_cor = data * data' /n
      chk_cor : 
    1.00       -0.38        0.22       -0.85
   -0.38        1.00        0.30        0.57
    0.22        0.30        1.00        0.27
   -0.85        0.57        0.27        1.00

We see, that the correlation-matrix is reproduced.
The "mystery" here is simply, that by the definition of correlation (of normed data, to make the example short)

R = data * data' / n

here data may have arbitrary many columns and t*t' is identity-matrix, so
R = data * t * t' * data' / n
R = (data*t) * (data*t)' / n
and the cholesky-factorization is just the operation to get L and L' from a correlationmatrix

L = cholesky(R) ==> L*L' = R

(Thus the result is also exact)

[end update]

[update 2] In response to the remark of mpiktas I'll add a variant which allows to set the standard-deviation and the means in the generated data explicitely.
The arguing in the first update was to show, how the data form a m-dimensional vectorspace, and the cholesky-decomposition of the according correlation-matrix can simply be understood as a special rotation of the data in that vectorspace.
However, means of data generated the way I described above are not guaranteed to be zero. Soe here is a variant which produces means=zero, stddevs=1 and the data are then scalable to other stddevs and translatable to prediefined means.
Here the idea is to use the L-matrix as "recipe for composition" of uncorrelated "factors" as understood in factor analysis/PCA. Having a centered and normed matrix of data D it is assumed that the cholesky-matrix L describes a linear combination of uncorrelated data of unit-variance ("factors") in a matrix F with the same number of columns as D such that
L * F = D

Now, some random F can be created by a random-generator providing normal distribution with mean=0 and stddev=1 per row. If the rows in F were truely uncorrelated, we had by the above composition L * F = D a simple solution. After that we could scale the rows of D by standard-deviations and also translate to predefined means.
Unfortunately randomgenerators do not exactly give uncorrelated data, so the correlations in F must be removed first. This is possible by the inverse of the cholesky-matrix of the correlations of F. Here is a script pseudocode

v = 4 // set number of variables/of rows, make a small example        
n = 200 // set number of observations/ of columns      
F = randomn(v,n,0,1)  // generate randomdata in F where rows have mean=0, stddev=1      
covF  = F * F' /n      
cholF =  cholesky(covF)     

L = cholesky( R )          // R is the targetted correlation-matrix
data = L * inv(cholF) * F      

sdevs = diag([1,2,3,4])              // provide four stddevs     
means = colvector([20,10,40,30])     // provide four means     

simuldata = sdevs*data + means   // here the means-vector must be expanded to    
                                // be added to all columns in simuldata     

The simulated data are in the matrix simuldata.

While we can set stddevs and means explicitely there is still one problem pertaining: by the leftmultiplication of inv(cholF) and L the normal-distribution in simuldata as originally provided by the randomgenerator in F may be affected/distorted a bit. I don't have an idea yet how to solve this...

Solution 2:

What you want to do is called sampling from a multivariate distribution. It's not hard to look up how to do this, although in your case if some of your variables are assumed normal and some Poisson you're going to have to calculate the joint PDF of your model yourself. Rather than trying to explain the whole process here, I'll give you some things to read:

http://en.wikipedia.org/wiki/Multivariate_normal_distribution

http://www.statisticalengineering.com/bivariate_normal.htm

http://www.springerlink.com/content/uq8810j473527617/

Solution 3:

You could make your "10%" implementation faster by using gradient descent

Here's an example of doing it for covariance matrix because it's easier.

You have $k\times k$ covariance matrix C and you want to get $k \times n$ observation matrix $A$ that produces it.

The task is to find $A$ such that

$$A A' = n^2 C$$

You could start with a random guess for $A$ and try to minimize the sum of squared errors. Using Frobenius norm, we can write this objective as follows

$$J=\|A A'-n^2 C\|^2_F$$

Let $D$ be our matrix of errors, ie $D=A A'-n^2 C$. Checking with the Matrix Cookbook I get the following for the gradient

$$\frac{\partial J}{\partial a_{iy}} \propto a_{iy} \sum_j D_{i,j}$$

In other words, your update to data matrix for sensor $i$, observation $y$ should be proportional to the sum of errors in $i$th row of covariance matrix multiplied by the value of $i,y$'th observation.

Gradient descent step would be to update all weights at once. It might be more robust to update just the worst rows, ie calculate sum of errors for every row, update entries corresponding to the worst rows, then recalculate errors.