Is there a sense in which the Chi-squared distribution is an inner product?

Solution 1:

Interesting question!

First of all, yes, it does make sense to define independent random variables as different components of a vector. For example, the position of a particle in three dimensions might well be determined by $x,y,$ and $z$ each being a standard normal random variable. So it can make literal geometric sense to do as you suggest.

People make random vectors all the time. They see common use in linear regression with multiple variables, for example. Observations of several variables at once are often listed forming a vector. Many of the regression formulas are written with matrices and vectors.

You can call something a vector if the collection of things like it satisfies the rules of a vector space. Identity, inverse, commutativity of addition, multiplication by a constant, and so forth. I see no reason why you couldn't make your vectors of normal random variables into a vector space. However, you need to decide if you want to use just normal random variables, or allow any random variables. The sum of two normal random variables is normal, but that isn't true for many other distributions. Also you have to be careful what you are using as scalars, if you want to multiply components by them!

The problem comes in when you sum them. If you add three random variables together with coefficients, you get, as you pointed out, one random variable. The closest geometric analogy would be using the Manhattan metric in three dimensions, where your sum might be the distance between two points moving only along a rectangular grid. But three components of a vector do not sum up to one of anything in particular, generally.

The $\chi^2$ random variable for positive integer degrees of freedom can be considered as a sum of squared normal variables, so it does look rather like a dot product, doesn't it? However, as I pointed out above, you need to know what your scalars are. If they are regular numbers, then a chi-square random variable is not a number and so cannot be a dot product result. If you wanted to use chi square random variables, then you have the problem of multiplying a $Z$ by a $\chi^2$ and not getting a $Z$ back again, so that you wouldn't have a vector any more.

If you were to make a very general space, where components can be any random variable at all, and the scalars are also random variables, then you might be able to make it work. Covariance might not makes sense any more as a dot product in that case, however.

Solution 2:

It might be helpful to think of our addition across all df in the first place: if we had to establish if they were dependent/independent, we would certainly need to evaluate an inner product, but in the design of this test, we impose this 'orthogonality' on our degrees of freedom as a feature, not a bug, since it just makes the chi-squared test that much more conservative in determining dependence.

With respect to the inner product, I would go backwards: start with a l2 norm, which we're using, and then use the induced inner product from that norm. It seems that our inner product space is that induced by l2 norm on the vector space of ~~errors~~ residuals, not the original data.

What is the difference between Array and Arrays class in Java? [duplicate]

Is there any way to disable Screen Capture/Screen Recording Flutter win32 app

Downloading a folder through with FTP using PHP

pyenv install: 3.x BUILD FAILED (Ubuntu 20.04 using python-build 20180424)

Serial not auto-incrementing when insert table into another table

How to upsample a multi-index dataframe ensuring each grouping covers the same time range (provide custom starting and ending datetimes)

how to filter based on two values from the array of array

connect from website to MySQL database

Understanding the Python with statement and context managers

Make only identical pairs in python list

Hibernate: duplicate key value violates unique constraint on collection

Sphinx: How set the right path for html_static_path in config.py?