Understanding the Metropolis-Hastings Algorithm
Can someone please tell me if I have correctly written the code for MCMC Sampling (Metropolis Hastings Algorithm)?
Given a multivariate joint probability distribution function, I am interested in learning how to randomly sample a conditional probability distribution function.
Suppose I have a 4 Dimensional multivariate Normal Distribution:
Suppose this multivariate Normal Distribution P(X, Y, Z, W) distribution has a:
Mean vector (4 x 1): 5.0060022 3.4280049 1.4620007 0.2459998
Variance-Covariance Matrix (4 x 4): 0.15065114 0.13080115 0.02084463 0.01309107 0.13080115 0.17604529 0.01603245 0.01221458 0.02084463 0.01603245 0.02808260 0.00601568 0.01309107 0.01221458 0.00601568 0.01042365
Using the R programming language, I created a function that corresponds to this 4 Dimensional Multivariate Normal Distribution:
#define constants needed for the multivariate normal
sigma1 <- c(0.15065114 , 0.13080115 , 0.02084463 , 0.01309107 , 0.13080115 , 0.17604529 , 0.01603245 , 0.01221458 , 0.02084463 , 0.01603245 , 0.02808260 , 0.00601568 , 0.01309107 , 0.01221458 , 0.00601568 , 0.01042365)
sigma <- matrix(sigma1, nrow=4, ncol= 4, byrow = TRUE)
sigma_inv <- solve(sigma)
sigma_det <- det(sigma)
denom = sqrt( (2*pi)^4 * sigma_det)
#actual multivariate function is defined below ("target")
target <- function(x,y,z,w)
{
x_one = x - 5.0060022
x_two = y - 3.4280049
x_three = z - 1.4620007
x_four = w - 0.2459998
x_t = c(x_one, x_two, x_three, x_four)
x_t_one <- matrix(x_t, nrow=4, ncol= 1, byrow = TRUE)
x_t_two = matrix(x_t, nrow=1, ncol= 4, byrow = TRUE)
# In this part, as it's (x-mu)^T * SIGMA * (x-mu)
#num = exp(-0.5 * t(x_t_t) %*% sigma1_inv %*% x_t_t)
num = exp(-0.5 * x_t_two %*% sigma_inv %*% x_t_one)
answer_1 = num/denom
return(answer_1)
}
Question: Suppose I want to take random samples from this multivariate normal distribution P(X, Y, Z, W), conditional on P(X, Y | Z = 2 , W = 1.3)
I attempted to manually write a Monte Carlo Sampler (Metropolis-Hastings) to take random samples from P(X, Y | Z = 2 , W = 1.3). To do this, I first "fixed" the values of Z and W within the original multivariate normal distribution:
#fix the definitions of w and z as per P(X, Y | Z = 2 , W = 1.3)
target <- function(x,y)
{
x_one = x - 5.0060022
x_two = y - 3.4280049
x_three = 2 - 1.4620007
x_four = 1.3 - 0.2459998
x_t = c(x_one, x_two, x_three, x_four)
x_t_one <- matrix(x_t, nrow=4, ncol= 1, byrow = TRUE)
x_t_two = matrix(x_t, nrow=1, ncol= 4, byrow = TRUE)
# In this part, as it's (x-mu)^T * SIGMA * (x-mu)
#num = exp(-0.5 * t(x_t_t) %*% sigma1_inv %*% x_t_t)
num = exp(-0.5 * x_t_two %*% sigma_inv %*% x_t_one)
answer_1 = num/denom
return(answer_1)
}
Next, I run the Monte Carlo Sampler (Metropolis-Hastings) to randomly sample this Conditional Distribution:
library(mvtnorm)
x = rep(0,10000)
y = rep(0,10000)
x[1] = 3 #initialize; I've set arbitrarily set this to 3 and 1
y[1] =1
for(i in 2:10000){
current_x = x[i-1]
current_y = y[i-1]
new = rmvnorm(n=1, mean=c(current_x,current_y), sigma=diag(2), method="chol") # generate bivariate random numbers
proposed_x = new[1]
proposed_y = new[2]
A = target(proposed_x,proposed_y)/target(current_x,current_y)
if(runif(1)<A){
x[i] = proposed_x # accept move with probabily min(1,A)
y[i] = proposed_y
} else {
x[i] = current_x # otherwise "reject" move, and stay where we are
y[i] = current_y
}
}
The final answer can be taken as the "mean" of both columns:
mean(mcmc_output$y)
mean(mcmc_output$y)
My Question: Can someone please tell me if what I have done is correct? I know that the Multivariate Normal Distribution has "attractive theoretical properties" that allow you to perform the above task without using MCMC Sampling - but I am trying to learn how to perform MCMC Sampling and wanted to start with a basic example by using the Normal Distribution. Can someone please tell me if what I have done is correct? Is there a way to find out if the MCMC algorithm has converged?
Thanks!
Optional: Visualization of the Metropolis-Hastings Algorithm
library(ggplot2)
ggplot(mcmc_output, aes(x = x, y =
y)) +
geom_density_2d_filled() +
ggtitle("Contour Plots of the MCMC Estimates")
Solution 1:
I think that what you are doing is wrong. Indeed, your second target function is not computing $P(X,Y\mid Z=2, W=1.3)$ as you indicate. Rather it is computing $P(X,Y,Z=2, W=1.3)$, which is not the same thing. Using Bayes' theorem, you have that $P(X,Y\mid Z=2, W=1.3) = \frac{P(X,Y,Z=2, W=1.3)}{P(Z=2, W=1.3)}$.
Edit (after comment)
Since $P(Z=2, W=1.3)$ is independent from $X$ and $Y$, the result should indeed be correct. A simple way to check it visually would be to project your points on the eigenvectors of the covariance matrix and plot the two resulting distributions against the theoretical ones (gaussians with variance given by the corresponding eigenvalues) using a qq-plot. Over time, the points should align better and better on the diagonal (you can use the Root Mean Square to quantify it).