Like other MCMC methods, the Gibbs sampler constructs a Markov Chain whose values converge towards a target distribution. Gibbs Sampling is a specific case of the Metropolis-Hastings algorithm wherein proposals are always accepted. Gibbs Sampling is applicable when the joint distribution is not known explicitly or is difficult to sample from directly, but the conditional distribution of each variable is known and is easier to sample from.
Let's see a code demonstration.
# To begin, we import the following libraries.
import numpy as np
import scipy as sp
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
# Let's make a joint distribution that we're going to draw the Gibbs Sampler from.
f = lambda x, y: np.exp(-(x*x*y*y + x*x + y*y - 8*x - 8*y)/2.)
# Then, we plot the probability distribution.
xx = np.linspace(-1, 8, 100)
yy = np.linspace(-1, 8, 100)
# Here we make a meshgrid using the np.meshgrid function and pass through the equally-sized
# sequences for x and y axes.
xg, yg = np.meshgrid(xx, yy)
z = f(xg.ravel(), yg.ravel())
z2 = z.reshape(xg.shape)
plt.contourf(xg, yg, z2, cmap='BrBG')
# We've made a wicked cat staring at you :D
<matplotlib.contour.QuadContourSet at 0x7fdf800a7370>
# Now, we’ll attempt to estimate the probability distribution using Gibbs Sampling.
# Fortunately, the underlying conditional probabilities for both variables are normal distributions.
# Therefore, we can express the parameters of X and Y in terms of mu and sigma.
# Let's set up 50,000 steps as the number of iteration we want the Gibbs Sampler to simulate.
N = 50000
# And let's create an array of 50,001 values for x and y,
x = np.zeros(N+1)
y = np.zeros(N+1)
# and initiate the beginning point of x and y as 1 and 6, respectively
x[0] = 1.
y[0] = 6.
# Since the conditional distribution of y|x is given by N(4/(1+x^2), sqrt(1/(1+x^2))),
# and the conditional distribution of x|y is given by N(4/(1+y^2), sqrt(1/(1+y^2))).
# Let z as the variable value, and i as index, define the standard deviation
# sigma and mean mu using the lambda function.
sig = lambda z, i: np.sqrt(1. / (1.+z[i]*z[i]))
mu = lambda z, i: 4./ (1.+z[i]*z[i])
# We step through the Gibbs Sampling algorithm. In the for loop,
for i in range(1, N, 2):
# The for loop takes 2 as increments to make it convenient to update two values once at a time.
# We first sample the next x value by the conditional distribution of X|Y^(n-1)
sig_x = sig(y, i-1)
mu_x = mu(y, i-1)
# So at time i, we update the new value of x is drawn from the normal distribution with mean and
# standard deviation with mu_x and sig_x, the conditional distribution of X|Y^(n-1)
x[i] = np.random.normal(mu_x, sig_x)
# As for y, we don't do any update yet.
y[i] = y[i-1]
# We now sample the next value of y given the updated x value.
# The conditional distribution is Y|X^(n)
sig_y = sig(x, i)
mu_y = mu(x, i)
# We update the new value of y drawn from the normal distribution with mean and
# standard deviation with mu_y and sig_y, the conditional distribution of Y|X^(n)
y[i+1] = np.random.normal(mu_y, sig_y)
# and of course, don't change the x value.
x[i+1] = x[i]
# Up till now, we've updated x and y values for once. There are two values updated in one iteration.
# Finally, we plot the results after the entire for loop.
plt.hist(x, bins=50);
plt.hist(y, bins=50);
# We can set the alpha as 0,8 to make it more transparent.
plt.contourf(xg, yg, z2, alpha=0.8, cmap='BrBG')
# On top of the distribution, we can add the simulated x and y values where we've approximated from
# the underlying distribution.
plt.plot(x[::10],y[::10], '.', alpha=0.1)
# Also, let's add the trajectory of x and y values for the first 300 draws.
plt.plot(x[:300],y[:300], c='r', alpha=0.3, lw=1)
# As we can see, the probability distribution obtained using the Gibbs Sampling algorithm does a good
# job of approximating the target distribution.
[<matplotlib.lines.Line2D at 0x7fdfe23e5400>]
The Gibbs Sampling is a Monte Carlo Markov Chain method that iteratively draws an instance from the distribution of each variable, conditional on the current values of the other variables in order to estimate complex joint distributions. In contrast to the Metropolis-Hastings algorithm, we always accept the proposal. Thus, Gibbs Sampling can be much more efficient.