Arbitrary Distribution Sampling II
note: part 2 of a series on sampling from arbitrary distrbutions, I suggest reading the first one before continuing
Refresher
At the end of the last installment, we had developed a rejection sampling method to generate samples from a random variable \(X\) with a known, but completely arbitrary distribution functions \(f(x)\). A quick review of the method is the following:
- we generate a random variable \(Y\) using an easy to sample from “proposal distribution” \(g(Y)\)
- we generate a random uniform variable \(U \in [0,1]\),
- we accept the point \(Y\) as being part of our sample if \(U \leq \frac{f(Y)}{M\cdot g(Y)}\)
- the accepted \(Y\) values will be distributed the same as \(X\) is
M is a constant parameter that we need to set to ensure the ratio of \(\frac{f(Y)}{M\cdot g(Y)} \leq 1\) for our entire domain. Assuming that both distributions are normalized, this method works best when \(M\) is as close to \(1\) as possible. There are, of course, other factors to consider– such as the efficiency of your sampling algorithm for \(Y\).
We left off thinking about ways to generate the proposal distribution \(g(x)\) and find the value of \(M\) automatically.
Using np.histogram
One method to accomplish this, is to make a histogram. A normalized histogram is nothing more than a piecewise linear pdf. This is the logical extension of using a uniform distribution as the proposal distribution. Let’s make a Histogram distribution object that holds bins and counts, with the ability to give us the pdf and sample from itself:
We should also bring back our old target distribution:
Ok, let’s see how this works in practice
We can see pretty clearly that the target distribution density is significantly lower than that of the proposal distribution as several points, so we will need to find a appropriate value for \(M\). Let’s write some code to do this for us, since we are moving in the direction of automating this whole process:
To generate N samples, we run the following
Here is the plot we get out from the \(M\) minimization
Interestingly, the optimal \(M\) turns out to be the same as we had for the uniform distribution. The problem will not be solved by more bins either, or a finer mesh along \(x\) when training M. We would still have to generate 2 samples on average to get 1. We need a better histogram, one that doesn’t generate densities that are so wildly above the target pdf.
Making our own histogram
Since we can’t rely on the built in numpy histogram function, we will simply build it ourselves. A first order improvement will be just setting the height of each bin by hand to be the maximum between the left and right endpoints.
Let’s also use a quadratic pdf this time; which would work even more poorly using the naive histogram approach we used above:
We’ll compare both methods, by making two different proposal distributions:
If we plot both histograms, there seems to only be a little difference between the two: However, there are huge consequences when it comes to the efficiency of these two histogram distributions. When we use the np histogram method, \(M\approx 3\): While \(M\approx1.3\) for the upper histogram! This means that we should expect the upper histogram to be more than twice as efficient at generating samples of the target distribution (and this will be the case if you test it, go ahead and give it a try.)
Unnormalized distributions
You may have already intuited that this will work for unnormalized distributions too. If the distributions \(f\) and \(g\) are written as an unnormalized part \(f_u\) and \(g_u\), multiplied by a normalization constants \(N_f\) and \(N_g\), we simply redefine \(M\) to absorb those constants. The downside here is that we lose interpretability of \(M\), which no longer corresponds so directly to the efficiency of the algorithm. A quick example:
Now, we find “\(M\)”
We don’t expect this \(M\) to give us an estimate of the acceptance, so we’ll have to run a small trial of 500 samples first to approximate the acceptance ratio:
And there we go: no norm, no problem.
Optimization and Scaling
An easy optimization would be to edit the upper_histogram function to evaluate the midpoint as well as the ends of the intervals when calculating the height of each box. This would help improve the algorithm in situations where the regions are not monotonic. Something like this…
Of course, you can do even more than just the midpoint, but there will be a balance in terms of how many points it is feasible to check. This kind of thinking gets even more important when we start thinking about higher dimensions. This brings us to the major question at hand: how does it scale into higher dimensions.
Extending the method to high dimensional histograms should be pretty straightforward in theory:
- coarse grain the space
- for each cell in the coarse graining, evaluate the “corners” and take the height to be the max among the corners
- potentially also evaluate the center to try to account for non-monotonic regions
We can see how the evaluation of many points inside of a cell isn’t going to play that well in high dimensions because if you coarse grain an d dimensional system into n sections along each dimension you have \(n^d\) boxes, each having \(2^d\) corners. All of the sudden we are starting to evaluate a lot of points here…
This topic has turned out to be quite a bit more interesting that I originally expected. I think this deserves some more thought. Expect another installment sometime in the near future.