Overcoming the Dimensionality Problem with Monte Carlo Integration

By Z.H. Fu https://fuzihaofzh.github.io/blog/

Numerical integration plays a pivotal role in numerous scientific computations, with applications spanning from physics to finance. Conventional numerical integration techniques, such as Simpson’s rule or the trapezoidal rule, perform well for straightforward, low-dimensional integrals. However, these classical methods falter when applied to high-dimensional spaces, a challenge addressed by Monte Carlo integration, which offers a robust and efficient resolution to the dimensionality problem.

The Dimensionality Problem

The dimensionality problem emerges in numerical integration as the number of dimensions in an integral increases. Traditional methods, like Simpson’s rule or the trapezoidal rule, suffer from the “curse of dimensionality,” where their complexity grows exponentially with the increase in dimensions. This exponential growth renders them ineffective for high-dimensional integrals, as the computational resources required become prohibitively large.

Monte Carlo Integration: A Solution to the Dimensionality Problem

Monte Carlo integration, a technique that harnesses randomness, provides an efficacious solution to the dimensionality problem. Unlike traditional methods, the complexity of Monte Carlo integration does not increase exponentially with the number of dimensions. Instead, it remains relatively stable, making it a more efficient choice for high-dimensional spaces.

This efficiency stems from the fundamental principle of Monte Carlo methods: random sampling. By randomly sampling points within the integration domain and averaging the function values at these points, we can estimate the integral. As the number of random samples increases, the estimate converges to the true value of the integral.

The mathematical definition of the Monte Carlo method in the context of numerical integration is as follows:

Definition (Monte Carlo Integration): Given a function $f: [0,1]^d \rightarrow \mathbb{R}$ and a sequence of independent and identically distributed (i.i.d.) random variables $X_1, X_2, ..., X_n$ uniformly distributed over $[0,1]^d$ , the Monte Carlo estimate of the integral $\int_{[0,1]^d} f(x) dx$ is given by
$\int_{[0,1]^d} f(x) dx \approx \frac{1}{n}\sum_{i=1}^n f(X_i),$
where $X_i$ are the sample points in $[0,1]^d$ and $n$ is the number of samples.

The law of large numbers ensures that this approximation converges to the true value of the integral as $n \rightarrow \infty$ .

Error of Monte Carlo Integration

The following Theorem show that the Error Bound of Monte Carlo Integration is decoupled with the dimension which is quite desireable.

Theorem (Error Bound of Monte Carlo Integration): Let $f: [0,1]^d \rightarrow \mathbb{R}$ be a measurable function such that $|f(x)| \leq 1$ for all $x \in [0,1]^d$ . Let $x_1, x_2, ..., x_n$ be independent random variables, each uniformly distributed in $[0,1]^d$ . Define the Monte Carlo estimator of the integral of $f$ over $[0,1]^d$ by
$I_n = \frac{1}{n}\sum_{i=1}^n f(x_i).$
Then the expected square error of the Monte Carlo estimator $I_n$ is $O(\frac{1}{\sqrt{n}})$ , i.e.,
$\mathbb{E}((\frac{1}{n}\sum_{i=1}^n f(x_i) - \int_{[0,1]^d} f(x) dx)^2) = O(\frac{1}{\sqrt{n}}).$

This theorem indicates that the error of the Monte Carlo method decreases with the square root of the number of samples. It is not related to the demenion and thus resolve the curse of dimensionality problem. Let give its proof.

Proof:

We begin by establishing the expectation of the Monte Carlo estimator of the integral. The Monte Carlo method approximates the integral by taking the average of $n$ samples of the function $f$ evaluated at random points $x_i$ uniformly distributed in the interval $[0,1]^d$ . The expectation of the Monte Carlo estimator is:

\mathbb{E}(\frac{1}{n}\sum_{i=1}^n f(x_i)) = \frac{1}{n} \sum_{i=1}^n \mathbb{E}(f(x_i))

Since $x_i$ are chosen uniformly at random, the expectation $\mathbb{E}(f(x_i))$ is just the integral of $f$ over the interval $[0,1]^d$ :

\mathbb{E}(f(x_i)) = \int_{[0,1]^d} f(x) dx

Therefore, we have:

\mathbb{E}(\frac{1}{n}\sum_{i=1}^n f(x_i)) = \frac{1}{n} \sum_{i=1}^n \int_{[0,1]^d} f(x) dx = \int_{[0,1]^d} f(x) dx

This demonstrates that the Monte Carlo estimator is unbiased. The expectation of the estimator equals the true value of the integral.

Next, we consider the variance of the Monte Carlo estimator. The variance of the Monte Carlo estimator is:

Var(\frac{1}{n}\sum_{i=1}^n f(x_i)) = \frac{1}{n^2} \sum_{i=1}^n Var(f(x_i))

Assuming that the samples $x_i$ are independent, the variance of the sum is the sum of the variances. Since $|f(x)| \leq 1$ , we have $Var(f(x_i)) \leq 1$ . Therefore, we can bound the variance by:

Var(\frac{1}{n}\sum_{i=1}^n f(x_i)) \leq \frac{1}{n^2} \sum_{i=1}^n 1 = \frac{1}{n}

The error of the Monte Carlo estimator is the square root of the variance, so the error is:

\mathbb{E}((\frac{1}{n}\sum_{i=1}^n f(x_i)-\int_{[0,1]^d} f(x) dx)^2) = Var(\frac{1}{n}\sum_{i=1}^n f(x_i)) \leq \frac{1}{n}

Taking the square root of both sides, we get:

\mathbb{E}(\frac{1}{n}\sum_{i=1}^n f(x_i)-\int_{[0,1]^d} f(x) dx) = O(\frac{1}{\sqrt{n}})

This confirms that the error of the Monte Carlo method decreases with the square root of the number of samples, a common result in Monte Carlo methods.

Conclusion

Monte Carlo integration offers a powerful and efficient solution to the dimensionality problem in numerical integration. Its capability to handle high-dimensional spaces, along with its straightforward error estimation, makes it an indispensable tool in the field of scientific computation. While it might not always be the optimal choice for low-dimensional, simple integrals, its strengths truly become apparent when dealing with the complexity and challenges of high-dimensional integration.

Reference

[1] Vershynin R. High-dimensional probability: An introduction with applications in data science[M]. Cambridge university press, 2018.