Monte Carlo Methods and Their Applications in Big Data Analysis

Ji, Hao; Li, Yaohang

doi:10.1007/978-3-319-25127-1_7

Hao Ji⁴ &
Yaohang Li⁴

3868 Accesses
3 Citations
1 Altmetric

Abstract

Due to recent emergence of many big data problems, Monte Carlo methods tend to become powerful tools for analyzing big data sets. In this chapter, we first review the fundamental principles of Monte Carlo methods. Then, we describe several popular variance reduction techniques, including stratified sampling, control variates, antithetic variates, and importance sampling, to improve Monte Carlo sampling efficiency. Finally, application examples of estimation of sum, Monte Carlo linear solver, image recovery, matrix multiplication, and low-rank approximation are shown as case studies to demonstrate the effectiveness of Monte Carlo methods in data analysis.

Access provided by Autonomous University of Puebla. Download chapter PDF

Product-form estimators: exploiting independence to scale up Monte Carlo

Article Open access 21 December 2021

Statistical Leveraging Methods in Big Data

General Introduction to Monte Carlo and Multi-level Monte Carlo Methods

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Numerical methods known as Monte Carlo methods can be loosely defined in general terms to be any methods that rely on random sampling to estimate the solutions. Monte Carlo methods are often applied to problems which are either too complicated to be described by a mathematical model or whose parameter space is too large to be explored systematically.

Recent years have witnessed dramatic increase of data in many fields of science and engineering [1], due to the advancement of sensors, mobile devices, biotechnology, digital communication, and internet applications. These massive, continuously growing, complex, and diverse data sets are often referred to as the “big data.” Analyzing big data demands cost-effective and innovative forms of information processing for enhanced insight and decision making. While many traditional, deterministic data analysis methods have great difficulties to scale to the massive big data sets, Monte Carlo methods, which are based on random sampling techniques, become important and powerful tools for big data applications. Particularly, for certain big data sets beyond the computational capability of the most powerful supercomputers, Monte Carlo data analysis methods using random sampling are the only viable approaches.

This article starts with a basic description of the principles of Monte Carlo methods. It then discusses the variance reduction techniques, including stratified sampling, control variates, antithetic variates, and importance sampling, to design “smart” Monte Carlo. Examples of Monte Carlo methods on estimating sum of a large array, solving linear systems, multiplying matrix products, recovering missing matrix, and approximating low-rank matrices. Although in order to keep the presentation simple, these application examples are presented in a relatively small scale, the underlying Monte Carlo techniques can be effectively extended to big data sets.

2 The Basic of Monte Carlo

Monte Carlo methods provide approximate solutions to a variety of mathematical problems by random sampling. Let us take the numerical integration as an example, which constitutes a broad family of algorithms in numerical analysis. Suppose we want to calculate a one-dimensional definite numerical integral, $I =\int _{ a}^{b}f\left (x\right )dx$. A common numerical integral method is to divide the one-dimensional interval into N subintervals and then to sum the area corresponding to each subinterval using either rectangular, trapezoidal, or Simpson’s rules (Fig. 7.1a) [2]. Similarly, for two-dimensional intervals, the number of 2D subintervals becomes N ² (Fig. 7.1b). In general, for d-dimensional integration problems, the d-dimensional space needs to be divided into N ^d subintervals. For a not very high dimensional problem with d = 20 and N = 100, the total number of subintervals that need to be evaluated goes up to 10⁴⁰, which is unapproachable by most numerical integration algorithms. Mathematically, this is referred to as the “curse of dimensionality.”

In contrast, Monte Carlo methods estimate the integral by statistical sampling techniques [3]. Let us consider a one-dimensional integral $I_{0-1} =\int _{ 0}^{1}f\left (x\right )dx$, which can be easily extended to a more general integral of $I =\int _{ a}^{b}f\left (x\right )dx$. Suppose that the random variables x ₁, x ₂, ⋯ , x _N are drawn independently from the probability density function p(x). A function F may be defined as

$$\displaystyle{F =\sum _{ i=1}^{N}f\left (x_{ i}\right )p\left (x_{i}\right ).}$$

The expectation value of F becomes

$$\displaystyle{E\left (F\right ) =\int _{ 0}^{1}f\left (x\right )p\left (x\right )dx.}$$

The crude Monte Carlo integration method assumes that the probability density function p(x) is uniform, i.e., the random samples $f(x_{1}),f(x_{2}),\cdots \,,f(x_{N})$ are equally important, and then

$$\displaystyle{E\left (F\right ) =\int _{ a}^{b}f\left (x\right )dx.}$$

Correspondingly, the variance of F becomes

$$\displaystyle{V ar\left (F\right ) = \dfrac{1} {N}\int _{0}^{1}\left (f(x) - E(F)\right )^{2}dx = \dfrac{1} {N}\sigma ^{2},}$$

where σ ² is the inherent variance of the integrant function f(x). Clearly, we can find that the standard deviation of the estimator θ is $\sigma N^{-1/2}$. This means that as N → ∞, the distribution of F narrows around its mean at the rate of $O\left (N^{-1/2}\right )$.

Now, let us extend the Monte Carlo integration method to a d-dimensional integral $I_{d} =\int _{ 0}^{1}\cdots \int _{0}^{1}f\left (x\right )dx$, the expectation of $F_{d} =\sum _{ i=1}^{N}f(x_{i})/N$ on uniformly distributed random variable vectors x ₁, x ₂, ⋯ , x _N becomes

$$\displaystyle{E\left (F_{d}\right ) =\int _{ 0}^{1}\cdots \int _{ 0}^{1}f\left (x\right )dx = I_{ d}.}$$

The variance of the estimator F _d is σ _d ²∕N, where σ _d ² is the inherent variance of the integrant function f(x). If an integral function f(x) is given, σ _d ² is a constant and therefore, similar to one-dimensional integral, the convergence rate of Monte Carlo is $O\left (N^{-1/2}\right )$, which is independent of dimensionality.

In summary, compared to the deterministic numerical integration methods, whose convergence rate is $O(N^{-\alpha /d})$, where α is the algorithm related constant and d is the dimension, Monte Carlo integration method yields a convergence rate of $O(N^{-1/2})$ [4], which can somehow avoid the “curse of dimensionality.” Moreover, computations on each random samples are independent, which can be carried out in an embarrassingly parallel minor to harness the power of large-scale parallel and distributed computing architectures [5, 6]. On the other hand, the main disadvantage of Monte Carlo is that the convergence of Monte Carlo methods is very slow, roughly for every one digit of accuracy usually requiring 100 times more computations.

3 Variance Reduction

Crude Monte Carlo treats all random samples are equally important. In reality, we can often gain additional knowledge from the application domain, which can be taken advantage to come up with better estimators. Variance reduction is a procedure of deriving an alternative estimator to obtain a smaller variance than the crude Monte Carlo estimator and improve the precision of the Monte Carlo estimates for a given number of samples. In practical applications, a good estimator leading to million times more accurate than a bad one is not rarely seen. In this section, we describe some of the popular variance reduction techniques [3, 4], including stratified sampling, control variates, antithetic variates, and importance sampling. These variance reduction methods, if appropriately used, can significantly improve the efficiency of Monte Carlo methods in processing and analyzing big data sets.

3.1 Stratified Sampling

The inherent variance of the integral function f(x) may vary significantly in different regions. The fundamental idea of stratified sampling is to apply more samples in the region with large variability and vice versa. Using the one-dimensional integral in interval $\left [0,1\right ]$ as an example, stratified sampling is to partition the interval $\left [0,1\right ]$ into several disjoint subinterval ranges, i.e., $\left [0,\alpha _{1}\right ),\left [\alpha _{1},\alpha _{2}\right ),\cdots \,,\left [\alpha _{k-1},\alpha _{k}\right ),\left [\alpha _{k},1\right ]$. Then a Monte Carlo estimator $F_{\alpha _{i-1},\alpha _{i}}$ is applied to each range $\left [\alpha _{i-1},\alpha _{i}\right )$ separately so that

$$\displaystyle{F_{\alpha _{i-1},\alpha _{i}} =\sum _{ j=1}^{n_{i} }\left (\alpha _{i} -\alpha _{i-1}\right ) \dfrac{1} {n_{i}}f(x_{j}),}$$

where x _j is a random sample in $\left [\alpha _{i-1},\alpha _{i}\right )$ and n _i is the total number of samples. Clearly,

$$\displaystyle{E\left (F_{\alpha _{i-1},\alpha _{i}}\right ) =\int _{ \alpha _{i-1}}^{\alpha _{i} }f(x)dx.}$$

The overall stratified sampling estimator F _stratified is the combination of all estimators in different subinterval ranges

$$\displaystyle{F_{stratified} =\sum _{ i=1}^{k+1}F_{\alpha _{ i-1},\alpha _{i}},}$$

which is an unbiased estimator of the integral. The variance of the stratified sampling estimator becomes

$$\displaystyle{V ar\left (F_{stratified}\right ) =\sum _{ i=1}^{k+1}\left (\alpha _{ i} -\alpha _{i-1)}\right ) \dfrac{1} {n_{i}}\sigma _{\alpha _{i-1},\alpha _{i}}^{2},}$$

where $\sigma _{\alpha _{ i-1},\alpha _{i}}^{2}$ is the variance of the integrant function f(x) in subinterval $\left [\alpha _{i-1},\alpha _{i}\right )$. This reveals the rationale of stratified sampling, which is to minimize $\dfrac{1} {n_{i}}\sigma _{\alpha _{ i-1},\alpha _{i}}^{2}$ in each subinterval. If stratification is well carried out, the variance of stratified sampling will be less than that of crude Monte Carlo.

3.2 Control Variates

If there is another estimator which is positively correlated to the one to be evaluated and can be easily calculated with known expectation, it can be used as a control variate. For example, in the Monte Carlo integration example, the original integral $I_{0-1} =\int _{ 0}^{1}f(x)dx$ can be rewritten into a summation of two parts

$$\displaystyle{I_{0-1} =\int _{ 0}^{1}g(x)dx +\int _{ 0}^{1}\left (f(x) - g(x)\right )dx,}$$

where g(x) is a simple function that can be either integrated theoretically or easily calculated and absorbs most of the variation of f(x) by mimicking f(x) as shown in Fig. 7.2, we only need to estimate the difference between f(x) − g(x), which will result in variance reduction of the estimator.

3.3 Antithetic Variates

Contrast to the control variates method based on a strongly positively correlated estimator with F, the antithetic variates method takes advantage of an estimator F ^′ that is strongly negatively correlated with F. Suppose that F is an unbiased estimator of a quantity, E(F) = E(F ^′), and Cov(F, F ^′) < 0. Then, the antithetic estimator $F_{antitheticc} = (F + F^{{\prime}})/2$ is also an unbiased estimator of this quantity. The sampling variance of F _antitheticc becomes

$$\displaystyle{V ar\left (F_{antitheticc}\right ) = \dfrac{V ar\left (F\right )} {v} + \dfrac{V ar\left (F^{{\prime}}\right )} {4} + \dfrac{Cov\left (F,F^{{\prime}}\right )} {2}.}$$

Here Cov(F, F ^′) < 0 due to negative correlation between F and F ^′, which leads to overall smaller variance in F _antitheticc than that of F and F ^′.

3.4 Importance Sampling

The importance sampling technology is often used in statistical resampling, which reduces variance by emphasizing the sampling on regions of interest. For example, by introducing a new proposal function g(x), the original integral $I_{0-1} =\int _{ 0}^{1}f(x)dx$ can be rewritten as

$$\displaystyle{I_{0-1} =\int _{ 0}^{1}\dfrac{f(x)} {g(x)}g(x)dx =\int _{ 0}^{1}\dfrac{f(x)} {g(x)}dG(x),}$$

where G(x) is a cumulative density function (CDF). f(x)∕g(x) is called the likelihood ratio. With random samples drawn from a proposal distribution whose CDF is G(x) instead of sampling from a uniform distribution, the variance of the importance sampling estimator F _{importance sampling} becomes

$$\displaystyle{V ar\left (F_{importance\ sampling}\right ) =\int _{ 0}^{1}\left (\dfrac{f(x)} {g(x)} - E(F)\right )^{2}dG(x).}$$

A good likelihood ratio can result in a significant variance reduction.

In practice, assume that we know nothing about the target distribution at the very beginning, we may have to start from uniform sampling. However, after initial sampling, we have a better understanding of the target distribution, which results in a better proposal function. The resampling can be guided by the new proposal function and leads to a better approximation of the target distribution.

4 Examples of Monte Carlo Methods in Data Science

In this section, we present several application examples as case studies of using Monte Carlo methods for data analysis. Variance reduction techniques are applied to build “smart” Monte Carlo estimators to enhance sampling efficiency.

4.1 Case Study 1: Estimation of Sum

In many practical data analysis applications, we are often required to estimate the sum of a large data set. However, due to many reasons, we are not allowed to visit every element in the data set but want to get a good estimation. The following is an application example to estimate the overall salary expense in a company.

Consider a big, global company having 7140 employees falling in the following categories, 78 managers, 4020 engineers, 2008 salesmen, and 1034 technicians. Now we want to estimate the overall expense in employee salaries in this company. Due to cost as technical difficulty, we are only allowed to use 100 samples.

The simplest sampling method is the crude Monte Carlo using uniform sampling without considering different categories. The main problem of uniform sampling is ignoring the differences among categories, which will thus lead to a large estimation variance 460, 871K ± 80, 712K. More importantly, the percentage of managers is around 1 % in all employees. There is a high chance that the selected 100 samples may miss the manager category. Stratified sampling can better address this problem. We can calculate the number of samples falling into each category, specifically, $78/7140 \times 100 = 1$ sample in managers, $4020/7140 \times 100 = 56$ samples in engineers, $2008/7140 \times 100 = 28$ samples in salesmen, and $1034/7140 \times 100 = 15$ samples in technicians. As a result, stratified sampling yields better estimation (500, 112K ± 30, 147K) than that of uniform sampling.

If we have additional information, we can construct even better stratified sampling estimator. If we happen to know about the salary variation in each employee category (for example, from previous years’ data), i.e., Managers (200K ∼ 500K), Salesmen (40K ∼ 120K), Engineers (60K ∼ 80K), and Technicians (50K ∼ 70K), we can take advantage of this information. We can find that the managers and salesmen have large variances, which deserves more samples while the engineers and technicians categories have small variances, where small number of samples will work. By reassigning the number of samples in different categories, 32 samples for Managers, 7 samples for Engineers, 59 samples for Salesmen, and 2 samples for Technicians, the overall estimated result becomes 520, 066K ± 10, 113K, which provides us more precise estimation than simple stratified sampling.

4.2 Case Study 2: Monte Carlo Linear Solver

Applying Monte Carlo sampling to estimate solutions in linear systems is originally proposed by Ulam and von Neumann and later described by Forsythe and Leibler in [7]. Considering a linear system of

$$\displaystyle{x = Hx + b,}$$

where H is an n × n non-singular matrix, b is the given constant vector, and x is the vector of unknowns. The fundamental idea of the Monte Carlo solver is to construct Markov chains by generating random walks to statistically sample the underlying Neumann series

$$\displaystyle{I + H + H^{2} + H^{3} + \cdots }$$

of the linear system [8]. The transition probabilities of the random walks are defined by a transition matrix P satisfying the following transition conditions:

$$\displaystyle\begin{array}{rcl} P_{ij} \geq 0;& & {}\\ \sum _{j}P_{ij} \leq 1;& & {}\\ H_{ij}\neq 0 \rightarrow P_{ij}\neq 0,& & {}\\ \end{array}$$

and the termination probability T _i at row i is defined as

$$\displaystyle{T_{i} = 1 -\sum _{j}P_{ij}.}$$

Then, a random walk starting at i ₀ and terminating after k steps is defined as

$$\displaystyle{\gamma _{k}: r_{0} \rightarrow r_{1} \rightarrow r_{2} \rightarrow \cdots \rightarrow r_{k}}$$

where the integers r ₀, r ₁, r ₂, ⋯ , r _k are the row indices of matrix H visited during the random walk.

As noted in existing literature [9], if the necessary and sufficient condition for the convergence of Monte Carlo solver holds, a random variable X(γ _k) defined as

$$\displaystyle{X\left (\gamma _{k}\right ) = \frac{H_{r_{0}r_{1}}H_{r_{1}r_{2}}\cdots H_{r_{k-1}r_{k}}} {P_{r_{0}r_{1}}P_{r_{1}r_{2}}\cdots P_{r_{k-1}r_{k}}} b_{r_{k}}/T_{r_{k}}}$$

is an unbiased estimator of component $x_{r_{0}}$ in the unknown vector x.

Variance reductions have been commonly employed in the Monte Carlo algorithms to improve the sampling efficiency. Let’s consider a simple linear with $H = \left [\begin{array}{ccc} 0.1 &0.45& 0.225\\ - 0.15 & 0.1 & - 0.3 \\ - 0.18&0.36& 0.1 \end{array} \right ]$ and $b = \left [\begin{array}{c} 0.225\\ 1.35 \\ 0.72 \end{array} \right ]$. Clearly, the exact solution is $b = \left [\begin{array}{c} 1\\ 1 \\ 1 \end{array} \right ].$

Figure 7.3 compares the convergence of Monte Carlo linear solvers using three sampling schemes listed in Table 7.1 with different transition matrices in terms of estimate value x ₁. One can find that even though all of these sampling schemes have the same convergence rate of $O(N^{-1/2})$, the estimator based on Monte Carlo Almost Optimal (MAO) scheme [10], which samples the matrix elements according to their importance, yields significant smaller variance than the other two.

Table 7.1 Uniform, original, and MAO sampling scheme with different transition matrices

Full size table

4.3 Case Study 3: Image Recovery

In this case study, we investigate the image processing technology of using a small number of pixel samples to recover an incomplete or fuzzy image. The strategy of Monte Carlo sampling plays a critical role for the quality of the recovered image.

We use an aerial image chosen from the USC-SIPI Image Database [11] as an example. The image recovery method is based on a matrix completion algorithm by optimizing the constrained nuclear norm of the matrix [12]. Two sets of pixel samples (12. 19 % samples in each) are used—one is generated by uniform sampling and the other by importance sampling. The importance sampling process consists of two stages:

(1)
initial uniform sampling is performed with 3. 05 % pixel samples to learn the rough pixel distribution in the image to produce a proposal function; and
(2)
importance sampling is performed based on the proposal function created in (1) to generate the rest 9. 14 % samples.

As shown in Fig. 7.4, the pixel samples generated by the importance sampling scheme lead to a recovered image in significantly higher quality than the one by using uniform samples.

4.4 Case Study 4: Matrix Multiplication

In this case study, we investigate the Monte Carlo methods of approximating the product of large matrices. Let A be an m × n matrix and B be an n × p matrix, where m,n, and p are large. The Monte Carlo sampling algorithm to fast approximate the product matrix C = AB using s samples is described as follows [13]:

(1)
Generate s random integers i _k between 1 and n with probability $p_{i_{k}}$, for k = 1, ⋯ , s;
(2)
Set $M^{(k)} = A^{(i_{k})}/\sqrt{sp_{ i_{k}}}$ and $N_{(k)} = B_{(i_{k})}/\sqrt{sp_{( } i_{k } )}$, for k = 1, ⋯ , s;
(3)
Compute the matrix product of MN.

The fundamental idea of the Monte Carlo sampling algorithm is to construct a discrete random variable X with probability $p\left (X = \dfrac{A^{(k)}B_{(k)}} {p_{k}} \right ) = p_{k}$ for k = 1, ⋯ , n, where A ^(k) and B _(k) represent the kth column of A and the kth row of B, respectively. The expectation of X is

$$\displaystyle{E(X) =\sum _{ k=1}^{n}\dfrac{A^{(k)}B_{ (k)}} {p_{k}} p_{k} =\sum _{ k=1}^{n}A^{(k)}B_{ (k)},}$$

which is identical to matrix C. Therefore, matrix C can be fast approximated from the product of MN, where $MN = \dfrac{1} {s}\sum _{k=1}^{s}\dfrac{A^{(i_{k})}B_{ (i_{k})}} {p_{i_{k}}}$ is an estimator. The variance on each element in MN is [13]

$$\displaystyle{V ar\left [\left (MN\right )_{ij}\right ] =\sum _{ k=1}^{n}\dfrac{A_{ik}^{2}A_{ kj}^{2}} {s^{2}p_{k}} - \dfrac{1} {s^{2}}\left (AB\right )_{ij}^{2}.}$$

To improve the accuracy of approximating matrix–matrix multiplication, importance sampling can be effectively applied by using the optimal probabilities

$$\displaystyle{p_{k} = \dfrac{\left \vert A^{(k)}\right \vert \left \vert B_{(k)}\right \vert } {\sum _{j=1}^{n}\left \vert A^{(j)}\right \vert \left \vert B_{(j)}\right \vert },}$$

where the expectation of approximation error can be theoretically minimized [13, 14].

Let us consider a simple toy example where

$$\displaystyle{A = \left [\begin{array}{cccccc} 1.6254 & 8.5799 & 8.5596 & 5.4832 & 9.1047 &1.3797\\ 9.8717 & 4.5759 & 7.5434 & 1.1230 & 8.4098 &8.3874 \\ 4.8246 & 3.9563 & 5.1322 & 1.0143 & 8.6938 &7.7045\\ 0.9104 &545.3701 & 4.6611 &769.1278 & 9.1947 &4.4734 \\ 14.9880& 6.4549 &197.7756& 5.2276 &88.9237&7.4417 \end{array} \right ]}$$

and

$$\displaystyle{B = \left [\begin{array}{cccc} 5.5899& 2.6152 &7.8102& 8.2298\\ 5.9020 &666.2519 &2.6396 & 7.7184 \\ 7.3150&143.9785&9.0632& 4.9668\\ 2.1493 &695.7888 &3.2657 &974.7106 \\ 1.3544&758.2126&2.7377& 4.6348\\ 9.1003 & 8.3607 &6.7709 & 27.5470\\ \end{array} \right ]}$$

Figure 7.5 compares the relative approximation error of the resulting product matrix using uniform sampling and the one using important sampling with respect to sample size in 1000 runs. One can clearly find that when optimal selection probabilities are used, importance sampling outperforms uniform sampling with better approximation of the matrix product.

4.5 Case Study 5: Low-Rank Approximation

Given a matrix A, it is often desirable to find a good low-rank approximation to A in many data analysis applications. Denote u ₁, ⋯ , u _n and v ₁, ⋯ , v _n as the left and right singular vectors, respectively, σ ₁, σ ₂, ⋯ , σ _n, are singular values in non-increasing order, the matrix A can be expressed as

$$\displaystyle{A =\sum _{ i=1}^{n}\sigma _{ i}u_{i}v_{i}^{T}}$$

The best k-rank approximation [15] to a matrix A is formed as $A_{k} =\sum _{ i=1}^{k}\sigma _{i}u_{i}v_{i}^{T}$ with minor error

$$\displaystyle{\min _{rank(B)=k}\|A - B\|_{2} =\| A - A_{k}\|_{2} =\sigma _{k+1}.}$$

To handle the computational challenges involved in very big matrices, randomized Singular Value Decomposition (SVD) algorithm with Gaussian sampling [16–18] is widely used to approximate top-k singular values and singular vectors. The rationale is to construct a small condensed subspace by sampling A, where the dominant actions of A could be fast estimated from this small subspace with relatively low computation cost and high confidence. The procedure of randomized SVD is described as follows:

(1)
Sample A with a standard Gaussian matrix Ω so that Y = A Ω;
(2)
Construct a basis Q for the range of Y;
(3)
Compute matrix multiplication of B = A ^T Q;
(4)
Perform a deterministic SVD decomposition on $B = U_{B}\varSigma _{B}V _{B}^{T}$;
(5)
Assign $u_{j} = Qv_{B_{j}}$, $\sigma _{j} =\sigma _{B_{j}}$, and $v_{j} = u_{B_{j}}$, j = 1, ⋯ , k.

We consider an example of applying the randomized SVD algorithm to obtain a low-rank approximation of an image while minimizing the approximation error. The control variates approach is applied. First of all, the whole range space of matrix A is sampled to approximate the largest r singular components and then derive an approximate $\widetilde{A_{r}}$ such that $\widetilde{A_{r}} =\sum _{ i=1}^{r}\sigma _{i}u_{i}v_{i}^{T}$. If the approximation error is too high, $\widetilde{A_{r}}$ is used as the control variate to sample the next dominating singular components. This process is repeated until satisfactory approximation accuracy is achieved.

Figure 7.6 shows the results of applying randomized SVD algorithm and control variate to compute a low-rank approximation of the aerial image with less than 1 % error. Figure 7.6a presents the original image. Figures 7.6b–d illustrate the adaptive reconstructed images with increasing numbers of singular components. Finally with 100 singular components, a low-rank approximation of the original image with 0. 9 % error is obtained.

5 Summary

Monte Carlo methods are powerful tools to analyze large data sets, particularly in the big data era when the data growth outpaces the computer processing capability. We review the basics of Monte Carlo methods and the popular techniques for variance reduction in this article. Several application examples using Monte Carlo for data analysis are presented. The rule of thumb in Monte Carlo is, the more known knowledge is incorporated into the estimator, the more uncertainty can be reduced and the better data analysis accuracy can be obtained.

References

M. Hilbert, P. López, The world’s technological capacity to store, communicate, and compute information. Science 332, 60–65 (2011)
Article Google Scholar
R.L. Burden, J.D. Faires, Numerical Analysis (Brooks/Cole, Cengage Learning, Boston, 2011)
MATH Google Scholar
J.M. Hammersley, D.C. Handscomb, Monte Carlo Methods (Chapman and Hall, Methuen & Co., London, and Wiley, New York, 1964)
Book MATH Google Scholar
J.S. Liu, Monte Carlo Strategies in Scientific Computing (Springer, New York, 2008)
MATH Google Scholar
Y. Li, M. Mascagni, Grid-based Monte Carlo application, in Grid Computing Third International Workshop/Conference (2002), pp. 13–24
Google Scholar
Y. Li, M. Mascagni, Analysis of large-scale grid-based Monte Carlo applications. Int. J. High Perform. Comput. Appl. 17, 369–382 (2003)
Article Google Scholar
G.E. Forsythe, R.A. Leibler, Matrix inversion by a Monte Carlo method. Math. Comput. 4, 127–129 (1950)
Article MathSciNet Google Scholar
H. Ji, Y. Li, Reusing random walks in Monte Carlo methods for linear systems, in Proceedings of International Conference on Computational Science, vol. 9 (2012), pp. 383–392
Google Scholar
H. Ji, M. Mascagni, Y. Li, Convergence analysis of Markov chain Monte Carlo linear solvers using Ulam–von Neumann Algorithm. SIAM J. Numer. Anal. 51, 2107–2122 (2013)
Article MathSciNet MATH Google Scholar
I.T. Dimov, B. Philippe, A. Karaivanova, C. Weihrauch, Robustness and applicability of Markov chain Monte Carlo algorithms for eigenvalue problems. Appl. Math. Model. 32, 1511–1529 (2008)
Article MathSciNet MATH Google Scholar
A. Weber, SIPI image database, The USC-SIPI Image Database, Signal & Image Processing Institute, Department of Electrical Engineering, Viterbi School of Engineering, Univ. of Southern California (2012), http://sipi.usc.edu/database
J.F. Cai, E.J. Candès, Z. Shen, A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 20, 1956–1982 (2010)
Article MathSciNet MATH Google Scholar
P. Drineas, R. Kannan, M.W. Mahoney, Fast Monte Carlo algorithms for matrices I: approximating matrix multiplication. SIAM J. Comput. 36, 132–157 (2006)
Article MathSciNet MATH Google Scholar
S. Eriksson-Bique, M. Solbrig, M. Stefanelli, S. Warkentin, R. Abbey, I.C. Ipsen, Importance sampling for a Monte Carlo matrix multiplication algorithm, with application to information retrieval. SIAM J. Sci. Comput. 33, 1689–1706 (2011)
Article MathSciNet MATH Google Scholar
G.H. Golub, C.F. Van Loan, Matrix Computations (Johns Hopkins University Press, Baltimore, 2012)
MATH Google Scholar
N. Halko, P.G. Martinsson, J.A. Tropp, Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev. 53, 217–288 (2011)
Article MathSciNet MATH Google Scholar
M.W. Mahoney, Randomized algorithms for matrices and data. Found. Trends Mach. Learn. 3, 123–224 (2011)
MATH Google Scholar
H. Ji, Y. Li, GPU accelerated randomized singular value decomposition and its application in image compression, in Proceedings of Modeling, Simulation, and Visualization Student Capstone Conference, Sulfolk (2014)
Google Scholar

Download references

Acknowledgements

This work is partially supported by NSF grant 1066471 for Yaohang Li and Hao Ji acknowledges support from ODU Modeling and Simulation Fellowship.

Author information

Authors and Affiliations

Department of Computer Science, Old Dominion University, Norfolk, VA, 23529, USA
Hao Ji & Yaohang Li

Authors

Hao Ji
View author publications
You can also search for this author in PubMed Google Scholar
Yaohang Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yaohang Li .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Ji, H., Li, Y. (2015). Monte Carlo Methods and Their Applications in Big Data Analysis. In: Mathematical Problems in Data Science. Springer, Cham. https://doi.org/10.1007/978-3-319-25127-1_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-25127-1_7
Published: 16 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25125-7
Online ISBN: 978-3-319-25127-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics