1 Introduction

Numerical integration over triangular areas is a subject that has been of interest for many decades, for example when applying finite element methods; see Cowper (1973) where Gaussian quadrature formulas for triangles are derived. For sampling on a two-dimensional triangle with applications in computer graphics, see Pharr (2019), Basu and Owen (2015). Rather than mapping points from the unit cube to the triangle as done in, e.g., Heitz (2019), Pillards and Cools (2005), Fang and Wang (1993), Arvo (1995), instead Basu and Owen (2015) proposed two low-discrepancy constructions that construct points directly on the triangle. The first is based on a finite lattice, which attains a parallelogram discrepancy of \(O(\log (n)/n)\), the best possible rate. Brandolini et al. (2013) and work prior to this have only indicated that such a discrepancy was possible, without providing a construction. The lattice construction in Basu and Owen (2015) is not extensible in the sample sizes n, i.e., if \(n_1\) points are sampled, but \(n_2>n_1\) points are needed, the entire lattice with \(n_2\) points needs to be constructed rather than adding \(n_2-n_1\) points to the existing lattice. The second is a triangular van der Corput (vdC) sequence based on the one-dimensional vdC sequence in base 4 that places points in a two-dimensional triangle by recursively subdividing the triangle.

In this paper, we extend the work of Basu and Owen (2015) and explore the use of randomized quasi-Monte Carlo methods on triangles and their success in practice. Required background is reviewed in Section 2. Here, we first introduce the triangular lattice and vdC sequence of Basu and Owen (2015). In Section 2.5, we then show that the vdC sequence has poor projection properties: The one-dimensional projections of the non-randomized triangular vdC sequence contain only \(2\sqrt{n}<n\) points, which can lead to poorer integration results for functions with low effective dimension. This is different from rank-1 lattices, which can be made fully projection regular.

We then provide an extensible lattice construction for points in the triangle that can be randomized with a digital shift in Section 3, making this method more applicable in practice. In Section 4, we address the issue of the poor projection properties of the triangular vdC sequence by proposing a stratified sampling scheme using the sub-triangles constructed in the vdC sequence as strata. We also show that, in general over the unit interval and not only on triangles, the stratified sampling scheme has the same distribution as the nested scrambling of the vdC sequence, while being more efficient to implement. This connection between stratified sampling and nested scrambling allows us to have the benefits of nested scrambling without the expensive computational costs. This connection also allows us to show that nested scrambling provides a way to implement an extensible stratified estimator.

Finally, although various constructions and mappings for sampling on the triangle exist, there is little, if any, work done comparing these different methods on numerical integration examples. This motivates the inclusion of Section 5, a numerical study on a suite of test functions on the triangle that allow us to differentiate between the performance of these constructions. Section 6 concludes this paper.

2 Background

In this section, we review quasi-Monte Carlo methods, some definitions and properties of triangles, and show how one can uniformly sample within a triangle.

2.1 Monte Carlo and Quasi-Monte Carlo Methods

The Monte Carlo (MC) method for numerical integration uses repeated random sampling to obtain numerical results for integrals that either do not have a closed form or are difficult to integrate theoretically. Quantities of interest are integrals over the s-dimensional hypercube \([0,1)^s\), which can be written as

$$\begin{aligned} \mu = I(f) = \int _{[0,1)^s}f(\varvec{u})\,\mathrm{d}\varvec{u}, \end{aligned}$$

where \(f:[0,1)^s\rightarrow \mathbb {R}\) is integrable. The MC estimator of I(f) based on a sample of size n is

$$\begin{aligned} \hat{\mu}_{mc} = \frac{1}{n}\sum _{i=1}^{n}f(\varvec{u}_i), \end{aligned}$$

where \(\{\varvec{u}_1,\dots ,\varvec{u}_n\}\) are independent and identically distributed (iid) samples from the uniform distribution over \([0,1)^s\). From the Central Limit Theorem, the integration error is asymptotically normal with

$$\begin{aligned} \sqrt{n}(I(f) - \hat{\mu }_{mc}) \xrightarrow{D} {\mathrm{N}}(0, \sigma ^2), \end{aligned}$$

where \(\xrightarrow{D}\) denotes convergence in distribution and \(\sigma ^2 = I(f^2) - I(f)^2\) is the variance of f, which can be estimated by the sample variance of \(f(\varvec{u}_1),\dots ,f(\varvec{u}_n)\). The MC integration error is \(O_p(\frac{1}{\sqrt{n}})\), meaning that \(k^2\)-many samples are needed to decrease the error by a factor of k. This is one of the drawbacks of the MC method, improving the accuracy of an estimate can be computationally expensive.

A way to improve the error of the estimator of our quantity of interest is to use quasi-Monte Carlo (QMC) methods instead of the MC method. These methods replace the randomly sampled point set of an MC estimator with a deterministic point set that is chosen to fill the unit hypercube as homogeneously as possible. Such sets are known as low-discrepancy sequences or low-discrepancy point sets, as the sequences cover the area over the unit hypercube more evenly than those of a pseudo-randomly generated sequence of \(\mathrm{Uniform}\left( [0,1)^s\right) \) random variables and thereby minimize the discrepancy of the point set. There are two main kinds of low-discrepancy point sets: integration lattices (Sloan and Joe 1994) such as the Kronecker lattice of Korobov (1959), and digital nets and sequences (Dick and Pillichshammer 2010) such as the vdC sequence of van der Corput (1935). Using QMC methods allows the error of the estimator to decrease at a faster rate than \(O(\frac{1}{\sqrt{n}})\), which makes it superior for the purpose of estimation; see Owen (1997).

Quasi-random point sets can be randomized in practice to be able to estimate the variance of the randomized Quasi-Monte Carlo (RQMC) estimator. This randomization produces a new point set \(\tilde{P}_n = \{\varvec{\tilde{u}_1},\dots ,\varvec{\tilde{u}_n}\}\) which satisfies that each point \(\varvec{\tilde{u}}_i \sim \mathrm{Uniform}([0,1)^s)\) for all i while preserving the low-discrepancy of \(P_n\). Note that after applying the randomization function to quasi-random numbers, the generated points \(\varvec{\tilde{u}}_i\) are not independent across different i. Thus, any parts of the Monte Carlo methodology that depends on the assumption of independence will need to be adjusted when using the RQMC method.

In particular, with the MC estimator, one uses the fact that \(\{\varvec{u}_1,\dots ,\varvec{u}_n\}\) are iid samples to estimate the standard error of our estimator. This cannot be done in the same way when using RQMC to estimate I(f) because in that case the sample points in \(\tilde{P}_n\) are not independent. Instead, as per Lemieux (2009, Chapter 6.2), we create a random sample of v quasi-random estimators, which are each based on a randomized point set of size n. Let \(\tilde{P}_{n,l} = \{\varvec{\tilde{u}}_{1,l},\dots ,\varvec{\tilde{u}}_{n,l}\}\), where \(\tilde{P}_{n,1},\dots ,\tilde{P}_{n,v}\) are v independent copies of \(\tilde{P}_{n}\). Define the \(l^{th}\) RQMC estimator

$$\begin{aligned} \hat{\mu }_{rqmc, l} = \frac{1}{n}\sum _{i=1}^{n}f(\varvec{\tilde{u}}_{i, l}) \quad \mathrm{for } l =1,\dots , v, \end{aligned}$$

which has expectation

$$\begin{aligned} \mathrm{E}(\hat{\mu }_{rqmc,l})&= \frac{1}{n}\sum _{i=1}^{n}\mathrm{E}(f(\varvec{\tilde{u}}_{i,l})) = \frac{1}{n}\sum _{i=1}^{n}\int _{[0,1)^s} f(\varvec{\tilde{u}}_{i,l})\,\mathrm{d}\tilde{\varvec{u}}_{i,l} = I(f) \end{aligned}$$

as each \(\varvec{\tilde{u}}_{i,l} \sim \mathrm{Uniform}([0,1)^s)\). Thus, \(\hat{\mu }_{rqmc,l}\) is an unbiased estimator of I(f). The overall RQMC estimator of I(f), based on these v iid estimators, is

$$\begin{aligned} \hat{\mu }_{rqmc} = \frac{1}{v}\sum _{l = 1}^v \hat{\mu }_{rqmc,l}, \end{aligned}$$
(1)

with variance estimator

$$\begin{aligned} \hat{\sigma }^2_{rqmc} = \frac{1}{v}\left( \frac{1}{v-1}\right) \sum _{l=1}^v(\hat{\mu }_{rqmc,l} - \hat{\mu }_{rqmc})^2. \end{aligned}$$
(2)

The empirical variance in Eq. (2) can then be compared with that of the regular MC estimator with a sample size nv.

2.2 Point Sets on the Triangle

In the general case, let points ABC lie on a hyperplane in \(\mathbb {R}^d\), forming a non-degenerate triangle, i.e., not lying on the same line. Define the triangle with corners A, B and C as the convex combination

$$\triangle (A,B,C) = \left\{ \lambda _1A+\lambda _2B+\lambda _3C\,\Big |\, \min _j\{\lambda _j\}\ge 0,\ \sum _{j=1}^3\lambda _j = 1\right\}.$$

Without loss of generality, we can consider \(A,B,C\in \mathbb {R}^2\). We often construct point sets on special triangles, such as the equilateral triangle, \( \triangle _E = \triangle \left( (0,0), (1,0), (1/2, \sqrt{3}/2)\right) ,\) or the right-angle triangle, \( \triangle _R = \triangle \left( (0,0), (0,1), (1,0)\right) ,\) as it is simpler than constructing point sets on an arbitrary triangle. It is thus useful to be able to map a point set constructed on one triangle to any other arbitrary triangle. Indeed, we can use an affine transformation which preserves the ratios of the lengths of parallel line segments and ratios of distances between points lying on a straight line.

Let \(\triangle = \triangle (A,B,C)\) and \(\triangle ' = \triangle (A',B',C')\) be two arbitrary triangles. Algorithm 1 from Tymchyshyn and Khlevniuk (2019) explains how to transform a point \(\varvec{x}=(x,y)\in \triangle \) to \(\varvec{x}' = (x',y') \in \triangle '\). This transformation can be applied to every point within \(\triangle \) to create a sampling scheme on \(\triangle '\). When \(\triangle '\) is non-degenerate, this transformation is one-to-one. The algorithm can be extended to higher dimensions, for example, to map between simplexes.

Algorithm 1

(Transforming a point from \(\triangle \) to \(\triangle '\)). Given \(\triangle = \triangle (A,B,C)\), \(\triangle ' = \triangle (A',B',C')\), and a point \(\varvec{x}=(x,y)\in \triangle \):

  1. 1.

    Define the matrices

    $$ M(\triangle ) = \left( \begin{matrix} a_1 &{} b_1 &{} c_1 \\ a_2 &{} b_2 &{} c_2 \\ 1 &{} 1&{} 1 \end{matrix} \right) \quad \mathrm{and}\quad M(\triangle ') = \left( \begin{matrix} a_1' &{} b_1' &{} c_1' \\ a_2' &{} b_2' &{} c_2' \\ 1 &{} 1&{} 1 \end{matrix} \right) ,$$

    where \(A = (a_1, a_2)\), \(B = (b_1, b_2)\), \(C = (c_1, c_2)\), \(A' = (a'_1, a'_2)\), \(B' = (b'_1, b'_2)\), and \(C' = (c'_1, c'_2)\).

  2. 2.

    Let \(M(\triangle , \triangle ') = M(\triangle ') M(\triangle )^{-1}\) be our affine transformation matrix.

  3. 3.

    Return the point \(\varvec{x}' = (x', y')\) consisting of the first two components of the matrix–vector product

    $$\begin{aligned} (x', y', z') = M(\triangle , \triangle ') \left( \begin{matrix} x\\ y\\ 1\end{matrix}\right) . \end{aligned}$$
    (3)

We illustrate the affine transformation in Fig. 1. The first four points of the scrambled triangular vdC sequence are generated on the equilateral triangle \(\triangle _E\), and then mapped using Algorithm 1 to the right-angle triangle \(\triangle _R\). Clearly, the ratios of distances between points are preserved, as are the low-discrepancy properties of the point set.

Fig. 1
figure 1

Four points generated on the equilateral triangle and mapped to the right-angle triangle using Algorithm 1

2.3 Transforming a Low-discrepancy Point Set from the Unit Square to a Triangle

The end goal of creating point sets over some domain \(\Omega \) is often to estimate an integral over said domain, say, \(\int _{\Omega }f(\varvec{x})\,\mathrm{d} \varvec{x},\) where \(f:\Omega \rightarrow \mathbb {R}\) is integrable. If constructing point sets on \(\Omega \) directly is difficult or not possible, a natural and popular approach is to sample points from the unit hypercube \([0,1)^d\) and to formulate this integral as

$$\begin{aligned} \mu = \int _{\Omega } f(\varvec{x})\,\mathrm{d} \varvec{x} = \lambda (\Omega ) \int _{[0,1)^d} f(\phi (\varvec{u}))\,\mathrm{d} \varvec{u}, \end{aligned}$$

where \(\phi :[0,1)^d\rightarrow \Omega \) for some \(d\in \mathbb {N}\) is a mapping such that \(\phi (\varvec{U})\sim \mathrm{Uniform}(\Omega )\) for \(\varvec{U}\sim \mathrm{Uniform}\left( (0,1)^d\right) \). That is, we can estimate expectations by sampling points uniformly from the domain \(\Omega \) if we can find such a mapping \(\phi \).

For the right-angle triangle \(\Omega =\triangle \) with \(0 \le x_1 \le x_2 \le 1\) (which is \(\Omega =\triangle ((0,0), (0,1), (1,1))\)), Pillards and Cools (2005) give six possible transformations \(\phi \) to map points from \(\mathrm{Uniform}\left( [0,1)^2\right) \) to a uniform distribution over \(\triangle \). Note that this right-angle triangle \(\triangle \) is not the same as \(\triangle _R\), but as mentioned earlier, transforming points generated on one triangle to another triangle is straightforward.

  1. 1.

    Method Drop. Points are accepted if they are within \(\triangle \) and rejected otherwise. In higher dimensions, many points get lost – only 1 in every s! points are kept when working in s dimensions.

  2. 2.

    Method Sort. Points rejected by the drop method are recovered by reordering the coordinates of a point in the unit square such that \(x_1 \le x_{2}\) to be within \(\triangle \). This transformation is fast and continuous.

  3. 3.

    Method Mirror. We accept the points that are already in \(\triangle \) and reflect the other ones at (1/2, 1/2). The resulting transformation is fast, but discontinuous.

  4. 4.

    Method Origami. Here, the sort method is recursively used within the unit square. Starting by subdividing the unit square into \(b^{2m}\) subsquares, where b and m are user-chosen integer values, with each iteration increasing the side length of the subsquare by a factor of b, until it matches the unit square. This transformation is discontinuous.

  5. 5.

    Method Root. This method is based on Fang and Wang (1993) and given by

    $$\begin{aligned} \phi (u_1, u_2) = (u_1\sqrt{u_2}, \sqrt{u_2}). \end{aligned}$$

    This transformation is continuous and smooth, but the two sharp corners of the triangle are treated in different ways.

  6. 6.

    Method Shift. This method treats the two sharp corners of the triangle the same way, in contrast with the root method. A line is drawn with slope -1 through each point, and each point is then moved halfway towards the nearest axis along this line. This transformation is fast, but is not generalisable to higher dimensions.

The “drop” method is of acceptance-rejection type in that it only keeps the points in the triangle and discards the other ones. The other methods, except “root” and “shift”, can be thought of as improved variants of “drop”. The method “root” is originally from Fang and Wang (1993) and is an application of the inverse Rosenblatt transform; see Rosenblatt (1952). Other methods, such as introduced in Heitz (2019), have been introduced for computer graphics applications, that transform points from the unit square to \(\triangle _R\) with lower distortion than the “root” method, similarly to the “shift” method. Although the concept of “low-distortion” is not consistently defined (Shirley and Chiu 1997), such transformations aim to preserve correlation between points.

In our numerical experiments of Section 5, we use the method “root” on the bivariate Sobol’ sequence as well as on pseudo-randomly generated numbers. We use this method as it is smooth, continuous, fast and can be extended to go from higher-dimensional cubes to simplexes.

2.4 The Triangular Rank-2 Lattice

In this section, we describe lattice constructions in general, and then the triangular lattice of Basu and Owen (2015). First, we define a lattice point set.

$$\begin{aligned} P_n = \left\{ \left( \frac{i_1\varvec{z}_1}{n_1} + \frac{i_2\varvec{z}_2}{n_2} + ... + \frac{i_r\varvec{z}_r}{n_r} \right) \mathrm{mod }1,\ 0\le i_l < n_l, l = 1,...r \right\} , \end{aligned}$$

where the basis vectors \(\varvec{z}_1,..., \varvec{z}_r\in \mathbb {R}^s\). A lattice rule is of rank t if it can be expressed in this form with \(r = t\), but not \(r < t\). In particular, for rank-1 lattices, we can write

$$\begin{aligned} P_n = \left\{ \left( \frac{iz_1}{n}, \frac{iz_2}{n}, \dots ,\frac{iz_s}{n} \right) \mathrm{mod }1,\ 0\le i_l < n_l, l = 1,...r \right\} , \end{aligned}$$

based on a single generating vector \(\varvec{z} = (z_1,...,z_s)\). Rank-1 lattices have an advantage over higher-order lattices, as they can be made fully projection regular, i.e., each one-dimensional projection has n points. One way of constructing higher rank lattices is to employ the copy rules of Disney and Sloan (1992); that is, we can take a rank-1 lattice, scale it, and copy it into each of the \(2^s\) subcubes obtained by partitioning the unit cube into two parts on each side, and obtain a rank-s lattice. Thus, rank-1 lattices are more often used. There is extensive work on choices of rank-1 lattices, both theoretical and computer searches for good lattices are available; see for example, Dick et al. (2022) and Goda and L’Ecuyer (2022).

Basu and Owen (2015) give a lattice construction for points on the triangle with optimal discrepancy: let \(\alpha \in (0,2\pi )\) be such that \(\tan (\alpha )\) is a quadratic irrational number, i.e., \(\tan (\alpha )=(a+b\sqrt{c})/d\) for \(b,d\ne 0\) and \(c>0\) not a perfect square. The point set \(P_n\) obtained by rotating the lattice \((2n)^{-1/2}\mathbb {Z}^2\) counterclockwise by \(\alpha \) and intersecting with \(\triangle \) satisfies \(D_n^P(P_n,\triangle _R) \le C \log n /n\). The following algorithm produces such point sets.

Algorithm 2

(Lattice construction of Basu and Owen (2015)). Given the target sample size n, \(\alpha \) such that \(\tan (\alpha )\) is badly approximable in the sense of Basu and Owen (2015, Definition 4.1) (e.g., \(\alpha =3\pi /8\)), a random vector \(\varvec{U}\sim \mathrm{Uniform}\left( (0,1)^2\right) \) (for an optional shift), an integer N, sample n points in \(\triangle _R\) as follows:

  1. 1.

    Let \(P=\{0,1,\dots , N-1\}^2\) and set \(\varvec{x}\leftarrow (\varvec{x}+\varvec{U})/N\) for \(\varvec{x}\in P\).

  2. 2.

    Map \(\varvec{x}\leftarrow 2 \varvec{x}-1\in [-1,1]^2\) for all \(\varvec{x}\in P\).

  3. 3.

    Set \(\varvec{x} \leftarrow \begin{pmatrix} \cos (\alpha ) &{} -\sin (\alpha ) \\ \sin (\alpha ) &{} \cos (\alpha ) \end{pmatrix} \varvec{x}\) for all \(\varvec{x}\in P\).

  4. 4.

    Set \(P_n = P \cap \triangle _R\).

  5. 5.

    If \(|P_n|\ne n\), add or remove \(|P_n|-n\) points in \(\triangle _R\) to \(P_n\).

  6. 6.

    Return \(P_n\).

The randomization transforms the otherwise deterministic points so that \(\varvec{x}\sim \mathrm{Uniform}(\triangle )\) for all \(\varvec{x}\in P\), which follows readily by observing that the density of each \(\varvec{x}\) is constant if a randomization is performed. In Step 5, we can choose arbitrarily which points to add or remove. We remark that the algorithm presented in Basu and Owen (2016, p. 757) differs slightly from Algorithm 2 in that their version uses \(N=\lceil \sqrt{2n}\rceil +1\) and the lattice \(\{-N,\dots ,N\}^2\).

2.5 The Triangular vdC Sequence

We now describe the triangular vdC sequence of Basu and Owen (2015), which is based on the one-dimensional vdC sequence in base 4. The \(i^{th}\) point of the one-dimensional vdC sequence in base b is given by \(u_i=\phi _b(i-1)\) where the radical inverse function \(\phi _b\) is defined as

$$\begin{aligned} \phi _b(i) = \sum _{k\ge 0} d_k b^{-k-1},\quad i = \sum _{k\ge 0}d_k b^k \in \{0,1,\dots \}. \end{aligned}$$

With this formula, the vdC sequence places points at the left-most boundaries of each of the intervals \([b^{-m}, b^{-m+1})\). Similarly, the triangular vdC sequence, based on the vdC sequence in base 4, replaces the intervals with \(b^m = 4^m\) congruent sub-triangles and places the points in the centre of each terminal sub-triangle. Let \(T=\triangle (A,B,C)\) denote the specific triangle which we wish to generate points on. Define the sub-triangle of T with index d for \(d\in \{0,1,2,3\}\) as

$$\begin{aligned} T(d) = \left\{ \begin{array}{ll} \triangle \left( \frac{B+C}{2}, \frac{A+C}{2}, \frac{A+B}{2}\right) , &{}\quad d=0, \\ \triangle \left( A, \frac{A+B}{2}, \frac{A+C}{2}\right) , &{}\quad d=1, \\ \triangle \left( \frac{B+A}{2}, B, \frac{B+C}{2}\right) , &{}\quad d=2, \\ \triangle \left( \frac{C+A}{2}, \frac{C+B}{2}, C\right) , &{}\quad d=3. \end{array} \right. \end{aligned}$$
(4)

For the \(i^{th}\) point in the sequence, write the base 4 representation of \(i \ge 0 \) as \(i = \sum _{k\ge 0}d_k4^k\). This representation has at most \(K_i=\lceil \log _4(i)+1\rceil \) non-zero digits, which means that the expansion is finite, and we do not have to infinitely divide the triangle into sub-triangles. The original construction for the triangular vdC sequence obtains the \(i^{th}\) triangular point by mapping the integer i to the midpoint of the triangle \(T(d_0,\dots ,d_{K_i})\), which is recursively defined by \(T(d_{k}, d_{k+1})=(T(d_{k}))(d_{k+1})\), as detailed in Algorithm 3. For example, if we have \(T= \triangle _R = \triangle \left( (0,0), (0,1), (1,0)\right) \), then \(T(2,3) = (T(2))(3)\), where \(T(2) = \triangle ((0,0.5),(0,1),(0.5,0.5))\). Then, \( (T(2))(3) = \triangle ((0.25,0.5),(0.25,0.75),(0.5,0.5))\).

Algorithm 3

(Triangular vdC sequence). Given the desired sample size n and a target triangle \(\triangle (A,B,C)\), the first n points, outputted in an \(n \times 2\) array \(\varvec{x}\), are generated as follows:

  1. 1.

    For \(i=1,\dots ,n\):

    1. (a)

      Compute \((d_0,\dots ,d_{K_i})\) such that \(i-1=\sum _{k= 0}^{K_i}d_k4^k\).

    2. (b)

      Initialize \(T = \triangle (A,B,C)\).

    3. (c)

      For \(j=0,\dots ,K_i\).

      • Update \(T=T(d_j)\) using (4).

    4. (d)

      Set the \(i^{th}\) coordinate \(\varvec{x}[i]\) of \(\varvec{x}\) as \(\varvec{x}[i]=\mathrm{mdpt}(T)\), where the midpoint function is \(\mathrm{mdpt}(\triangle (A,B,C)) = (A+B+C)/3\) component-wise.

  2. 2.

    Return \(\varvec{x}=(x_1,\dots ,x_n)\).

In other words, Algorithm 3 generates a vdC sequence in base 4, and each digit of the base 4 expansion denotes which sub-triangle the point lies in. Advantages of this method include that since it is based on the vdC sequence, it is extensible, balanced, can be modified to be randomized, and it is easily implemented. However, when implementing the algorithm as originally described, once the terminal sub-triangles are identified, placing the sampling points at the centre of each results in points that suffer from poor projection properties, as shown in Proposition 1. That is, the one-dimensional projections have non-unique points, which will lead to poorer integration results, especially for functions with low effective dimension where the majority of the variance of the function is captured by one-dimensional projections onto these axes.

Proposition 1

(Non-unique one-dimensional projections for the triangular vdC sequence) Let \(T=\triangle _R\), such as in Fig. 2, and denote by \(P_{n}\) the point set consisting of the first n points produced by Algorithm 3 for \(n=4^k\) and \(k>2\). Then the projections of \(P_n\) onto the x- and y-axis contain \(2\sqrt{n}=2^{k+1}<n\) points.

Fig. 2
figure 2

The first 1000 points of the triangular vdC sequence of Basu and Owen (2015)

Proof

Since \(n=4^k\), each of the \(4^k\) subtriangles contains one point, and there are \(2^k\) rows of subtriangles. In each row j, the midpoints of all upright triangles have the same y coordinate, say \(y_{j1}\). Similarly, the midpoints of all inverted triangles in the same row have the same y coordinate, say \(y_{j2}\). Since there are only these two cases, the point set projects on \(\{y_{ji}: j=1,\dots ,2^k; i=1,2\}\), which has \(2\cdot 2^k=2^{k+1}\) elements. The x-axis case follows similarly. \(\square \)

This behaviour where the one-dimensional projections contain non-unique points can also be observed in the equilateral triangle, as seen in Fig. 6.

As a continuation of this work, Goda et al. (2017) generalize the triangular vdC sequence by replacing the support \(\{0,1,2,3\}\) of the transformation in (4) by \(\{(0,0), (1,0), (0,1), (1,1)\}=\mathbb {F}_2^2\), where the input strings (from \(\mathbb {F}_2^m\) for some m or \(\mathbb {F}_2^{\infty }\)) come from a digital net. They prove that their construction gives worst case error in \(\mathcal {O}((\log n)^3/n)\) for functions in \(C^2(\triangle )\). Furthermore, they show that their construction includes the vdC sequence of Basu and Owen (2015); see Goda et al. (2017, p. 369).

3 Extensible Triangular Lattice Constructions

We now describe our extensible triangular rank-1 and rank-2 constructions, as an extension of the triangular lattice construction. The construction in Algorithm 2 is non-extensible. In this section, we propose an extensible scheme, for which we make use of the one-dimensional vdC sequence. Recall that the \(i^{th}\) point of an extensible rank-1 lattice sequence with generating vector \(\varvec{z} \in \mathbb {Z}^d\) is defined as

$$\begin{aligned} \varvec{u}_i = \phi _b(i) \varvec{z} \bmod {1} \in [0,1]^d, \end{aligned}$$

where \(\phi _b(i)\) is the \(i^{th}\) term of the vdC sequence in base b, and the modulus 1 operation is applied component-wise; see Hickernell and Hong (1997) and Hickernell et al. (2001).

If we want to use this idea to define a triangular Kronecker lattice sequence, then we need to introduce a more general idea extending the rank-1 case. In particular, the grid that needs to be generated, given by

$$\begin{aligned} P = \left\{ -\frac{N}{N},-\frac{N-1}{N},\ldots ,-\frac{1}{N},0,\frac{1}{N},\ldots , \frac{N}{N}\right\} ^2, \end{aligned}$$

is a rank-2 lattice rather than a rank-1 lattice. We also note that the only reason why Basu and Owen (2015) use a grid over \([-1,1]^2\) instead of \([0,1)^2\) seems to be that when they rotate the points from the grid, they can just take the intersection with \(\triangle _R\) without having to perform any modulo 1 operation. However, with our proposed approach to generate an extensible grid, we will focus on generating points in \([0,1)^2\); we can then either make use of modulo 1 operations or extend the grid to \([-1,1]^2\).

First, we must choose the base b in which the grid will be constructed. The choice of the base b is such that whenever n is of the form \(b^{2k}\) for some \(k \ge 1\), the point set \(P_n\) obtained with this method will be exactly the same as if we had proceeded with the fixed-size approach based on \(N=b^k\); see Proposition 2 below. We then make use of the base b decomposition of i as \(i = \sum _{j\ge 0} d_jb^j\) to define the point

$$\begin{aligned} \varvec{u}_i = (u_{i,1},u_{i,2}) = \frac{d_0}{b} (1,1) + \frac{d_1}{b} (0,1) + \frac{d_2}{b^2} (1,1) + \frac{d_3}{b^2} (0,1) + \ldots \bmod {1}. \end{aligned}$$
(5)

Note that the modulo 1 operation is only necessary for the second coordinate, as we have the bound

$$\begin{aligned} \frac{d_0}{b} + \frac{d_2}{b^2} + \frac{d_4}{b^3} + \ldots \le \sum _{j\ge 1}\frac{d_j}{b^k} \le \sum _{j\ge 1}\Bigl (\frac{1}{b}\Bigr )^k = \frac{1}{1-1/b}-1 \le 1. \end{aligned}$$

Figure 3 shows the first 9, 50, and 81 points obtained with this method when \(b=3\).

Fig. 3
figure 3

First 9 (left), 50 (middle) and 81 points when using \(b=3\)

The points \(\varvec{u}_i\) lie in \([0,1)^2\). Next, we rotate the points by \(\alpha \) and can then proceed as in Algorithm 2.

Algorithm 4

(Extensible Kronecker Lattice). Given \(\alpha , n, b\), a random vector \(\varvec{U}\sim \mathrm{Uniform}\left( (0,1)^2\right) \) (for the optional shift) and a skip \(s\ge 0\), sample n points in \(\triangle _R\) as follows:

  1. 1.

    Set \(P_n=\{\}\) and \(k=0\).

  2. 2.

    While \(|P_n|<n\),

    1. (a)

      Compute digits \(d_j\) such that \(k+s = \sum _{j\ge 0}d_jb^j\) and compute \(\varvec{u}\) in (5).

    2. (b)

      Set \(k\leftarrow k+1\), \(\varvec{u} \leftarrow 2((\varvec{u}+\varvec{U})\mod 1)-1\in [-1,1]^2\).

    3. (c)

      Set \(\varvec{u} \leftarrow \begin{pmatrix} \cos (\alpha ) &{} -\sin (\alpha ) \\ \sin (\alpha ) &{} \cos (\alpha ) \end{pmatrix} \varvec{u}\).

    4. (d)

      If \(\varvec{u}\in \triangle _R\), set \(P_n=P_n\cup \{\varvec{u}\}\).

  3. 3.

    Return \(P_n\).

Proposition 2

(Equivalence of Algorithms 2 and 4). Let \(n=b^{2k}\), \(k>0\) and \(s=0\). The sets \(P_n\) of points obtained by Algorithm 2 with \(N=b^k\) and Algorithm 4 coincide.

Proof

Assume without loss of generality that \(\varvec{U}=\varvec{0}\). Consider the set \(Q=\{\varvec{u}_i:i=0,\dots ,n-1\}\) where the \(\varvec{u}_i\) are as in (5). Then Q is a rank-2 lattice and can be written as \(Q=\{ (i/b^k, j/b^k): 0\le i,j < b^k\}\). This is exactly the rectangular grid in Step 1 of Algorithm 2. Since the remaining operations (intersection and rotation) are identical in both algorithms, the result follows. \(\square \)

Algorithm 4 is based on a rank-2 lattice with generating vectors \(\varvec{z}_1=(1,1)\) and \(\varvec{z}_2=(0,1)\). In our numerical experiments, we also consider a rank-1 lattice with generating vector \(\varvec{z}=(1, 182667)\); see Cools et al. (2006, p. 26). The lattice construction allows for an easy randomization by shifting the underlying grid by a uniform vector, as originally proposed by Cranley and Patterson (1976). Figure 4 displays five independently randomized copies of the lattice points, each having a different colour.

Fig. 4
figure 4

Five independently randomized triangular Kronecker lattice point sets with \(2^6\) points each

4 Stratified Sampling Based on the Triangular vdC Sequence

As previously mentioned, placing the sampling points at the centre of each terminal sub-triangle for the triangular vdC sequence results in points being mapped to the same coordinate on the x or y axis. Since the triangular vdC sequence is based on the one-dimensional vdC sequence over the unit interval, a natural randomization to “repair” the poor projection quality is scrambling.

In this section, we explain the equivalency between stratified sampling and the nested scrambling of Owen (1995) for any (0, m, 1)-net or (0, 1)-sequence. This equivalency is not limited to the vdC sequence, but will hold in general for all (0, 1)-sequences, including, for example, one-dimensional projections of the Sobol’ sequence. This will allow us to then propose an efficient implementation of nested scrambling via a stratified estimator. We also show that nested scrambling is a way to implement an extensible stratified estimator. Finally, since the triangular vdC sequence is essentially a mapping from the one-dimensional vdC sequence to the two-dimensional space, it is easy to transform our proposed scrambling implementation in one dimension into a randomization method for the triangular vdC sequence.

4.1 Stratified Sampling

A simple way to reduce the variance of an estimator for the integral of a function f over the unit interval [0, 1) is to stratify [0, 1) into M subintervals of length \(p_j\) and allocate \(n_j\) points to the jth subinterval (with \(n_1+\ldots +n_M = n\)), thus resulting in the estimator

$$\begin{aligned} \hat{\mu }_{strat} = \sum _{j=1}^M p_j \frac{1}{n_j} \sum _{i=1}^{n_j} f(u_j^i), \end{aligned}$$

where the \(u_j^i\) are iid uniforms in the jth subinterval, and independent from \(u_k^{\ell }\) for \(k \ne j\), \(\ell =1,\ldots ,n_k\). If we allocate a number of points proportional to the length of each subinterval (i.e., \(n_j = np_j\) which is assumed to be an integer), then this is stratified sampling with proportional allocation. This method is guaranteed to have a lower variance than regular Monte Carlo; see Lemieux (2009, p. 126). Furthermore, if \(n = b^m\) and we set \(M=b^m\) then one point \(u_j^1=u_{j}\) is allocated to the jth stratum, and we get \(\hat{\mu }_{strat} = \sum _{j=1}^{n} f(u_{j})/n\) where each \(u_j\) is uniformly distributed in the jth stratum.

The variance of this estimator is \({\mathrm{Var}}(\hat{\mu }_{strat}) = \frac{1}{n^2}\sum _{j=1}^{n} \sigma _j^2\), where \(\sigma _j^2\) is the variance of f within the \(j^{\mathrm{th}}\) subinterval.

In the context of this section, since the number of points is not necessarily an integer power of b, we refer to a base b stratified sampling scheme as a sampling scheme where for any number of points n, if we subdivide the unit interval into subintervals with length \(b^{-m}\) for any positive integer m, the number of points within each subinterval is different by at most one from the number of points within any other subinterval. That is, \(|n_j-n_{\ell }| \le 1\) for all \(j,\ell =1,\ldots ,b^m\). Stratified sampling with this property can yield a more efficient implementation of nested scrambling, which we explain in the next section.

4.2 Nested Uniform Scrambling

The nested uniform scrambling method of Owen (1995), sometimes referred to as “Owen’s scrambling” or “nested scrambling”, scrambles a point set \(P_n \subseteq [0,1)^s\) by applying random permutations to the digits \(u_{i,j,l}\) that stem from the base b expansion of \(u_{i,j}\) (we write \(u_{i,j} = \sum _{l=1}^{\infty } u_{i,j,l} b^{-l}\)). Permutations \(\pi \) are randomly uniformly distributed over all b! permutations of \([0,1,\dots ,b-1]\). The permutation of each digit depends on all the digits that came before it, and a new set of permutations is used for each coordinate. That is, the permutation used for \(u_{i,j,1}\) is \(\pi _{j}\), the permutation used for \(u_{i,j,2}\) is \(\pi _{j, u_{i,j,1}}\) (the permutation applied to the second digit \(u_{i,j,2}\) depends on the first digit \(u_{i,j,1}\)), the permutation used for \(u_{i,j,3}\) is \(\pi _{j, u_{i,j,1} u_{i,j,2}}\), and so on. That is, to randomize the \(k^{th}\) digit of a given coordinate j, up to \(b^{k-1}\) different permutations may be needed.

Nested scrambling is costly to implement, both in terms of time and memory. It requires a dictionary or other lookup data structure to store all the permutations generated, and, in addition to the time to go through all the coordinates of every point, the lookup time for each permutation needs to be paid. However, despite being costly to implement, nested scrambling is often used because it has the potential to reduce the variance of the RQMC estimator in Eq. (1) to \(O(n^{-3}\log (n)^{s-1}) \approx O(n^{-3 + \epsilon })\) for sufficiently smooth functions as shown in Owen (1997). Use of nested scrambling is further justified by the fact that nested scrambling in base b also preserves equidistribution in base b as defined below, and satisfies the requirement of being a base b-digital scramble as defined in Hickernell (1996), Hong and Hickernell (2003), Owen (2003) and Wiart et al. (2021), which is also given below.

Definition 1

(Equidistribution). We say that \(P_n\) with n of the form \(b^m\) with \(m \ge 0\) for a single-base b construction, is \((k_1,\ldots ,k_s)\)-equidistributed in base b if every elementary \((k_1,\ldots ,k_s)\)-interval of the form

$$\begin{aligned} I_{\varvec{k}}(\varvec{a}) = \prod _{\ell =1}^s \left[ \left. \frac{a_{\ell }}{b^{k_{\ell }}},\frac{a_{\ell }+1}{b^{k_{\ell }}} \right. \right) \end{aligned}$$

for \(0 \le a_{\ell } < b^{k_{\ell }}\) contains exactly \(nb_1^{-k_1}\cdots b_s^{-k_s}\) points from \(P_n\), assuming \(\varvec{k}\) is such that \(n \ge b_1^{k_1} \ldots b_s^{k_s}\).

Definition 2

((tms)-nets). We say that a point set \(P_n\) in base b has a quality parameter t if \(P_n\) is \((k_1,\ldots ,k_s)\)-equidistributed for all s-dimensional vectors of non-negative integers \(\varvec{k}=(k_1,\ldots ,k_s)\) such that \(k_1 + \ldots + k_s \le m-t\). We then refer to \(P_n\) as a (tms)-net in base b.

Definition 3

((ts)-sequences). We say a sequence is a (ts)-sequence in base b if for every integer \(m \ge 0\), every point set of the form \(\varvec{u}_{j},\ldots ,\varvec{u}_{j+b^m-1}\), where j is of the form \(j=vb^m+1\) for some \(v \ge 0\), is a (tms)-net.

Definition 4

(Base b-digital scramble). A randomization \(\mathcal{S}\) is a base b-digital scrambling if the following two properties hold: Let \(U_{i,\ell } = \sum _{r=1}^{\infty } U_{i,\ell ,r} b^{-r}\), that is, \(U_{i,\ell ,r}\) represents the \(r^{\mathrm{th}}\) digit in the base b expansion of the \(\ell ^{\mathrm{th}}\) coordinate of the \(i^{\mathrm{th}}\) point \(\varvec{u}_i\) in the scrambled point set \(\tilde{P}_n\). Then we must have:

  1. 1.

    Each \(\varvec{u}_i \sim \mathrm{Uniform}([0,1)^s)\);

  2. 2.

    For two distinct points \(\varvec{u}_i=\mathcal{S}(\varvec{v}_i),\varvec{u}_j=\mathcal{S}(\varvec{v}_j)\) and for each coordinate \(\ell =1,\ldots ,s\), if the two deterministic points \(V_{i,\ell },V_{j,\ell }\) have the same first r digits in base b and differ on the \((r+1)^{\mathrm{th}}\) digit, then:

    1. (a)

      the scrambled points \((U_{i,\ell },U_{j,\ell })\) also have the same first r digits in base b;

    2. (b)

      the pair \((U_{i,\ell ,r+1},U_{j,\ell ,r+1})\) is uniformly distributed over \(\{(k_1,k_2), 0 \le k_1 \ne k_2 < b\}\);

    3. (c)

      the pairs \((U_{i,\ell ,v},U_{j,\ell ,v})\) for \(v > r+1\) are mutually independent and uniformly distributed over \(\{(k_1,k_2), 0 \le k_1, k_2 < b\}\).

4.3 Nested Scrambling as Stratified Sampling

We now show that we can implement a nested scrambled vdC sequence via a base b stratified sampling scheme on the unit interval, as the two methods produce point sets with the same joint distribution. First, we show that the scrambled vdC sequence in base b is a (0, 1)-sequence in base b. Since scrambling in the constructing base does not change the t parameter, it is sufficient to show that the deterministic vdC sequence is a (0, 1)-sequence.

Lemma 1

(A consecutive subset of the vdC sequence is a (0, m, 1)-net). Any point set \(P_n\) that is made up of \(n = b^m\) consecutive points from the vdC sequence constructed in base b is a (0, m, 1)-net in base b.

Proof

If we consider the first m digits in the base b expansion of each point \(u_i \in P_n\), there is exactly one point with each of the unique \(b^m\) combinations of digits; see Dong and Lemieux (2022, Section 5). That is, there is exactly one point in each of the m-elementary intervals and thus, by definition, a (0, m, 1)-net in base b, and thus \(P_n\), is m-equidistributed in base b. \(\square \)

Lemma 2

(A consecutive subset of a vdC sequence has the same properties as the initial portion of a (0, 1)-sequence). Any point set \(P_n\) that is made up of n consecutive points from a vdC sequence in base b has the same equidistribution properties as the first n points of a (0, 1)-sequence in base b.

Proof

The result follows by observing that every point set \(P_k\) made up of \(k = b^m\) consecutive points from the vdC sequence in base b is a (0, m, 1)-net in base b (by Lemma 1). \(\square \)

Lemma 3

(Equivalence of the scrambled vdC sequence and stratified sampling for \(n = b^m\)). Let \(\hat{\mu }_{scr,n} = \frac{1}{n}\sum _{i=1}^{n}f(\tilde{u}_i)\) where \(\tilde{u}_i\), \(i = 1 \ldots n\), are the first n points of a scrambled vdC sequence in base b. If \(n=b^m\) for some positive integer m, then \(\hat{\mu }_{scr,n}\) corresponds to the estimator that uses stratified sampling with proportional allocation.

Proof

Since the first \(b^m\) points of the scrambled vdC sequence in base b is a (0, m, 1)-net in base b (by Lemma 1), its equidistribution properties mean that there is exactly one point in each of the \(b^m\) subintervals, and the scrambling as defined in Definition 4 has the point placed independently and uniformly within each of the \(b^m\) subintervals. This is exactly stratified sampling with proportional allocation. \(\square \)

However, in general, n is not a power of b. In this case, we argue that the equidistribution properties of the (0, 1)-sequence are such that the scrambled vdC estimator (by Lemma 2) places a number of points in the intervals \([j b^{-l}, (j+1)b^{-l})\) that are different by at most one from each other for \(l \le m\) for \(m = \lfloor \log _b n \rfloor \).

To explain how to implement this estimator using our base b stratified sampling algorithm, we introduce the following notation. Write \(n = \lambda b^m + r\), with the quotient \(\lambda = \lfloor n /b^m \rfloor \) and remainder r satisfying \(0 \le r <b^m\). Further, break down r as \(r = kb + j\), where \(0 \le j < b\). We also define M to be the smallest integer power of b greater than or equal to n. That is, \(M=b^q\), where \(q =\lceil \log _b n \rceil .\)

In Proposition 3 we will prove that we can implement \(\hat{\mu }_{scr,n}\) for any sample size n as a base b stratified sampling estimator over strata of the form \([j b^{-l},(j+1)b^{-l})\) for \(l = 1,\ldots ,m\). This is done by using the vector \(\left[ N_1,\ldots ,N_{M}\right] \), where \(N_j\) is the number of points in \([j b^{-q},(j+1)b^{-q})\) (either 0 or 1). That is, \(N_j\) counts the number of points in the smallest meaningful stratum: strata of size \(b^{-q}\) or smaller can have at most 1 point, thus there is no reason to subdivide further.

The scrambled vdC estimator satisfies the following properties:

  1. 1.

    \(N_j \in \{0,1\}\), \(j=1,\ldots ,M\), \(\sum _{j=1}^M N_j = n\), and

  2. 2.

    The \(N_j\)’s have the same marginal distribution.

To define the scrambled estimator, we simply need to generate the vector of \(N_j\)’s with the properties listed above. Rather than obtaining these \(N_j\) via scrambling, we propose a more efficient procedure for sampling a vector of \(N_j\)’s given a base b and a number of points n in Algorithm 5 (the latter samples \(N_1,\dots ,N_M\) for computing \(\hat{\mu }_{scr,n}\)).

Algorithm 5

(Sampling \(N_1,\dots ,N_M\) for computing \(\hat{\mu }_{scr,n}\)). Given input \(n\in \mathbb {N}\) and \(b\ge 2\), sample \(N_1,\dots ,N_M\) as follows:

  1. 1.

    Initialize n and b. Calculate the constants \(q, m, M, \lambda ,r,k,j\) as summarized in Table 1.

  2. 2.

    If \(M=n\), return \(N_j=1\) for \(j=1,\ldots ,M\).

  3. 3.

    If \(m = 0\), then \(n = \lambda \), and \(M = b\). Randomly choose \(\lambda \) of the b \(N_j\)s to be 1 and \(b-\lambda \) of the \(N_j\)’s to be 0, and return \(N_j\) for \(j=1,\ldots ,M\).

  4. 4.

    If \(m > 0\), we recursively generate the \(N_j\)’s in each of the b sub-intervals of the form \([j b,(j+1)b)\), \(j = 0,\ldots ,b-1\) as follows:

    1. (a)

      Generate a vector of \(n_i\)’s of length b that sums to n with \(b-r\) entries of \(\lambda b^{m-1}\) and r entries of \(\lambda b^{m-1} + 1\):

      1. (i)

        Randomly choose a subset \(\mathcal{I}\) of indices j from \(\{1,\ldots ,b\}\).

      2. (ii)

        If \(i \in \mathcal{I}\), then \(n_i = \lambda b^{m-1} + k + 1\).

      3. (iii)

        Otherwise, \(n_i = \lambda b^{m-1} + k\).

    2. (b)

      For each \(i = 1,\ldots ,b\), we generate \(N_{(i-1) b^m + 1},\ldots ,N_{i b^m}\) by restarting at Step 1 with the same base b but with \(n = n_i\), and \(q = m\). At this step, we have b recursive calls to the function.

  5. 5.

    Return \(N_1,\dots ,N_M\).

Table 1 Constants used in Algorithm 5

Note that in Step 4b, when we return to Step 1, if \(n_i\) is a power of b, q is not guaranteed to be equal to \(\lceil \log _b n \rceil \), as it can also take on the value \(\lceil \log _b n \rceil + 1\). For example, this case may arise when sampling \(n = 10\) points in base 3. We would need to allocate these 10 points into 27 strata, so we will need to put 3 points into 9 strata. So even though \(3 = 3^1\), we would want \(q = 2\) as \(9 = 3^2\): q is counting how many times we need to subdivide the unit interval.

After acquiring the \(N_j\)’s using Algorithm 5, we must generate the point set. This process is straightforward. To generate the point set based on the \(N_j\), for every \(j = 1 \dots M\), if \(N_j = 1\), generate a point uniformly in the interval \([j b^{-q},(j+1)b^{-q})\). If \(N_j = 0\), do not generate a point in the interval.

Proposition 3

(Equivalence of stratified sampling and the one-dimensional vdC sequence). The point set created using Algorithm 5 to define the \(N_j\)’s is such that pairs of distinct points have the same joint distribution as those coming from a scrambled vdC point set and thus the corresponding estimators have the same first two moments.

Proof

We must derive the joint pdf of pairs of randomly chosen distinct points from the point set created using Algorithm 5, and show that it is the same as the joint pdf of pairs of randomly chosen distinct points from a one-dimensional scrambled vdC point set. To do so, we need the following definitions taken from Wiart et al. (2021) and Dong and Lemieux (2022):

Definition 5

(\(\gamma _b(x,y)\)). For \(x,y\in [0,1)\), let \(\gamma _b(x,y) \ge 0\) be the exact number of initial digits shared by x and y in their base b expansion, i.e. the smallest number \(i \ge 0\) such that \( \lfloor b^ix\rfloor =\lfloor b^iy\rfloor \) but \(\lfloor b^{i+1}x\rfloor \ne \lfloor b^{i+1}y\rfloor .\) If \(x=y\) then we let \(\gamma _b(x,y) = \infty \).

Definition 6

(Counting numbers \(N_b(i;P_n)\) and \(M_b(k;P_n)\)). Let \(P_n =\{U_1,\ldots ,U_n\}\) be a point set in [0, 1) and \(b, i, k \in \mathbb {N}, b \ge 2\). Then,

  1. 1.

    \(N_b(i;P_n)\) is the number of ordered pairs of distinct points \((U_l,U_j)\) in \(P_n\) such that \(\varvec{\gamma }_b(U_l,U_j) = i\),

  2. 2.

    \(M_b(k;P_n)\) is the number of ordered pairs of distinct points \((U_l,U_j)\) in \(P_n\) such that \(\varvec{\gamma }_b(U_l,U_j) \ge k\), and

  3. 3.

    \(N_b(k;P_n,U_l) = \sum _{e \in \{0,1\}} (-1)^{|e|} M_b(k+e;P_n,U_l)\).

(Proof, continued.) The joint pdf of a one-dimensional scrambled vdC is

$$\begin{aligned} \psi (x, y)= \left\{ \begin{array}{ll} \frac{N_b(i;P_n)}{n(n-1)}\frac{b^{1 + i}}{(b-1)}, &{}\mathrm{if } {i} < \infty , \\ 0, &{}\mathrm{if } {i}=\infty , \end{array} \right. \end{aligned}$$

where \(P_n\) refers to the first n points of the deterministic vdC sequence.

We now show that the scrambled vdC point set and the point set constructed using the proposed base b stratified sampling method from Algorithm 5 yield the same \(M_b(k; P_n)\) as defined in Definition 6. This means that \(N_b(i;P_n)\) and thus the joint pdf \(\psi (x,y)\) for all pairs of points (xy) are then the same for both methods.

Since the base b vdC sequence is equivalent to a one-dimensional Halton sequence, we know from Dong and Lemieux (2022) that in this case \(M_b(k;P_n)\) is given by:

$$\begin{aligned} {}_{vdC} M_b(k;P_n) = \lfloor * \rfloor {\frac{n-1}{b^k}}\left( 2n - \lfloor * \rfloor {\frac{n-1}{b^k}}b^{k}-b^{k}\right) . \end{aligned}$$
(6)

For any k, we can write \(n = qb^k + r\), where \(q = \lfloor * \rfloor {\frac{n}{b^k}}\) and \(0 \le r < b^k\). By construction of the base b stratified sampling estimator, we divide the unit interval into \(b^k\) segments and any pairs of points in the same segment will share at least k initial common digits. There are \(b^k - r\) of these segments with q points, and r segments with \(q+1\) points. Thus, the number of ordered pairs of points that are in the same segment is \((b^k - r)q(q-1) + r(q+1)q\). Substituting \(r = n-qb^k\), we have \({}_{SS} M_b(k;P_n) = 2nq - q^2b^k - q b^k\) for the point set created via base b stratified sampling. Now, we substitute \(q = \lfloor * \rfloor {\frac{n}{b^k}}\) and simplify to get

$$\begin{aligned} {}_{SS} M_b(k;P_n)&= 2n\lfloor * \rfloor {\frac{n}{b^k}} - \lfloor * \rfloor {\frac{n}{b^k}}^2b^k - \lfloor * \rfloor {\frac{n}{b^k}} b^k = \lfloor * \rfloor {\frac{n}{b^k}}\left( 2n - \lfloor * \rfloor {\frac{n}{b^k}}b^{k}-b^{k}\right) . \end{aligned}$$

This is very similar to (6), except with \(\lfloor \frac{n-1}{b^k}\rfloor \) rather than \(\lfloor \frac{n}{b^k}\rfloor \). We show that \({}_{SS} M_b(k;P_n) = {}_{vdC} M_b(k;P_n)\) by considering the following two cases:

  1. 1.

    \(\lfloor * \rfloor {\frac{n}{b^k}} = \lfloor * \rfloor {\frac{n-1}{b^k}}\). This case occurs when \(r \ne 0\). If this is the case, then we conclude \({}_{SS} M_b(k;P_n) = \lfloor * \rfloor {\frac{n-1}{b^k}}\left( 2n - \lfloor * \rfloor {\frac{n-1}{b^k}}b^{k}-b^{k}\right) = {}_{vdC} M_b(k;P_n)\) and we are done.

  2. 2.

    \(\lfloor * \rfloor {\frac{n}{b^k}} \ne \lfloor * \rfloor {\frac{n-1}{b^k}}\). In this case, \(r = 0\) and \(\lfloor * \rfloor {\frac{n}{b^k}} = \frac{n}{b^k} = q\), as well as \(\lfloor * \rfloor {\frac{n-1}{b^k}} = \frac{n}{b^k} - 1 = q-1\). Then, we can write \({}_{SS} M_b(k;P_n)\) as

    $$\begin{aligned} {}_{SS} M_b(k;P_n)&= \frac{n}{b^k}\Bigl (2n - \frac{n}{b^k} b^k - b^k\Bigr ) = \frac{n}{b^k}(2n-n-b^k) = \frac{n^2}{b^k} - n. \end{aligned}$$

    Similarly, we can write \({}_{vdC} M_b(k;P_n)\) as:

    $$\begin{aligned} {}_{vdC} M_b(k;P_n)&= \Bigl (\frac{n}{b^k}-1\Bigr )\Bigl (2n - \Bigl (\frac{n}{b^k}-1\Bigr ) b^k - b^k\Bigr ) \\&= \Bigl (\frac{n}{b^k}-1\Bigr )n = \frac{n^2}{b^k} - n. \end{aligned}$$

    Thus, \({}_{SS} M_b(k;P_n) = {}_{vdC} M_b(k;P_n)\), as needed.

Hence, we find that \({}_{SS} M_b(k;P_n) = {}_{vdC} M_b(k;P_n)\) and thus that \({}_{SS} N_b(k;P_n) = {}_{vdC} N_b(k;P_n)\) for all k. This implies that \({}_{SS} \psi (x,y) = {}_{vdC} \psi (x,y)\) for all \((x,y) \in [0,1)^2\). Thus, the estimator implemented using the base b stratified sampling has the same first two moments as the scrambled vdC estimator. \(\square \)

Since n is not necessarily an integer power of b, our estimator does not inherit properties of stratified sampling with proportional allocation. However, its connection to nested scrambling implies we can apply results about the variance of scrambled estimators for (0, 1)-sequences as found, e.g., in Gerber (2015). That is, even though the variance is not guaranteed to be no higher than for Monte Carlo for any function (as a purely stratified sampling estimator based on proportional allocation would), the superior asymptotic bounds for the variance of a scrambled estimator apply. Also, since we know the scrambled vdC estimator produces an unbiased estimator (Dong and Lemieux 2022) this means Algorithm 5 also produces an unbiased estimator.

4.4 Comparative Efficiency Analysis for Fixed n

Using the above connection between scrambling and base b stratified sampling, for fixed n we can implement scrambling by sampling the \(N_j\)’s as in Algorithm 5 and once those intervals where a point will be placed have been identified, we simply place a point uniformly in that interval. This approach is computationally more efficient than proceeding via recursive permutations, as is required when implementing nested scrambling.

The recursive Algorithm 5 for base b stratified sampling has \(\lceil \log _b(n) \rceil \) layers, and each call to the function has b sub-calls. Thus, in total, there are \(b + b^2 + \ldots + b^{\lceil \log _b(n) \rceil } = O(n)\) operations. For nested scrambling of the vdC sequence, we need to scramble the first \(\lfloor \log _b(n-1) \rfloor + 1\) digits for each of the n points. This means that the number of operations needed is \(O(n \log (n))\). In addition, permutations have to be stored in a lookup dictionary. There are b combinations of 1 digit, \(b^2\) combinations of 2 digits, and so on. Thus, the amount of storage needed for nested scrambling is \(b+2b^2 + \ldots + (\lfloor \log _b(n-1) \rfloor + 1)b^{\lfloor \log _b(n-1) \rfloor + 1} = O(n\log (n))\). Another advantage of the base b stratified sampling algorithm is that it does not require any additional storage for a lookup dictionary, as the nested scrambling algorithm does.

Figure 5 shows the runtime needed to generate \(n = 2000, 4000, \ldots , 200\,000\) points from the scrambled base 4 vdC sequence needed to construct point sets on the triangle. Our base b stratified sampling implementation is compared with nested scrambling and shown to be more computationally efficient. Furthermore, an increase in runtime occurs only at integer powers of 4. We also report, in the legend, a growth rate of the runtime \(\alpha \), such that the runtime is proportional to \(n^{\alpha }\). This is estimated by the regression coefficient \(\alpha \) of \(\mathrm{log}(\mathrm{runtime}) = \alpha \mathrm{log}(n) + c\). \(\alpha \) is estimated to be 1.07 for stratified sampling and 1.21 for nested scrambling, showing that, indeed, stratified sampling has close to linear time complexity while nested scrambling has a higher time complexity.

Fig. 5
figure 5

Runtime comparison for generating a scrambled vdC sequence in base 4 using Owen’s Scrambling vs Stratified Sampling. Estimated growth rate for the runtimes are in parentheses in the legends

4.5 From the One-dimensional vdC Sequence to the Triangular vdC Sequence

It is now straightforward to use the ideas presented thus far for the one-dimensional vdC sequence to construct a scheme to sample n points on an arbitrary triangle. The M “strata” are now the sub-triangles, and we can use Algorithm 5 to sample the number of points in each sub-triangle, say \(N_1,\dots ,N_M\). Then, sample \(N_j\) (either 0 or 1) points uniformly in each sub-triangle for \(j=1,\dots ,M\), for example, using one of the methods described in Section 2.3. This estimator can also be extended using the method described in Appendix.

An algorithm similar to Algorithm 3 can be used to sample in the triangle – the steps are described in Algorithm 6. Again, the sampling scheme subdivides the triangle into a finer and finer partition of triangles until each sub-triangle gets at most one point. The differences are that we now fill the sub-triangles in a non-deterministic order (that still satisfies the equidistribution properties), and as well, the points within each sub-triangle are uniformly placed rather than put in the centre.

Algorithm 6

(Mapping the stratified sampling estimator to the triangle). Given input \(n\ge 1\) and target triangle \(\triangle (A,B,C)\), generate the first n points, outputted in an \(n \times 2\) array \(\varvec{x}\), as follows:

  1. 1.

    Generate a vector of indexes \(I_1, \ldots , I_n\) using Algorithm 5 with base \(b = 4\). Each of the indexes \( 0 \le I_i \le 4^{\lceil \log _4 n \rceil } - 1\), so we know that the base 4 representation is finite for each index. The representation has at most \(K_i=\lceil \log _4(i)+1\rceil \) digits, which means that the expansion is finite, and we do not have to infinitely divide the triangle into sub-triangles.

  2. 2.

    For \(i=1,\dots ,n\):

    1. (a)

      Compute \((d_0,\dots ,d_{K_i})\) such that \(I_i=\sum _{k = 0}^{K_i}d_k4^k\);

    2. (b)

      Initialize \(T = \triangle (A,B,C)\);

    3. (c)

      For \(j=0,\dots ,K_i\),

      • Update \(T=T(d_j)\) using (4);

    4. (d)

      Set the \(i^{th}\) point \(\varvec{x}[i]\) to be a random uniformly sampled point within T. In our implementation, we use the “root” method as described in Section 2.3;

  3. 3.

    Return \(\varvec{x}\).

This avoids the poor projection properties of the original triangular vdC sequence as seen in Fig. 6, and it provides a way to randomize the point set to allow for error estimation.

Fig. 6
figure 6

Examples of the triangular vdC points generated on \(\triangle _E\), with \(n = 10\) (top) and \(n = 16\) (bottom). The images on the left have each point at the centre of the terminal sub-triangle, while the images on the right have the points scrambled. For the \(n=10\) case, the sub-triangles with points are selected via stratified sampling

We note that this method of mapping points from the one-dimensional vdC sequence to the two-dimensional triangle can be done in other bases – for any base b such that b is a perfect square, we can map a scrambled vdC sequence in base b to the triangle, which is simply divided into b sub-triangles of equal size. The enumeration of the sub-triangles does not matter – after scrambling, (or, equivalently, stratified sampling), the points are uniformly distributed between the sub-triangles.

Proposition 4

(Unbiasedness of \(\hat{\mu }_{scr,n}\)). The estimator \(\hat{\mu }_{scr,n}\) is unbiased.

Proof

The statement is equivalent to showing that each point is marginally uniformly distributed over the triangle. Without loss of generality, the marginal pdf f for the point \(\varvec{u} = (u_1,u_2)\) uniformly distributed over \(\triangle _E\) is:

$$\begin{aligned} f(\varvec{u}) = {\left\{ \begin{array}{ll} \frac{2}{\sin (\pi /3)}, &{} \mathrm{ if } 0 \le u_1 \le 1/2 \mathrm{ and } u_2 \le u_1 \tan (\pi /3) \\ &{} \mathrm{ or } 1/2 \le u_1 \le 1 \mathrm{ and } (1-u_1) \tan (\pi /3), \\ 0, &{} \mathrm{ otherwise. } \end{array}\right. } \end{aligned}$$

For the stratified sampling estimator with given \(n\ge 1\), we subdivide the triangle into \(4^q\) sub-triangles each with area \(\frac{2}{\sin (\pi /3) 4^q}\). Since each sub-triangle is chosen with equal probability (as the \(N_j\) share the same marginal distribution), and we uniformly sample within each of the n sub-triangles, the value of the pdf f within the sub-triangle is equal to the reciprocal of its area, \(\frac{\sin (\pi /3) 4^q}{2}\). Then, a point sampled using the stratified sampling algorithm has marginal pdf \(g(\varvec{u})\) given by

$$\begin{aligned} g(\varvec{u})&= {\left\{ \begin{array}{ll} \frac{\sin (\pi /3) 4^q}{2 \times \mathrm{\# sub-triangles}}, &{} \mathrm{ if } \varvec{u} = (u_1, u_2) \in \triangle _E, \\ 0, &{} \mathrm{ otherwise }, \end{array}\right. } \\&= {\left\{ \begin{array}{ll} \frac{2}{\sin (\pi /3)}, &{} \mathrm{ if } \varvec{u} = (u_1, u_2) \in \triangle _E, \\ 0, &{} \mathrm{ otherwise. } \end{array}\right. } \end{aligned}$$

Thus, since \(g(\varvec{u}) = f(\varvec{u})\), the stratified sampling algorithm samples uniformly over the triangle and thus the resulting estimator is unbiased. \(\square \)

5 Numerical Experiments

Now that we have described our proposed lattice and randomized triangular vdC constructions for a point set on the triangle, we compare their performance on numerical integration problems with existing constructions. There are very few, if any, numerical experiments on the triangle that compare RQMC integration variances, so we hope that these experiments give insight towards the performance of the various methods.

To test the performance of the different triangular constructions, we consider the following 2-dimensional test functions over the right-angle triangle \(\triangle _R\) with corners at (0, 0), (0, 1), (1, 0):

  1. 1.

    \(f_1(x,y) = ((|x-\beta |+y)^d + (|y-\beta |+x)^d)/2\). This function has two singularities, so we anticipate this function to be harder to integrate than the others. This function is based on \(f(x,y) = (|x-\beta |+y)^d\) from Pillards and Cools (2005). The integral evaluates to \(\frac{1}{(d+1)(d+2)} \left( (d+1/2)(1-\beta )^{d+2} + (\beta +1)^{d+2}/2 - \beta ^{d+2} \right) \) over \(\triangle _R\).

  2. 2.

    \(f_2(x,y) = \cos (2\pi \beta +\alpha _1x+\alpha _2y)\), from Pillards and Cools (2005). This is a smooth oscillatory function. The integral evaluates to \(\frac{1}{\alpha _2} ( \frac{1}{\alpha _1-\alpha _2}(\cos (2\pi \beta +\alpha _2)-\cos (2\pi \beta +\alpha _1)) + \frac{1}{\alpha _1}(\cos (2\pi \beta +\alpha _1)-\cos (2\pi \beta )))\) over \(\triangle _R\).

  3. 3.

    \(f_3(x,y) = x^{\alpha _3} + y^{\alpha _3}\). Since this is the sum of univariate functions, it will help us determine if poor one-dimensional projections affect the integration power of the point set. The integral evaluates to \(\frac{2}{(\alpha _3+1) (\alpha _3+2)}\) over \(\triangle _R\).

We use \(\beta =0.4\), \(d=-0.9\), \(\alpha _1=e^3\), \(\alpha _2=e^2\), \(\alpha _3=2.5\) and estimate \(\mu _j=\int _{\triangle _R}f_j(\varvec{x})\,\mathrm{d} \varvec{x}\), whose theoretical values are known for \(j=1,2,3\). Figure 7 displays \(f_k\) for \(k=1,2,3\) with these parameter settings. Although the figures show the functions over the unit square, we integrate over the right-angle triangle \(\triangle _R\) only.

Fig. 7
figure 7

Test functions \(f_1\) (left), \(f_2\) (middle) and \(f_3\) (right) used in the numerical study

We also estimate the value at time 0 of a European basket call option with maturity \(T=1\) based on two underlying assets that follow a lognormal distribution each. Formally, the value of the option is

$$\begin{aligned} C_{0}=\mathrm{E}\left[ \max \left( 0, e^{-r T}\left( \frac{1}{2} \sum _{j=1}^{2} S_j(T)-K\right) \right) \right] , \end{aligned}$$

where \(S_j(T)\) is the price of the asset j at maturity \(T=1\). The lognormal model means we can write \(S_j(T)=S_j(0) e^{(r-\sigma ^{2} / 2)T+\sigma \sqrt{T} Z}\), where \(Z \sim {\mathrm{N}}(0,1)\) is a standard normal random variable.

For our applications, we use the following parameters: the strike price K is either 45, 50, or 55 (to account for out-of-the-money, at-the-money, and in-the-money); the asset price at time 0 \(S_0=50\); the risk-free rate \(r=0.05\); the volatility \(\sigma =0.3\), and the maturity \(T=1\) year.

We consider two types of dependence structures between the two lognormal assets in the basket. Each dependence structure is the copula implied by the uniform distribution on a triangle. We start with the following two triangles: \(\triangle \left( (0,0), (0,1) (1,0)\right) \) and \(\triangle \left( (1,1), (0,1) (1,0)\right) \). For each triangle, we generate point sets on that triangle and transform them to marginally standard uniform distributions with their marginal CDFs, \(F(x_i) = 1-(1-x_i)^2\) for \(i = 1,2\) for \(\triangle \left( (0,0),(0,1),(1,0)\right) \) and \(F(x_i) = 1-x_i^2\) for \(i = 1,2\) for \(\triangle \left( (1,1),(0,1),(1,0)\right) \) to obtain samples from the corresponding copula. Figure 8 shows such copula samples, where the underlying triangles were generated using the rSobol’ + root method as detailed in the following section. The triangle-implied-copulas are used to model scenarios in which the asset prices are not simultaneously high (first triangle) or not simultaneously low (second triangle).

Fig. 8
figure 8

Samples from the copulas based on \(\triangle \left( (0,0), (0,1) (1,0)\right) \) (left) and \(\triangle \left( (1,1), (0,1) (1,0)\right) \) (right) used in the basket option pricing example

We compare the following randomized methods to estimate \(\mu _j\).

  1. 1.

    PRNG + root: This is equivalent to the MC method; we generate pseudo-random points in the unit square and then apply the “root” method from Pillards and Cools (2005).

  2. 2.

    rSobol’ + root: Here, the Sobol sequence randomized with a digital shift is generated, and then the “root” method from Pillards and Cools (2005) is applied.

  3. 3.

    rLattice1: This refers to using Algorithm 4 with the rank-1 lattice of Cools et al. (2006) and \(\alpha =3\pi /8\), randomized with a shift.

  4. 4.

    rLattice2: This refers to the rank-2 lattice of Basu and Owen (2015), randomized with a shift. This method is not extensible, so a new point set must be generated every time n is changed.

  5. 5.

    rvdC: This refers to our randomized triangular vdC sequence based on stratified sampling.

If necessary, Algorithm 1 is then used to transform the points to the desired triangle. The first \(n=1000\) points from these constructions are shown in Fig. 9.

Fig. 9
figure 9

First \(n=1000\) points of the point sets used in our experiments

For each method and each sample size \(n \in 2^4, 2^5, \dots 2^{17}\), \(v=25\) randomizations were used. The estimates are obtained as the sample average of the realizations, while the variance is estimated as the sample variance of the v independent draws. Figure 10 displays the results for the two-dimensional test functions. We also report the convergence speed (as measured by the regression coefficient \(\alpha \) of \(\mathrm{log}(\widehat{\mathrm{Var}}) = \alpha \mathrm{log}(n) + c\) displayed in the legend). We can see from the results that the bivariate Sobol’ sequence mapped to the triangle using the “root” method is typically the best performing method on these test functions. It does particularly well on function \(f_2\), which can be explained by the parameters chosen for this function, in conjunction with how the root transformation is applied to map to the triangle \(\triangle _R\). Indeed, since we use the mapping \((1-\sqrt{u_1},\sqrt{u_1}u_2)\), it means the corresponding function on \([0,1)^2\) is smoother than \(f_2\), i.e., it oscillates with a lower frequency. This is unlike what is happening for function \(f_3\), where the methods sampling directly on the triangle deal with a sum of univariate functions, while the corresponding function on \([0,1)^2\) based on the root transformation is truly two-dimensional.

Fig. 10
figure 10

Estimates (left) and estimated variances (right) when integrating \(f_1\) (top), \(f_2\) (middle) or \(f_3\) (right). For each n, \(v=25\) randomizations were used. Regression coefficients for the estimated variance are in parentheses in the legends

Our stratified sampling method is approximately equal in performance to the lattice-based methods, except for \(f_3\) where it does better and is in fact essentially as good as the Sobol’ + root method. The variance reduction of stratified sampling is more pronounced for functions where the within-strata variance is small and the between-strata variance is larger. Nested scrambling also has been shown to have significant variance reductions for smooth functions. Thus, it is unsurprising that for the function \(f_3\), the stratified sampling scheme has the best performance out of the three test functions, as it is the smoothest function with the most between-strata variance.

Figures 11 and 12 display the results for the basket option pricing problem. Again, we also report the convergence speed in the legend. For all strike prices, the results are based on the same \(v = 25\) randomized point sets. As was the case for the two-dimensional test functions, we can see from the results that the bivariate Sobol’ sequence mapped to the triangle using the “root” method is typically the best performing method on these test functions. Our stratified sampling-based estimator tends to outperform the lattice-based methods. These results suggest that point sets directly constructed to have a low discrepancy in the integration space of interest may be outperformed by point sets obtained by applying a well-chosen transformation to a low-discrepancy point set constructed over the unit cube. Gaining a better understanding of when and why this happens is something that we plan to investigate in the near future.

Fig. 11
figure 11

Estimates when integrating the basket option pricing problem. For each n, \(v=25\) randomizations were used

Fig. 12
figure 12

Estimated variances when integrating the basket option pricing problem. For each n, \(v=25\) randomizations were used. Regression coefficients are in parentheses in the legends

6 Conclusion

In this paper, we provided an extensible rank-1 lattice construction for points in the triangle that can be randomized with a shift. We also examined the projection qualities of the triangular vdC sequence, and we improved upon the triangular vdC sequence of Basu and Owen (2015) by proposing a sampling scheme that uses their idea of recursively subdividing the triangle, but with superior one-dimensional projections. We also showed that the scrambled sequence can be efficiently implemented using stratified sampling, giving the benefits of the reduced variance without the additional computational costs. This connection between stratified sampling and scrambling also gives an extensible stratified estimator. We give a test suite of functions and include a numerical study to compare the different sampling constructions over a triangular region.

Future work in this area includes extending similar stratified sampling schemes onto other surfaces, such as those of spheres and simplexes. We would also like to explore constructions and applications that require sampling on multiple triangles, such as surfaces constructed with a mesh of triangles.