1 Introduction

Texts on numerical methods abound with formulas for numerical integration sometimes called quadrature or mechanical quadrature [1, 2]. The function f(x), which is to be integrated, may be a known function or a set of discrete data. This is not surprising, since there are so many possibilities for selecting the base-point spacing, the degree of the interpolating polynomial, and the location of base points with respect to the interval of integration. Many known functions, however, do not have an exact integral, and an approximate numerical procedure is required to compute the integral. In many cases, the function f(x) is known only as a set of discrete points, in which case an approximate numerical procedure is required to compute the integral [1, 2]. Numerical integration formulas can be developed by fitting approximating functions to discrete data and integrating the approximating function:

$$I = \int\limits_{{x_{1} }}^{{x_{n} }} {f(x)dx \cong \int\limits_{{x_{1} }}^{{x_{1} + (n - 1)h}} {P(x)dx} }$$
(1)

when the function to be integrated has known values at equally spaced points (Δx = h=constant) and n is number of points with x ranging as: x1, x2 = x1 + h, x3 = x1 + 2h, …, xn−1=x1 + (n − 2)h, xn = x1 + (n − 1)h, a polynomial P(x) can be fit to the discrete data [2, 3]. The resulting formulas are called Newton–Cotes formulas that employ functional values at equally spaced base points.

The distance between the lower and upper limits of an integral is called the range of integration. The distance between any two data points is called an increment or step (Δx = h) [1, 3,4,5].

2 State of the Art on Numerical Quadrature

There is a large literature on numerical integration, also called quadrature. Of special importance are the midpoint rule and Simpson’s rule. They are simple to use and bring enormous improvements for smooth functions in low dimensions [6,7,8]. The advantage of classical quadrature methods decays rapidly with increasing dimension. This phenomenon is a manifestation of Bellman’s ‘curse of dimensionality’, with Monte Carlo versions in two classic theorems of Bakhvalov.

The trapezoid rule is based on a piecewise linear approximation. The trapezoid rule integrates correctly any function f that is piecewise linear on each segment [xi−1, xi], by using two evaluation points at the ends of the segments [9,10,11]. The midpoint rule also integrates such a function correctly using just one point in the middle of each segment. The midpoint rule has benefitted from an error cancellation. This kind of cancellation plays a big role in the development of classical quadrature methods.

The midpoint rule has a big practical advantage over the trapezoid rule. It does not evaluate f at either endpoint a or b. Many of the integrals that we apply Monte Carlo methods to diverge to infinity at one or both endpoints. In such cases, the midpoint rule avoids the singularity. There are numerous mathematical techniques for removing singularities [12,13,14]. When we have no such analysis of our integrand, perhaps because it has a complicated problem-dependent formulation, or because we have hundreds of integrands to consider simultaneously, then avoiding the singularity is attractive. By contrast, the trapezoid rule does not avoid the endpoints x = a and x = b. For such methods a second, less attractive principle is to ignore the singularity, perhaps by using f(xi) = 0 at any sample point xi where f is singular.

The midpoint and trapezoid rules are based on correctly integrating piecewise constant and linear approximations to the integrand. That idea extends naturally to methods that locally integrate higher order polynomials [15,16,17]. The result is much more accurate integration, at least when the integrand is smooth. The idea behind Simpson’s rule generalizes easily to higher orders. We split the interval [a, b] into panels, find a rule that integrates a polynomial correctly within a panel, and then apply it within every panel to get a compound rule.

There are two main varieties of compound quadrature rule. For open rules we do not evaluate f at the end-points of the panel. The midpoint rule is open. For closed rules we do evaluate f at the end-points of the panel [18, 19]. The trapezoid rule and Simpson’s rule are both closed. Closed rules have the advantage that some function evaluations get reused when we increase n. Open rules have a perhaps greater advantage that they avoid the ends of the interval where singularities often appear.

The trapezoid rule and Simpson’s rule use n =2 and n =3 points respectively within each panel. In general, one can use m points to integrate polynomials of degree n  1, to yield the Newton–Cotes formulas, of which the trapezoid rule and Simpson’s rule are special cases [12, 20, 21]. The Newton–Cotes rule for n =4 is another of Simpson’s rules, called Simpson’s 3/8 rule. Newton–Cotes rules of odd order have the advantage that, by symmetry, they also correctly integrate polynomials of degree m, as we saw already in the case of Simpson’s rule.

High order rules should be used with caution [22,23,24]. They exploit high order smoothness in the integrand, but can give poor outcomes when the integrand is not as smooth as they require. In particular if a genuinely smooth quantity has some mild nonsmoothness in its numerical implementation f, then high order integration rules can behave very badly, magnifying this numerical noise.

As a further caution, note that taking f fixed and letting the order n in a Newton–Cotes formula increase does not always converge to the right answer even for f with infinitely many derivatives. Lower order rules applied in panels are more robust [23,24,25]. The Newton–Cotes rules can be made into compound rules similarly to the way Simpson’s rule was compounded. When the basic method integrates polynomials of degree r exactly within panels, then the compound method has error O(m−r), assuming that f(r) is continuous on [a, b].

The rules considered above evaluate f at equispaced points. The basic panel for a Gauss rule is conventionally [− 1, 1] or sometimes ℝ, and not [0, h] as we used for Newton–Cotes rules. Also the target integration problem is generally weighted. The widely used weight functions are multiples of standard probability density functions, such as the uniform, gamma, Gaussian and beta distributions [12, 24, 26]. The idea is that having f be nearly a polynomial can be much more appropriate than requiring the whole integrand f(x)w(x) to be nearly a polynomial. Choosing wi and xi together yields 2n parameters and it is then possible to integrate polynomials of degree up to 2n − 1 without error.

Unlike Newton–Cotes rules, Gauss rules of high order have non-negative weights. We could in principle use a very large n. For the uniform weighting w(x) =1 though, we could also break the region into panels. Then for m function evaluations the error will be O(m−2n) assuming as usual that f(2n) is continuous on [a, b]. Gauss rules for uniform weights on [− 1, 1] have the advantage that they can be used within panels.

Quadrature rules offer an elegant and efficient way to numerically evaluate integrals of functions from a linear space under consideration [24,25,26,27]. These rules typically require function evaluation at specific points, called nodes, and these values are multiplied by constants, called weights, to give the final value of the integral as a weighted sum.

There is an extensive number of various quadrature rules depending on n (f is univariate, bivariate, multivariate), domain shape (disc, hypercube, simplex), and the type of the linear space (polynomials, splines, rational functions, smooth non-polynomial) [12, 13, 28, 29]. For polynomial multivariate integration, the field is well studied. In the univariate case, a lot of research has been devoted to piecewise polynomials to address integration for spline spaces arising in isogeometric analysis. Introduced so called half-point rule for splines that needs the minimum number of quadrature points. However, this rule is in general exact only over the whole real line (infinite domain).

For finite domains, one may introduce additional quadrature points which make the rule non-Gaussian (slightly suboptimal in terms of the number of quadrature points), but more importantly, it yields quadrature weights that can be negative, unlike in Gaussian quadratures. When computing Galerkin approximations, assembling mass and stiffness matrices is the bottleneck of the whole computation and efficient quadrature rules for specific spline spaces are needed to efficiently evaluate the matrix entries [12, 13, 26, 29,30,31].

In the multivariate case where spline spaces possess a tensor-product structure, univariate quadrature rules are typically used in each direction, resulting in tensor-product rules. Recently, it have changed the paradigm of the mass and stiffness matrix computation from the traditional element-wise assembly to a row-wise concept [32,33,34]. When building the mass matrix, one B-spline basis function of the scalar product involved is considered as a positive measure (i.e., a weight function), and a weighted quadrature with respect to that weight is computed for each matrix row. Such an approach brings significant computational savings because the number of quadrature points in each parameter dimension is independent on the polynomial degree and requires asymptotically (for a large number of elements) only two points per element. For the multivariate case that is unstructured such as triangular meshes in 2D, however, constructing efficient quadrature rules from tensor product counterparts is unnatural, resulting in rules that are often not symmetric even though they act on a symmetric domain [35,36,37].

Classical quadrature methods are very well tuned to one-dimensional problems with smooth integrands. A natural way to extend them to multi-dimensional problems is to write them as iterated one-dimensional integrals, via Fubini’s theorem [12, 13, 38, 39]. When we estimate each of those one-dimensional integrals by a quadrature rule, we end up with a set of sample points on a multi-dimensional grid. Unfortunately, there is a curse of dimensionality that severely limits the accuracy of this approach. This curse of dimensionality is not confined to sampling on grids formed as products of one-dimensional rules. Any quadrature rule in high dimensions will suffer from the same problem. Two important theorems of Bakhvalov, make the point.

Bakhvalov’s theorem makes high-dimensional quadrature seem intractable. There is no way to beat the rate O(nr/d), no matter where you put your sampling points xi or how cleverly you weight them. At first, this result looks surprising, because we have been using Monte Carlo methods which get an root mean square error O(n−1/2) in any dimension. The explanation is that in Monte Carlo sampling we pick one single function f(·) with finite variance σ2 and then in sampling n uniform random points, get an root mean square error of σn−1/2 for the estimate of that function’s integral. Bahkvalov’s theorem works in the opposite order [40]. We pick our points x1,.., xn, and their weights wi. Then Bakhvalov finds a function f with r derivatives on which our rule makes a large error. When we take a Monte Carlo sample, there is always some smooth function for which we would have got a very bad answer. Such worst case analysis is very pessimistic because the worst case functions could behave very oddly right near our sampled x1,.., xn, and the worst case functions might look nothing like the ones we are trying to integrate. We can hybridize quadrature and Monte Carlo methods by using each of them on some of the variables. Hybrids of Monte Carlo and quasi-Monte Carlo methods are often used [5,6,7, 41,42,43].

The Laplace approximation is a classical device for approximate integration. The Laplace approximation is very accurate when log(f(x)) is smooth and the quadratic approximation is good where f is not negligible. Such a phenomenon often happens when x is a statistical parameter subject to the central limit theorem, f(x) is its posterior distribution, and the sample size is large enough for the Central Limit Theorem to apply [20, 21, 44,45,46].

The Laplace approximation is now overshadowed by Markov Chain Monte Carlo. One reason is that the Laplace approximation is designed for unimodal functions. When prior distribution has two or more important modes, then the space can perhaps be cut into pieces containing one mode each, and Laplace approximations applied separately and combined, but such a process can be cumbersome. Markov Chain Monte Carlo by contrast is designed to find and sample from multiple modes, although on some problems it will have difficulty doing so. The Laplace approximation also requires finding the optimum of a d-dimensional function and working with the Hessian at the mode. In some settings that optimization may be difficult, and when d is extremely large, then finding the determinant of the Hessian can be a challenge. Finally, posterior distributions that are discrete or are mixtures of continuous and discrete parts can be handled by Markov Chain Monte Carlo but are not suitable for the Laplace approximation. The Laplace approximation is not completely superceded by Markov Chain Monte Carlo. In particular, the fully exponential version is very accurate for problems with modest dimension d and large n. When the optimization problem is tractable then it may provide a much more automatic and fast answer than Markov Chain Monte Carlo does [12, 17, 23, 31, 34, 47].

There is some mild controversy about the use of adaptive methods. There are theoretical results showing that adaptive methods cannot improve significantly over non-adaptive ones. There are also theoretical and empirical results showing that adaptive methods may do much better than non-adaptive ones. These results are not contradictory, because they make different assumptions about the problem. For a high level survey of when adaptation helps [4, 47,48,49].

Sparse grids were originally developed for the quadrature of high-dimensional functions. The method is always based on a one-dimensional quadrature rule, but performs a more sophisticated combination of univariate results. However, whereas the tensor product rule guarantees that the weights of all of the cubature points will be positive if the weights of the quadrature points were positive, Smolyak’s rule does not guarantee that the weights will all be positive [39,40,41, 48].

Bayesian Quadrature is a statistical approach to the numerical problem of computing integrals and falls under the field of probabilistic numerics [42,43,44]. It can provide a full handling of the uncertainty over the solution of the integral expressed as a Gaussian Process posterior variance. It is also known to provide very fast convergence rates which can be up to exponential in the number of quadrature points n.

The problem of evaluating the integral can be reduced to an initial value problem for an ordinary differential equation by applying the fundamental theorem of calculus. For instance, the standard fourth-order Runge–Kutta method applied to the differential equation yields Simpson’s rule. Thus, in our view, numerical quadrature problems are often wrongly studied as a numerical solution of differential equations. Numerical integration is a much more general and distinct study of the differential equations [50,51,52,53].

3 Newton–Cotes Numerical Integration Formulas

The Newton–Cotes formulas are shown for comparison with the new formulas obtained using splines. The closed integration formulas use information about f(x), that is, they have base points, at both limits of integration. The trapezoid rule for a single interval is obtained by fitting a first-degree polynomial to two discrete points [1, 2, 4, 26]. Simpson’s 1/3 rule is obtained by fitting a second-degree polynomial to three equally spaced discrete points. Simpson’s 3/8 rule is obtained by fitting a third-degree polynomial to four equally spaced discrete points. Boole’s rule is obtained by fitting a fourth-degree polynomial to five equally spaced discrete points. Table 1 shows Newton–Cotes closed integration formulas.

Table 1 Newton–Cotes closed integration formulas

Generally, for closed equations where n is the number of points, ci are integer coefficients, num is the integer numerator that multiplies step h = Δx and den is the integer denominator, the closed integration formula is:

$$I = \frac{num}{den}h(ci_{1} y_{1} + ci_{2} y_{2} + ci_{3} y_{3} + ci_{4} y_{4} + \cdots + ci_{n - 2} y_{n - 2} + ci_{n - 1} y_{n - 1} + ci_{n} y_{n} )$$
(2)

In the open integration formulas, the first (y1) and last (yn) points do not appear in the formula. The open integration formulas do not require information about f(x) at limits of integration [1, 2, 4, 5, 27]. The midpoint rule for a double interval is obtained by fitting a zero-degree polynomial to three discrete points. The upper limit of integration is x3 = x1 + 2h. Table 2 shows Newton–Cotes open integration formulas.

Table 2 Newton–Cotes open integration formulas

Generally, for the open formulas, where n is the number of points, ci are the integer coefficients, num is the integer numerator that multiplies step h = Δx and den is the integer denominator, the open integration formula is:

$$I = \frac{num}{den}h(ci_{2} y_{2} + ci_{3} y_{3} + ci_{4} y_{4} + \cdots + ci_{n - 2} y_{n - 2} + ci_{n - 1} y_{n - 1} )$$
(3)

In the semi-open integration formulas, the last point (yn) does not appear in the formula. In the semi-closed integration formulas, the first point (y1) does not appear in formula [1, 2, 28,29,30]. The formula for a double interval is obtained by fitting a zero-degree polynomial to three discrete points. When N is odd, the semi-closed or semi-open integration formulas are the same as the open integration formulas. The upper limit of integration is x3 = x1 + 2h, and the integral (I) has the following formula:

$$I = 2h\left( {y_{2} } \right)\quad or\quad I = 2hy_{2} \quad Error \approx O\left( {h^{2} } \right)$$
(4)

For three intervals, the semi-open formula is obtained by fitting a first-degree polynomial to four discrete points. The upper limit of the integral is x4 = x1 + 3h; then, the integral (I) has the following formula:

$$I = \frac{3h}{4}\left( {y_{1} + 3y_{3} } \right)\quad or\quad I = 0.75hy_{1} + 2.25hy_{3} \quad Error \approx O\left( {h^{3} } \right)$$
(5)

Table 3 shows Newton–Cotes semi-open integration formulas [1,2,3]. Generally, for semi-open formulas, where n is the number of points, the semi-open integration formula is:

$$I = \frac{num}{den}h(ci_{1} y_{1} + ci_{2} y_{2} + ci_{3} y_{3} + ci_{4} y_{4} + \cdots + ci_{n - 2} y_{n - 2} + ci_{n - 1} y_{n - 1} )$$
(6)
Table 3 Newton–Cotes semi-open integration formulas

The semi-closed integration formulas are the same as the semi-open formulas. For example, in case of n = 3:

$$I = 2h\left( {y_{2} } \right)\quad or\quad I = 2hy_{2} \quad Error \approx O\left( {h^{2} } \right)$$
(7)

For N = 4:

$$I = \frac{3h}{4}\left( {y_{2} + 3y_{4} } \right)\quad or\quad I = 0.75hy_{2} + 2.25hy_{4} \quad Error \approx O\left( {h^{3} } \right)$$
(8)

Table 4 shows Newton–Cotes semi-closed integration formulas. Generally, for semi-closed formulas, where n is the number of points, ci are integer coefficients, the semi-closed integration formula is:

$$I = \frac{num}{den}h(ci_{2} y_{2} + ci_{3} y_{3} + ci_{4} y_{4} + \cdots + ci_{n - 2} y_{n - 2} + ci_{n - 1} y_{n - 1} + ci_{n} y_{n} )$$
(9)
Table 4 Newton–Cotes semi-closed integration formulas

The semi-open or semi-closed formulas can be used on a type of improper integral, that is one with a lower limit of − ∞ or an upper limit of + ∞. Such integrals can usually be evaluated by making a change in the variable that transforms the infinite limit to one that is finite [1,2,3, 29]. The following identity serves this purpose and works for any function that decreases toward zero at least as fast as the function 1/x2 as x approaches infinity:

$$\int\limits_{a}^{b} {f(x)dx = \int\limits_{1/b}^{1/a} {\frac{1}{{w^{2} }}f\left( {\frac{1}{w}} \right)dw} }$$
(10)

where ab > 0. Therefore, it can be used only when a is positive and b is ∞ or when a is − ∞ and b is negative. For cases where the limits are from − ∞ to ∞, the integral can be implemented in three steps [1,2,3,4]. For example:

$$\int\limits_{ - \infty }^{\infty } {f(x)dx = } \int\limits_{ - \infty }^{ - A} {f(x)dx} + \int\limits_{ - A}^{A} {f(x)dx} + \int\limits_{A}^{\infty } {f(x)dx}$$
(11)

where A is a positive number. One problem with using Eq. (10) to evaluate an integral is that the transformed function will be singular at one of the limits [1, 3, 30, 31]. The semi-open or semi-closed integration formula can be used to circumvent this dilemma as these formulas allow evaluation of the integral without employing the data at the end points of the integration range.

4 Spline Interpolation

In applying the Newton–Cotes method, (n − 1)th-order polynomials were used to interpolate between n data points. This curve captures of all the meandering suggested by the points. However, there are cases where these functions can lead to erroneous results because of round-off errors and overshooting. An alternative approach is to apply lower-order polynomials to subsets of the data points. These connecting polynomials are called spline functions [1, 2, 4, 31,32,33].

For example, third-order curves, which are employed to connect each pair of data points, are called cubic splines. These functions can be constructed so that the connections between adjacent cubic equations are visually smooth. On the surface, it would seem that the third-order approximation of the splines would be inferior to the higher-order expressions. You might wonder why a spline would ever be preferable. There are situations in which a spline performs better than a higher-order polynomial. This is the case where a function is generally smooth but undergoes an abrupt change somewhere in the region of interest. In this case a higher-order polynomial will tend to erratically oscillate in the vicinity of the abrupt change. In contrast, the spline also connects the points, but because it is limited to lower-order changes, the oscillations are kept to a minimum. Thus the spline usually provides a superior approximation of the behavior of functions that have local, abrupt changes [1, 2, 4, 34,35,36,37].

The concept of the spline originated from the drafting technique of using a thin, flexible strip (called a spline) to draw smooth curves through a set of points. The process is depicted in Fig. 1 for a series of five pins (data points). In this technique, the draftsman places paper over a wooden board and hammers nails or pins into the paper (and board) at the location of the data points [38,39,40,41]. A smooth cubic curve results from interweaving the strip between the pins. Hence, the name “cubic spline” has been adopted for polynomials of this type [1, 3, 4, 12].

Fig. 1
figure 1

The practical experimental technique of using a spline to draw smooth curves through a series of points. Notice how, at the end points, the spline straightens out. This is called a “natural” spline

Using this practical historical interpolation device, one could also calculate the area under the curve, by using the weight of the sand beneath the spline, as shown in Fig. 2.

Fig. 2
figure 2

Primitive integration, using the weight of the sand to calculate the area under the curve generated by a spline interpolation

In interpolation by splines, between every two points, we have a polynomial of a certain degree [1, 2, 4, 41,42,43,44]. Therefore, the interpolation is not made by a single polynomial but by many polynomials. Below, an example with 5 points and 4 third-degree polynomials forming a cubic spline is given. We can see from the equations that there are 12 unknown coefficients, (a0,…,a3,b0,…,b3,c0,…,c3,d0…,d3).

$$\begin{aligned} f(x) = \left\{ {\begin{array}{*{20}l} {P_{a} (x) = a_{0} + a_{1} x + a_{2} x^{2} + a_{3} x^{3} ,} \hfill & {se} \hfill & {x_{1} \le x \le x_{2} } \hfill \\ {P_{b} (x) = b_{0} + b_{1} x + b_{2} x^{2} + b_{3} x^{3} ,} \hfill & {se} \hfill & {x_{2} \le x \le x_{3} } \hfill \\ {P_{c} (x) = c_{0} + c_{1} x + c_{2} x^{2} + c_{3} x^{3} ,} \hfill & {se} \hfill & {x_{3} \le x \le x_{4} } \hfill \\ {P_{d} (x) = d_{0} + d_{1} x + d_{2} x^{2} + d_{3} x^{3} ,} \hfill & {se} \hfill & {x_{4} \le x \le x_{5} } \hfill \\ \end{array} } \right. \hfill \\ \hfill \\ \end{aligned}$$
(12)

As shown in Fig. 3, the objective in using cubic splines is to derive a third-order polynomial for each interval between the knots (between two data points). Thus, for n data points (i = 1, 2,…, n), there are (n − 2) internal points, without the first and last point. There are (n − 1) intervals and (n − 1) third-order polynomials. Consequently, there are 4(n − 1) unknown constants to evaluate and thus 4n − 4 conditions are required to evaluate the unknown constants. These are as follows:

Fig. 3
figure 3

Cubic spline of 3rd order polynomials for each interval between knots (g) of 5 points (n = 5) and 4 polynomials (Pa, Pb, Pc and Pd)

  1. 1.

    The function values must be equal at the interior knots (2 conditions for each internal point = 2n − 4 conditions).

  2. 2.

    The first and last functions must pass through the end points (2 conditions).

  3. 3.

    The first derivatives at the interior knots must be equal (n − 2 conditions).

  4. 4.

    The second derivatives at the interior knots must be equal (n − 2 conditions).

  5. 5.

    Two derivatives at the first or end knots are zero (2 conditions), chosen from the first to third derivatives of the first and last polynomials: \({\text{P}}_{1}^{\prime }\)(x1) = 0, \({\text{P}}_{1}^{{\prime \prime }}\)(x1) = 0, \({\text{P}}_{1}^{\prime \prime \prime }\)(x1) = 0, \({\text{P}}_{{{\text{n}} - 1}}^{\prime }\)(xn) = 0, \({\text{P}}_{{{\text{n}} - 1}}^{\prime \prime }\)(xn) = 0 and \({\text{P}}_{{{\text{n}} - 1}}^{\prime \prime \prime }\)(xn) = 0.

Therefore, (2n − 4) + 2 + (n − 2) + (n − 2) + 2 = 4n − 4, is the number of conditions that is equal to the number of unknown polynomial coefficients.

The visual interpretation of condition 5 is that the function becomes a straight line at the end knots [1, 4, 43,44,45,46,47]. Specification of such an end condition leads to what is termed a “natural” spline. It is given this name because the drafting spline naturally behaves in this fashion. If the value of the second derivative at the end knots is nonzero (that is, there is some curvature), this information can be used alternatively to supply the two final conditions [1, 2, 4, 12].

Generalizing for an order polynomial equal to g, the objective in gth-order splines is to derive a gth-order polynomial for each interval between knots (between two data points), as in

$$P_{j} (x) = a_{0} + a_{1} x + a_{2} x^{2} + a_{3} x^{3} + a_{4} x^{4} + \cdots + a_{g} x^{g}$$
(13)

Thus, for n data points (i = 1, 2,…, n), there are (n − 2) internal points, without the first and last point. There are (n − 1) intervals and (n − 1) gth-order polynomials consequently, and (g + 1)(n − 1) unknown constants need to be evaluated. Therefore, gn + n − g − 1 conditions are required to evaluate the unknown constants. These are as follows:

  1. 1.

    The function values must be equal at the interior knots (2 conditions for each internal point = 2n − 4 conditions).

  2. 2.

    The first and last functions must pass through the end points (2 conditions).

  3. 3.

    The first to (g − 1) order derivatives at the interior knots must be equal ([g − 1][n − 2] conditions).

  4. 4.

    The (g − 1) derivatives at the first or end knots are zero (g − 1 conditions), chosen from the first to g order derivatives of the first and last polynomials: \({\text{P}}_{1}^{\prime }\)(x1) = 0, \({\text{P}}_{1}^{{\prime \prime }}\)(x1) = 0, \({\text{P}}_{1}^{\prime \prime \prime }\)(x1) = 0, P (4)1 (x1) = 0,…, P (g)1 (x1) = 0 and \({\text{P}}_{{{\text{n}} - 1}}^{\prime }\)(xn) = 0, \({\text{P}}_{{{\text{n}} - 1}}^{\prime \prime }\)(xn) = 0, \({\text{P}}_{{{\text{n}} - 1}}^{\prime \prime \prime }\)(xn) = 0, P (4)n−1 (xn) = 0,…, P (g)n−1 (xn) = 0.

Therefore, (2n − 4) + 2 + (g + 1)(n − 2) + (g − 1) = gn + n − g − 1, is the number of conditions which equals the number of unknown polynomial coefficients.

5 New Type of Spline Interpolation

In this new type of interpolation by splines, instead of each polynomial passing through only two points, the polynomial passes through m points, as shown in Fig. 4. At each m point, the interpolator polynomial is changed, and the derivatives of these two polynomials are equated at these points to give a degree of continuity to the overall curve.

Fig. 4
figure 4

New type of splines interpolation where order polynomial for each interval between knots (g) is 7, for 13 points (n = 13), where each polynomial passes through 4 points (m = 4) and there are 4 polynomials (Pa, Pb, Pc and Pd)

Thus, at the points of change of the polynomial, the value and the successive derivatives of these interpolating polynomials are matched. The number of points (n), the number of polynomials (np), the degree of the polynomial (g), the number of points through which the polynomial passes (m), and the order of the derivatives (d) that will be equalized at the polynomial exchange points are then altered so that we always have the number of equations equal to the number of coefficients of unknown polynomials. To complete the equations, the natural conditions are used, where the successive derivatives of the first and last polynomials in the first and last points, respectively, are made equal to zero. Thus, obtaining different interpolations and different polynomials for the same data points.

The coefficients of the interpolating polynomials are obtained by the resolution of a linear system. The equations of the linear system come from the interpolation conditions, where the polynomials pass through some points Pj(xi) = yi; from the derivative conditions, where the derivatives of successive orders are equalized P (g)j (xi) = P (g)j+1 (xi); and the natural conditions, where the derivatives in the first and last point are equalized to zero, P (g)1 (x1) = 0 or P (g)np (xn) = 0. The number of equations in the linear system must be equal to the number of polynomials (np) multiplied by the degree of the polynomials plus one (g + 1).

Final corrections can still be made before the resolution of a system of linear equations, where the equations that contain one or more of a certain point can be excluded. For example, we can remove equations that contain the first (x1,y1) or last (xn, yn) point or the middle point (xn/2,yn/2). This results in generating integration formulas that do not contain these points.

6 Integration of Polynomials Obtained by Spline Interpolation

Once the interpolating polynomials are obtained by splines, they must be integrated. Each polynomial is integrated onto its x-range [1,2,3, 12]. Because the spacing of the abscissa x is constant (Δx = h) and the value start, x1, has no influence, the integration formulas obtained are all functions of the values of y (y1, y2, y3, y4,…, yn). The formula below shows this procedure:

$$I = \int\limits_{{x_{1} }}^{{x_{n} }} {f(x)dx \cong \sum\limits_{j = 1}^{np} {\left[ {\int\limits_{{x_{k} }}^{{x_{k + m} }} {P_{j} (x)dx} } \right]} } = h\left( {cr_{1} \, y_{1} + cr_{2} \, y_{2} + \cdots + cr_{n - 1} \, y_{n - 1} + cr_{n} \, y_{n} } \right)$$
(14)

Once the integration formula is obtained, it is tested in many examples, and the truncation error is estimated as a function of the order Δx = h. The stability and convergence of the formula are also tested in several examples where the exact values of the integrals are known. This allows for verification and validation of the new numerical integration formulas.

7 Algorithm for Obtaining Different Integration Formulas by Spline Interpolation

An algorithm is proposed to obtain thousands of integration formulas for different interpolations by splines. A maximum of 25 points (mn = 25) was used, considering a large number of data points, to obtain integration formulas by following the steps below:

  1. 1.

    Set the maximum number of points to 25 (mn = 25)

  2. 2.

    Vary the number of points (n) from 2 to mn (for n = 2 to mn)

  3. 3.

    Vary the number of polynomials (np) from 1 to mn (for np = 1 to mn)

  4. 4.

    Vary the degree of the polynomials (g) from 0 to mn (for g = 1 to mn)

  5. 5.

    Vary the greater order of the derivative that will be matched in the polynomials (d) from 0 to mn (i.e., from d = 0 to mn)

  6. 6.

    Define the data points (x1,y1), (x2,y2), (x3,y3), …, (xn−1,yn−1), (xn,yn)

  7. 7.

    Set the equally spaced abscissa (Δx = h) value so that x2 = x1 + h, x3 = x1 + 2h,…, xn = x1 + (n − 1)h

  8. 8.

    Define the interpolating polynomials P1, P2, P3, …, Pnp

  9. 9.

    Calculate the number of polynomial coefficients to be obtained np(g + 1)

  10. 10.

    Obtain the np(g + 1) linear equations

  11. 11.

    Determine the number of data points (m) through which each polynomial will pass (for m = 2 to mn)

  12. 12.

    Obtain the linear equations using the interpolation conditions Pj(xi) = yi

  13. 13.

    Obtain the linear equations using the derivation conditions P (g)j (xi) = P (g)j+1 (xi)

  14. 14.

    Obtain the linear equations using the natural conditions P (g)1 (x1) = 0 or P (g)np (xn) = 0

  15. 15.

    Optionally, eliminate the equations that contain a certain point, the first (x1,y1) or last point (xn,yn) or the middle point (xn/2,yn/2)

  16. 16.

    Perform the test to continue if the number of equations is equal to the number of unknown polynomial coefficients

  17. 17.

    Solve the linear system to calculate the coefficients of the interpolating polynomials

  18. 18.

    Integrate the obtained polynomials

  19. 19.

    Obtain the formulas for numerical integration

  20. 20.

    Test the integration formulas obtained against known integrals

  21. 21.

    Estimate the truncation error of the formulas

  22. 22.

    Test the convergence, applicability and accuracy of the formulas

  23. 23.

    Select the best verified and validated integration formulas

8 Results Obtained and the Best Integration Formulas

The following tables show some integration formulas obtained. Tables 5, 6, 7, 8, 9 and 10 show integration formulas similar to Newton–Cotes closed formulas in increasing order of truncation error. Tables 11, 12, 13, 14, 15 and 16 show integration formulas similar to Newton–Cotes open formulas in increasing order of truncation error. Tables 17, 18, 19, 20, 21, 22, 23, 24 and 25 show integration formulas similar to Newton–Cotes semi-closed formulas in increasing order of truncation error. Tables 26 and 27 show integration formulas similar to Newton–Cotes semi-closed formulas in increasing order of truncation error; it can be noticed the similarity with the previous tables of semi-closed formulas, having the same coefficients and changing only the index values of the y-ordinates. Tables 28, 29, 30, 31 and 32 show integration formulas similar to Newton–Cotes closed formulas with no middle point in increasing order of truncation error, note that there are no y-values for indexes n/2 or (n + 1)/2 or both. Tables 33, 34, 35, 36 and 37 show integration formulas similar to Newton–Cotes open formulas with no middle point in increasing order of truncation error, note also that there are no y-values for indexes n/2 or (n + 1)/2 or both. Tables 38, 39, 40, 41 and 42 show integration formulas similar to Newton–Cotes open formulas without two points in increasing order of truncation error, note that there are no y-values for indexes 1, 2, n − 1 and n.

Table 5 Integration formulas similar to Newton–Cotes closed form formulas with truncation error = O(h2) and degree polynomials g = 1
Table 6 Integration formulas similar to Newton–Cotes closed form formulas with truncation error = O(h4) and degree polynomials g = 2
Table 7 Integration formulas similar to Newton–Cotes closed form formulas with truncation error = O(h6) and degree polynomials g = 4
Table 8 Integration formulas similar to Newton–Cotes closed form formulas with truncation error = O(h8) and degree polynomials g = 6
Table 9 Integration formulas similar to Newton–Cotes closed form formulas with truncation error = O(h10) and degree polynomials g = 8
Table 10 Integration formulas similar to Newton–Cotes closed form formulas with truncation error = O(h12) and degree polynomials g = 10
Table 11 Integration formulas similar to Newton–Cotes open form formulas with truncation error = O(h2) and degree polynomials g = 1
Table 12 Integration formulas similar to Newton–Cotes open form formulas with truncation error = O(h4) and degree polynomials g = 2
Table 13 Integration formulas similar to Newton–Cotes open form formulas with truncation error = O(h6) and degree polynomials g = 4
Table 14 Integration formulas similar to Newton–Cotes open form formulas with truncation error = O(h8) and degree polynomials g = 6
Table 15 Integration formulas similar to Newton–Cotes open form formulas with truncation error = O(h10) and degree polynomials g = 8
Table 16 Integration formulas similar to Newton–Cotes open form formulas with truncation error = O(h12) and degree polynomials g = 10
Table 17 Integration formulas similar to Newton–Cotes semi-closed formulas with truncation error = O(h1) and degree polynomials g = 0
Table 18 Integration formulas similar to Newton–Cotes semi-closed formulas with truncation error = O(h2) and degree polynomials g = 1
Table 19 Integration formulas similar to Newton–Cotes semi-closed formulas with truncation error = O(h3) and degree polynomials g = 2
Table 20 Integration formulas to Newton–Cotes semi-closed formulas with truncation error = O(h4) and degree polynomials g = 3
Table 21 Integration formulas similar to Newton–Cotes semi-closed formulas with truncation error = O(h5) and degree polynomials g = 4
Table 22 Integration formulas similar to Newton–Cotes semi-closed formulas with truncation error = O(h6) and degree polynomials g = 5
Table 23 Integration formulas similar to Newton–Cotes semi-closed formulas with truncation error = O(h7) and degree polynomials g = 6
Table 24 Integration formulas similar to Newton–Cotes semi-closed formulas with truncation error = O(h8) and degree polynomials g = 7
Table 25 Integration formulas similar to Newton–Cotes semi-closed formulas with truncation error = O(h9) and degree polynomials g = 8
Table 26 Integration formulas similar to Newton–Cotes semi-open formulas with truncation error = O(h1) and degree polynomials g = 0
Table 27 Integration formulas similar to Newton–Cotes semi-open formulas with truncation error = O(h2) and degree polynomials g = 1
Table 28 Integration formulas similar to Newton–Cotes closed formulas with no middle point and truncation error = O(h2) and degree polynomials g = 1
Table 29 Integration formulas similar to Newton–Cotes closed formulas with no middle point and truncation error = O(h4) and degree polynomials g = 2
Table 30 Integration formulas similar to Newton–Cotes closed formulas with no middle point and truncation error = O(h6) and degree polynomials g = 4
Table 31 Integration formulas similar to Newton–Cotes closed formulas with no middle point and truncation error = O(h8) and degree polynomials g = 6
Table 32 Integration formulas similar to Newton–Cotes closed formulas with no middle point and truncation error = O(h10) and degree polynomials g = 8
Table 33 Integration formulas similar to Newton–Cotes open formulas with no middle point and truncation error = O(h2) and degree polynomials g = 1
Table 34 Integration formulas similar to Newton–Cotes open formulas with no middle point and truncation error = O(h4) and degree polynomials g = 2
Table 35 Integration formulas similar to Newton–Cotes open formulas with no middle point and truncation error = O(h6) and degree polynomials g = 4
Table 36 Integration formulas similar to Newton–Cotes open formulas with no middle point and truncation error = O(h8) and degree polynomials g = 6
Table 37 Integration formulas similar to Newton–Cotes open formulas with no middle point and truncation error = O(h10) and degree polynomials g = 8
Table 38 Integration formulas similar to Newton–Cotes open formulas without two points with truncation error = O(h2) and degree polynomials g = 1
Table 39 Integration formulas similar to Newton–Cotes open formulas without two points with truncation error = O(h4) and degree polynomials g = 2
Table 40 Integration formulas similar to Newton–Cotes open formulas without two points with truncation error = O(h6) and degree polynomials g = 4
Table 41 Integration formulas similar to Newton–Cotes open formulas without two points with truncation error = O(h8) and degree polynomials g = 6
Table 42 Integration formulas similar Newton–Cotes open no two points with truncation error = O(h10) and degree polynomials g = 8

9 Conclusion

The integration of the polynomials obtained by interpolation using splines allowed us to obtain new and previously unknown integration formulas with a high order of truncation errors. Many different integration formulas, similar to the Newton–Cotes formulas, were obtained in this study. These new integration formulas can be used in many engineering, mathematics, and physics research applications. These new integration formulas may also be compared with the application of Newton–Cotes formulas in different fields of science. There seems to be a strong relationship between the order of degree of the polynomials and the order of the truncation errors. The present article opens a new field of research to obtain numerical methods for derivation, integration and resolution of differential equations. Another interesting observation is that the interpolation by splines approach generates a smooth adjustment between intervals and a minimum variation in curve fitting, which can help in the stability of the integration. It is believed that more new and important research on this subject can be made, given the tremendous evolution of numerical methods in engineering.