Introduction

Geological hazards associated with rock mass stability are widespread, and several natural factors act in predisposing many types of geological environments to landslides and rock failures, covering a wide range of size and volume. A rock mass is defined as formed by blocks of rock material characterized by systems of discontinuities that significantly affect its mechanical behavior, predisposing peculiar types of kinematics (Bieniawski 1989). For this reason, the issues related to the study of rock masses have always been of primary importance. For protection of the territory, safeguard of human lives and urban areas, and of natural and cultural heritage as well, it is essential to define the susceptibility to rock instabilities, identifying the potentially affected areas. The orientation of discontinuities, as well as other properties (i.e. spacing, persistence, roughness, infilling, weathering), have a capital importance on the geo-mechanical behavior of the rock mass (Bieniawski 1973; Calcaterra and Parise 2010; Piteau 1970). These characteristics play a crucial role and contribute to the geo-structural and geo-mechanical characterization of the rock mass and are defined as the key features for a complete survey according to the 1978 ISRM recommendations (ISRM 1978). In case of soluble rocks such as carbonates and evaporites, the presence of voids related to karst dissolution should also be taken into account (Andriani and Parise 2015, 2017; Palmer 2007), since these features strongly control the water flow, with significant effects as concerns stability (Parise 2022).

To quantitatively describe the structural set-up of rock masses, new methods have been proposed in the last decades, with particular reference to LiDAR (Light Detection and Ranging) and Photogrammetry, both often coupled with RPAS (Remotely Piloted Aircraft Systems), as essential tools for the geomechanical analysis (Abellán et al. 2014; Barnobi et al. 2009; Jaboyedoff et al. 2012; Oppikofer et al. 2009; Tomás et al. 2020; Viero et al. 2010). Application of these methods allows the acquisition of high resolution geo-localized 3D point clouds over large areas, in a relatively short time. The general theme, represented by the use of remote systems for the definition of a quantitative assessment of rock mass instability, is today a subject of great interest: the recent scientific literature demonstrates that these methodologies still require the development of innovative survey techniques and data analysis, in order to fully define their potential, advantages, but above all to ensure their reliability (Abellan et al. 2016; Cardia et al. 2022; Ferrero et al. 2009; Galgaro et al. 2004; Loiotine et al. 2021a, b; Pagano et al. 2020; Riquelme et al. 2014, 2017; Slob 2010; Slob et al. 2005). For several years researchers made efforts to implement new algorithms to standardize the extraction of primitive geometries from 3D point clouds, in order to identify detailed geometric characteristics on scanned structures of the real world and, therefore, to identify discontinuities and blocks on rock masses (Borrmann et al. 2011; Dewez et al. 2016; Hammah & Curran 1998; Jaboyedoff et al. 2007; Li et al. 2019; Lombardi et al. 2011; Menegoni et al. 2021; Roncella & Forlani 2005; Schnabel et al. 2007; Tran et al. 2015; Xia et al. 2020). New algorithms have to be reproducible, ready to use and, possibly, fast, automatic, and reliable. The problem with fully automatic methods is that they have several limitations, such as incorrect computation of the normals, or uncertainty in the automatic input of parameters, which can greatly vary from case to case. Results can often be misleading, due to the geometrical peculiarities normally presented by a rock outcrop. Even with the huge progresses done in the last decades for softwares able to manage and analyze point clouds, acquiring the geo-mechanical data requires necessarily the control by experts in order to select the planes of geological significance, and delete all those related to anthropogenic works. This is a crucial point to highlight, since our firm belief is that it is not possible yet to entirely rely on automated systems.

Therefore, a correct and complete characterization of a rock mass cannot disregard the importance of an analysis carried out by an expert user, together with the precision of unbiased input parameters. Thus, a previous analysis conducted by means of traditional methods, or in situ observations, is crucially important to standardize the subsequent process for a reliable definition of the principal characteristics of both rock blocks and discontinuity systems. The traditional methods of surveying rock outcrops require the work of specialized geologists to carry out accurate and time-consuming surveys on sites that, especially in steep mountains and underground environments, are logistically difficult, often requiring intervention of rock climbers. This operation, which is definitely not simple due to logistics, requires multi-disciplinary skills. Consequently, there is the risk of making sampling errors, or to collect insufficient data for a complete geo-structural characterization. For this reason, in situ observations should be coupled with those acquired through new technologies, to have more robust and reliable data to perform geostatistical analysis. Moreover, rock outcrops always present complex geometrical relations among joint, faults and fractures; it is therefore necessary to detect and evaluate ranges of angular values for the discontinuities, rather than extracting a single value.

Starting from the above considerations, this paper proposes a different approach for the semi-automatic evaluation and extraction of discontinuity sets affecting rock mass stability, using 3D point cloud data. The work is structured as follows: in this section, we have summarized the main objectives and a brief state of the art regarding rock mass stability analysis; the main novel contributions of the proposed method is presented in Section "Methodology", which includes the methodology for the discontinuities evaluation/extraction, and a description of the material used: it illustrates (a) the user-supervised method of reducing the point cloud density following a minimum threshold and the coplanarities using the Iterative Pole Density Estimation (IPDE), (b) the semi-automatic identification of discontinuity sets using a Kernel Density Estimation (KDE) analysis coupled with a manual sets selection or automated clustering, and (c) the user-defined extraction of the main sets with the Supervised Set Extraction (SSE). Section "Case studies" summarizes the results of the method applied to two case studies. In the second case, this is also done by comparing a density estimation of the various set of discontinuities made with little or no previous manual intervention on the whole point cloud, with a supervised method made before performing the density estimation, linking at last both the outcomes with the results from traditional analysis. Section "Discussion" discusses the most important matters of the results, and, eventually, the last section explores the future perspective of the research. This paper includes the public availability of the complete programming code used, along with the mention of all used modules, in order to provide the method validation for other researchers.

Methodology

The proposed method aims to evaluate and extract structural discontinuities from 3D digital point clouds of rock outcrops. Starting from a set of raw data points (D), that could be represented by the point coordinates X, Y, and Z, the surface described by the points is limited by discontinuities, which can be classified into sets, each defining a single plane, with a more or less wide angle of tolerance in terms of orientation. The proposed methodology is developed through the following 4 main steps: (1) determination of the approximate orientation of every normal of the points in D; (2) statistical analysis of D with IPDE, which consists in detecting all relevant poles of planes on the rock mass, representing the different sets of discontinuities; (3) manual or automatic clusters identification: localization of the points defining different clusters in space; and (4) extraction of the identified sets with SSE.

Materials

The surveys were carried out with a RIEGL VZ400 laser scanner, and integrated with a photogrammetric survey from RPAS, performed with the aircraft Italdron 4HSE, equipped with a Sony Alpha 7 (24.7 MP sensor) digital camera. The scanner is a class 1 laser scanner, with a detection range > 500 m, equipped with a high-definition digital camera Nikon D700 (12.1 MP sensor) and an integrated inclinometer, lead laser, GPS system and compass. It performs high velocity acquisition, precisely more than 120.000 pts/sec with 3 mm precision. The drone is assembled for critical scenarios, it reaches a maximum flying height of 150 m and has a maximum range of 1,5 km. The sensor size is 35.8 × 23.9 mm (42 MP). It is equipped with a parachute, and a 3 axes gimbal. Further, it has an autonomy of 30 min and a wind resistance up to 50 km/h. The whole analysis of acquired data was conducted on an Intel i5-8250U 8th generation 1.60 GHz processor, 16 GB DDR4 RAM, integrated GPU and Windows 10 operating system. The georeferencing of the point clouds was done using a Leica GS14 dual-frequency GNSS and a Leica TS16 total station. The adopted system is the WGS84 UTM33. For the visualization of the point clouds, the internal software of the laser scanner RiSCAN PRO 2.0 was used, and the results were visualized by means of the open GPL software CloudCompare v2.11beta.

Procedure

Before performing the main steps of the method, in the pre-processing phase it is necessary to clean the point cloud by manually removing all spurious elements without geological meaning (e.g. vegetation, floor points, walls, etc.) as these areas could lead to unnecessary computation. A point sub-sampling is also carried out to reduce the large density of points in excess, especially for small areas, but above all to lighten the cloud and limit long computation times, not leading to any further benefit for the recognition of discontinuities. This step was carried out using the sub-sampling algorithm in CloudCompare, a method that removes double points (points with equal values present around a specified radius), and does not affect the resolution of the three-dimensional cloud.

  • Step 1: Determination of the orientation of the points

For this first step the values of the normal of every point in the initial dataset D have to be calculated. To compute normals on a point cloud, the local surface represented by a point and its neighbors must be estimated. This can be done by a Principal Component Analysis (PCA), that specifically computes eigenvectors of the covariance matrix C based on the local neighborhoods of each point, the equation of which can be written as:

$$C=\frac{1}{N}\sum_{i=1}^{N}\left({p}_{i}-\mu \right){\left({p}_{i}-\mu \right)}^{T}$$
(1)

where p1 … pN are the points in a given neighborhood, μ is the centroid of the points and T in the apex indicates the transpose of a matrix formed by exchanging rows into columns. The resulting eigenvector for each neighborhood of points, that can be chosen a priori, therefore represents the vector relative to the normal for each point (a, b, c). The decomposition of the eigenvectors on x, y and z for each point gives the component of the normal. Given that (a, b, c) = n(x,y,z) is the vector of a single plane derived by the equation ax + by + cz + d = 0, if then the condition nx + ny + nz = 1 is satisfied for every point, the procedure goes on, otherwise the result must be divided by the normal vector. Then, normals need to be adjusted in terms of orientation, because this method computes only the direction of the vector. It is therefore a common procedure to determine a heuristic preferred orientation (± X, ± Y, ± Z); in most of the cases this will be + Z. Such operation can be done by placing the condition |ni| on every value present in the third column of the resultant 3-dimensional normal matrix. A fast and accurate way to compute normals is to use the computation plug-in available in CloudCompare: it performs the analysis after the user has chosen a local surface model (plane, 2d triangulation, quadric); then, the default neighbors extraction process relies on an octree-cell structure. The user must only choose the neighborhood radius, and all points relying in that radius will be used for the PCA analysis. Eventually, it proceeds with the user-specified heuristic orientation or with the Minimum Spanning Tree (MST) algorithm, which allows the user to determine again a radius by choosing a number of nearest normal neighbors, useful to compute the orientation. In this case a sort of region growing process attempts to re-orient all normals in a consistent way: it starts from a random point, and then propagates the normal orientation from one neighbor to the other. We calculated normals with the PCA method, and then relied on MST to have uniform orientations. However, the level of the initial noise and the number/distance of neighbors will change the surface of the 3D structure, so that the actions above cannot be done disregarding the previous pre-processing point cloud cleaning and a case sensitive parameters input. After this sub-step, it is necessary to convert the normal values into dip and dip direction values through a standardized method of conversion, lastly revised in Kemeny et al. (2006) and generally available on line (Girardeau-Montaut 2016). The dip (δ) value expressed in radians for every point is determined by the equation:

$$\delta =\mathit{arctan}\left(\frac{\left|{n}_{x}\right|}{\left|{n}_{z}\right|}\right)$$
(2)

Where nx and nz are the components of the normal values, respectively along the X and Z axes. Since the result of this equation is in radians, it is sufficient to convert it in degrees by multiplying every value for 180 and dividing for π. The same occurs for dip direction (γ) values, by means of the equation to express it in radians starting from every point normal triplet:

$$\upgamma =\mathit{arctan}\left(\frac{\left|{n}_{x}\right|}{\left|{n}_{y}\right|}\right)$$
(3)

Where ny is the component of the normal value expressed along the Y axis. Conversion to degrees is done in the same way as for dip values. The only difference is that, in order to compute correctly the γ values before the conversion, a mask has to be made for every quadrant, except for the first, to re-orient the values with the 0 along the + Y axis and to compute the right azimuthal values (0–360). The structure of the masks is summarized in Table 1.

Table 1 Structure of the masks made to compute the right azimuthal values along the Y axis
  • Step 2: Statistical analysis of datasets with the IPDE

This step is fundamentally based on recognition of the parallel orientation associated with the normals to the points. A certain number of given sample points (S) are considered to perform the analysis. Starting from a random point, the algorithm iterates S times the analysis over D, searching for every point with the same orientation as the sample point, in a given range of tolerance for both dip and dip direction. This is very straightforward, since for every point an orientation matrix with these two values has been previously generated. Given the tolerance angles, very planar or almost planar features present in D are evaluated, and each time one of these has at least a density greater than a given threshold (K), the system keeps and stores it in a new dataset (D1), that will be used to perform the subsequent analysis and plot the data. Values below the threshold will be discarded, as they probably represent minor discontinuities or transition edges between discontinuities, most likely non-geological planes. An overall look at the functionality of the algorithm is provided in alg. 1 and in Fig. 1. Next, the analysis is performed by means of the stereographic projection of the plane poles for data plotting. Every point orientation matrix is converted to stereographic projection, and the density of the poles for each region of the projection is calculated. The statistical analysis calculates the distribution by means of the KDE technique, a non-parametric way to estimate the probability density function of a random variable (Silverman 1986). This implementation was done through Gaussian kernels, useful for visualization purposes. Kamb and Schmidt density estimation methods were also used; these are instead parametric methods very similar to KDE, as they process Gaussian curves by discarding all points below a certain given threshold (Vollmer 1995).

Fig. 1
figure 1

Schematic flow chart of the IPDE

Algorithm 1
figure a

Iterative Pole Density Estimation in pseudocode

  • Step 3: Manual or automatic clusters identification

At this step it is possible to visualize the results on a stereographic projection, and the user can select cluster points manually or go for an automatic clustering that will give as output the values of the centroids of each identified cluster. As KDE allows calculation of the width of the kernels (e.g. bandwidth) and computation of their density, it is possible to see the plot ranges of values with local maxima; the user in practice can hoover the mouse and click on the plot while this is interactively showing dip and dip direction values. The selected values are printed as output when the plot is closed. The same happens if the automatic clustering is chosen, with the difference that no point can be selected, but the user can compare his/her observations on the stereographic projection along with the output values. In any case, every plot is stored in temporary file, which the system is able to reload again for further observations, without the need to re-perform the analysis. The types of clustering implemented are the K-means and the Gaussian Mixture. The first one is implemented with its “ +  + ” version (Arthur & Vassilvitskii 2007), that uses a smart procedure to initialize the cluster centers before proceeding with the standard k-means optimization: the first cluster center is chosen uniformly at random from the data points that are being clustered, after which each subsequent cluster center is chosen from the remaining data points with probability proportional to its squared distance from the point's closest existing cluster center. The second type of clustering relies on a method that computes a finite-dimensional model, performing a hierarchical evaluation consisting of a set of parameters, each specifying the degree of belonging of a component (point) to the corresponding mixture (Figueiredo and Jain 2002). If the mixture components are Gaussian distributions, as in this type of process, there will be a mean and variance for each component. The two described techniques are generally good clustering approaches for a high number of sample points, and are very similar also in processing and results (Table 2). The main difference relies on the metric technique to evaluate the geometrical relations between clusters: K-means uses distances between points, whilst the Gaussian Mixture takes into account the Mahalanobis distances (1936). The main issue is that the number of clusters to evaluate has to be known previously, and must be specified manually. It can be also evaluated with common indexes estimators such as the Calinski-Harabasz Index (CHI), Silhouette or Davies-Bouldin, all of which could be used independently to evaluate the optimal number of clusters at the expense of processing time (Singh et al. 2021). At the moment, no such estimator has been yet implemented in our method, as these constraints could lead to misinterpretation of data and provide in some cases biased results.

Table 2 Comparison of the clustering algorithms implemented in the workflow
  • Step 4: Extraction of the identified sets with SSE

This step is intended to isolate the clusters of discontinuities (sets) to the point cloud, giving D as input and resulting in various minor datasets (kn,…,kn+1), stored by the system as new files. Again, the procedure is very straightforward, since D has orientation values already computed; in practice, it only needs to be specified how many sets one wants to extract and, subsequently, an interval of tolerance for both dip and dip direction values for each set. In addition, it must be specified if a set has complementary values to highlight on the point cloud, and which could be useful for later analysis. The meaning of specifying complementary values is that very often poles of interest with high dip angle (e.g. > 80°), could be included in the cloud points; thus, they will be considered as part of separate clusters, but in fact they are not. In these cases, the system generates a single set instead of splitting it into two. In practice, SSE is a tool that filters the original point cloud as the user controls the process step by step, not leaving any choice to the machine. In this way the user can compare the results of the other steps, and decide if they are useful to the last evaluation, or repeat the analysis tweaking the parameters, or, alternatively, extract the sets identified with other methods. In this sense, SSE is a stand-alone tool that can be used independently from the algorithms and methods seen above, and can be defined as a fully-supervised method of extracting clusters (Alg. 2; Fig. 2).

Fig. 2
figure 2

Schematic flow chart of the SSE

Algorithm 2
figure b

Supervised Set Extraction in pseudocode

Case studies

Description

The first case study is the Cocceio cave, located in the Phlegrean Fields of Campania, in southern Italy (Fig. 3). This artificial cave belongs to the category of military works in the classification of artificial cavities proposed by the dedicated UIS (International Union of Speleology) Commission (Parise et al. 2013). Built in Roman time, around 37 B.C. (Beloch 1989) by Lucio Cocceio (from whom it takes its name), it was entirely dug into the tuff, with 6 vertical wells crossing the above hill to reach the ground surface (Pagano et al. 2018), in order to connect Cuma (fortification and lookout point on the Domitian-Phlegrean coast) to Portus Iulius, an important military infrastructure reaching, through a series of canals and basins, the Gulf of Pozzuoli. During World War II the cave was used as an ammunition depot and suffered severe damage following blastings that generated an explosion cap, strongly susceptible to collapses of rock blocks. The cave, large enough to allow the passage of two wagons, extends for 950 m, has trapezoidal section and straight, slightly uphill, course. Since the post-war period, it underwent reclamation works without ever being fully consolidated. The Cocceio Cave insists in the western sector of the Phlegrean Fields (Fig. 3, 4), a large volcanic field whose origin is connected to the tectonic events related to the opening of the Tyrrhenian basin, as the result of two large caldera collapses linked to the eruptions of the Campanian Ignimbrite (39,000 years ago) and of the Neapolitan Yellow Tuff (15,000 years ago) (De Riso et al. 2004; Di Girolamo et al. 1984; Rolandi et al. 2003). Within this latter the cave was entirely excavated, through a sequence of massive to pseudo-stratified ashes, with abundant accretionary and lithic lapilli, and subordinate pumice. Surge deposits are distributed over an area of about 34 km2, while fall deposits are found only on a restricted area to the north (Di Vito et al. 2011; Lirer et al. 2011). After being put in place, these materials were affected by zeolitisation processes which led to the formation of lithoid facies characterized by intense fracturing.

Fig. 3
figure 3

Geographical location of the two case studies

Fig. 4
figure 4

Photographs inside the Cocceio cave, showing one of the authors taking measurements of the discontinuities in the traditional way, with compass and inclinometer. The red line drawn with a spray represents the layer where the 3D point cloud was subsequently cropped to start the analysis

The second case study is part of the slope, along with a natural cave, standing below the medieval watchtower of Cetara, on the Amalfi Coast in Campania (Fig. 35), which later became a fort and a prison, and nowadays houses a civic museum. The total scanned area is about 150 m-long and 60 m-high, with Mesozoic limestones and dolostones as the main lithotypes. The Cetara cliff refers to the lower Jurassic Upper Dolomite formation, more precisely to a laminated bioclastic dolomite (Pappone et al. 2009), representing the bedrock, locally covered by loose pyroclastic and lithoid deposits of the Campanian volcanic systems.

Fig. 5
figure 5

Photographs of the scanned portion of the Cetara cliff, with details of the two main areas. To the right, it is also visible the entrance to the cave and part of the above buildings

In the SE portion of the Lattari Mts., where Cetara falls, the dolomitic rocks have been intensively modeled by fluvial incisions, which almost completely canceled the morphological relics of the ancient base levels. The geo-morphological structure is characterized by a significantly faulted bedrock, due to various Miocene to Quaternary tectonic phases. Over the carbonate rocks, Quaternary detrital deposits (breccias on the slope, gravels and conglomerates of alluvial deposits) and pyroclastic deposits rest discontinuously. The alluvial deposits (gravels and sands) are found in the main valley bottoms where most of the inhabited centers arise. In particular, the town of Cetara rests on a large fan built through numerous debris-flow and alluvial events, highly frequent in the Campanian geological settings (Calcaterra et al. 1999, 2000, 2003; Vennari et al. 2016).

The survey at the Cocceio cave (Fig. 4) concerned a portion of the cave vault and the ground level over it. It was carried out with laser scanner, which acquired a georeferenced point cloud consisting of 349 million points, with a density of 350,000 points per square meter (Fig. 6). The survey consisted of 10 scan positions (9 inside the cave, 1 outside), integrated with a photogrammetric survey from RPAS. Each single scan indicates a sufficient number (minimum of 4) of georeferenced targets (control points). The registration error between different scans varies from 2 to maximum 4 mm. The control points were georeferenced by survey celerimeter with total station and GPS. The survey was carried out through a polygonal survey line starting from the outside to within the cave.

Fig. 6
figure 6

Cocceio cave 3D point cloud. a Point cloud resulting from the combined scans. b Portion of the cave vault (seen from one of the sides of the cap) used for the analysis

The survey at Cetara (Fig. 5) was performed without the drone but with the same laser scanner, acquiring a point cloud of about 570 million points, with a density of 700,000 points per square meter on 9 scan positions (Fig. 7). The targets, positioned on the ground, were georeferenced with GPS. Georeferencing of the entire group of targets by means of GPS recorded an average error of about 4 mm. The scans were merged with the C2C (cloud to cloud) method, i.e. the best possible overlap between two neighboring scans which results in the smallest error (registration error of the entire group was 1 mm).

Fig. 7
figure 7

3D point cloud at the Cetara cliff. a Point cloud of the cliff, resulting from the combined scans. b Portion of the point cloud used for the analysis

Results

After manual cleaning and sub-sampling, the point cloud of the Cocceio cave, only referring to the rock outcrop, turned out to be about 2 million points, with average density of 20,000 points/m2, and that of Cetara about 760,000 points, with average density of 8,500 points/m2. IPDE was used in its basic version to estimate its effectiveness on the first case study point cloud, where the functioning of the basic version has a fixed tolerance range of ± 5° for both dip and dip direction. Parameters used for the starting cloud were S = 100,000 (~ 5% of total points) and K = 30,000 (density for the coplanarity pole of ~ 1.5% of total points) for the first case study. The result is a cloud consisting in about half the starting points, with average density of 11.300 points/m2, where the main systems of discontinuity are highlighted (Fig. 8). With this method the four main sets recognized with the traditional analysis have been identified, even though, due to high density and their vicinity, k1 and k2 sets are merged in the resulting plot. They can be split for extraction by relying on the observation made previously on the rock outcrop. Interestingly, the k4 set, not identified in previous studies (Pagano et al. 2018), came out clearly as a separate set. The two sets referring to bedding planes are merged in the resulting plot, with most of the points lying approximately towards south, at about 180°, revealing a slightly higher density of s1 in a kind of weighted average between the two poles identified on-site. This outcome confirms the previous analysis made on the point cloud by Pagano and coworkers (2018). For the automated clustering evaluation, we gave as input for the number of clusters a total of eight, six of which referring to k1-k4 (i.e. two taking into account k1/k2 merging stage and their complementary value, two for k3 and k4, two for complementary values of these sets), one to the bedding plane, plus another as a bonus to detect either the extra bedding plane that we had from the traditional acquisition or another set. Both K-Means and Gaussian Mixture clustering have been performed on the IPDE resulting point cloud (graphical results in Fig. 9, output centroids of clustering methods and values of points selected manually in Table 3). Eventually, combining the results from traditional techniques (Table 4; Fig. 10) and the new approach, five sets were extracted with the previously given tolerance angle (Table 8).

Fig. 8
figure 8

Cocceio Cave: point cloud resulting from the IPDE, highlighted on the original cloud and colored with scalar values indicative of the dip direction

Fig. 9
figure 9

Results at the Cocceio Cave: a KDE performed on the whole point cloud. b Stereographic projection of the IPDE and subsequent KDE – points of the point cloud are spread on the plot in blue color; red crosses are the manually selected centers of the poles. c K-Means clustering performed on the result of the IPDE. Note that the isolines in this type of classification have different colors for each cluster, and are smoothed compared to the classical KDE. d Gaussian Mixture clustering performed on the result of the IPDE. The red dots in both (c) and (d) are the automatic centroids of the clusters resulting from the analysis

Table 3 Cocceio Cave: resulting numerical values of the clustering on the IPDE result
Table 4 Cocceio Cave: discontinuity sets recognized by traditional techniques. Being in presence of a tuff formation, the trend of the layers is the consequence of its emplacement, or pyroclastic flows of the surge or flow type. This originates a kind of folded layers, likely due to the curvilinear trend that characterizes pyroclastic flows
Fig. 10
figure 10

Cocceio cave. a Stereographic projection with the sets identified through traditional methods. b Detail of the rock outcrop, highlighting some of the major sets identified during the in situ survey. Note that both K4 and S2 are not shown in the picture, due to their relatively scarce presence

At Cetara, on the graphs resulting from the KDE analysis performed on the cloud, density poles not compatible with the outcomes of traditional methods went to mask the sets recognized before (Fig. 12a). It was thus realized that some of these poles (two in particular, with dip direction ~ 125°, and ~ 350°), were planes of anthropogenic origin, not considered in the on-site analysis. By manually extracting such planes, filtering the cloud by eliminating them, and performing again the digital density analysis (i.e., KDE and Kamb/Schmidt) and the Gaussian Mixture clustering, the planes recognized in the traditional surveys were actually identified (Fig. 11, 12b, Table 5, 6). Subsequently, IPDE was also performed on the total cloud. This made possible to lighten it, eliminating minor discontinuities and non-geological planes between discontinuities (Fig. 13). The parameters utilized in this case were S = 70,000 (~ 10% of total points) and K = 20,000 (density for the coplanarity pole of ~ 2.5% of total points), and the cloud passed from ~ 760,000 to ~ 580,000 points, with average density of 7,000 points/m2. Manual filtering was required to eliminate the anthropogenic planes which, having abundant densities, were still identified by the algorithm. The final result (~ 520,000 points) was relatively in agreement with that of the traditional survey: the main three sets were identified, with the addition of two others, which were not considered significant during the in-situ surveys (Fig. 14; Table 7). The latter refer to complementary values of 2 out of the 3 sets recognized by the digital analysis. The midpoints for the construction of the poles to extract have an orientation of + 5° for dip direction of k1 and + 4° for k3, when compared to those identified in the traditional analysis. Taking into account the nature of the rock mass, composed mainly by tuff (see previous chapter), that in this area is typically formed by blocks with surfaces showing evident undulations, sets with an angular tolerance of ± 10° for dip direction and ± 5° of dip have been subsequently extracted, being the points of the poles very scattered in the stereographic projection (Table 9).

Fig. 11
figure 11

Cetara cliff. a Stereographic projection with the sets identified through traditional methods. b Detail of the rock outcrop, highlighting the major sets identified during the in situ survey

Fig. 12
figure 12

Cetara case study: a KDE on the whole point cloud. b KDE with Gaussian Mixture clustering on the point cloud resulting by the manual intervention; the more biasing influent pole at ~ 125° of dip direction was eliminated – the red dots are the centroids of the clusters resulting from the automatic analysis. c Kamb evaluation on the partially cleaned point cloud. d Schmidt evaluation on the partially cleaned point cloud. For these two types of evaluations the density of points per pole is highlighted with a clear color ramp

Table 5 Cetara case study: discontinuity sets recognized by traditional techniques
Table 6 Cetara case study: resulting numerical values of the Gaussian Mixture clustering on the cleaned dataset
Fig. 13
figure 13

Cetara case study: point cloud resulting from the IPDE, highlighted on the original cloud and colored with scalar values indicative of the dip direction

Fig. 14
figure 14

Cetara case study results of the IPDE. Points of the point cloud are spread on the plot in blue color (a) KDE performed on the point cloud without manual intervention. (b) KDE performed on the point cloud where two poles referring to planes of anthropogenic origin (dip direction = 125°, 350°) were eliminated. Manual points selection

Table 7 Resulting numerical values of the manual clustering on the IPDE result

Discussion

At the Cocceio Cave the fully automated method gave reliable results, compared with the analysis performed through traditional techniques; these latter served as ground truth to have an analytical basis to integrate and compare the digital analysis. In this case it also proved unnecessary to proceed with any manipulation for manually eliminating non-geological features before starting the whole evaluation. Both clustering methods are relatively precise and correlate well with the in-situ observations on the point cloud when executed on the IPDE result. This algorithm, in fact, provides a standalone solution to mitigate common errors coming from performing density and clustering analysis directly on the whole point cloud resulting from the pre-processing stage. By eliminating points lying below a certain case-sensitive threshold given by the user (the parameter K of the algorithm), that in this case was of 30,000 points, corresponding to ~ 1.5% of the total of the pre-processed cloud, it is possible to automatically take into account for the analysis only features having geological meaning, i.e., planar or almost planar rocky surfaces, leaving aside many useless points in the cloud, not to be considered for the analysis. This process, in addition to speeding up the computation, tends to minimize the errors due to the evaluation of curved (e.g., passages from one discontinuity to another) and small surfaces (e.g. tiny fractures), thus leading the results towards the expectations of the user’s observation conducted by traditional method, and improving it. This important feature is rarely taken into account by the scientific literature on detection of discontinuity sets (Dewez et al. 2016; Jaboyedoff et al. 2007; Hammah and Curran 1998; Menegoni et al. 2022; Pagano et al. 2020; Riquelme et al. 2014; Singh et al. 2021; Tomás et al. 2020); a relatively new method explores the concept of getting rid of the rock surfaces with too pronounced curvature (Tsui et al. 2021) and, in this sense, a possible future perspective for this type of work could be the application of machine/deep learning algorithms to the classification of 3D point clouds, to improve the accuracy of object segmentation and identification process. Several approaches of this type can be found nowadays in the literature, even though most of them are not directly connected to geology; among these, it is worth mentioning the works by Li et al. (2019), representing one of the first use of machine learning methods to recognize primitive shapes within 3D point clouds, and that by Mammoliti et al. (2022), who have been the first to explore the possibility of using such systems for the recognition of geological features on the results of digital surveys. The K-means clustering gave reliable outputs, since the isolines are eventually smoothed, and thus the results are even more defined. This type of clustering is considered in fact reliable and useful when the clusters to detect on a rock mass are consistent, for example more than four/five sets, in rather complex cases. It relies on point distances and the centroid of each cluster would be a sort of weighted mean of the cluster itself, resulting in good outcomes when evaluating many sets, i.e. the more the clusters, the more precise the analysis on a rock mass would be. In this case the surface layer pole was split into two, giving as output two very close centroids, one of which can be precisely correlated with the result of the digital evaluation previously performed (Pagano et al. 2018). The Gaussian Mixture, on the other hand, splits k1 and k2 on their main orientations, while the complementary values of the two poles are unified in a single one; it also leaves the bedding plane in a unified pole cluster, the centroid of which is pointing approximately towards south (Fig. 10d). This method is more suitable for cases with low number of clusters, since it strongly relies on the point density, then giving outputs that closely match the traditional observations or the manual clustering on digital results when the sets to detect are few (from two to a maximum of five/six, as we were able to verify in the second case study). Finally, the manual selection of points of interest is also a precise and effective method to deal with the stereographic projections. This method is simple and intuitive and instead of giving automatic outputs to be validated later, by adding or deleting what the program recognizes (Dewez et al. 2016; Jaboyedoff et al. 2007; Riquelme et al. 2014), it gives to the user the possibility to choose the most significant poles directly on the stereoplot. Finally, the extracted sets through the SSE, are a sort of mean among the analyses performed (Fig. 15; Table 8).

Fig. 15
figure 15

Extracted sets highlighted on the original point cloud at the Cocceio Cave: a k1, b k2, c k3, d k4, and e s1. The software scale is in meters

Table 8 Numerical values of the extracted sets at the Cocceio Cave
Table 9 Numerical values of the extracted sets at Cetara

At Cetara, instead, a fully automatic procedure would have led to detect also planes not relevant for geological purposes, and the density of these would have masked completely some sets recognized in situ. Therefore, starting an automated clustering, or even performing IPDE, we would have had biased outcomes This is because these planes, namely two walls, had an high mole of points density to be eliminated by the IPDE without resulting in a significant loss of geological poles. Nonetheless for the Gaussian Mixture it was chosen not to eliminate one of these, i.e. the pole at dip direction 350°, as in this case we realized after a density analysis that, unlike that at ~ 125°, it was not going to hide any main set. Both were anyway eliminated before starting IPDE to further lighten the computation and have more defined results. In this case, too, the sets were extracted by wisely mixing the results from the various methods, including the traditional one (Fig. 16; Table 9). Comparing the KDE/Kamb-Schmidt evaluation timing, to perform the KDE and plot the resulting data, the computer takes about 10 min for the Cetara case, compared to about 3 for Kamb's analysis and even about 1 and a half minute for Schmidt's. These results point to the implementation of the last two as a standard. However, the fact that the accuracy of traditional KDE is greater, makes it preferable not to completely abandon this method, and induces, on the contrary, to use the approaches in a combined manner, aimed at comparing the results in terms of accuracy. In case quick preliminary analysis is needed before undertaking more in-depth investigations, the Kamb/Schmidt system could be used from the beginning.

Fig. 16
figure 16

Extracted sets highlighted on the original point cloud at Cetara: a k1, b k2, and c k3. The software scale is in meters

Conclusions

Stability assessment of rock masses plays a crucial role in the mitigation of the related risk. Any rock mass characterization requires the acquisition of information on discontinuities at the outcrops, which is an essential step for the fully comprehension of their kinematic behavior.

With this work we presented an alternative perspective on the analysis of data acquired by means of 3D scanning techniques. In detail, we highlighted and stressed the importance of the role by the expert, without entirely relying upon the use of the machine; even though a great improvement in workflow automation is obtained using the proposed methodology, a solid background in structural geology and rock mechanics is always needed to make the right choice throughout the different steps. This must be coupled and integrated with field reports from in-situ surveys, aimed at gaining the best visual recognition of the results. Further, we pointed out the importance of considering ranges of values to describe clusters of poles and to extract them, instead of fixed orientations values, as it is nearly impossible to quantify the orientation of a discontinuity set by using a single number in most rock types.

In its first part, the method presented is useful for highlighting the main systems of discontinuity and does not take into account numerous minor planes of transition among sets or smaller fractures, thus lightening the point cloud to avoid loss of information on the density of the major poles. This type of approach facilitates both dimensional and statistical analysis, and speeds up the generation of stereographic projections by increasing the accuracy in the recognition of density poles through the use of well-established methods and algorithms for their visualization and detection. Eventually. The packages utilized to develop the code implemented in this workflow are listed in Table 10.

Table 10 List of the main python packages utilized

The second part of this method focuses on the utility of a fully supervised function for the extraction of discontinuity sets. After comparing the various outputs that different methods can provide, and having established the relationships among digital and traditional methods, it is strongly advisable to decide which intervals referring to individual sets to extract, without leaving the machine to perform automatically the isolation of the main families present on the rock mass. This could enhance the overall workflow, leaving to the user the possibility to filter and extract as many sets as possible, to repeat the procedure also in order to widen or narrow the given interval depending on the case, or to isolate and eliminate those portions of the 3D point cloud potentially inducing to errors or to unexpected peaks when performing a simple density analysis.