Introduction

The so-called fourth industrial revolution, Industry 4.0, is undergoing a fast development thanks to the advances in Industrial Internet of Things and artificial intelligence (Cohen et al. 2019). As a typical application of Industry 4.0, smart industrial systems includes a wide range of sensors and processors all over the manufacturing process (Kurfess et al. 2020) to monitor equipment and timely discover faults. In this framework artificial intelligence algorithms are used for automated quality control and surface defect inspection tasks in the manufacturing industry (Gao et al. 2006; Hao et al. 2020; Pimenov et al. 2018; Tabernik et al. 2020; Židek et al. 2020). In such applications, digitization plays an increasingly important role, as it is essential to create accurate digital representations of the as-manufactured part that could then be sent to software for quality assessment or defect detection (Teutsch 2007). In many industrial applications such as the one showcased in this paper, digitization takes place at an intermediate stage of the manufacturing chain and not necessarily on the finished product. Interestingly, the objective of intermediate digitization would not be to make a decision on whether to accept, reject or rework a part, but rather to adapt the subsequent manufacturing operations to the specifics of the part being produced in an automated way. Digitization, in combination with machine learning and artificial intelligence algorithms, has already demonstrated its potential to bring new levels of accuracy and productivity to the industry (Jack Feng and Wang 2002; Židek et al. 2020).

Despite the benefits it brings in terms of shape identification, we must include the digitization time in the total manufacturing workflow time, which includes the acquisition time plus the data processing time. Sometimes the digitization time may be negligible compared to the overall processing cycle. However, in many industrial cases digitization is concerned with large parts that have to meet tight geometric tolerances, requiring a detailed scan of the outer surface and the acquisition and processing of massive data (Drouot et al. 2018; Poulhaon et al. 2014). In this sense, the introduction of a complete digitization in the manufacturing process may have a negative effect in the total production time, which is in conflict with the desired requirements for the introduction of artificial intelligence in the industry (Lee et al. 2020).

In this work, we consider as an application the mechanical machining of large aeronautical fuselage panels which are about six meters long and two meters wide, with small thickness in the order of few millimeters and double curvature, see Fig. 1. These panels undergo a milling process to reduce the thickness locally in order to optimize the overall weight-resistance ratio of the panel. Because of their pronounced slenderness, the milling process requires sophisticated technology, including special supports and the use of a mirror milling system (Panczuk and Foissac 2008), with a milling cutter on the machined face and a counter-bearing on the opposite side.

Thickness tolerances are in the order of 0.1 millimeters. Significant geometric variability is observed from one panel to another, even if they belong to the same batch. Presumably, variability originates from, at least, two sources: (i) previous manufacturing stages, including a metal forming process that precedes milling; and (ii) the positioning of the part on the support system, as the part is likely to deform significantly under its own weight. As a consequence, the milling toolpath cannot be designed according to a CAD model as it would not be an accurate representation of the real part. Instead, panels must be fully digitized before machining in order to adapt the milling path to their actual geometry. Both the acquisition time and data processing (generation of CAD files and adaptation of the machining path) have a negative impact on productivity.

In light of this practical case, it is reasonable to assume that there are similar issues in other manufacturing processes and even beyond the industrial sector. For example, the interested reader can refer to the references below related to the medical imaging community. Leaving aside possible technological improvements regarding the data acquisition procedure (see e.g. D’Apuzzo 2006; Khalfaoui et al. 2013; Mian and Al-Ahmari 2019; Qian et al. 2021 in this regard), it is straightforward that a reduced sampling of the surface leads to a shorter scanning time, at the cost of having an incomplete measurement of the panel. The challenge we face is that of designing a digitization methodology that takes as input an incomplete data acquisition of a surface in a reduced time, and outputs a complete reconstruction of the surface without negatively impacting accuracy.

Fig. 1
figure 1

Application case studied: aircraft panel manufacturing. a Raw panel after forming. Complete digitization of the component is required to adapt machining trajectories to the real shape. b Final panel after milling

The research field of surface reconstruction from digitally scanned shapes has achieved considerable progress in the last decades, with contributions mainly found in applications involving computer vision, e.g. Belongie et al. (2002), Pauly et al. (2005), Marton et al. (2009), Guo et al. (2013) and especially, the medical imaging and bioengineering community, see Heimann and Meinzer (2009), Wang et al. (2016), Bernard et al. (2017), Ballester et al. (2017), Zhang et al. (2017), Lauzeral et al. (2019) to cite only a few. All methods rely on some assumptions, or priors, that are imposed either on the scanned point cloud or on the scanned surface. A complete review of such methods and different priors can be found in Berger et al. (2014). One popular approach is that of Statistical Shape Analysis, which capitalizes on previously digitized samples of the same class (considered as training data) to build a statistical shape model (SSM). The SSM is then able to represent any shape in the object class by a mean shape and the variability observed in the training set. Among the different possibilities to build an SSM, we find Point Distribution Models (Cootes et al. 1992), in which the shapes are represented as a set of points or landmarks distributed over the surface. Within this approach, the creation of an SSM is achieved in three main steps: (i) registration, that is, the identification of the landmark points in each training shape; (ii) computation of the mean shape; and (iii) extraction of a low-dimensional description of the shape variability using Principal Component Analysis (PCA) (Chatterjee 2000; Jolliffe 2002; Jolliffe and Cadima 2016).

Previous works in surface reconstruction from incomplete data can be found in Guo et al. (2013), which uses a database of shapes for 2D contour completion and in Pauly et al. (2005) for 3D objects surface completion. In Shen et al. (2012) high-quality 3D models are reconstructed from low-quality data by assembling templates from an object database. These works, however, assume that the incomplete regions are small, thus the objective is to fill in the gaps that may appear in the acquired data, whereas the aim of this work is to identify a particular shape by digitizing only a portion of its surface. The work in Bernard et al. (2017) presents a method for surface reconstruction from sparse point clouds, where the input consists of a reduced number of points which are registered onto the template of the SSM using a Gaussian mixture model approach. The interested reader is also referred to this last reference for an in-depth review of the state of the art in this topic. The idea of building object reconstructions from reduced samplings is already present in Bernard et al. (2017), however, the digitization outputs in high accuracy manufacturing, which is the main scope of this work, usually consist of clouds with a large amount of data points. Indeed, even with a partial scan of the surface, the number of scanned points is in the order of millions. Another relevant difference is that the existing literature on surface reconstruction deals with closed surfaces of 3D solid objects (Lauzeral et al. 2019), whereas our interest relies on curved shapes that can be treated as 2D surfaces in a 3-dimensional space.

In this work we propose a methodology for fast 3D surface digitization that works as follows: first, an SSM is computed from a database of full samples, using prior information of the surface’s boundary. Then each new sample is partially digitized and reconstructed by fitting the SSM to the partial measure solving a non-linear minimization problem. After the new surface is identified in the SSM, the proposed methodology outputs a reconstruction point cloud that resembles that of a complete surface scan. With this choice, the introduction of our proposed methodology in the manufacturing workflow is straightforward, as all the following data processing phases in the manufacturing line remain unchanged, (see Fig. 2).

Fig. 2
figure 2

Scheme of the original manufacturing workflow, in blue, where a complete digitization of the surface is performed before machining due to high quality requirements. The proposed modification, consisting of a partial digitization and a shape reconstruction Algorithm, is highlighted in green. In order to be as less intrusive as possible in the manufacturing workflow, the output of the proposed fast digitization methodology tries to resemble the point cloud of a complete scan acquisition (Color figure online)

The article is structured as follows: in “Data” section we introduce the data used in this work together with the relevant features of the data acquisition. The details of the SSM creation and reconstruction algorithm are presented in “Methodology” section, together with error and validation metrics definitions. Finally, the numerical results concerning the application case are shown in “Results” and “Conclusions” sections contains the conclusions.

Data

All data used in this work has been provided by Stelia Aerospace (St. Nazaire, France). In this section, we present the data acquisition process and then the different datasets used in this work, together with its features. We denote a point in the 3D space as \({\varvec{x}} = \{x,y,z\} \in {\mathbb {R}}^3\), and a 3D cloud of points corresponding to a panel shape is represented with capital letters, \(S = \{{\varvec{x}}_1,...,{\varvec{x}}_n\}^T \in {\mathbb {R}}^{n\times 3}\). The Euclidean distance is denoted by \(\Vert \cdot \Vert \).

Data acquisition method and data format

The data acquisition process is as follows: a laser beam travels along a predefined path that sweeps the panel, storing 3D coordinates of points on the panel’s surface. The beam is 7.5 cm wide, it reads 400-450 points at a time (acquisition step), and the device performs acquisition steps at a constant frequency as it travels along the measured surface. A complete scan of a panel requires a scanning path around 180 m long, yielding a point cloud of approximately 16 million points. This data is then stored in a file that contains the coordinates of the entire point cloud with a numerical precision of 0.001mm, which will be the lower bound when measuring the reconstruction error in “Results” section. All points in the sample are considered for the statistical shape model construction, and the proposed reconstruction algorithm will provide surface reconstructions with the same point density as that of a completely scanned panel. To help the shape’s location in the 3D space, the component in consideration contains two small holes that are drilled for aligning purposes during the different manufacturing processes undergoing, serving as reference marks. The coordinates of both marks are obtained during the scanning phase, and they are denoted as \({\varvec{\varPhi }}_{12}\) and \({\varvec{\varPhi }}_{14}\) (see Fig. 3a).

Datasets

Three different datasets have been considered along with this work, and are referenced as complete scan, partial sampling, and verification datasets. The features of each one are the following:

Complete scan dataset This is the main database, consisting of 746 panels that were completely scanned at Stelia Aerospace’s workshop between 2014 and 2019 from 7 different laser data acquisition sets working in parallel. The data is considered to be homogeneous, this is, no distinctions are made between each acquisition set. A thorough review of the raw database was performed to dismiss corrupted data. Overall, there were missing files in 53 samples (either the point cloud file or any of the reference markers’ locations), and 138 samples were found to have flaws in the acquired data (either point cloud with empty regions or incoherent measures). All these defective samples were dismissed, having a final training set consisting of \(n_s = 555\) panels.

Partial sampling dataset To test the reconstruction algorithm, a partial sample dataset is built from the complete scan database. The partial scan trajectory (Fig. 3b) has been arbitrarily chosen, partially covering all the panel’s surface with a trajectory similar to a triangular wave. This has been selected to fulfill the technological constraints of the laser data acquisition devices at Stelia Aerospace, so that an in-situ validation can be performed. This partial scan path is near 35 m long and provides an incomplete point cloud of 3.5 million points, this is, roughly a 20% of the data contained in a complete scan.

Fig. 3
figure 3

a Sample shape of the studied geometry with landmark locations and b partial scan trajectory in blue. Dimensions in millimeters

Verification dataset The last set consists of 9 new panels, different from those in the complete scan dataset, which are used in the verification of the methodology. These samples have been digitized twice, using both the complete and partial scan strategies. As the data comes from the same sample, the reference landmarks \({\varvec{\varPhi }}_{12}, {\varvec{\varPhi }}_{14}\) should have the same coordinates in the partial and complete files of the same sample. However, we have noticed differences around 0.1 mm in these values between the partial and complete data. This is probably because the scanning machine resets its home position between scans, and that process may introduce a small offset in the local reference system. Despite those differences being small with respect to the total size of the component, they will be relevant when evaluating the reconstruction, which is meant to be within the same order of magnitude.

Methodology

The method in this work is divided into two stages: first, an SSM of the panel is built in an offline stage, and then the online stage generates a reconstruction from an incomplete panel measure using the shape parametrization.

  1. 1.

    Offline stage generating the SSM. The SSM is built in the same fashion as the Point Distribution Models (PDM), where there are two distinguished steps: alignment of the training shapes and dimensionality reduction.

    1. (a)

      Alignment phase. A point cloud is arbitrarily chosen to serve as template shape (\(S_0\)). Then, the alignment of each sample in the training set (target shape) is performed in two steps: a rigid registration of the target onto the template followed by a non-rigid registration of the template onto the target. The output of this step is the template’s deformation field such that it fits each target shape.

    2. (b)

      A dimensionality reduction step, using techniques such as the Principal Component Analysis (PCA) (Jolliffe 2002), to extract a low-dimensional description of the shape variability. Specifically, the principal modes of deformation of the template are extracted from the registered data. It is worth mentioning that the PCA returns an ordered set of modes according to how much each of them contributes to explain the input data. Therefore, the user can establish a trade-off between accuracy and model size, expressed by the number of modes considered. By selecting the set of modes that best describe variability, the PCA provides a statistical description of any shape with a restricted number of modes. New shapes \({\widetilde{x}}\) can be written as the sum of the mean component \({\bar{x}}\) plus a linear combination of the selected principal modes \({\varvec{\phi }}_i\) and its shape parameters \(\alpha _i\) (Jolliffe 2002):

      $$\begin{aligned} {\widetilde{x}} = {\bar{x}} + \sum _{i=1}^{N} {\varvec{\phi }}_i\alpha _i \end{aligned}$$
      (1)
  2. 2.

    Online stage panel reconstruction. At the online stage, the input data is an incomplete scan of a geometry outside the SSM’s samples, thus the online phase is divided into two steps: shape parameter identification and geometry reconstruction.

    1. (a)

      The shape parameters in Eq. (1) corresponding to the new geometry are identified through an iterative minimization problem. As a result, we obtain a deformation field of \(S_0\) that fits the shape of the partially scanned sample.

    2. (b)

      After obtaining the shape parameters that best fit the incomplete data, the SSM is particularized, with those parameters, at a fine cloud of arbitrarily distributed points to create a virtual reconstruction of the sample with the same format of a completely scanned panel. This is called the oversampling step.

The offline and online phases are now detailed in “Offline training” and “Online reconstruction” sections respectively.

Offline training

The purpose of this stage is to build a Statistical Shape Model from a set of previously scanned complete shapes, which is our training set. Following the PDM building procedure, we first align all shapes in the training set to an arbitrarily chosen template shape. We divide this alignment into rigid and non-rigid registration steps. Then the final SSM is obtained after a dimensionality reduction step.

Rigid registration

Each of the 7 different scanning machines used to acquire all the samples in the database has its own coordinate system, which introduces considerable differences in the scanned coordinates of each panel. The rigid registration step aims to minimize rotations and displacements between samples so that the differences are mainly due to shape variations. A simple rigid registration can be performed using the reference marks \({\varvec{\varPhi }}_{12}\), \({\varvec{\varPhi }}_{14}\), the samples are aligned so that \({\varvec{\varPhi }}_{14}\) is located at the origin of the new reference system, and \({\varvec{\varPhi }}_{12}\) is located on the x-axis. Let \({\varvec{x}}_s\) be the coordinate of a point belonging to a given sample, then the rigid transformation is written as:

$$\begin{aligned} {\varvec{x}} = {\varvec{R}}_{\beta }{\varvec{R}}_{\theta } \left( {\varvec{x}}_s - {\varvec{\varPhi }}_{14}\right) \end{aligned}$$
(2)

where \(\theta \) and \(\beta \) are the angles between the segment \({\varvec{\varPhi }}_{14}-{\varvec{\varPhi }}_{12}\) and the XY, XZ planes respectively, with their corresponding rotation matrices \({\varvec{R}}_{\theta }\) and \({\varvec{R}}_{\beta }\).

Template shape

The construction of the SSM requires an arbitrarily chosen template shape, denoted as \(S_0\). The geometry under consideration is a nearly rectangular-shaped shell which, despite having a curvature, is mainly aligned with the reference coordinate system, z being the out-of-plane direction. Considering this topology we define the template as a flat surface on the xy plane. After performing the rigid registration on all samples of the training set, we define a rectangular bounding box that covers all \(\{x, y\}\) coordinates (see Fig. 4). The bounding box is then meshed with a regular structured grid of 1000 by 250 nodes, thus \(S_0 \in {\mathbb {R}}^{2\times n_{S_{0}}}\), with \(n_{S_{0}} = 250000\).

Non-rigid registration

From now on and to simplify the notation we will use the following decomposition for any point in the space: \({\varvec{x}}\equiv \{{\varvec{p}},z\}\), with \({\varvec{p}}\) representing the 2D coordinates \(\{x, y\}\). Let \(T_i\) be the i-th sample in the complete scan dataset. The objective in this step is the creation of a mapping \({\varvec{a}}_i:{\mathbb {R}}^{2}\mapsto {\mathbb {R}}^{3}\) such that \({\varvec{a}}_i(S_0)\) fits the target shape \(T_i\). Following again the in-plane / out-of-plane approach we divide the non-rigid registration into two phases: first we obtain a mapping \({\varvec{f}}_1:{\mathbb {R}}^{2}\mapsto {\mathbb {R}}^{2}\) that fits the in-plane coordinates \({\varvec{p}}_i\), then we compute a scalar mapping \(f_2:{\mathbb {R}}^{2}\mapsto {\mathbb {R}}\) such that \(f_2\circ {\varvec{f}}_1\) fits \(z_i\). The complete non-rigid registration mapping can be written as:

$$\begin{aligned} {\varvec{a}}_i = \left\{ {\varvec{f}}_1\left( S_0\right) ,\ f_2\circ {\varvec{f}}_1\left( S_0\right) \right\} \end{aligned}$$
(3)

XY registration The first step of the non-rigid registration is performed using the thin plate spline model (Duchon 1977; Meinguet 1979). This is an efficient tool commonly used for shape matching tasks in the framework of object recognition methods (Belongie et al. 2002; Bookstein 1997; Lauzeral et al. 2019). In its 2D version, the TPS defines a function f(xy) that minimizes the bending energy (Duchon 1977; Meinguet 1979):

$$\begin{aligned} I_f = \displaystyle \int \int _{{\mathbb {R}}^{2}}\left( \frac{\partial ^2 f}{\partial x^2}\right) ^2 + 2\left( \frac{\partial ^2 f}{\partial x\partial y}\right) ^2 + \left( \frac{\partial ^2 f}{\partial y^2}\right) ^2 dxdy\nonumber \\ \end{aligned}$$
(4)

subject to \(f({\varvec{p}}_i) = {\varvec{p}}'_i\), where \(i = 1, ..., N\) are some previously defined correspondences between the target shape \({\varvec{p}}_i\) and its transformations \({\varvec{p}}'_i\), also known as landmark points. There are different options to establish those landmarks when there is no prior knowledge of the target shape available, such as closest point metrics (Lauzeral et al. 2019) or the shape context (Belongie et al. 2002), which establish as landmarks a set of points arbitrarily distributed at the shape’s boundary. In both works, the correspondences are set in an iterative process that alternates between setting the landmarks and solving the TPS problem (4) until convergence is reached. The landmarks can also be directly specified if previous knowledge of the shapes being matched is available, with the benefit of avoiding the iterative process. In the application case at hand, the reference marks \(\left\{ {\varvec{\varPhi }}_{12}, {\varvec{\varPhi }}_{14}\right\} \) provide very little information of the shape to perform a non-rigid registration with high accuracy in terms of shape match. However, as the panel’s shape is simple (the projection on the xy plane is a slightly deformed rectangle), we can set the landmarks to solve the TPS problem as follows. The boundary of the panel is the convex hull of all the plane coordinates \({\varvec{p}}_T\in T\) (Fig. 4a). The first landmarks are the corners of the square-shaped target \(T_i\), which are found using a corner detection algorithm and matched to the correspondent corners of the template \(S_0\). Then we can re-parametrize the target’s boundary so that it has the same amount of points as that of the template’s boundary. This provides enough landmarks to solve the TPS problem (4) and obtain \({\varvec{f}}_1(S_0)\). An example of this mapping is shown in Fig. 4, where the template’s mesh has been coarsened for the sake of clarity.

Fig. 4
figure 4

Non-rigid registration: XY mapping. a Template mesh \(S_0\) and target sample \(T_i\) with plane coordinates \({\varvec{p}}_i\). b Mapping \({\varvec{f}}_1\left( S_0\right) \) obtained with the TPS. The correspondences between boundary points (landmarks) are highlighted

Z registration To present the second phase in the non-rigid registration step we describe the computation of \(f_2\circ {\varvec{f}}_1\) at a given node \(n \in S_0\) with coordinates \({\varvec{p}}'_n = {\varvec{f}}_1\left( {\varvec{p}}_n\right) \), which is illustrated in Fig. 5a. First we identify all points \({\mathcal {P}} = \{{\varvec{p}} \in T : \Vert {\varvec{p}}-{\varvec{p}}'_n\Vert < r\}\), within the searching radius \(r = h\sqrt{2}/2\), and h being a representative element size of \(S_0\) (in this application, \(h = 6.3\) mm). The coordinate \(z_n\) is then computed as a weighted average of all \(z_i \in {\mathcal {P}}\) and using a Gaussian filter (Shapiro and Stockman 2000):

$$\begin{aligned} z_n = \frac{\sum _{i\in p} z_i \omega _i}{\sum _{i\in p} \omega _i}\ ;\quad \omega _i = e^{-\frac{\Vert {\varvec{p}}_i-{\varvec{p}}'_n\Vert ^2}{2\sigma ^2}} \end{aligned}$$
(5)

where we have considered a standard deviation \(\sigma = r/2\). The data acquisition procedure presented some specific features that influenced the quality of the z-coordinate registration near the panel boundaries. On one hand, the samples analyzed in this work contained a much higher point density near the panel boundaries (as a consequence of the scanning device travelling at a slower speed near the boundaries while keeping the acquisition frequency constant), and on the other hand, the scan device’s trajectory produced a saw-toothed pattern of the boundary instead of the smooth curve of the actual panel, as shown in the detail of Fig. 5b. To address these issues we implemented the following features concerning the registration near the boundaries:

  1. 1.

    The Gaussian filter’s standard deviation is reduced to \(\sigma = r/8\) for nodes along the boundary of \(S_0\), and \(\sigma = r/6\) for nodes at the two following inner layers. With this measure, we give more importance to the points located near \({\varvec{p}}'_n\) without reducing the number of considered points, since there is a higher number of points in the boundary.

  2. 2.

    The nodes at the boundary of \(S_0\) laying outside the saw-toothed pattern don’t have any point within a distance r. The non-rigid registration at these nodes is computed employing a least squares fitting of the z values obtained at the adjacent nodes.

Fig. 5
figure 5

Non-rigid registration: Z mapping. a Weighted average at a node with coordinates \({\varvec{p}}'_n\). b Border features: radii of the weighted average area (orange), and nodes without points (red) (Color figure online)

Statistical shape model: dimensionality reduction

When all targets are registered into \(S_0\) we build the matrix \({\varvec{A}} = [{\varvec{a_1}}, ..., {\varvec{a}}_{n_s}]\), and compute the average displacement field \({\varvec{{\bar{x}}}}\). Then the SVD is applied to matrix \(\widetilde{{\varvec{A}}}\), which is obtained by subtracting \({\varvec{{\bar{x}}}}\) from \({\varvec{A}}\). Finally, the shape parametrization \({\varvec{{\widetilde{x}}}}\) of the shape generation model is written as Jolliffe (2002), Lauzeral et al. (2019):

$$\begin{aligned} {\varvec{{\widetilde{x}}}}= {\varvec{{\bar{x}}}}+ {\varvec{\varPhi }}{\varvec{\alpha }} \end{aligned}$$
(6)

where we have used compact notation, \({\varvec{\alpha }}\in {\mathbb {R}}^{m}\) being the m shape parameters and \({\varvec{\varPhi }}\) the truncated basis with \([{\varvec{\phi }}_1, ..., {\varvec{\phi }}_m]\) modes. The SSM is defined at the discrete set of points in the template’s mesh \(S_0\). However, the online reconstruction stage, which is presented in the following section, requires a continuous definition of the SSM. This can be easily obtained by including the traditional FE interpolation in Eq. (6), with linear shape functions. Thus, the SSM at a given point p inside the template shape \(S_0\) with interpolation coordinates \({\varvec{\xi }}_p\in {\mathbb {R}}^{2}\) is computed as:

$$\begin{aligned} \varphi _{p}({\varvec{\alpha }}) := {\varvec{B}}\left( {\varvec{\xi }}_p\right) {\varvec{{\bar{x}}}}+ {\varvec{B}}\left( {\varvec{\xi }}_p\right) {\varvec{\varPhi }}{\varvec{\alpha }} \end{aligned}$$
(7)

where \({\varvec{B}}\) is the linear interpolation operator.

Online reconstruction

Let \(T'\) be a new sample outside the SSM original data from which only a partial scan with coordinates \({\varvec{X}}_t \in T'\) is available. The goal is to generate a reconstruction of the panel from the incomplete data acquired using the SSM from the offline training. This is achieved in two steps: identification of the shape parameters and oversampling.

Shape identification from a partial scan

The location of the reference marks \(\{{\varvec{\varPhi }}_{12,t}, {\varvec{\varPhi }}_{14,t}\}\) is also available in the new sample \(T'\), thus the rigid registration in Eq. (2) is applied to obtain \({\varvec{x}}_t\). Now we must identify the shape parameters of \(T'\) that best fit the partial measures \({\varvec{x}}_t\). For this, the following minimization problem is proposed:

$$\begin{aligned} \min _{{\varvec{\alpha }}\in {\mathbb {R}}^{m}} \left[ \frac{1}{2}\sum _{t\in T'}\left( \varphi _{t}({\varvec{\alpha }}) - {\varvec{x}}_t\right) ^2\right] \end{aligned}$$
(8)

Introducing Eq. (7) in (8), and using the simplified notation \({\varvec{B}}_t = {\varvec{B}}\left( {\varvec{\xi }}_t\right) \), we can obtain the solution of the minimization problem:

$$\begin{aligned}&\displaystyle \min _{{\varvec{\alpha }}\in {\mathbb {R}}^{m}} \left[ \frac{1}{2}\sum _{t\in T'}\left( {\varvec{B}}_t{\varvec{{\bar{x}}}}+ {\varvec{B}}_t{\varvec{\varPhi }}{\varvec{\alpha }} - {\varvec{x}}_t\right) ^2\right] \nonumber \\&\displaystyle \left( \sum _{t\in T'} {\varvec{\varPhi }}^T{\varvec{B}}_t^T{\varvec{B}}_t{\varvec{\varPhi }}\right) {\varvec{\alpha }} = \sum _{t\in T'}{\varvec{\varPhi }}^T{\varvec{B}}_t^T\left( {\varvec{x}}_t - {\varvec{B}}_t{\varvec{{\bar{x}}}}\right) \end{aligned}$$
(9)

The problem in (8) is non linear, as the interpolation coordinates \({\varvec{\xi }}_t\) corresponding to the points \({\varvec{x}}_t\) also depend on the unknowns \({\varvec{\alpha }}\). In other words, while the \({\varvec{p}}_t\) coordinates remain constant the underlying mesh \(S_0\) is being modified by \({\varvec{\alpha }}\), so the sample’s points can change their location in the interpolation. For the solution of the problem, we propose a fixed-point procedure presented in Algorithm 1. After setting an initial guess \({\varvec{\alpha _0}}\) the mapping \(\varphi ({\varvec{\alpha }})\) is defined, and the interpolation coordinates of the sample’s points, \({\varvec{\xi }}_t\), can be directly obtained. Having the interpolation coordinates Eq. (9) becomes a linear system of m equations with m unknowns that can be easily solved, yielding new shape parameters. The process is repeated until stagnation of the shape parameters.

figure a

We propose two different alternatives to set the initial guess \({\varvec{\alpha }}_0\):

  1. 1.

    Assume \({\varvec{\alpha }}_0 = {\varvec{0}}\). This is equivalent to setting the mean value of the reduced basis, \({\varvec{{\bar{x}}}}\), as the initial guess for the SSM in (6).

  2. 2.

    We also propose a more problem-specific approach. We first obtain the perimeter of the partially scanned sample as the convex hull of the \({\varvec{p}}_t\) coordinates. For these points, we can obtain interpolation coordinates assuming the same procedure as in the non-rigid registration (“XY registration” section) for the generation of landmarks. Then the problem in Eq. (9) can be solved considering only those points along the boundary to obtain the initial values \({\varvec{\alpha }}_0\).

Oversampling

Having solved problem (8), the SSM’s parameters \({\varvec{\alpha }}_t\) are identified, and the mapping \(\varphi ({\varvec{\alpha }}_t)\) is fitted to represent the partially scanned sample \(T'\). We can create a new point cloud \({\varvec{x}}'_{r}\) that resembles that of a complete scanned sample (i.e. with around 16 million points), as this data is later used in the manufacturing workflow of the panel. For that purpose, we evaluate and store the interpolation coordinates \({\varvec{\xi }}_{ref}\) of a sample arbitrarily chosen from the database used to generate the SSM in the offline stage. Then, the SSM is evaluated at the reference coordinates and the rigid registration transformation in Eq. (2) is reversed:

$$\begin{aligned} \begin{array}{ll} &{}{\varvec{x}}_{r} = \varphi _{ref}({\varvec{\alpha }}_p) \\ &{} \\ &{} {\varvec{x}}'_{r} = {\varvec{R}}^T{\varvec{x}}_{r} + {\varvec{\varPhi }}_{14} \end{array} \end{aligned}$$
(10)

Validation and error metrics

The purpose of the validation is twofold. First, to ensure the quality of the SSM obtained from the complete scan dataset. Second, to verify the reconstruction algorithm from an incomplete sample. Here we present the metrics used for validation, and the results are shown in “Results” section.

SSM performance metrics

The quality of the SSM is evaluated with the compactness, generalization, and specificity metrics introduced in Davies (2002). The compactness is defined as:

$$\begin{aligned} C(m) = \sum _{i = 1}^{m} \uplambda _i \end{aligned}$$
(11)

with \(\uplambda _i\) being the i-th proper value associated with the SVD. This metric represents the contribution of each additional shape parameter in the reduced model. The fewer parameters required to properly describe any shape, the more compact the model is.

The generalization can be seen as the ability of the SSM to represent a shape out of the training set. This is evaluated using a leave-one-out approach: a sample T from the complete scan dataset is removed and a new SSM is built using the \(n_T-1\) remaining shapes. Then, T is registered to obtain \({\varvec{a}}_t\), and the shape parameters corresponding to the new SSM can be obtained as follows:

$$\begin{aligned} {\varvec{\alpha }}_t = {\varvec{\varPhi }}_{m}^T({\varvec{a}}_t - {\varvec{{\bar{x}}}}) \end{aligned}$$
(12)

where \({\varvec{\varPhi }}_{m}\) stands for the truncated basis with the corresponding amount of modes m and \({\varvec{{\bar{x}}}}\) the average field. A reconstruction \(T'(m)\) of sample T is computed using Eq. (6) and the estimated shape parameters. Finally, the approximation error is measured as the average point-wise distance between all \(n_{s_0}\) points in the reconstruction and the sample registration:

$$\begin{aligned} \epsilon (T,T'(m)) = \frac{1}{n_{s_0}}\sum _{i=1}^{n_{s_0}}\Vert {\varvec{{\bar{x}}}}_i + {\varvec{\varPhi }}_{m}{\varvec{\varPhi }}_{m}^T({\varvec{a}}_{ti} - {\varvec{{\bar{x}}}}_i) - {\varvec{a}}_{ti}\Vert \nonumber \\ \end{aligned}$$
(13)

This process is then repeated for each sample in the complete scan dataset, and the generalization is calculated as a function of the number of modes as in Davies (2002):

$$\begin{aligned} G(m) = \frac{1}{n_T}\sum _{i=1}^{n_T} \epsilon (T_i,T'_i(m)) \end{aligned}$$
(14)

Finally, the specificity measures the ability of the SSM to generate shapes that are similar to those in the training set. A random set of N shapes \({\mathcal {R}}_s\) is generated, with a variable number of modes m. Then, for each shape \(S_i(m) \in {\mathcal {R}}_s\) the distance to the closest registered sample in the complete dataset \({\mathcal {T}}_s\) is computed as in Eq. (13), and the specificity is written as Davies (2002):

$$\begin{aligned} S(m) = \frac{1}{N}\sum _{i=1}^{N} \min _{{\varvec{a}}_j \in {\mathcal {T}}_s} \epsilon ({\varvec{a}}_j, S_i(m)) \end{aligned}$$
(15)

Reconstruction error metrics

To evaluate the quality of the panel reconstruction algorithm we will consider a complete scan C and a partial scan P. The reconstruction process will output a new reconstructed point cloud, R. The complete and partial samples may come from the complete dataset, with its respective sample in the partial sampling dataset, or from the complete and partial scans in the verification dataset. As noted in section the data in the verification dataset presents some differences in the reference landmarks’ \({\varvec{\varPhi }}_{12}, {\varvec{\varPhi }}_{14}\) location, which would introduce an artificial source of error, since in a real production workflow a sample wouldn’t be scanned twice. To exclude this error source from the reconstruction error metrics we will evaluate the reconstruction error after performing the rigid registration step in “Rigid registration” section where all shapes are aligned, thus we will work with the coordinates \({\varvec{x}}_{c}\) and \({\varvec{x}}_{r}\).

The error between the reconstruction R and the complete scan C can be defined as the distance between both shapes, which we denote as Dist(CR). The following definition for the distance between two surfaces was used in Xu et al. (2013), Lauzeral et al. (2019) for SSM validations:

$$\begin{aligned} \displaystyle Dist(C, R)= & {} \frac{1}{2}\left[ \frac{1}{N_c}\sum _{{\varvec{c}}\in {\varvec{x}}_c}\min _{{\varvec{r}}\in {\varvec{x}}_r}\Vert {\varvec{c}}-{\varvec{r}}\Vert \right. \nonumber \\&\left. + \frac{1}{N_r}\sum _{{\varvec{r}}\in {\varvec{x}}_r}\min _{{\varvec{c}}\in {\varvec{x}}_c}\Vert {\varvec{c}}-{\varvec{r}}\Vert \right] \end{aligned}$$
(16)

where \(N_c\) and \(N_r\) stand for the number of points at C and R respectively. Although being a robust, unbiased function, this method has a high computational cost as the number of points increases. Due to the high density of the point clouds involved in this application (around 16 million), we propose an alternative definition that takes advantage again of the geometry’s shape to reduce the computational cost. First we define \({\varvec{\varphi }}_s\equiv \{{\varvec{p}}_s, z_s\}\) as the solution of problem (8) applied to obtain the reconstruction R. Then an interpolant of the z-coordinates is built using again the linear interpolation operator \({\varvec{B}}\):

$$\begin{aligned} L_R({\varvec{p}}) = {\varvec{B}}_{p}z_{s} ,\quad L_R({\varvec{p}}_s) = z_{s} \end{aligned}$$
(17)

Finally, the distance function between the reconstructed and completely scanned shapes is defined as:

$$\begin{aligned} \displaystyle Dist(C, R) = \frac{1}{N_c}\sum _{c\in {\varvec{x}}_c}|z_c - L_R({\varvec{p}}_c)| \end{aligned}$$
(18)

Using this distance function and the partial sampling dataset we introduce an alternative computation of the generalization metric, \(G^{*}(m)\), which considers not only the SSM but also the reconstruction algorithm’s performance in identifying shapes that are outside the training dataset. For that, we will use the same leave-one-out approach, but now the approximation error is computed between each sample \(C_i\) in the complete scan dataset and a reconstruction \(R_i(m)\) built from its corresponding sample in the partial sampling dataset \(P_i\) using m modes. Therefore, the new generalization metric can be written as:

$$\begin{aligned} G^{*}(m) = \frac{1}{n_T}\sum _{i=1}^{n_T} Dist(C_i,R_i(m)) \end{aligned}$$
(19)

where the distance between shapes is calculated using Eq. (18). With this measure, we can evaluate the quality of the complete process, which includes the registration, the model order reduction, and the non-linear reconstruction algorithm.

Results

The performance of the proposed method has been analysed in three different ways:

  1. 1.

    Via compactness, which measures the effect of truncation in the ability of the model to reproduce the training dataset.

  2. 2.

    Via generalization, which measures the ability of the model to represent unseen data using a leave-one-out procedure.

  3. 3.

    Via in-situ verification tests, which demonstrate the ability of the methodology to perform in a relevant environment, as the resulting output of the proposed methodology has been introduced in the actual manufacturing workflow, with successful results in the following quality control procedures.

The results concerning the SSM performance are shown in “SSM performance evaluation” section, whereas the in-situ verification test is presented in “Online shape reconstruction verification test” section.

SSM performance evaluation

The SSM metrics are presented in Figs. 6 and 7. All these results were calculated using the complete scan dataset, with the partial sampling dataset being also used for the computation of \(G^{*}(m)\) and a random set of 1000 generated shapes for the computation of the specificity. The compactness graph indicates that starting at around 50 modes the contribution of each additional parameter drastically diminishes. Metrics S(m), G(m) and \(G^{*}(m)\) were calculated for \(m = \left\{ 1, 2, 5, 10, 20, 50, 100, 200, 500\right\} \). In this test, when solving the non-linear minimization problem in Eq. (9) the boundary-specific approach was used to set the initial guess of the shape parameters. A comparison between the proposed alternatives is shown in the following section.

Both generalization factors show a similar trend, with a logarithmic descent of the average error up to 0.044 and 0.053mm respectively. This means that the reconstruction algorithm from a partial sample provides reliable results, similar to projecting the intrinsic ability of the SSM. The bars in the graphs represent one standard deviation. Although in both cases the variability decreases with the number of modes, the \(G^{*}(m)\) metric presents always higher variability than G(m). In both cases the generalization metrics are stabilized at \(m = 100\) modes, showing a slight improvement for 200 and 500 modes. In light of these results, it seems appropriate to choose a truncated basis with 100 modes for this application, which represents a balance between accuracy and computational cost.

Fig. 6
figure 6

Statistical shape model, performance evaluation metrics: compactness (left) and specificity (right)

Fig. 7
figure 7

Generalization metric. Lower values mean better generalization of the model. The error bars represent one standard deviation

Online shape reconstruction verification test

After testing the performance of the method with the complete scan and the virtually generated partial sampling datasets, in this section we validate the presented methodology in an industrial environment. For this, 9 different samples were digitized twice in the manufacturing plant, both partially and completely (for error measurements), as presented in “Datasets” section. These samples form the verification dataset. Then, for each panel in the verification dataset, we obtained a surface reconstruction from the partial digitization using the algorithm proposed in this work, and the output reconstruction point cloud was introduced in the manufacturing workflow, substituting the complete digitization point cloud, as presented in Fig. 2. The experiment is designed and executed as follows for each panel: (i) run the full scan program (ii) stop the execution and enter in manual execution mode, to avoid the milling step to be started after scanning (iii) log on the industrial computer and download a copy of the full scan for later comparison (iv) load the reduced scan program (v) log on the industrial computer and download a copy of the partial scan (vi) give the partial scan as input to our reconstruction subroutines, running on a laptop (vii) compare the reconstructed to the fully scanned panel (viii) transfer a copy of the reconstructed panel to the industrial computer (ix) go back to automated execution mode by moving to the next manufacturing step (milling).

The graph in Fig. 8 shows the reconstruction error between the complete scans and its reconstruction for all samples in the verification test, using the distance function defined in Eq. (18). The performance of both alternatives for the initial guess to solve the non-linear problem presented in “Shape identification from a partial scan” section is also compared. The horizontal black line in the graph represents the generalization metric of the SSM for 100 modes, to serve as a reference.

Fig. 8
figure 8

Reconstruction error for each sample in the verification dataset, with the reconstruction fixed-point algorithm using as initial guess the mean panel (blue) and the initial boundary reconstruction (red). The SSM’s generalization value is included in the graph as a reference (Color figure online)

The use of boundary information to build the initial guess provides a better reconstruction overall, with error values near the generalization ability of the SSM. Although using the mean panel as an initial guess (i.e. \({\varvec{\alpha }}_0 = {\varvec{0}}\)) provides slightly worse results for most of the samples, there is a considerable discrepancy in samples 4, 7, and 9. Figure 9 shows the reconstruction error color map for sample 9 using both strategies. The oversampling solution using the mean panel clearly fails to adapt along the boundary of the surface, and \(1.6\%\) of the points in the oversampling cloud have reconstruction errors over 1 mm, whereas the resulting cloud in Fig. 9a has no points with such error levels. Similar behavior is found in samples 4 and 7. Finally, Fig. 10 shows the reconstruction error color map for all panels in the verification dataset using the boundary-specific initial guess.

Fig. 9
figure 9

Verification dataset, sample 9. Reconstruction error using boundary-specific procedure a and mean panel b as the initial guess of the fixed-point algorithm

Fig. 10
figure 10

Reconstruction error map using boundary-specific procedure for all samples in the verification dataset (see “Datasets” section)

Discussion

The typical applications of statistical shape models are found in 2D images and 3D volumes with closed surfaces. The application in this work presents a new challenge, which is the identification of an open surface in a 3D space. Whereas the construction of Statistical Shape Models for 3D objects can be achieved by properly detecting the boundary of the solid and then performing a dimension reduction (as in Lauzeral et al. 2019 for example), that is not enough when it comes to open surfaces in a three-dimensional space. In this case the contours of the open surface must be identified, there is no such solid’s bulk, and the surface’s curvature cannot be captured with the identification of its contours. The proposed method is designed to solve this new challenge by building the SSM in two steps: first, identification of the surface’s contours and then mapping of the interior curvatures. The performed tests in “Results” section show that the quality of the generated SSM is in the order of 0.01mm, this is just one order of magnitude over the data numerical precision, thus we have achieved the maximum possible accuracy to describe any shape in the provided database, with a reduced basis of 100 modes in this particular case.

Regarding the surface reconstruction from a partially digitized sample, we compared the performance of the proposed algorithm using different strategies for the initial guess in the fixed-point iteration in the test using the panels from the verification dataset. The results show that the identification of the surface’s boundary as initial step of the non-linear problem solution (presented in “Shape identification from a partial scan” section) is crucial to obtain an accurate shape parameter identification. Not only because the average reconstruction error is lower in all test cases, but also because using the mean panel of the SSM as initial guess can lead to stagnation of the solver at local minima with worse adaption along the boundaries of the surface, resulting in a higher reconstruction error. The case of sample number 6 in Fig. 10f is worth to remark, since the performance of the reconstruction algorithm was clearly worse in that particular case when compared to the other samples. The average reconstruction error was significantly higher, and the error map showed a vertical line pattern with higher reconstruction error. After a thorough study of that sample by the industrial partner, an error in the workshop’s scanning device was detected, thus the complete scan of that sample wasn’t representing its actual surface. Therefore, we think that the methodology presented in this work could also open the possibility of using the part’s SSM as an aditional tool to check for anomalies in the manufacturing or digitization workflow.

In the performed tests we observed that using 100 modes in the initial fixed-point step when using only the contour of the partial digitization may lead to overfitting problems. This is because, in that particular case, the number of points used to solve the fitting problem in (8) is considerably lower than that of the whole partial sample. To avoid this problem we performed the initial iteration of the fixed-point algorithm using only 10 modes and the boundary of the partial sample in the tests shown in this work, which lead to satisfactory results.

The total reconstruction error of the proposed reconstruction algorithm from a partial sample is slightly higher than the generalization ability of the SSM (which is measured with complete samples), with values between 0.05 and 0.1mm. Although we have successfully achieved high-quality shape reconstructions from incomplete data, we think that the trajectory of the partial scan (Fig. 3b), which was arbitrarily chosen, might affect in the total reconstruction error, and it also explains the slightly worse generalization (and higher variability) of the SSM when it is taken into account (Fig. 7), since that trajectory is not related to the features existing in the computed SSM. The choice of an optimal set of measuring locations [using for example the EIM (Barrault et al. 2004; Chaturantabut and Sorensen 2010) or a data-driven approach (Poulhaon et al. 2014)] for the SSM would lead to lower variability in the surface reconstruction algorithm.

Although we have presented this surface reconstruction methodology using a specific application case of an aircraft manufacturing component, the proposed approach can be extended for the reconstruction of any other surface, provided there is a database of completely digitized samples to build a Statistical Shape Model, and enough information of the surface’s boundary topology for the non-rigid registration step in “Non-rigid registration” section. After building the SSM that captures the variability of the new surface, the application of the presented reconstruction algorithm is straightforward.

Finally, concerning the computational cost of this method, an implementation in MATLAB (The Mathworks Inc., USA) takes 1 second per iteration in the fixed-point algorithm. The shape identification problem required between 15 and 20 iterations for all samples, and the whole reconstruction algorithm, from reading the input file to obtaining the oversampling reconstruction, takes between 35 and 45 s, depending on the number of iterations. Considering the time spent in the partial scan acquisition and the reconstruction algorithm the proposed method can reach an 80% reduction in the surface digitization time, according to estimations by Stelia Aerospace’s experts.

Conclusions

We presented a methodology for 3D surface reconstruction combining partial sampling and SSMs. For the computation of the SSM, we make use of prior known information of the surface, especially in which concerns boundary detection and mapping. We propose a non-linear fitting problem to identify the shape parameters of a partially scanned sample, which is solved with a fixed-point algorithm. The results show that the obtained SSM can describe the surfaces using 100 modes, with an accuracy only one order of magnitude higher than the scanning device’s numerical precision. The fixed-point algorithm’s efficiency is increased if the boundary of the partial sample is used for the initial iteration, and the total reconstruction error of the proposed reconstruction algorithm, this is, including SSM and fixed-point algorithm, remains between 0.05 and 0.1mm of average error reconstruction for all tested samples.

Whereas typical applications of statistical shape models are found in 2D images and 3D volumes with closed surfaces, we present a method for the fast digitization of open surfaces in a 3D space using incomplete measurements, which are obtained in a significantly shorter time than complete digitizations, achieving high-quality final reconstructed shapes. The implementation of this methodology in an industrial environment achieved an estimated time reduction of 80% in the shown application case, provided that there exists a database with samples that have been completely digitized. The search of optimal partial measuring strategies for the surface reconstruction algorithm will be studied in future works, together with the possibility of building the SSM as panel data is collected, this is, without having a previously obtained sample database.