1 Introduction

Generally, a lot of computer applications such as robotics, games, animation and many more are based on the 3D modeling. This type of modeling is very difficult and time consuming. Apart from that, it needs a measurement process one by one to model any 3D object in the real world. To overcome these problems, different approaches for the 3D reconstruction were proposed [1, 17, 27]. These studies can be divided into two major categories: Voronoi-based methods and mesh-free methods. Voronoi-based algorithms work together with Delaunay triangulation [5] to reconstruct surfaces, while in the mesh-free methods, surfaces can be reconstructed using B-splines [29], Radial Basis Functions [26], partial differential equation (PDE) and moving-least-squares (MLS) [13] techniques. Among all these techniques, data point acquisition was proven a very important step. Many hardware systems [8, 12, 19] were also developed for this purpose. A LightWave approach was used in [8] for 3D surface points reconstruction. A Microsoft Xbox Kinect sensor was presented in [12] to obtain the volumetric representation, whereas the same kind of Microsoft Kinect sensor camera was considered in [28] for the 3D indoor scenes reconstruction.

To reconstruct surfaces and to obtain correspondences simultaneously, many well known algorithms [7, 15, 16] based on the volumetric reconstruction and the representation from the 2D images existed in literature. A method of space carving and voxel coloring based on the visibility and consistency of the voxels in the image was developed in [16]. Direct discrete minimization formulations and graph-cut approaches were also proposed in [11, 15, 25] for establishing the correspondences between the images. All of the proposed methods obtain disparity maps with accurate contours but limited depth precision. However, this limitation was removed in [2]. Another algorithm for the reconstruction of 3D scenes from only two views, and based on minimum line correspondences was presented in [21]. Recently, a method for reconstructing the surface model in the form of frontier points and secondary vertices from 2D images was developed in [23]. A new technique for an accurate 3D shape measurement of multiple separate objects was presented in [14] using stereo vision approach. The beauty of this technique is its capability of performing a full-field 3D shape measurement with high accuracy even in the presence of discontinuities and multiple separate regions.

Many applications like pattern recognition, robot vision and animation require an alternate representation of 3D objects in a compact form. Stick like 1D representation or a line like representation called curve-skeleton [18], is one of such forms. Curve-skeleton is a different representation from medial surface representation. Curve-skeleton includes several examples: virtual navigation, registration, animation, morphing, scientific analysis, shape recognition, and shape retrieval. In the case of 2D representation, skeletons are termed as medial axis while in 3D, these are called medial surface. Curve-skeleton is a special 1D representation of a 3D object whose reconstruction is also a difficult topic due to an ill-posed problem. Apart from the above described methods, efforts were also made for reconstructing the 3D surface using skeletons of the objects [8, 18]. A completely new interest point-based algorithm, which may be used in any computer vision application for 3D models, was presented in [6]. A method for the approximation of the medial surface and curve-skeletons was presented in [8, 18]. In both presented techniques, spheres or balls of radius equal to the distance from each skeletal point to the object boundary were computed first. After that the whole 3D object surface was represented as a union of spheres or balls.

In this paper, a novel algorithm is proposed for reconstructing 3D objects from the skeletal information extracted from the stereo images. The reconstruction problem is solved by reconstructing the curve-skeleton and boundary of the object. Affine invariant property of the convex hulls is used to establish the correspondence between the skeletons and boundaries in stereo images. The reconstructed curve-skeleton is used as the seed for growing the shape of the 3D object, whereas the boundary is considered for controlling the growing process. In literature, efforts [8, 18] were made for reconstructing 3D object using their 3D skeletons (Medial surface). However, the area of reconstructing 3D object with only limited information of their 2D skeletons remains a difficult problem due to its ill-posed nature. Thus the presented method has several advantages over the above described studies: (1) We have tried to apply the concept of curve-skeleton in the field of multi-view reconstruction; (2) We have considered the arbitrary stereo images as the input of our reconstruction algorithm; (3) Here, the reconstruction is achieved with the limited information retained in the obtained curve-skeleton and boundary; (4) The presented method is easy to employed, and may be used to reconstruct any 3D free-form object.

2 Preliminaries

2.1 Curve-Skeleton

In many applications (e.g., animation and virtual navigation), a concise representation of an object in terms of curve arcs or straight lines, is desirable. This line-like representation of a 3D object is called curve-skeleton, and it is a simplified 1D representation of the original 3D object, consisting only of curves. It differs from the skeletal-surface representation (medial surface), which is a higher dimensional structure. An example of curve-skeleton is shown in Fig. 1, where it can be seen that an object (Horse) is represented via curve arcs and straight lines only. This type of representation is said to be the curve-skeleton of horse. The curve-skeleton captures the essential topology of the object in a very compact form which is easy to understand. Curve-skeleton has the following desirable properties: homotopic, invariant under isometric transformations, reconstruction, thinness, centeredness, smoothness, robustness, efficient to compute and hierarchic.

Fig. 1
figure 1

Representation of the skeleton of a 3D object via curve arcs

2.2 NURBS-Skeleton

NURBS-skeleton [22] is a skeleton extraction algorithm in binary images for the purpose of shape representation. It involves the following major steps:

  • Skeleton Extraction: In the first phase of this approach, the skeleton (thin lines) of the considered shape is extracted using a thinning algorithm.

  • Interest Points Detection: The detection of end points, junction points and curve points of medial axis are performed. The thin lines (from first phase) can then be converted into a graph associating the curve points with the edges, the end and the junction points with the vertices.

  • Edges Extraction and Pruning Process: Skeletal pixels are classified into three sets which are the junction set (JS), the end set (ES) and the curve set (CS). Following the edge definition, an automatic search of skeletal edges is executed and short skeletal branches which are not significant are removed.

  • NURBS Curves Approximation: NURBS-skeleton algorithm operates edge by edge as: (i) Select an edge; (ii) Determine the control points of the edge; (iii) Set the NURBS curve parameters (degree, knot vector and weights); and (iv) Generate the corresponding NURBS curve.

NURBS-skeleton represents the skeleton of the shape smoothly where the non-significant information is deleted. Thus, NURBS-skeleton algorithm is based on the NURBS curves approximation for the modeling step.

3 Preprocessing Operations

In this section, some operations which are performed on the stereo images for extracting skeleton and boundary are explained briefly. These extracted skeletons and boundaries are used further in the reconstruction process of the object. Sect. 3.1 deals with the skeleton estimation, whereas Sect.  3.2 presents the boundary extraction.

3.1 Skeleton Estimation

In the first step of the proposed approach, a skeleton of the object is extracted. The method to accomplish this, is called thinning or skeletization. In thinning process, a set of volumetric objects is iteratively peeled off layer by layer until only one central layer, the skeleton of the object, remains. Thinning may be considered as a morphological operation that is used to remove selected foreground pixels from binary images, somewhat like erosion or opening. The thinning operation can be expressed in terms of a hit-and-miss transformation. The thinning of an image A by a structuring element B is described as:

$${\text{thin }}(A,B) = A - [{\text{ hit-or-miss }}(A,B)]$$

where, ‘\(-\)’ represents the logical subtraction. The obtained skeleton has a number of non-significant branches which may be removed. To achieve this, a NURBS-skeleton algorithm is performed [22]. The procedure for the extraction of the skeleton is given in Algorithm 1. It extracts the medial axis for both input images. Table 1 represents the results of the skeleton estimation for some object images. In first column, the binary images of the input objects are represented. Using the skeleton estimation procedure, the NURBS-skeleton of the input objects are extracted. These resulting skeletons of the images are shown in the second column of Table 1.

Table 1 NURBS-Skeleton estimation and boundary extraction of some 2D object images
figure a

3.2 Boundary Extraction

The proposed approach is applicable to the images in which object (to be reconstructed) is like a blob. In other words, the foreground object is quite different from the background. In such cases, boundary of the object is detected by morphological operations. There are two fundamental morphological operators dilation and erosion. Dilation adds pixels to the boundaries of objects in an image, while erosion removes pixels on object boundaries. The number of pixels added or removed from the objects in an image depends on the size and shape of the structuring element used to process the image. There are algorithms [4, 9] in literature for the edge detection, whose direct and natural goal is the boundary tracing. These edge detectors are used prior to morphological operations just to verify the boundary of the blob of the of the object (the intensities change drastically). Thus, the boundary of an image A, denoted by \(\beta (A)\), may be obtained by first eroding the image with a structure element B, and then performing the set differences between A and its erosion, i.e.,

$$\beta (A) = A - (A\ominus B).$$

Table 1 shows the results of boundary extraction. Using the morphological operations, boundary of the input objects are extracted. The results of boundary extraction for the input objects are shown in the third column of Table 1.

4 Proposed Reconstruction Algorithm

In this section, the proposed method for reconstructing 3D objects is presented. The methodology is discussed in two sections. Sect.  4.1 describes the correspondence estimation method, whereas Sect. 4.2 defines the growing process of a 3D object.

4.1 Correspondence Estimation

For establishing the correspondence between the two images, convex hull affine invariants are used which is explained below in detail.

4.1.1 Fitting Convex Hull on Boundary

Convex hull [24] of any set of data points is the smallest convex set which contains all these data points. Several algorithms of surface reconstruction like Delaunay triangulation, Voronoi diagrams and many more take advantage of the convex hull as a core part of their procedures. The convex hull in binary images can be determined by its boundary pixels set. In fact, the convex hull of the boundary pixels set is same as the convex hull of the binary image. The convex hull algorithm has two major stages. First is a polygon extraction procedure that collects a sequence of candidate extreme points by scanning the image in a clockwise manner. The second stage is a convex hull extraction procedure which checks for the convexity of the extracted polygon. The detailed procedure for fitting the convex hull with detected boundary points is explained in [3]. The beauty of this algorithm is that the algorithm ends with the sequential order of the convex hull vertices. Thus, it avoids the need of arranging the vertices of the convex hull in a sequential order. An example of the described process is shown in Fig. 2.

Fig. 2
figure 2

a The object boundary represented by solid lines; b the convex hull fitted by the procedure described in Sect.  4.1.1 together with the ordered vertices (small red circles)

4.1.2 Convex Hull Affine Invariants

In the proposed study, we have assumed that the stereo imaging geometry has a narrow baseline. Hence, both views are close together and are obtained under weak perspective projections, and the images are equivalent up to an affine transformation. Affine transformation between a pair of points in 2D space is given by a rotational transformation \([{\textit{A}}]\), and a translation b. It may be represented as \(\textit{T}=\{[\textit{A}],b\}\). Such an affine transformation holds several important properties [20], e.g., affine invariance. If affine invariance does not change under any type of transformation, it is called the absolute affine invariance. The convex hull is an example of an absolute affine invariant, i.e., absolute affine invariants may be generated by the convex hull. In the proposed study, the area invariant property of the convex hulls is used as the absolute invariant.

Fig. 3
figure 3

An example of establishing convex hull affine invariant using the procedure described in Sect. 4.1.2: a a set of four consecutive vertices on the convex hull of the object boundary in the left view; b corresponding four vertices on the convex hull of the object boundary in the right view

The construction of convex hull affine invariants is shown in Fig. 3. Let us consider two different convex hulls taken from two stereo images with \(n_1\) and \(n_2\) vertices. With \(n_1\) ordered vertices, \((r_1,r_2,\ldots ,r_{n_1})\), form the ith quadruplet in the left image which is made with four consecutive vertices \((r_{i},r_{i+1},r_{i+2},r_{i+3})\) (See Fig. 3a). Similarly, with \(n_2\) order vertices, \((r{^\prime }_1,r{^\prime }_2,\ldots ,r{^\prime }_{n_2})\), form the jth quadruplet of four consecutive vertices \((r{^\prime }_{j},r{^\prime }_{j+1},r{^\prime }_{j+2},r{^\prime }_{j+3})\) (See Fig. 3b). There are four possible triangles formed by considering three vertices at a time taken from both quadruplets. All possible triangles and their respective areas from both quadruplets are listed in Tables 2 and 3.

Table 2 Generated possible triangles and their areas from the ith quadruplet
Table 3 Generated possible triangles and their areas from the jth quadruplet

For every given quadruplet, the absolute affine invariant forms an invariant feature vector I. One possible set of absolute invariant may be derived by considering the ratio of any pair of triangles from Table 2. Similarly, in the right image, the absolute invariant may be derived from Table 3. There are six local absolute invariants \(I_{12}(i)\), \(I_{13}(i)\), \(I_{14}(i)\), \(I_{23}(i)\), \(I_{24}(i), I_{34}(i)\) in total with their counterparts \(I{^\prime }_{12}(j)\), \(I{^\prime }_{13}(j)\), \(I{^\prime }_{14}(j)\), \(I{^\prime }_{23}(j)\), \(I{^\prime }_{24}(j)\), \(I{^\prime }_{34}(j)\) that can be obtained by considering a pair of triangle areas at a time. Ideally,

$$\begin{aligned} I_{k \zeta }(i)= & {} \frac{S_{k}(i)}{S_{\zeta }(i)} \nonumber \\ I{^\prime}_{k \zeta }(j)= & {} \frac{S{^\prime}_{k}(j)}{S{^\prime}_{\zeta }(j)}\,\, {\text{ for }}~~ k, \zeta = 1,2,3,4;\quad k< \zeta . \end{aligned}$$
(1)

These sets of invariants are called invariant feature vectors, denoted as I(i) and \(I'(j)\) in the left and right images, respectively.

The absolute area invariant property [30] may be used to search an associated invariant feature vector \(I{^\prime }\) in the right image with respect to the invariant feature vector I in the left image. The ordered vertices [3] of the convex hull produce the corresponding ordered feature vectors along the convex hulls. The number of absolute invariant feature vectors with respect to the convex hulls in both images remain same in the absence of occlusion or overlap. Therefore, each left invariant feature vector I(i) for all i in the left image must have its counterpart right invariant feature vector \(I{^\prime }(j)\) for all j in the right image. If the object is occluded, its convex hull may also be changed and the associated absolute invariant feature vectors must be affected. In that case we add/delete the vertex/vertices in such a way that all the vertices in both images are distributed equally. This helps us to equalize the number of vertices in both images. An example of such a case is shown in Fig. 4.

Fig. 4
figure 4

Examples of adding/deleting the vertex/vertices in case of occlusion: a addition of a vertex; b deleting the vertices in order to equalize the number of vertices in both views

4.1.3 Establishing Correspondence

The correspondence is established by measuring the following error:

$$\begin{aligned} e_\omega (i,j) =\,&\Vert I(i)-I{^\prime }(j)\Vert ^2 + \Vert I(i+1)-I{^\prime }(j+1)\Vert ^2 \\&\, +\cdots + \Vert I(i+\omega -\zeta )-I{^\prime }(j+\omega -\zeta )\Vert ^2 \end{aligned}$$
(2)

where, \(\omega\) is the number of consecutive vertices and \(\zeta ~=1,2,3,4\) depending on the value of \(\omega\). The following matching decision rule is adopted:

$$(i_{min},j_{min})=\arg \{\min \limits _{(i=1,\ldots ,n_{1})}\min \limits _{(j=1,\ldots ,n_{2})}e_\omega (i,j)\},$$
(3)

where, \(n_1\) and \(n_2\) are the number of vertices of the two different convex hulls taken from two stereo images.

It establishes the correspondence between the quadruplet \(i_{min}\) from the left image and the quadruplet \(j_{min}\) from the right image. Here, the matching is only a ordered linear shift search due to the ordered vertices of the convex hull. Based on the value of \(\omega\) used, it gives the correspondence between the four (\(\omega =1\)), five (\(\omega =2\)) and six (\(\omega =3\)) consecutive corresponding vertices. In both convex hulls, we take all possible sets of three vertices, not necessarily consecutive and find those sets which result the error below a predefined threshold value. Using this approach, the correspondence between a pair of set of vertices associated to the two convex hulls can be established. Once we get the correspondence between the pair of vertices, it concludes that the boundary points lying within the area of the convex hull of the matched vertices will also correspond. The whole process of correspondence estimation is given in Algorithm 2. Same procedure for matching between the skeletons of the 2D images is adopted.

figure b

4.2 3D Reconstruction of an Object

The skeleton in 3D space is computed by performing triangulation [10] on the corresponding points of the stereo skeletons. The obtained 3D curve is considered as a curve-skeleton. The 3D boundary is obtained in a similar manner by adopting triangulation. The proposed solution of object reconstruction uses ball growing approach. The required reconstruction is achieved by computing the union of spheres centered at each curve-skeleton point. The radius of each sphere is the distance from the center to the closest point on the boundary of the object. This radius is defined by the distance transform value [18] which is calculated as the Euclidean distance between the curve-skeleton points p and the boundary points BV. The whole process of object reconstruction is given in Algorithm 3. The whole reconstruction algorithm requires \({O(N_1N_2^2+N_2^3)}\) as running time complexity on an average value (\(N_1,N_2\) are the number of input data points in both view planes).

figure c

5 Results and Discussions

The experimental studies were carried out on various objects including computer generated and real stereo images. The code was developed on a MATLAB platform on a PC having 2.53 GHz Intel(R), i3 processor with 3.0 GB RAM.

Fig. 5
figure 5

Flow diagram of reconstruction process

5.1 Qualitative Results

The synthetic objects (TORUS, TEAPOT, BUNNY and BLOCK) were modeled in a computer environment http://www.tc18.org/code-data-set/3D-images.php. The stereo views were generated by taking a known geometry of stereo imaging system for each case. The flow diagram of whole reconstruction process is represented in Fig. 5.

The reconstruction results for all these objects are listed in Figs. 6, 7, 8, 9, 10, 11, 12, 13. In case of computer generated objects, reconstruction results are shown in Figs.  7, 8, 9, 10, 11.

Figure 7a, b illustrate the input stereo images of the test object (TORUS, Fig. 6a). The NURBS-skeletons of these images were extracted as shown in Fig. 7c, d, whereas the extracted boundaries are shown in Fig. 7e, f. In order to establish the correspondence in both views, the convex hulls are fitted through the extracted boundaries. A demonstration of establishing correspondence between two images of the test object is presented in Fig. 7g, h. After getting all the information required for the reconstruction of 3D boundary and curve-skeleton, triangulation is used to recover the required parameters (See Fig. 7i). The original and reconstructed object (TORUS) are shown in Fig. 8.

The simulation results in the case of the corrupted input images is shown in Fig. 9. This figure shows that the proposed algorithm is robust against a measurable amount of noise. For the actual measurement, the reconstruction result of a computer generated synthetic object (SCREWDRIVER) is shown in Fig. 10. The reconstruction results in case of other test objects (TEAPOT, BUNNY, BLOCK) are shown in Fig. 11. In case of all the computer generated test objects and in case of added noise, reconstructed results achieve similar quality of results.

Fig. 6
figure 6

a TORUS; b, c projections in two views

Fig. 7
figure 7

a, b Stereo views of TORUS; c, d NURBS-skeletons; e, f boundaries of stereo views; g, h fitting convex hull and establishing correspondence; i reconstructed 3D boundary points and curve-skeleton

Fig. 8
figure 8

a TORUS; b reconstructed TORUS by proposed approach

Fig. 9
figure 9

a, b Noise corrupted stereo views; c reconstructed object by proposed approach

Fig. 10
figure 10

a Synthetic object; b reconstructed object by proposed approach

Fig. 11
figure 11

a, c, e Computer generated objects (TEAPOT, BUNNY, BLOCK); b, d, f reconstructed test objects by proposed approach

Fig. 12
figure 12

a, b Real stereo image pair; c, d two views of the reconstructed 3D model by proposed approach; e, f same views of the 3D model reconstructed by the algorithm [1]; g, h same views of the reconstructed 3D model by the algorithm [18]

Fig. 13
figure 13

Reconstruction with increased skeleton branches

To evaluate the capability of the proposed approach in the real environment, simulations are also presented. Figure 12 shows the reconstruction results based on real images. Two images (See Fig. 12a, b) captured with a digital camera are employed as the inputs of the reconstruction algorithm. In Fig. 12a, b, the object of the interest is the computer mouse. Our aim is to reconstruct the 3D model of this computer mouse. Following the proposed algorithm, the different views of the reconstructed model of computer mouse in 3D space are shown in Fig. 12c, d. To compare the reconstruction results qualitatively for real stereo images, two well known state-of-the-art algorithms [1, 18] are considered. Fig. 12e, f show the reconstructed results by the algorithm [1], whereas Fig. 12g, h represent the results by the algorithm [18]. All the reconstructed results are represented with the same orientation. On analyzing the results, it may be seen that the proposed algorithm is capable of reconstructing the object more efficiently as compared to the other two algorithms. However, in case of the real objects, we are not able to get the enthusiastic results due to the limitation of the curve-skeleton. The results may be improved by using the increased number of branches of the skeleton with respect to its boundary. We have presented this analysis by taking a model of another physical object (COW). Figure 13 supports our claim for this case.

5.2 Quantitative Results

The results obtained with the proposed approach were evaluated from a quantitative view also. To perform this, the distance from the reconstructed surface to the original one was measured. We compared the two surfaces in cross sections, and measuring the smallest Euclidean distances between the two sections.

Table 4 shows the reconstruction error analysis in case of the computer generated object TORUS. The original and reconstructed test objects were divided into eight equal cross sections. These sections were considered for measuring the smallest distances (based on Euclidean norm) between the two surfaces (reconstructed and original surface). The measured distances are expressed in millimeters. For another computer generated object (BUNNY), the same evaluation method was adopted for the analysis. In this experiment, the two objects (reconstructed and the original model of BUNNY) were compared based on the recognized features, e.g., ears, mouth, mid area and legs. Each recognized feature was divided into equal sections, e.g., ears (2 sections), mouth (2 sections), mid area (4 sections), legs (6 sections) respectively. This evaluation is presented in Table 5. In each table, the last two columns represent the results from the two well known state-of-the-art methods [1, 18] from literature. It can be concluded from Tables 4 and 5 that the reconstruction errors for the proposed approach are smaller than the other two state-of-the-art approaches. Thus, both the tables quantitatively support the results for the proposed approach quite well.

Table 4 Errors of shape reconstruction in test experiment (TORUS, in millimeters)
Table 5 Errors of shape reconstruction in test experiment (BUNNY, in millimeters)

An analysis to show the effect of skeleton branches on the reconstruction accuracy is also presented, where the reconstruction errors were plotted against the number of skeleton branches. The accuracy graph is shown in Fig. 14. Form this graph, it may be concluded that the proposed algorithm may get better results by increasing the number of skeleton branches.

Fig. 14
figure 14

Reconstruction accuracy graph

6 Conclusions

In this paper, a methodology for reconstructing 3D objects from 2D binary images using their skeletal information is proposed. NURBS-skeleton is used to extract skeletons in both stereo views, whereas convex hull affine invariants are used in the matching process of their skeletons and boundaries. The reconstruction problem is reduced to obtain the curve-skeleton and boundary of the 3D objects. The reconstructed curve-skeleton is used as a seed for growing the 3D object, and the boundary is used for terminating this process. The ball growing approach is applied at the end to reconstruct the whole 3D surface. Different simulations, by taking the synthetic as well as real objects, are presented in order to validate the proposed algorithm. Error analysis is also conducted to check the robustness of the presented study. The proposed algorithm gives satisfactory results for the 3D reconstruction of objects with poor textures.

The presented method is straightforward and easy to apply. Also, we may conclude that the proposed method is robust against a measurable amount of noise. The proposed approach is applicable to the images in which the object of interest is like a blob i.e., the foreground object is quite different from the background. This can be considered as a limitation of this approach. However, the exact and accurate reconstruction of the original object is not achievable with the retained information of curve-skeleton and boundary in space, still the proposed method gives effective and satisfactory results.