1 Introduction

Deformities in the spine may cause pain, numbness, tingling sensation, loss of function, and even pulmonary and cardiac difficulties. If the spine loses its usual, graceful S-shape (when seen from the side), or if the spine loses its straightness, deformities may arise (viewed from front to rear). The term Pediatric Spinal Deformity (PSD) refers to such malformations of the spine in children, namely, scoliosis, kyphosis, and spondylolisthesis [1]. Scoliosis and kyphosis are conditions characterized by aberrant curvature of the spine, while spondylolisthesis is characterized by the displacement of vertebrae. PSD is caused by a variety of factors, the most important of which is bone growth, which controls the curvature and advancement of the deformity. There is still more to learn about the underlying etiology of PSD. Affecting roughly 7 million people in the United States, scoliosis is one of the most prevalent spinal abnormalities. Adolescent Idiopathic Scoliosis (AIS) is used to describe the condition in children and adolescents. AIS accounts for around 80% of all pediatric scoliosis cases, while affecting approximately 3% of adolescents under the age of 16 in the USA [2].

The treatment of scoliosis is mainly reliant on the shape and extent of spinal curvature, and specific treatment choices are usually determined by the surgeon’s expertise. As a result, the development of a clinically validated, patient-specific model of the spine to assist surgeons in the prognosis of early-stage PSD would guide optimum surgical and non-surgical treatment options. For both screening and monitoring in present clinical practice, the lack of an appropriate safe, inexpensive, and accurate measuring technology is a major bottleneck. To track the evolution of deformities throughout adolescent growth, frequent imaging is essential [3] and the extracted features from medical images such as Cobb angle determine the severity of spinal deformity. The Cobb angle is the most often utilized measurement for determining the severity of spinal deformities.

It is common practice in spine surgery to use computer-aided procedures, such as determining the best path for the insertion of pedicle screws [4,5,6], as well as improved surgical navigation and a more comprehensive pre-operative surgical plan [7, 8]. One of the greatest barriers to incorporating these technologies into clinical practice is the time and effort necessary to generate patient-specific functional models from medical imaging. It entails a number of manual procedures and is time intensive, even for seasoned specialists [9, 10]. For example, image segmentation is both time-consuming and user-dependent [11, 12] because it requires to locate and segment vertebrae ahead of time [14]. Generating 3D detailed geometry of the spine from a 2D set of X-ray images is also a challenging task. Various manual processes are required to segment the obtained volumetric mesh in order to detect hard and soft tissue once the 3D model has been formed.

Machine learning (ML) approaches need a large amount of data to be trained and provide reliable results [14, 15]. The absence of reliable medical data for a given individual over time is one of the challenges in implementing ML for prognosis spinal deformity. Furthermore, these models are referred to as “data-hungry” approaches since they cannot forecast outside of the range of the training data [16]. Recent publications show that by including physics into the system, one may enhance the model’s predictability range [17]. However, for spinal curve progression including bone formation, patient-specific governing physical equations with time-varying and geometric-based coefficients are unavailable.

Previously published research from our group has used X-ray imaging data to develop a prognostic framework for AIS [18]. This framework predicted spine morphology by combining clinical data acquired from X-ray images with mechanistic features such as stress distribution on the growing surface of the vertebrae extracted from a spine surrogate model along with the bone growth model. Although this unique framework used mechanistic data science for forecasting spine deformity, the previous method had a number of limitations that impede deploying this framework in real-world scenarios. One restriction is the manual parameter adjustment for image segmentation of each vertebra. The presented 3D reconstruction and geometry generation technique requires considerable labor since each tissue is generated separately and all tissues are assembled together. Moreover, the bone growth model has constant parameters throughout all patients and time steps. However, these assumptions are unrealistic since bone formation differs across ages and vertebrae.

Table 1 Contributions of proposed framework with respect to previous framework [18]

The proposed framework (Fig. 1) in this paper builds on our previous work [18], with each step in the framework improved to meet real-world needs. Both the limitations of the previous framework and the contributions of the current framework are summarized in Table 1.

The organization of the paper is as follows: After discussing related works in Sect. 2, we describe the framework to create the patient-specific 3D geometry from X-ray images in Sect. 3. In Sect. 4, we present our novel bio-informed mechanistic machine learning model for prognosis of pediatric spinal deformity. Lastly, we end with some conclusions and discuss future directions of this work.

Fig. 1
figure 1

The overall workflow of the proposed framework. The geometric data of vertebrae is retrieved from 2D X-ray time-series images using an image segmentation algorithm whose parameters are calibrated through sensitivity analysis. The 2D data is then translated to 3D data where patient-specific features are extracted. The imaging data (clinical) and the mechanistic aspects such as the dynamic patient-specific bone growth model are passed through the bio-informed mechanistic neural network to predict spinal deformity

2 Related works

2.1 Vertebrae image segmentation

Precisely segmenting the vertebrae is critical for subsequent analysis in an injury detection system. Statistical shape model (SSM)-based techniques have dominated previous work in vertebral segmentation [19, 20]. Based on a training set, these approaches capture statistical information on the shape and/or appearance of the vertebra. The mean shape is then manually or semi-automatically set close to the real vertebra, and a search process is used to converge the shape on the true vertebral boundaries. Latest evidence has used random forest-based machine learning (ML) models to achieve shape convergence [20,21,22,23]. These approaches, however, are only efficient and accurate for the restricted data reported in the literature and cannot be applied to patient-specific datasets.

2.2 3D model development and computational simulation

Compared to MRI and computational tomography (CT) scanners, X-ray images are more commonly employed due to their accessibility, lower cost, shorter scanning time, and lower ionizing radiation levels particularly for adolescent patients. Statistical Shape Models (SSM) [24,25,26,27] or Statistical Shape and Intensity Models (SSIM) were used to reconstruct bones from X-ray images. To prevent local maxima while optimizing the deformable model parameters, it is important to select a reasonable starting point [28, 29]. Recent years have seen the use of deep learning to recognize landmarks and triangulate them [28]. However, performing 3D reconstruction from two or more 2D photographs using a deep learning technique remains a tough problem due to the complexity of describing a dimensional expansion in multi-view circumstances. The EOS imaging system (formerly, Biospace Med, Paris, France), the DIERS formetric scanner, and ultrasonography are examples of recent breakthroughs in diagnostic imaging for AIS [3]. The EOS imaging system is made up of two orthogonal pairs of X-ray tubes and detector units that allow for the simultaneous capture of anteroposterior (AP) and lateral (LAT) X-ray pictures while standing. EOS imaging can quickly scan the spine in 8–15 s, depending on the patient’s height. From the EOS anteroposterior and lateral pictures, the software system sterEOS (EOS Imaging, Paris, France) can generate a highly accurate 3D model of the spine [30,31,32]. However, this technology is unable to divide distinct tissues in order to account for the many biological organs. Due to its expensive cost, it is not widely available in many medical centers.

To gain a better knowledge of load distribution and other mechanical features, the 3D generated patient-specific geometry may be utilized to construct a finite element model. There is no comprehensive automated workflow for anatomically correct FE simulations of the spine based on 2D X-ray data. A lot of work has been done on parametric FE models or a mix of statistical and FE models [33, 34]. However, those models either ignore essential patient-specific features or require a lot of manual labor, which necessitates a certain level of operator experience. Although efforts to automate the construction of FE models of the healthy spine have been performed [10, 35], the technique has never been integrated with deep learning-based segmentation algorithms or applied to diseased situations.

2.3 Implementing ML for studying spinal deformity

The application of ML in medical research has skyrocketed in recent decades. When it comes to applying ML for medical image analysis, there has always been the challenge on how to accurately integrate ML for disease diagnosis, prognosis, and therapy. A framework with such characteristics should always be able to capture the biological governing equation in order to offer extra information to the lacking training data. Recent studies have attempted to use AI to predict spinal deformities [36,37,38,39]. These frameworks, however, cannot be applied to other disciplines. Recently, research has shown that by incorporating the system’s underlying physical equations, the framework may forecast data outside of the projected range [18, 40]. Nevertheless, there are certain processes in between that need manual parameter adjustment and, as a result, cannot be implemented for real-time prognosis framework.

3 Patient-specific image segmentation and data generation

3.1 Image segmentation and parameter fitting

Image segmentation of clinical X-ray images is carried out to extract features for the prognosis framework [41]. The corner points of each vertebra are identified and used as reference points to monitor variations in spine shape and bone formation over time. We implement semi-automated image segmentation using active contour, also known as, the snakes method [42]. A rectangular contour consisting of four corner points is initialized manually around each vertebra and evolved to capture the shape. We define the four corner points of each vertebra as landmarks. These are the key points which help in describing the spine geometry. The segmentation is carried out in 2D for both AP and LAT images. The evolution of the active contour is carried out iteratively through minimization of image energy, allowing it to converge at the edges of features. Snakes are considered energy-minimizing splines which are defined in a parametric form. The total energy functional proposed in [42] considers both image and external constraint energy terms (described in detail in Appendix A).

Active contour model is fast and is able to accurately segment each vertebra in the X-ray images. However, there are a few drawbacks. The accuracy of segmentation depends on how accurately the active contour is initialized. The energy functional which is minimized consists of many weighting parameters which are set empirically at the beginning of the segmentation. The optimized parameters of the snakes method which provides the contour around each vertebra are generally obtained through a trial and error process, which is time-consuming. We propose an improved technique to obtain optimum parameters for the accurate segmentation of clinical X-ray images. The parameters of the Snake algorithm are categorized into two groups: geometric parameters and fitting parameters.

Geometric parameters For each vertebra, we initialize the active contour as a rectangle and define geometric parameters such as the width (\(w_n\)), height (\(h_n\)) and rotation angle (\(\theta _n\)) from the horizontal axis. n is the vertebra level number where \(n=1\sim {12}\) corresponds to the thoracic vertebra (T1, T2, \(\ldots\), T12) and \(n=13\sim 17\) represent lumbar vertebra (L1, L2, \(\ldots\), L5). From the T1 to the L5 vertebra, we assume a linear relationship between each vertebra size and vertebra level, with L5 vertebra having the largest size. The length and width of T1 and L5 can be used to determine the corresponding length and width of other vertebrae. The user chooses three reference corner points (top right, bottom right and bottom left) for T1 (\(w_1\) and \(h_1\)) and L5 (\(w_m\) and \(h_m\)) vertebrae as shown in Fig. 2a, and the parameters for the remaining vertebrae are computed using

$$\begin{aligned} \begin{aligned} h_n=h_1+ \left( \frac{h_m - h_1}{N-1}\right) \,(n-1),\\ w_n=w_1+ \left( \frac{w_m - w_1}{N-1} \right) \,(n-1). \end{aligned} \end{aligned}$$
(1)
Fig. 2
figure 2

a Approximating length and width of vertebrae by picking reference corner points of T1 and L5; and b the approximation of angle of rotation for each vertebra

For each vertebra, the user manually selects the center point. The rotation angle can be obtained by

$$\begin{aligned} \theta _n=cos^{-1}\left( \frac{\vec {\phi }_n.\vec {x}}{|\vec {\phi }_n|.|\vec {x}|}\right) \end{aligned}$$
(2)

where \(\theta _n\) is the rotation angle of the vertebra n, \(\vec {x}\) is the vector of horizontal axis and \(\vec {\phi }_n\) is the bisector of the vector that connects the two adjacent vertebrae as shown in Fig. 2b. It is worth noting that the framework may be used for any number of vertebrae in any regions of interest, including cervical, thoracic, lumbar, or a combination of all three. By estimating geometric parameters for the lumbar spine, Fig. 3 compares the ground truth and the modified snake method. The findings are in excellent accord with the ground truth, as shown. The ground truth segmentation result is evaluated using the snakes method in which the parameters are manually adjusted and the initialization of the contour is done manually for each vertebra.

Fig. 3
figure 3

Segmentation of the X-ray images. a Lateral (LAT) and b anteroposterior (AP) view of the image obtained from X-ray data. Comparison between two segmented lumbar spine: c ground truth (the output of the snakes algorithm in which the parameters are manually adjusted) and d by estimating geometric parameters (angle of rotations, length and width of each vertebra)

Fitting parameters The weighting coefficients in the active contour model are set in order to move the contour around each vertebra. \(\alpha\) and \(\beta\) are the weights associated with the first-order and second-order regularizing terms of the internal spline energy (Eq. (2) in [42]) that control the tension and rigidity of the snake, respectively. \(\gamma\) is the step size associated with the iterative update of the active contour and \(\kappa\) is the weighting coefficients associated with the derivatives of the external force terms (Eq. (17) in [42]). \(w_{l}\), \(w_{e}\) and \(w_{t}\) are the weighting coefficients associated with the image, edge and terminal energy functionals, respectively [42]. More details on the fitting parameters are provided in Appendix A. By identifying the optimal sets of these parameters, one may precisely determine the curvature of the spine.

A sensitivity analysis [43, 44] is performed on each pair of fitting parameters, namely \(\alpha\)-\(\beta\),\(\gamma\)-\(\kappa\), \(w_{l}\)-\(w_{e}\) and \(w_{t}\)-iterations, as illustrated in Fig. 4a–d, to provide a better estimate on the optimized parameters. Here, for a particular pair of fitting parameters, the segmentation result is compared with the ground truth result and the average value of the normalized mean squared error is calculated for all the images in the dataset. The normalized mean squared error is given as

$$\begin{aligned}&\frac{1}{\left|N_s \right|.\left|N_v \right|.\left|N_\ell \right|} \sum _{s=1}^{N_s}\sum _{v=1}^{N_v}\sum _{\ell =1}^{N_\ell }\nonumber \\&\quad \left( (x_{v,\ell }^{s} - \bar{x}_{v,\ell }^{s})^2 + (y_{v,\ell }^{s} - \bar{y}_{v,\ell }^{s})^2 \right) \end{aligned}$$
(3)

where \(N_s\), \(N_v\) and \(N_l\) are the number of images, number of vertebra and number of landmarks associated with each vertebrae. \(x_{v,l}^s\) and \(y_{v,l}^s\) are the positions of landmarks obtained by the current approach, whereas \(\bar{x}_{v,l}^s\) and \(\bar{y}_{v,l}^s\) are the positions of the landmarks obtained from ground truth result.

To obtain the optimized parameters throughout the dataset, each pairwise parameter with the least normalized mean squared error is the initial value in the Simplex optimization. The final set of optimized parameters are used to segment the images, as shown in Fig. 4e. We observe that the method is not very sensitive to the parameter values, which is desirable for practical applications, since there is no need for fine tuning of parameters in order to achieve satisfactory results.

Figure 5 shows the segmented spine in AP view, taking into account both the geometric and fitting parameters. First, the reference points of the T1 and L5 vertebrae will be identified and then the program will identify the contour around each vertebra.

Fig. 4
figure 4

ad Sensitivity analysis on the fitting parameters of the snakes algorithm to be provided as initial values for multi-variable optimization. e Optimized multi-variables of the snakes method for the lumbar spine. The dashed line denotes the ground truth manually picked four corner points of the vertebrae, and solid lines are the optimized configuration. The normalized mean square error is 0.0187 corresponding to \(\alpha = 0.25\), \(\beta = 0.25\), \(\gamma = 12\), \(\kappa = 0.2\), \(w_l = 0.5\), \(w_e = 0.5\), \(w_t = 0\), and \(iteration = 50\)

Fig. 5
figure 5

Three steps for image segmentation. a Picking the reference points of each vertebra, b picking the center points of all vertebrae and c segmented data by implementing optimized geometric and calibrating parameters using the snakes algorithm

3.2 3D shape reconstruction from bi-planar 2D data

Serial X-ray images from two orthogonal perspectives known as anteroposterior (AP) and lateral (LAT) are used in this study to gather patient-specific data (Fig. 3). Patient-specific characteristics are retrieved from X-ray images to assess spinal deformity prognosis. After performing segmentation and obtaining the contours from AP and LAT images, we perform 3D reconstruction from the 2D segmented contours in order to get the volumetric spine geometry. The different coordinate systems associated with the AP and LAT images present a challenge that needs to be addressed. Furthermore, it is challenging to keep track of the various coordinate systems of these images that have been longitudinally taken over time.

The initial step should be to calibrate the camera location in order to make all data consistent and comparable in scale. Here are the assumptions for the camera calibration step:

  • Reference of the coordinate system: The coordinate system’s reference point in AP and LAT perspectives is the center point of L5 (\(5^{th}\) lumbar vertebra).

  • Angle between AP and LAT views: the angle between AP and LAT views is \(90^{\circ }\).

  • Scaling criterion: The images are taken with different scales. Images need to be scaled such that the heights of the spine in these two X-ray views are the same. It is assumed that AP view is fixed and the LAT view will be calibrated accordingly. The scaling factor between two pairs of images is calculated as

    $$\begin{aligned} s= \frac{z^{\text {AP}}_{\text {max}}-z^{\text {AP}}_{\text {min}}}{z^{\text {LAT}}_{\text {max}}-z^{\text {LAT}}_{\text {min}}}, \end{aligned}$$
    (4)

    where \(z^{\text {AP}}_{\text {max}}\) and \(z^{\text {AP}}_{\text {min}}\) are the maximum and minimum of z coordinates of landmarks in AP, respectively. Similarly, \(z^{\text {LAT}}_{\text {max}}\) and \(z^{\text {LAT}}_{\text {min}}\) are the maximum and minimum of z coordinates of landmarks in LAT, respectively.

As seen in Fig. 6, the scaled data is subsequently fed into a 3D reconstruction procedure. The size of the bounding box for each vertebra is calculated by identifying the minimum and maximum in each direction, and the nodes in 3D space corresponding to that region are identified. The nodes on the boundary will also be detected using the MATLAB convhull function [45] (Convex hull of the 3D reconstructed region). The 3D reconstruction geometry will be obtained as well as the boundary points for every vertebrae by performing the described technique and utilizing the scaled landmarks received from the X-ray images (Fig. 7).

Fig. 6
figure 6

Flowchart of 3D reconstruction of vertebrae using 2D data

Fig. 7
figure 7

3D reconstruction of vertebrae using 2D data. The camera parameters are calibrated before reconstruction based on the explanation in Sect. 3.2. a The segmented data of the AP and LAT views, b the 3D reconstructed geometry shaded in blue (obtained from the multiple inner volume blue points) using the bounding box algorithm; the magenta points denote the boundary (outer) surface and the blue points denote the inner volume, c and d zoomed-in views of the inner and boundary points of the third thoracic vertebra (T3)

3.3 Volumetric mesh generation

The 3D reconstructed shape shown in Fig. 7 cannot be directly used in to generate 3D volumetric mesh of an estimated spine from the 2D views, and we use a twisting registration for the spine on the source mesh introduced in [46]. In our previous work [18], we used free-form deformation based on truncated hierarchical B-splines (THB-splines) for registration [47,48,49]. During registration, the detailed source geometry is taken from the atlas 3D geometry model which includes 128,205 vertices as explained in [18]. The 3D volumetric mesh composes of three components: intervertebral discs, lumbar vertebrae and thoracic vertebrae. The volumetric mesh of each vertebra is then segmented into growth plate, cortical bone and cancellous bone regions. Similarly, the volumetric mesh of each intervertebral disc is partitioned into Annulus Fibrosus and Nucleous Pulposus regions.

The employed registration method is based on the Iterative Closest Point (ICP) technique [50] which is extensively explained in Appendix B. In this method, we use the central points of each estimated vertebra and 8 corner points of the corresponding bounding box as the target points. Then we define the point cloud as a surface and perform the nonrigid registration method in Appendix B with high stiffness i.e., high \(\gamma\) in Eq. (B9), to register the mentioned corresponded points from source mesh (the Atlas 3D geometry in [18]) to the target mesh (generated in Sect. 3.2). The stiffness term prohibits the unfavourable rotation and causes in logical twisting instead. After finding the limited number of the transformation matrix as outcome of the registration method (17 (vertebra) \(\times\) 8 (bounding box) + 17 (central of vertebra) = 153 transformation matrices in total), each point on the source mesh among the 128,205 vertices will be transformed based on the transfer matrix of each of the 153 points which is the closest to the vertex before registration. As shown in Fig. 8a, b, the central points of the vertebra on the target spine and the source spine are depicted before and after registration respectively. Using the found transformation matrix of these 153 points, the complete volumetric spine is deformed from Fig. 8c to d which matches all the 8 landmarks of each vertebra.

Practically, the registration method allows us to generate a time-series mesh morphing of a patient’s spine using the output of Sect. 3.2 where the X-ray images are serial images taken at different time steps from the spine. Through the approach introduced in Sect. 3.2, we have a low-quality time-series 3D point clouds of the spine. Here, we use each of the reconstructed spines in each time interval as a template for the mesh registration method explained in Appendix B, which gives us a mesh morphing 4D (3D + time) [51, 52] data of the spine.

Fig. 8
figure 8

Volumetric mesh registration on a point cloud including reconstructed landmarks from X-ray images. Before registration: a center line, c volumetric mesh; After registration: b center line, d volumetric mesh

4 Bio-informed mechanistic machine learning

In this section, we propose our bio-informed mechanistic machine learning model that incorporates clinical data as well as mechanistic knowledge such as bone growth model and finite element results. We propose three machine learning models for prognosis of PSD: 3D-Clinical Neural Network; Bio-informed Clinical Neural Network; and Bio-informed Mechanistic Neural Network. The Bio-informed Clinical Neural Network is a stand-alone model that utilizes only the clinical data (i.e., X-ray images). Each of the latter two models is divided into two parts where both of them share the first part, called Center Point Predictor Neural Network. The Center Point Predictor Neural Network returns the center point of each vertebra, and this output is fed into the second part of the respective model.

The major difference between the Bio-informed Clinical Neural Network and the Bio-informed Mechanistic Neural Network is the loss function of the second part. The former uses the bone growth parameters in the loss function, whereas the latter takes both geometric features (position of landmarks and center points) and bone growth parameters into the loss function. This will be explained in detail in the following Sect. 4.3.

4.1 Introduction to bone growth model

One way to address the scarcity of patient-specific data is to incorporate the governing physical equation since it provides insights into the expected data. There are several aspects to consider when employing governing physical equations. The model must be validated over a large population, and some patient-specific time-dependent constants must be calibrated for each individual patient at each time step if these models are used in computational simulations as well as ML frameworks.

The underlying physical equation that governs the progression of the spine is the bone growth model, because the AIS occurs during adolescence when bone growth is at its peak [53]. It is believed that bone formation is a consequence of the stress imposed on each vertebra’s growth plate [54] located on the top and bottom of each vertebra. This assumption, however, may not be adequate to correctly depict bone growth, since other factors such as age, sex, and bone mineral density also affect bone formation. In the presented approach, we describe bone growth as follows:

$$\begin{aligned} \varvec{G}=\varvec{A}+\varvec{B}\bar{\sigma } \end{aligned}$$
(5)

where \(\varvec{G}=[G_X,G_Y,G_Z]\) are the growth rates along three normal directions and \(\bar{\sigma }\) is the von Mises stress. There are two advantages of implementing von Mises stress. First, all stress components are taken into account and second, the definition of the stress is independent of the coordinate system. Vectors \(\varvec{A}\) and \(\varvec{B}\) are patient-specific parameters that should be calibrated for each patient, where \(\varvec{A}=[A_X,A_Y,A_Z]\) are considered as growth rate for non-scoliotic spine and \(\varvec{B}=[B_X,B_Y,B_Z]\) are the regulating growth model parameters.

4.2 Modified growth model

We aim to make the most use of all available data by exploiting medical features acquired from X-ray data, mechanistic features extracted from computational models, and addressing data scarcity while leveraging the underlying physical equation. The objective of the bio-informed mechanistic machine learning method is to reconstruct the curvature of the spine throughout time. However, the calibrated parameters generated by the system may be implemented into a computational model for monitoring and assessing bone formation on a local level.

4.2.1 Framework explanation and notation

The parameters of the physical growth equation are patient-specific, time-dependent, and position-dependent. Thus, Eq. (5) for any arbitrary landmarks on the growth plates for time t and vertebra v can be rewritten as

$$\begin{aligned} \varvec{G^{tv}}=\varvec{A^{tv}}+\varvec{B^{tv}}\bar{\sigma }^{tv} \end{aligned}$$
(6)

where \(\varvec{A^{tv}}=[A^{tv}_X,A^{tv}_Y,A^{tv}_Z]\) and \(\varvec{B^{tv}}=[B^{tv}_X,B^{tv}_Y,B^{tv}_Z]\) are patient-specific time-dependent, position-dependent parameters that should be calibrated for each patient at a given time to calculate growth rates \(\varvec{G^{tv}}=[G^{tv}_X,G^{tv}_Y,G^{tv}_Z]\) using the von Mises stress at time t for vertebra v using the von Mises stress field from the surrogate finite element model explained in [18]. The loads, boundary conditions and material properties are the function of time t and is updated accordingly based on the age of the patient. The finite element model’s output will contain the von Mises stress that corresponds to the growth landmarks’ position. Indices XY and Z correspond to the global coordinate system shown in Fig. 7. In plain terms, the modified growth equation describes the directional growth of a specific vertebra of a particular patient over a span of time. The spatiotemporal parameters of the equation track the co-ordinates of the vertebra over the time. One can imagine a bounding box around a vertebra that deforms in three dimensions. The growth equation tracks the three orthonormal components of the growth through the parameters. Hence, once solved, the modified growth equation can specify the later position of the bounding box.

Local and global coordinate systems To obtain the growth model parameters that correspond to the growth deformation of the vertebrae without taking into account the rigid body motion caused by the movement of the bottom vertebrae, a local coordinate system is set up at the center point of each vertebra. The axes of the local coordinates have the same direction as the axes of the global coordinate system (as mentioned in Sect. 3, the center of global coordinate system is at the center point of L5 (the fifth lumbar vertebra)). Figure 9a shows the global coordinate systems corresponding to the time t and the local coordinate system for the vertebra L3 at the same time (Fig. 9b). The landmarks that surround each vertebra are divided into two categories: those that are positioned on the growth plates (\(X_{\text {G}}\)) and those that dictate the vertebra’s side shape (\(X_{\text {S}}\)) as shown in Fig. 9c.

Fig. 9
figure 9

Description of the coordinate systems and landmarks. a Global coordinate system in AP view, b labeling landmarks to growth (\(X_{\text {G}}\)) and side (\(X_{\text {S}}\)), c local coordinate system of the vertebra L3 in AP view (the yellow dots are the 16 landmarks and the blue dots are the 4 corner landmarks), d labeling of the growth landmarks on vertebrae in local coordinate system in 3D view and e labeling of the growth landmarks on the L3 in local coordinate system in 2D view using the 4 corner points

Growth parameters The growth landmarks with comparable motion directions (due to growth) within the same vertebra should be analyzed together to represent the growth deformation of \(X_{\text {G}}\) landmarks. In the first step, nodes that behave similarly in each direction will be grouped together, as indicated in Table 2.

Table 2 Landmarks with the same growth behavior are clustered into the same group. The numbers are clarified in Fig. 9d

In the y-axis, for example, landmarks 1, 3, 5 and 7 have the same growth order (positive local y), whereas landmarks 3, 4, 7 and 8 have the same growth direction in the z-axis (negative local z) as visualized in Fig. 9. The growth equation of each landmark (j) in local coordinates (xyz) for time (t) and vertebra (v) can be written as:

$$\begin{aligned} \varvec{G^{tv}_{j}}=\varvec{A^{tv}_{j}}+\varvec{B^{tv}_{j}}\bar{\sigma }^{tv}_j \end{aligned}$$
(7)

where \(\varvec{A^{tv}_j}=[A^{tv}_{xj},A^{tv}_{yj},A^{tv}_{zj}]\) and \(\varvec{B^{tv}_j}=[B^{tv}_{xj},B^{tv}_{yj},B^{tv}_{zj}]\) are patient-specific time dependent, position dependent parameters for landmark (j) that should be calibrated for each patient at a given time for every vertebra. Indices x, y and z represent the three directions of local coordinate system shown in Fig. 9. It should be noted that \(\bar{\sigma }^{tv}_j\) is the von-Mises stress on landmark j which is independent of the coordinate system.

4.2.2 Calculating growth parameters

Nodes with comparable growth behavior have the same growth parameters, as listed in Table 2. These parameters are considered to be the same for each vertebra. The first step in determining growth parameters for each vertebra v is to create growth equations for all nodes as presented in Eq. (7).

The equations are rewritten to represent the clustered label in each direction, using the same growth parameter assumption as mentioned before as

$$\begin{aligned}&G^{tv}_{xj}= {\left\{ \begin{array}{ll} A^{tv}_{xF}+B^{tv}_{xF}\bar{\sigma }^{tv}_j \qquad &{} x^{tv}_j < 0, \\ A^{tv}_{xB}+B^{tv}_{xB}\bar{\sigma }^{tv}_j \qquad &{} x^{tv}_j \ge 0, \end{array}\right. } \end{aligned}$$
(8)
$$\begin{aligned}&G^{tv}_{yj} = {\left\{ \begin{array}{ll} A^{tv}_{yR}+B^{tv}_{yR}\bar{\sigma }^{tv}_j \qquad &{} y^{tv}_j < 0, \\ A^{tv}_{yL}+B^{tv}_{yL}\bar{\sigma }^{tv}_j \qquad &{} y^{tv}_j \ge 0, \end{array}\right. } \end{aligned}$$
(9)
$$\begin{aligned}&G^{tv}_{zj} = {\left\{ \begin{array}{ll} A^{tv}_{zU}+B^{tv}_{zU}\bar{\sigma }^{tv}_j \qquad &{} z^{tv}_j < 0, \\ A^{tv}_{zD}+B^{tv}_{zD}\bar{\sigma }^{tv}_j \qquad &{} z^{tv}_j \ge 0 \end{array}\right. } \end{aligned}$$
(10)

where \(x^{tv}_j\), \(y^{tv}_j\) and \(z^{tv}_j\) are the coordinates in the local system of landmark j for the vertebra v at time t. The vectors \({\mathcal {A}}^{{tv}}\) and \({\mathcal {B}}^{{tv}}\) for a vertebra v at time t will be defined as

$$\begin{aligned} \begin{array}{c} {\mathcal {A}}^{{tv}}=[A^{{tv}}_F,A^{{tv}}_B,A^{{tv}}_R,A^{{tv}}_L,A^{{tv}}_U,A^{{tv}}_D],\\ {\mathcal {B}}^{{tv}}=[B^{{tv}}_F,B^{{tv}}_B,B^{{tv}}_R,B^{{tv}}_L,B^{{tv}}_U,B^{{tv}}_D], \end{array} \end{aligned}$$
(11)

to define the patient-specific, time-dependent, and position-dependent scalar value of growth parameters for each vertebra. To discretize the growth parameters in each direction, the vectors

$$\begin{aligned}&{\left\{ \begin{array}{ll} {\mathcal {A}_x^{tv}}=[A^{{tv}}_F,A^{{tv}}_F,A^{{tv}}_F,A^{{tv}}_F,A^{{tv}}_B,A^{{tv}}_B,A^{{tv}}_B,A^{{tv}}_B],\\ {\mathcal {A}_y^{tv}}=[A^{{tv}}_R,A^{{tv}}_L,A^{{tv}}_R,A^{{tv}}_L,A^{{tv}}_R,A^{{tv}}_L,A^{{tv}}_R,A^{{tv}}_L],\\ {\mathcal {A}_z^{tv}}=[A^{{tv}}_U,A^{{tv}}_U,A^{{tv}}_D,A^{{tv}}_D,A^{{tv}}_U,A^{{tv}}_U,A^{{tv}}_D,A^{{tv}}_D],\\ \end{array}\right. }\nonumber \\&{\left\{ \begin{array}{ll} {\mathcal {B}_x^{tv}}=[B^{{tv}}_F,B^{{tv}}_F,B^{{tv}}_F,B^{{tv}}_F,B^{{tv}}_B,B^{{tv}}_B,B^{{tv}}_B,B^{{tv}}_B],\\ {\mathcal {B}_y^{tv}}=[B^{{tv}}_R,B^{{tv}}_L,B^{{tv}}_R,B^{{tv}}_L,B^{{tv}}_R,B^{{tv}}_L,B^{{tv}}_R,B^{{tv}}_L],\\ {\mathcal {B}_z^{tv}}=[B^{{tv}}_U,B^{{tv}}_U,B^{{tv}}_D,B^{{tv}}_D,B^{{tv}}_U,B^{{tv}}_U,B^{{tv}}_D,B^{{tv}}_D],\\ \end{array}\right. } \end{aligned}$$
(12)

are defined corresponding to the growth parameters in each direction for all the growth landmarks. In the presented frame work, the number of growth landmarks is \(X_{\text {G}}=8\), corresponding to 8 corner points shown in Fig. 9. The post-processed data of serial X-ray images (described in Sect. 3.2) are used to directly assess the growth parameters by monitoring the locations of the growth landmarks in the two consecutive images. To represent the matrix form of growth parameters for each vertebra, matrices \(\mathbb {A}\) and \(\mathbb {B}\) are developed as

$$\begin{aligned}&\mathbb {A}^{{tv}}=[{[\mathcal {A}_x^{tv}]}^{\text {T}} \ \ {[\mathcal {A}_y^{tv}]}^{\text {T}} \ \ {[\mathcal {A}_z^{tv}]}^T], \quad \text { and } \quad \nonumber \\&\quad \mathbb {B}^{{tv}}=[{[\mathcal {B}_x^{tv}]}^{\text {T}} \ \ {[\mathcal {B}_y^{tv}]}^{\text {T}} \ \ {[\mathcal {B}_z^{tv}]}^{\text {T}}]. \end{aligned}$$
(13)

The dimension of \(\mathbb {A}^{{tv}}\) and \(\mathbb {B}^{{tv}}\) are \(8 \times 3\) corresponding to 8 growth landmarks and 3 local coordinate directions (xy and z).

4.3 Spinal deformity prognosis framework

A bio-informed machine learning framework is introduced by incorporating our knowledge from clinical data (X-ray images) and the modified bone growth computational model explained in Sect. 4.2. The landmark position \({\varvec{X}}\) is related to the features \({\varvec{I}}\) at a given time by an unknown bio-physics equation which can be generically defined as a mapping. Note that \({\varvec{I}}\) is referring to general features for now. It will be specified in the following sections depending on the choice of the model. The overall structure of a neural network can also be described as a mapping, i.e.:

$$\begin{aligned} {\left\{ \begin{array}{ll} {\text {Unknown\ bio-physics\ equation}}{:}\ \mathbf {\varvec{X}}=\mathscr {F}_{\text {Unknown-BioPhy}}({\varvec{I}})\\ {\text {Neural\ network\ mapping}}{:}\ \mathbf {\varvec{X}}=\mathscr {F}_{\text {FFNN}}(\mathbf {\varvec{I}}) \end{array}\right. } \end{aligned}$$
(14)

where \(\mathscr {F}_{\text {FFNN}}\) is the feed forward neural network (FFNN) that uses clinical and mechanistic features \({\varvec{I}}\) as input, and generates landmark coordinates \({\varvec{X}}\) as the output. The notation used throughout this section is defined in Table 3. To predict the position of a landmark, the input neurons would be clinical and mechanistic features and the hidden neurons would act as a multiplicative, functional decomposition of the unknown bio-physics equation that estimates the unknown function required to map input features to coordinates in the output neurons. In this project, MATLAB is used to build the FFNN and to train the neural network parameters [55].

Each neuron within every layer of a generic FFNN receives the output value from each neuron in the previous layer as input and produces a single output. This procedure is carried out for each layer. For an arbitrary number of layers and neurons per layer, the value of the jth neuron in layer l for the sth sample (either a training sample or prediction) may be written as:

$$\begin{aligned} a^{l}_{j,s} = {\left\{ \begin{array}{ll} I_{j,s}, &{}\text {if } l=1 \text { (input layer)} \\ \mathscr {A}(b^{l}_j+\sum _{i=1}^{N_N(l-1)}{W^{l-1}_{ji} a^{l-1}_{i,s}}), &{}\text {if } 1< l < N_{\text {L}} \text { (hidden layers)}\\ b^{l}_j+\sum _{i=1}^{N_N(l-1)}{W^{l-1}_{ji} a^{l-1}_{i,s}}, &{}\text {if } l=N_{\text {L}}\ \text {(output layer)} \end{array}\right. } \end{aligned}$$
(15)

where \(\mathscr {A}\) is an activation function. In the training part, this research uses ReLU (Rectified Linear Unit) function defined as the positive part of its argument, [56]: \(f(x)=x^+=max(0,x)\), and each neuron is computed using a different weight \(W_{ij}^{l=1}\) and bias \(b^{l=2}_j\), where i is the neuron in the previous layer. Finally, the overall response—the predicted landmark coordinates—is given by:

$$\begin{aligned} X^\text {predicted}_{j,s}=a^{N_{\text {L}}}_{j,s}. \end{aligned}$$
(16)

The FFNN can learn the unknown governing bio-physics equation based on the loss function and input and output features. The loss function may be constructed to train the FFNN to offer accurate estimates while also supplying patient-specific parameters that can be considered as the network’s hyperparameter. Multiple FFNN approaches are presented in the next section, each with a distinct loss function and input-output features.

Table 3 Notation table of variables used in the feed forward neural network

4.3.1 Machine learning framework explanation

3D-clinical neural network (FFNN\(_{\text {CL}}\)): Given the coordinates of a vertebra’s landmarks at time t, the clinical neural network attempts to predict the vertebra’s landmark coordinates at time \(t+\Delta t\); see Fig. 10. The input vectors are \(\varvec{I}_s^t=[\bar{\varvec{X}}_s^t,\varvec{\alpha }_s^t,t,\Delta t]\), where \(\bar{\varvec{X}}_s^t\) indicates the landmark coordinates of sample s at time t and \(\varvec{\alpha }_s^t\) denotes the global angles (described in [18]) characterizing the 3D shapes of the spine at that time. The outputs are vectors \(\varvec{O}_s^t=[\varvec{X}_s^{t+\Delta t}]\), where \(\varvec{X}_s^{t+\Delta t}\) denotes the expected 3D coordinates of the vertebra’s corner points at time \(t+\Delta t\).

figure a

The relative approximation error for this model is calculated as

$$\begin{aligned} \delta _{CL} = \frac{1}{\mathcal {H} \cdot N_{\text {T}}\cdot N_m}\sum _{s=1}^{N_{\text {T}}}\sum _{m=1}^{N_m}\left\| \varvec{X}_{sm}^{t+\Delta t}-\bar{\varvec{X}}_{sm}^{t+\Delta t}\right\| ^2, \end{aligned}$$
(18)

where \(\delta _{CL}\) denotes the relative error of the clinical neural network, \(\varvec{X}_{sm}^{t+\Delta t}\) is the coordinates of the landmark m of vertebra s predicted by the neural network, and \(\bar{\varvec{X}}_{sm}^{t+\Delta t}\) is from clinical dataset.

Fig. 10
figure 10

The structure of the 3D-clinical feed-forward neural network FFNN\(_{\text {CL}}\) developed to predict the coordinates of the landmarks at time \(t+\Delta t\)

Center point predictor neural network (FFNN\(_{\text {CR}}\)): Given the set of landmarks’ coordinates \(\bar{\varvec{X}}_s^t\), global angles \(\varvec{\alpha }_s^t\), and von Mises stress \(\bar{\varvec{\sigma }}_s^t\) at landmarks at time t, FFNN\(_{\text {CR}}\) predicts the coordinates of the center of the sample at time \(t+\Delta t\); see Fig. 11. More formally, the input to FFNN\(_{\text {CR}}\) is \(\varvec{I}_s^t=[\bar{\varvec{X}}_s^t,\varvec{\alpha }_s^t,t,\Delta t, \bar{\varvec{\sigma }}_s^t]\) and its output vector is \(\varvec{O}_{CR,s}^t=[\varvec{C}_s^{t+\Delta t}]\), where \(\varvec{C}_s^{t+\Delta t}\) denotes the center coordinates of the sample s at time \(t+\Delta t\). The results of this network are used in the following neural networks.

Fig. 11
figure 11

The structure of the center point prediction feed-forward neural network FFNN\(_{\text {CR}}\) developed to predict the location of the center point of the vertebra at time \(t+\Delta t\)

Bio-informed clinical neural network (FFNN\(_{\text {BC}}\)): Using the clinical data to predict bone growth parameters, FFNN\(_{\text {BC}}\) predicts the parameters of the physical growth equation given the set of landmark coordinates, global angles, and von Mises stress at the landmarks; see Fig. 12. Input vectors are \(\varvec{I}_s^t=[\bar{\varvec{X}}_s^t,\varvec{\alpha }_s^t,t,\Delta t, \bar{\varvec{\sigma }}_s^t]\), where \(\bar{\varvec{X}}_s^t\) is the landmarks’ coordinates at time t, \(\varvec{\alpha }_s^t\) is the set of global angles, and \(\bar{\varvec{\sigma }}_s^t\) is the set of von Mises stress. The outputs of the network are vectors \(\varvec{O}_s^t=[{\mathcal {A}}_s^{t+\Delta t}, {\mathcal {B}}_s^{t+\Delta t}]\) predicting the growth model parameters in 3D for each landmark. Combining the results of FFNN\(_{\text {BC}}\) with the center points resulted from FFNN\(_{\text {CR}}\), we are able to predict the coordinates of the landmarks of the sample at time \(t+\Delta t\).

figure b

For any sample s and any landmark m on s, define the landmark \(\varvec{X}_{\text {BC}}^{t+\Delta t}\) predicted by FFNN\(_{\text {BC}}\) as

$$\begin{aligned} \varvec{X}_{\text {BC}}^{t+\Delta t} = \bar{\varvec{X}}_{\text {BC}}^{t}+\Delta \varvec{C}^{t}_{s}+(\mathbb {A}^{t+\Delta t}_{\text {BC}}+\mathbb {B}^{t+\Delta t}_{\text {BC}}\sigma ^{t}_{\text {BC}})\Delta t, \end{aligned}$$
(21)

where \(\mathbb {A}_{\text {BC}}^{t+\Delta t}\) and \(\mathbb {B}_{\text {BC}}^{t+\Delta t}\) are obtained from \({\mathcal {A}}^{t+\Delta t}_{s}\) and \({\mathcal {B}}^{t+\Delta t}_{s}\), which are the outputs of the second model, \(\bar{\varvec{X}}_{\text {BC}}^{t}\) is the coordinates of landmark m of sample s at time t obtained from clinical dataset, and \(\Delta \varvec{C}^{t}_{s}\) is the change in the center of sample s from time t to time \(t+\Delta t\), obtained from the results of the first neural network. By this definition, this model’s relative approximation error, \(\delta _{\text {BC}}\), is calculated as

$$\begin{aligned} \delta _{\text {BC}} = \frac{1}{\mathcal {H} \cdot N_{\text {T}} \cdot N_m}\sum _{s=1}^{N_{\text {T}}}\sum _{m=1}^{N_m}\left\| \varvec{X}_{sm}^{t+\Delta t}-\bar{\varvec{X}}_{sm}^{t+\Delta t}\right\| . \end{aligned}$$
(22)

Bio-informed mechanistic neural network (FFNN\(_{\text {BM}}\)): Similar to FFNN\(_{\text {BC}}\), in the first step of this model, FFNN\(_{\text {CR}}\) predicts the coordinates of the center point of the vertebra at time \(t+\Delta t\), followed by computing the transition vector from the center point of the sample from time t to \(t+\Delta t\). The neural network FFNN\(_{\text {BM}}\) then predicts the set of growth model parameters as an output using a mechanistic loss function; see Fig. 12. The input vectors are \(\varvec{I}_s^t=[\bar{\varvec{X}}_s^t,\varvec{\alpha }_s^t,t,\Delta t, \bar{\varvec{\sigma }}_s^t]\) with the same definitions for \(\bar{\varvec{X}}_s^t,\varvec{\alpha }_s^t\), and \(\bar{\varvec{\sigma }}_s^t\). The output of FFNN\(_{\text {CR}}\) is \(\varvec{O}_{CR,s}^t=[\varvec{C}_s^{t+\Delta t}]\) which remains the same and the output vectors to the second step is \(\varvec{O}^t_s=[{\mathcal {A}}_s^{t+\Delta t}, {\mathcal {B}}_s^{t+\Delta t}]\), with \({\mathcal {A}}_s^{t+\Delta t}\) and \({\mathcal {B}}_s^{t+\Delta t}\) representing the parameters of the physical growth equation.

figure c
Fig. 12
figure 12

The architecture of a neural network for predicting the physical growth equation parameters. This structure is used in both FFNN\(_{\text {BC}}\) and FFNN\(_{\text {BM}}\) by defining two separate loss functions (one mechanistic and one non-mechanistic)

Similar to the FFNN\(_{\text {BC}}\), we define the predicted landmark m of sample s, for each m and s, as

$$\begin{aligned} \varvec{X}^{t+\Delta t}_{\text {BM}} = \bar{\varvec{X}}^t_{\text {BM}}+\Delta \varvec{C}^{t}_{s}+(\mathbb {A}^{t+\Delta t}_{\text {BM}}+\mathbb {B}^{t+\Delta t}_{\text {BM}}\sigma ^{t}_{\text {BM}})\Delta t. \end{aligned}$$
(25)

The definitions are identical to those stated in the FFNN\(_{\text {BC}}\) formulation. The relative approximation error for FFNN\(_{\text {BM}}\) model, represented by \(\delta _{\text {BM}}\), is calculated using

$$\begin{aligned} \delta _{\text {BM}} = \frac{1}{\mathcal {H}.N_{\text {T}}.N_m}\sum _{s=1}^{N_{\text {T}}}\sum _{m=1}^{N_m}\Vert \varvec{X}^{t+\Delta t}_{\text {BM}}-\bar{\varvec{X}}^{t+\Delta t}_{\text {BM}}\Vert ^2. \end{aligned}$$
(26)
Table 4 Neural network setup for neural networks

4.3.2 Implementing bio-informed mechanistic machine learning frameworks

Data collection and feature selection The database is comprised of clinical X-ray images. The snakes algorithm is capable of generating an infinite number of landmarks around each vertebra using X-ray images. Landmarks are classified into two types: those located on growth plates (\(X_{\text {G}}\)) and those that determine the vertebral sides (\(X_{\text {S}}\)), as illustrated in Fig. 9. The number of \(X_{\text {G}}\) landmarks on each 2D plane in this application is four, signifying the corner points that are expected to be on the growth plates. As a result, in the 3D framework presented (\(X_{\text {G}}=8\) and \(X_{\text {S}}=8\)) as shown in Fig. 13. At \(t_0=124\) months, the patient’s first X-ray image is obtained. The next four X-ray images are obtained at \(t=139,149,156,168\) months and used to train the neural network. The remaining three X-ray images, collected at \(t=160,179,187\) months, are utilized to compare with the neural network findings as shown in Table 5. Each X-ray image depicts the form of the patient’s spine at a specific age. Based on the explained machine learning framework explained in Sect. 4.3.1 FFNN\(_{\text {CL}}\), FFNN\(_{\text {CR-BC}}\) and FFNN\(_{\text {CR-BM}}\) are setup as explained in Table 4. The number of training samples, 68, corresponds to the 4 training datasets (Table 5), with each dataset containing 17 vertebrae. Depending on the NN methodology, each sample has a different size. The three test datasets listed in Table 5 are represented by the number of test samples (\(17 \times 3=51\)) in Table 4. It should be noted that the growth landmarks are tested for each framework, and the relative approximation error indicated in Table 6 is determined for all landmarks, including growth and side landmarks. The data are normalized before being fed to the model since the range and units of the input matrix are not the same.

Table 5 Data collection for FFNN\(_{\text {CL}}\), FFNN\(_{\text {CR-BC}}\) and FFNN\(_{\text {CR-BM}}\) for a single patient whose X-rays images were collected serially over time
Fig. 13
figure 13

Illustration of the growth landmarks and side landmarks. There are 8 growth landmarks (\(X_{\text {G}}=8\)) and 8 side landmarks (\(X_{\text {S}}=8\))

4.4 Results and cross-validation on the data

The findings obtained from the patient described in Table 5 are presented here to illustrate the accuracy of the bio-informed mechanistic neural network prediction. The NN reconstruction process is the same as that explained in Sect. 4.3.1. A breakdown of the data used for training and testing can be seen in Table 5. Figure  14 shows a comparison of the ground truth (actual data acquired from X-ray scans) and FFNN\(_{\text {CL}}\) for the age of 160 months which is inside of the range of the training data and 179 and 187 for the outside of the range of the training data. The results are visualized for the eight corner points signifying the growth landmarks. Due to the 3D-Clinical NN’s nature, the framework fails to forecast the ground truth. This framework is built on interpolation, and when it comes to the patient-specific prediction, where the available data is insufficient, the model cannot learn from the available data, and hence it fails to predict the outcomes. Figure 15 shows the results for the FFNN\(_{\text {CR-BC}}\) framework applied on the same dataset. The results are 3D reconstructed based on the growth parameters obtained by FFNN\(_{\text {CR-BC}}\) (Fig. 15a, c and e), and the corresponding AP views are visualized in Fig. 15 b, d and f, respectively. Because the findings are recreated using the bone growth model, this framework delivers a more accurate prediction than FFNN\(_{\text {CL}}\). The data are next examined for the FFNN\(_{\text {CR-BM}}\), as illustrated in Fig. 16. Due to the implementation of the mechanistic loss function, the results are the best when compared to the FFNN\(_{\text {CL}}\) and FFNN\(_{\text {CR-BC}}\) and are compatible with the ground truth obtained from X-ray images.

A cross-validation study is carried out to see how the input and output data impact the framework error, as shown in Table 6. The term “cross validation” refers to the process of determining how well a prediction model will work in practice [57]. We preserve the past two ages as test cases and rotate the test case inside the prior age span since we are more interested in prediction outside of the range. We can see from Table 6 that the prediction error of FFNN\(_{\text {CR-BM}}\) is always lower than that of FFNN\(_{\text {CL}}\) and FFNN\(_{\text {CR-BC}}\) and that it can predict future spine curvature with a small error. The relative errors are calculated based on the Eqs. (18), (22) and (26) for FFNN\(_{\text {CL}}\), FFNN\(_{\text {CR-BC}}\) and FFNN\(_{\text {CR-BM}}\), respectively. This observation can be explained by the fact that FFNN\(_{\text {CL}}\) operates as an interpolation function and cannot provide accurate prediction when there is insufficient training data. Since FFNN\(_{\text {CR-BC}}\) employs the bone growth equation to reconstruct the expected geometry, as explained in Eq. 20, it performs better than FFNN\(_{\text {CL}}\). Finally, FFNN\(_{\text {CR-BM}}\) performs the best since it modifies the loss function as mentioned in Eqs. 23 and 24 while also rebuilding geometry using the biological bone growth model. Table 7 refers to the cross validation on the 2D data using the existing Mechanistic framework [58] and FFNN\(_{\text {CR-BM}}\). As it is shown, for each testing case, the present structure is more efficient as the parameters for bone growth are calibrated taking into account the effects of time and position. The suggested FFNN\(_{\text {CR-BM}}\) enhances prediction accuracy by \(40\%\) for the inside of the range and \(84.3\%\) for the outside of the range.

Fig. 14
figure 14

Differences between pure data science prediction (FFNN\(_{\text {CL}}\)) and ground truth (the results obtained by X-ray images) at age of a 160 months (inside of the range of the trained data), b 179 months (outside of the range of the trained data) and c 187 months (outside of the range of the trained data). The landmarks are eight corner points of each vertebra. It is obvious that FFNN\(_{\text {CL}}\) cannot predict the ground truth

Fig. 15
figure 15

Differences between Bio-informed Clinical prediction (FFNN\(_{\text {CR-BC}}\)) and ground truth (the results obtained by X-ray images) at age of 160 months (inside of the range of the trained data, a and b), 179 months (outside of the range of the trained data, c and d), and 187 months (outside of the range of the trained data, e and f). Subpanels (a, c, e) show the 3D view and subpanels (b, d, f) show the 2D view on AP plane. The landmarks are eight corner points of each vertebra. It is obvious that FFNN\(_{\text {CR-BC}}\) can predict the results close to ground truth

Fig. 16
figure 16

Differences between Bio-informed Mechanistic predictions (FFNN\(_{\text {CR-BM}}\)) and ground truth (results obtained by X-ray images) at age of 160 Months (inside of the range of the trained data, a, b and g), 179 months (outside of the range of the trained data, c, d and h) and 187 Months (outside of the range of the trained data, e, f and i). Subpanels (a, c, e) show the 3D view, subpanels (b, d, f) show the 2D view on AP plane and subpanels (g, h, i) show the 3D reconstructed detailed geometry. The landmarks are eight corner points of each vertebra. FFNN\(_{\text {CR-BM}}\) can clearly predict outputs that are close to ground truth

Table 6 Cross-validation study for three different neural networks. The table lists the relative approximation error of the predicted landmarks (\(X_{\text {G}}\) and \(X_{\text {S}}\) as shown in Fig. 13) of each vertebra using the different neural networks. Datasets are categorized by age (in months) of the tracked patient. For each trial case, the testing data are shown in blue cells and the training data are shown in white cells
Table 7 Cross-validation study for two different neural networks on 2D data for AP view. The relative approximation error of the predicted landmarks (\(X_{\text {G}}\) and \(X_{\text {S}}\) as shown in Fig. 13) of each vertebra was calculated using the different neural networks. Datasets are categorized by age (in months) of the tracked patient. For each trial case, the testing data are shown in blue cells and the training data are shown in white cells. The mechanistic framework FFNN\(_{\text {ME}}\) is borrowed from [58]. For each trial case, the bio-informed mechanistic approach (FFNN\(_{\text {CR-BM}}\)) had better performance

5 Discussion and conclusion

In this study, we trained a bio-informed mechanistic deep learning model for prognosis of pediatric spinal deformity. The training data consist of clinical data and mechanistic features. The clinical data are extracted from anteroposterior and lateral views of the X-ray images using the active contour image segmentation model whose parameters are calibrated through sensitivity analysis. Next, the personalized 3D spine models are established through the mesh morphing technique which consists of registering a volumetric atlas spine model on each of the extracted clinical data. Thus, any geometry-dependant feature within the patient-specific models is detectable and trackable to calibrate the bone growth model parameters. The mechanistic features are obtained from the bone growth model which takes the stress distributions of the spine as inputs. A dynamic patient-specific bone growth model is proposed to enhance the accuracy of the model. To measure the stress distribution, a patient-specific 3D finite element model is generated based on the 2D clinical data and the 3D reconstruction algorithm. By merging medical data with a mechanistic model, the presented framework can address the problem of limited data for the patient-specific study. The proposed model is capable of predicting the spinal curve of a single patient, either inside or outside the training range. This study is unique in that it provides patient-specific, time-dependent, and position-dependent parameters that can be calibrated throughout the dataset. In addition, the proposed bio-informed deep learning network with the modified bone growth model was shown to achieve competitive or even superior performance against other state-of-the-art learning-based methods.

In conclusion, the workflow described in this article can be a useful and innovative guide for the early detection and treatment planning of spinal illnesses such as scoliosis, lordosis, and kyphosis. Furthermore, the framework may be used for dynamic finite element analysis of various tissues at a smaller scale. It can also be potentially extended to other image-based studies, including tumor progression and cardiovascular applications. Future research will look at the same framework over a larger dataset to construct a real-time, patient-specific, optimal treatment plan based on the first patient visit. Moreover, a virtual reality application will be developed to visualize the spine in 3D and guide surgical training in orthopaedic surgery.