Introduction

The assessment of left ventricular (LV) function is one of the most frequent indications for obtaining echocardiograms in the pediatric population. Measurement of fractional shortening (FS) and ejection fraction (EF) are long-standing conventional methods of assessing LV systolic function, however, both have important limitations [1,2,3]. Cardiac magnetic resonance imaging (MRI) is the clinical non-invasive gold standard for measuring ventricular function; however, due to availability, cost, and time requirements, echocardiography is more routinely used for screening and monitoring cardiac dysfunction in the clinical setting. Two-dimensional speckle-tracking strain echocardiography (2D STE) is a method of quantifying ventricular systolic function by tracking acoustic backscatter within the ventricular myocardium. 2D STE has been prevalent in the cardiac literature for the past two decades, and no longer considered a “novel” functional technique. The technique has been validated against in vitro and in vivo gold standards, with sonomicrometry and cardiac MRI. The normal values for LV strain have been defined in large populations in adults [4, 5] and pediatrics [6, 7]. Additionally, clinical advantages of GLS over EF and FS has been reported, including greater sensitivity for detecting pre-clinical cardiac dysfunction [8,9,10,11]. Recent studies in adult patients have reported lower variability in LV GLS measurements compared to EF, with GLS having superior prognostic value over LV EF for predicting major adverse cardiac events [12]. Despite these apparent advantages over EF and FS, strain continues to be an underutilized measurement in clinical echocardiography, especially in pediatrics. To our knowledge the precision of GLS compared to FS and EF has not been investigated in pediatrics. Thus, we aimed to assess the precision of LV GLS compared to LV EF and FS, with the hypothesis that LV GLS is a more precise LV functional parameter than EF or FS in pediatrics.

Methods

Study Design and Patient Population

Echocardiograms were obtained prospectively using standardized protocols between October 2013 and December 2015 at the St. Louis Children’s Hospital Heart Center at Washington University School of Medicine. All echocardiograms had been ordered for routine clinical evaluations. The study protocol was approved by the Institutional Review Board at Washington University School of Medicine. A total of 103 pediatric patients (< 22 years of age) echocardiograms meeting the following criteria were analyzed.

Inclusion and Exclusion Criteria

All patients above 1 month of age with a four-chambered heart and echocardiograms suitable for 2D strain, EF and FS analysis were included. Patients with poor echocardiographic windows, poor ECG tracings, or inadequate LV endomyocardial border visualization were excluded. Neonatal subjects (< 30 days old) were excluded due to the known ventricular remodeling that occurs during the neonatal period. Echos with significant wall motion abnormalities (dyskinesis or paradoxical motion), a known pitfall of M-mode analysis, were also excluded. Finally, patients with hemodynamically significant structural congenital heart disease, history of cardiac surgery, or arrhythmias were excluded.

Sample Size Determination

Post-hoc power analysis was performed based on results of the intraclass correlation coefficient (ICC) below. A sample size consisting of 52 subjects with two observations per subject was able to achieve 99% power to detect an ICC of 0.925 under the alternative hypothesis when the ICC under the null hypothesis is 0.692 using an F test with a significance level or α of 0.001. Based on this analysis, a study population of 103 patients was found to be more than adequate.

Data Acquisition

All echocardiograms were obtained prospectively by experienced pediatric cardiac sonographers following pediatric guidelines from the American Society of Echocardiography [13]. Conventional two-dimensional images were acquired on commercially available ultrasound machines (Vivid 7 or E9; GE Medical Systems, Milwaukee, WI) according to the standardized American Society of Echocardiography (ASE) protocol [13, 14]. The images were obtained using a phased-array transducer with central frequency ranging between 5 and 12 MHz. Cine image data were digitally stored for at least 3 cardiac cycles in native (uncompressed) frame rates for offline analysis on analysis software (EchoPAC version 110.1.2; or Digisonics version 3.8.4.2).

Two observers (M.D.P, C.M) were simultaneously trained in functional analysis by an expert pediatric echocardiographer (S.A). After a training period, the two observers analyzed 10 randomly selected echocardiograms and an analysis of variance was performed. Results were disclosed to the observers and techniques compared to narrow variations in analysis. These 10 initial echocardiograms were excluded from the final dataset. Following the training period, 2D global longitudinal strain, ejection fraction and fractional shortening were calculated for each study by the 2 observers blinded to each other’s analysis. The same echocardiographic cardiac cycle was utilized by both observers for each measurement. 15% of the studies had repeat assessments by each observer to evaluate intra-observer variability.

2D Strain Analysis

2D Lagrangian longitudinal strain (GLS) was obtained for each patient utilizing a standardized LV 2D strain protocol [15,16,17]. Offline 2D strain analysis of each study was performed for the entire cardiac cycle. Automated tracking of myocardial deformation was performed using EchoPAC v110.1.2 (GE Medical Systems, Milwaukee WI) for determination of six segment LV longitudinal strain for each of the 3 following echocardiographic views: Apical 4 chamber, Apical 2 chamber, and Apical 3 chamber views. Peak GLS from each of the views (determined by the nadir of the strain curve) was averaged to produce GLS. Tracking was visually inspected throughout the cardiac cycle and minimally manually adjusted as needed. Strain analysis on 15 randomly selected subjects was repeated by both observers (M.D.P, C.M) at least 4 weeks after the initial measurement to obtain data regarding inter-observer variability.

Ejection Fraction

Echocardiographic 2D images were obtained in the apical 4 chamber LV focused view, and the apical 2 chamber LV view for each study. Only studies showing the entirety of the endocardial border of the left ventricle were included. Biplane method of disks (also known as biplane Simpson method) was utilized for absolute LV volume measurements for each study using Digisonics Digiview v3.8.4.2 (Digisonics Inc., Houston, TX). The same cardiac cycles and clips were measured by each observer in the both views, per ASE guidelines [14]. The observers manually traced the endocardial border at end-diastole and end-systole in each view. End-diastole was defined as the peak of the electrocardiographic R-wave or just prior to onset of mitral valve closure in 2-D imaging. End systole was defined as the frame prior to opening of the mitral valve. Volume measurements were then used to calculate ejection fraction with the following formula: [(EDV − ESV)/EDV] × 100 = EF (%).

Fractional Shortening

M-mode echocardiography was performed in the parasternal short axis view through center of the LV at the level of the papillary muscles, per standard ASE protocol [13]. Both observers calculated FS for each study and measurements were performed offline using Digisonics Digiview v3.8.4.2. Again, the same cardiac cycles and clips were analyzed by each observer. FS was calculated using the following formula: [(LVEDD − LVESD)/LVEDD] × 100 = FS (%).

Statistical Analysis

Agreement between the blinded measurements was evaluated with inter-observer and intra-observer variability analysis. The two observers (M.D.P and C.M) performed offline analysis using the same measurement protocol. Variability was evaluated via Bland–Altman analysis [percentage bias and 95% limits of agreement (LOA)]. Coefficients of variation (CV) were calculated to demonstrate relative variability between methods of measure, and the CVs for each method were compared using the Friedman test with the Bonferroni adjustment [18].

Intraclass correlation coefficient (ICC) was used to measure intra-observer and inter-observer reliability within measurements. ICC is a commonly used statistic to assess rater reliability of measurements and incorporates the magnitude of disagreement rather an all or nothing agreement. Higher ICC values indicate greater reliability between measurements, with an ICC of 1 indicating perfect agreement and 0 indicating only random agreement. Post-hoc power analysis was performed as above utilizing ICC results for each clinical method. The above statistical analysis was performed using SPSS version 22.0 (SPSS, Inc., Armonk, NY).

Results

Functional analysis was performed on 103 individual studies determined to have adequate imaging. Patient characteristics are listed in Table 1. Median age was 13.2 years with 58% males. The most common indication for echocardiogram in this population was for surveillance after receiving chemotherapy. The second most common was routine outpatient screening; more specifically this included screenings for chest pain, palpitations, murmur, syncope or a family history (cardiomyopathy, bicuspid aortic valve, predisposing genetic condition, etc.) requiring evaluation by echocardiogram. Another common indication was surveillance after orthotopic heart transplantation. Three patients with history of cardiomyopathy related to coronary abnormalities (n = 2) or myocarditis (n = 1) were also included. Mean heart rate (72 bpm, range 41–104) approximated the mean frame rate (70 f/s, range 53–94) for source images.

Table 1 Patient demographics

Agreement, Bland–Altman Analysis

Mean values for each functional analysis method are shown in Table 2. No significant differences were found between means for GLS and FS, however the means between individual users were significantly different for EF (p < 0.001). The mean differences for inter-observer and intra-observer measurements are shown in Table 3. GLS had higher precision, i.e. higher inter-observer agreement and lower variability compared to EF and FS (Fig. 1; Table 4). Inter-observer variability was significant for EF, with high bias, wide limits of agreement. FS showed lower bias but wide limits of agreement. GLS had the strongest agreement and lowest variability.

Table 2 Measurement means and 95% confidence intervals for LV global longitudinal strain, ejection fraction, and fractional shortening
Table 3 Inter- and intra-observer mean difference between measurements or LV global longitudinal strain, ejection fraction, and fractional shortening
Fig. 1
figure 1

Inter-observer Bland Altman plots: comparing measurements obtained by two different users for each of three methods. This shows that there is less variability between user measurements when measuring GLS compared to EF and FS

Table 4 Reproducibility analysis for echocardiographic measures of left ventricle function in pediatric patients

Intra-observer analysis of 15 cases (15% of subjects) showed similar results, with low bias, and narrow limits of agreement for GLS. FS again showed low bias, but wide limits of agreement. EF had highest variability (Fig. 2).

Fig. 2
figure 2

Intra-observer Bland Altman plots: comparing measurements obtained by the same user two separate times for each of three methods. This shows that there is less variability between measurements obtained by the same user at different times when measuring GLS compared to EF and FS

Reliability, Intra-class Correlation Analysis

A high degree of reliability was found between the two observers’ measurements for GLS, with an ICC of 0.86 (Table 4). A moderate degree of reliability was found between the two individual observer measurements for FS, with an ICC of 0.53. Poor reliability was found for inter-observer EF with an ICC of 0.25.

Similar results were found for intra-observer measurements of GLS, with an ICC of 0.87. Intra-observer reliability for FS was poor with an ICC of 0.38. Again, poor reliability was found for intra-observer measurements of EF, with an ICC of 0.26 (Table 4).

For the inter-observer measurements, the coefficient of variation was lowest for GLS, with a CV of 3.61. It was slightly higher for FS with a CV of 5.88, and highest for EF with a CV of 18.95, showing the highest variation between measurements in EF. Using the Friedman test with Bonferroni adjustment, the CV for GLS was found to be significantly different from the CVs for EF (p < 0.0001), and FS (p = 0.0006). The CVs for EF and FS were also found to be significantly different (p < 0.0001).

Similar results were found for intra-observer measurements with a CV of 0.55 for GLS, compared to 3.97 for FS, and 4.11 for EF. Variability was lowest for observer 1 repeated measurements of GLS, and highest for observer 1 repeated measurements of EF. Using the Friedman test with Bonferroni adjustment, the CV for GLS was significantly different from the CV for EF (p < 0.02), but not significantly different from the CV for FS (p = 0.071). The CVs for EF and FS were not found to be significantly different (p = 0.071).

Discussion

The results of this study support our hypothesis that GLS is a more precise LV functional parameter than EF or FS. For the more routinely used analysis methods, FS by M-mode was more reproducible than EF calculated by the biplane method of disks. This finding is noteworthy and may be partly explained by our exclusion criteria. Consistent with the clinical practice in many echocardiography labs, we excluded subjects with considerable wall motion abnormalities (significant dyskinesis or paradoxical motion), a known pitfall of M-mode-based measurements. This may have resulted in a selection bias resulting in lower variability in FS measurements. In addition, the manual calculation of biplane EF from two separate images (versus one image for FS) may create opportunity for greater variability in EF compared to FS.

Limitations in EF and FS have been well established in the literature, though to-date they remain the most commonly used LV function quantification methods. Lee et al. showed inter-acquisition variability led to the near doubling of intra-reader variability when measuring LV dimensions and FS [19]. EF and FS are subject to limitations in acquisition, reader variability and differences in measurement techniques. Some examples include apical dropout, single beat versus 3 beat averaging, and operator variability. Margossian et al. compared multiple EF analysis methods and found significant differences between the various algorithms in pediatric subjects [20]. In this study, reproducibility of EF increased by using an average of three heart beats, though this may not be feasible in routine clinical practice due to time constraints. In potential pediatric cardiac transplant donors, Chen et al. found significant variability between local donor hospital measurements of FS and a central laboratory FS analysis of the same study, enough so to alter donor candidacy [21]. This raises considerable concern regarding the reliance on FS and EF for functional assessment in pediatrics. On the other hand, 2D LV GLS has been validated against in vitro and in vivo gold standards [22, 23], and normal values have been established from multiple large studies in adult [4, 5, 17] and pediatric populations [6, 7]. While 2D strain analysis is subject to some of the same limitations as EF (endocardial border visualization, apical dropout, etc.), strain analysis has key differentiators that may contribute to higher precision. These include the inherent algorithm of tracking acoustic backscatter that is relatively stable through the cardiac cycle, integration of key landmarks (mitral annulus, apex, endocardial border) into the analysis, and angle independence. This “semi-automated” method of functional analysis may reduce reader variability [24,25,26]. Strain also provides regional myocardial deformation patterns and is less dependent on ventricular loading conditions, both of which are well known limitations of EF and FS.

Clinical Implications

Clinical decisions are regularly based on the echocardiographic assessment of LV systolic function. The ability to follow changes over time and maintaining reliable measurements between readers is paramount. In addition to higher precision, strain offers several clinical advantages over EF and FS. Superiority of GLS over LVEF in predicting outcomes has been reported in adults [27], including a recent large meta-analysis by Kalam et al. showing superior prognostic value of GLS over LV EF for predicting major adverse cardiac events [12]. Several studies have shown the capability of 2D GLS in early detection of cardiac dysfunction. This has been shown extensively in adult patients following cardiotoxic chemotherapy [9], as well as in pediatrics [9, 28, 29], allowing earlier treatment of heart disease and alterations in treatment plans [30, 31] Similarly, strain has shown promise in recipients of cardiac transplants for detection of rejection [10, 32]. The potential clinical applications for pediatric patients continue to expand [33,34,35]. Given higher reproducibility and prognostic ability, the data from our study and the studies referenced argue for the inclusion of LV GLS as a routinely reported measurement of LV systolic function in clinical pediatric echo reports.

Historically, a major limitation in 2D strain analysis has been variability in measurements across different echo platforms and strain analysis software, limiting widespread clinical utilization. Prior literature has shown these differences, while significant, are similar across vendors [36]. To address this important limitation there has been a coordinated effort among the medical and scientific communities and commercial imaging vendors to standardize deformation imaging, resulting in recommendations summarized in a consensus document in 2015 [37]. Follow-up studies have subsequently shown reduction in inter-vendor variability for GLS [18, 38]. Of note, important variability in regional (segmental) longitudinal strain measurements remains an active issue [39, 40]. At this time, experience over the past two decades has proven LV GLS to be a robust and reliable measure of LV systolic function, with distinct advantages over EF and FS. Findings from this paper suggest the same may be said of LV GLS in pediatrics, albeit from a single-center experience with limitations as below.

Limitations

We note that there are limitations in this study. We utilized Simpson’s biplane method of disks for measurement of EF, though in pediatrics the 5/6 area length method with a 3-beat average has shown greater reproducibility [20]. The ASE currently recommends Simpson’s biplane method of disks for EF calculation [13], and this continues to be the method of EF calculation for most clinical echocardiography labs. Thus, to produce the most generalizable results, our group chose the biplane method of disks methods for EF calculation. The population we investigated was largely healthy. As a follow-up, the same analysis needs to be conducted in a larger cohort, preferably multi-center, and include patients with LV pathology. Also, we did not assess test–retest variability. Additionally, we compared the precision (reproducibility) of LV functional parameters though accuracy (comparison against a gold standard) was not addressed in this study. Prospective study comparing the above methods with cardiac MRI measurements would be of benefit in order to access accuracy. Lastly, our variability analysis was limited to a single vendor software and version. As discussed in the preceding section, inter-vendor variability in strain measurements has historically been a significant issue. Though standardization initiatives [37] have led to decrease in variability in GLS [18, 38], validation of this study’s findings with other ultrasound platforms and strain analysis software is warranted.

Conclusion

LV GLS is a more precise measure of LV systolic function compared to LV EF and LV FS in pediatric subjects with four-chambered hearts. FS in this cohort was more reproducible than LV EF. Given these findings and recent data showing improvements in inter-vendor variability, we propose LV GLS be included as a routinely reported functional parameter in pediatric (> 1 month old) echo reports, in addition to LV FS and EF. Follow-up of our study’s findings could include a multi-center trial in a heterogeneous pediatric population with a spectrum of LV disease using multi-vendor systems, and assessment of accuracy against cardiac MRI.