Introduction

Osteosarcoma is the most common primitive malignant neoplasm of the bone, which occurs most frequently in adolescents, with a second incidence peak among individuals aged > 60 years [1, 2]. The current treatment strategy—preoperative and postoperative adjuvant or neoadjuvant chemotherapy (NAC) in combination with definitive surgical resection—has dramatically improved the clinical survival rate from 20% to 60–70% compared with surgery alone [3]. Unfortunately, despite numerous clinical trials conducted over several decades, patient survival has not significantly improved. Osteosarcoma has markedly heterogeneous clinical behavior. Histopathological response to NAC, mainly the degree of necrosis, is reported to be the most important prognostic factor for survival in patients with localized osteosarcoma [4]. However, histological response cannot be evaluated for patient stratification before therapy. Currently, tumor necrosis can be assessed only in resected specimens after completion of NAC and continuation of ineffective chemotherapy can result in development of adverse effects, resistant clones, and associated costs [5]. Hence, to avoid ineffectual chemotherapy and optimize management of patients with localized osteosarcoma, an accurate and noninvasive means that can precisely predict the histological response to NAC is required in the era of personalized and precision medicine.

18F-fluorodeoxygucose (FDG) positron emission tomography-computed tomography (PET-CT) is a functional imaging modality and has been proposed as a noninvasive tool for predicting the chemotherapy response and clinical outcome in osteosarcoma [6,7,8,9,10]. Maximum standardized uptake value (SUVmax), mean standardized uptake value (SUVmean), metabolic tumor volume (MTV), and total lesion glycolysis (TLG), which are traditional parameters for analyzing 18F-FDG PET data, are valuable factors in predicting tumor response and prognosis in patients with osteosarcoma [6,7,8,9,10]. Recently, it has been shown that tumor heterogeneity is an important factor correlated with aggressive behavior, decreased response to therapy, and disease progression [11,12,13]. Thus, radiomics, which includes texture analysis, has attracted increased attention. An approach has been proposed to quantify spatial tumor metabolic heterogeneity characterized by 18F-FDG uptake texture features [14]. Increasing evidence shows that measurement of intratumoral heterogeneity on pretreatment 18F-FDG PET images can give predictive information in several solid tumors, such as rectal [15], esophageal [16], and oropharyngeal cancer [17]. However, only a few studies have explored whether baseline 18F-FDG PET radiomics features can predict treatment response or event-free survival (EFS) of patients with extremity osteosarcoma following NAC.

The aim of our study was to investigate whether the baseline texture features on 18F-FDG PET images can predict treatment response to NAC or EFS in patients with localized extremity osteosarcoma.

Materials and methods

Patients, treatment, and follow-up

This retrospective cohort study was performed at a single institution from January 2013 to December 2017 at Shanghai Jiao Tong University Affiliated Sixth People’s Hospital. The medical records of patients with histologically confirmed and resectable high-grade (II or III) extremity osteosarcoma were studied. The inclusion criteria were as follows: (i) histologically proven primary high-grade extremity osteosarcoma; (ii) no history of treatment except biopsy; (iii) American Joint Committee on Cancer (AJCC) stage II; (iv) completion of NAC and adjuvant chemotherapy and surgery at our institute; (v) no more than 2 weeks between 18F-FDG PET/CT and initiation of preoperative NAC; and (vi) follow-up for > 3 years for EFS. Exclusion criteria included the following: (i) patients with nonextremity osteosarcoma; (ii) secondary or extraskeletal osteosarcoma; (iii) skip lesions or distant metastases at initial presentation; (iv) incomplete clinical data; and (v) treatment by chemotherapy or surgery only. Thirty-five eligible patients were enrolled. Ethical approval was obtained for this single-institute retrospective analysis and written informed consent was waived.

All the eligible patients underwent two cycles of preoperative NAC followed by four cycles of postoperative chemotherapy. The preoperative NAC protocol was the modified T10 [18]. Definitive surgical resection of the primary tumor was scheduled between weeks 10 and 12 in the methotrexate-based protocol. After completion of treatment, routine follow-up evaluation was performed every 3 months for the first 2 years and every 6 months for the third year. In addition to clinical evaluation, each follow-up imaging examination included standard radiographs, CT scan, and magnetic resonance imaging of the operated limb. CT scans of the chest and whole-body bone scans were also performed.

18F-FDG PET/CT imaging acquisition

Baseline 18F-FDG PET/CT imaging was performed within 2 weeks of the patient commencing NAC. All patients were instructed to fast for at least 6 h before 18F-FDG PET/CT. Plasma glucose level was determined before tracer injection, and a maximum value of 11 mmol/L was allowed. Image acquisition started 1 h after intravenous injection of 18F-FDG at a dose of 3.7 MBq/kg. All studies were performed using GE Discovery VCT (General Electric Medical Systems) with the following setting: CT scan, 120 V and 80 mA, 64 slices, with a slice thickness of 3.75 mm. Whole-body PET scans were performed with a scan time of 2.5 min per bed position. Images were reconstructed iteratively using ordered subset expectation maximization. Attenuation correction was performed by unenhanced CT.

18F-FDG PET/CT imaging analysis

For texture analysis, an experienced nuclear medicine physician (10 years’ experience with 18F-FDG PET/CT in musculoskeletal tumors) manually drew the region of interest (ROI) around the primary tumors on each transaxial slice solely on 18F-FDG PET, not including CT data, while blinded to the patient outcome data. The ROIs were checked and validated by an independent senior nuclear medicine physician (22 years’ experience). The segmentation was performed on ITK-SNAP (latest version 3.6.0) [19]. The ROIs of the primary tumors were saved as DICOM-RT structures that were imported to the package of PyRadiomics imaging texture analysis toolbox (http://pyradiomics.readthedocs.io/en/latest/index.html) [20]. Specifically, 38 first- and higher-order texture features were derived and used for further analysis because these features are commonly used in medical imaging research [21]. SUVmax and SUVmean were defined as the maximum and mean tissue concentration in the delineated ROI, respectively. MTV was obtained by a threshold SUV of ≥ 2.0, which was used in osteosarcoma [6, 22]. TLG was calculated as (MTV) × (SUVmean).

Study outcomes

To improve the quality of our retrospective study, all pathological specimens were re-evaluated. The results of the reassessment were consistent with the initial pathological findings (completed by an expert musculoskeletal pathologist). Histological responses to NAC were graded as I and II, indicating a poor response, or grade III and IV, indicating a good response [23]. EFS was defined as the time interval from the first day of chemotherapy to local recurrence or distant metastasis, or to the appearance of secondary tumors or death from any cause, or to the last contact without events, whichever occurred first. Patients who were alive without recurrence or metastasis at the time of the last follow-up were censored.

Statistical analysis

Statistical analyses were performed using MedCalc software package (Version 18.2.1) and R open source statistical software (version 3.3.3; R project, http://www.r-project.org). As a preliminary step, to avoid feature redundancy, Pearson’s correlations between all pairs of features were calculated and displayed via a heat map (Fig. 1). Features were reduced to form a signature of potentially uncorrelated features, but retaining those features that were associated with cancer biological characteristics and prognosis. According to the histological response, all 18F-FDG PET features were compared using Student’s t test for paired data. Those predicting better histological response (p < 0.05) were chosen, and areas under the receiver operating characteristic (ROC) curves (AUCs) were calculated for each parameter to determine the best predictor cutoff value in predicting histological response. Based on these cutoff values, patients were grouped and the performance of 18F-FDG PET/CT was assessed as sensitivity, specificity, accuracy, positive predictive value, and negative predictive value. The Cox regression analysis was used to assess the effects of the texture parameters and other variables on EFS. To allow testing of multiple variables, all parameters with a p value < 0.01 were considered to indicate a significant difference in the univariate analysis and combined into a multivariate analysis to identify those independently associated with EFS. Kaplan-Meier curves were calculated for parameters that showed statistical significance after multivariate analysis, and differences were assessed by using a log-rank test.

Fig. 1
figure 1

Heat map depicting absolute value of Pearson correlation coefficients between all pairs of standard (SUVmax, SUVmean, MTV, and TLG) and texture PET features

Results

Patient and tumor characteristics

The demographic and clinical characteristics of patients are detailed in Table 1. Between January 2013 and December 2017, 35 eligible patients were enrolled, of whom four had American Joint Committee on Cancer (AJCC) stage IIA osteosarcoma and 31 had stage IIB. The median age was 33 years and more than half the patients were male. The most frequent location of the primary tumor was the femur (49%), followed by the tibia (14%). The histological subtype was osteoblastic in 22 patients (63%), chondroblastic in seven (20%), and others in six patients (17%). The alkaline phosphatase and lactate dehydrogenase levels were high in 15 and nine patients, respectively. Pathological findings revealed complete response in 16 patients and incomplete response in 19. The follow-up interval ranged from 38 to 52 months, with a median length of 41 months. No patients were lost. At the time of analysis, 20 (57%) patients had EFS, whereas the remaining 15 patients had experienced local recurrence, distant metastases, or death. Nine patients had metastatic disease: six in the lungs only, two in the bone only, and one in both the lungs and bone.

Table 1 Clinical characteristics of the patients

Tumor response prediction

Comparison of baseline 18F-FDG PET parameters between responders (n = 16) and nonresponders (n = 19) is summarized in Supplemental Table 3. Neither SUVmax nor SUVmean was significantly different between responders and nonresponders. However, nonresponders showed significantly higher MTV and TLG. In texture analysis, 10 indices were different between the two groups. Nonresponders showed significantly higher least axis (LA), correlation, dependence nonuniformity (DNU), energy, entropy, run length nonuniformity (RLNU), run entropy (RE), and size zone nonuniformity (SZNU), but lower surface volume ratios (SVR) and coarseness. The AUCs for different parameters in predicting significant tumor nonresponsiveness were 0.918 for MTV, 0.862 for LA, 0.882 for DNU and SZNU, and 0.868 for RLNU. MTV was the most effective parameter for predicting treatment response. Diagnostic performance for prediction of a poor histological response is summarized in Table 2 and Fig. 2.

Table 2 Areas under ROC curves for ability of 18F-FDG PET pretreatment parameters to predict non-response
Fig. 2
figure 2

ROC curves for baseline 18F-FDG PET primary tumor MTV, LA, DNU, RLNU, and SZNU for identification of responders versus nonresponders. The area under the ROC curve for all above parameters was > 0.85

EFS prediction

Univariate and multivariate Cox regression analyses were used to assess the effect of clinical parameters and 18F-FDG PET variables on EFS (Table 3). In univariate analyses, many 18F-FDG PET features were significantly correlated with EFS, but when combined in multivariate analysis, it modelled response (hazard ratio [HR] = 11.5029, p = 0.0003) + MTV (HR = 19.973, p < 0.0001) + coarsenessNGTDM (neighboring gray tone difference matrix) (HR = 5.847, p = 0.005), which were significantly and independently associated with EFS. Kaplan-Meier curves were calculated (Fig. 3) and differences were assessed by log-rank test (Supplemental Table 4).

Table 3 Cox proportional hazards model for EFS in patients with localized osteocarcoma
Fig. 3
figure 3

Kaplan-Meier survival curves of the EFS demonstrate differences in patients in response (a), MTV (b), and coarsenessNGTDM (c). The significant differences are 0.0008, 0.000,1 and 0.0018, respectively (log-rank test)

Discussion

In this context, we examined different baseline 18F-FDG PET image-based parameters related to conventional and heterogeneity texture indices in patients with high-grade localized osteosarcoma of the extremities treated by NAC. The main findings indicated that measurements of texture features of 18F-FDG uptake can predict treatment response (LA, DNU, SZNU, and RLNU) and EFS (coarsenessNGTDM), but the conventional parameter MTV provides the best predictive power.

Tumor response prediction

Prediction of tumor response before treatment initiation is of value for optimizing treatment strategies for patients with osteosarcoma, especially for AJCC stage II. Patients with histologically low-grade osteosarcoma have a lower likelihood of metastases and are treated by surgery alone [24, 25]; patients with skip lesions and distant metastases at initial presentation have been reported to have poor outcomes [24]. Therefore, such patients were excluded from the current study.

Many studies have investigated the value of 18F-FDG uptake in predicting treatment response in patients with osteosarcoma. As for baseline primary tumor uptake parameters to predict responders and nonresponders, MTV and TLG, but not SUVmax and SUVmean, were significant discriminators in our study, especially MTV. Consistent with our data, it has been reported that initial MTV measured by 18F-FDG PET can be used to predict the outcome of osteosarcoma of the extremities [6]. Byun et al [8] and Hyung et al [22] have revealed that the histological response to NAC in osteosarcoma can be predicted by MTV after one course of chemotherapy, rather than before treatment. MTV, as a conventional PET-related parameter, is based on metabolism of the malignant tumor and hence more accurately reflects the real tumor burden. It can be readily measured by drawing a VOI around the tumor, enabling high intra- and interobserver reproducibility [26]. It is also our clinical experience that MTV, rather than other PET imaging parameters, such as SUV, is more effective in evaluating the therapeutic response and outcome of patients with osteosarcoma. In contrast to our findings, it has been demonstrated that SUVmax serves as a useful prognostic biomarker for osteosarcoma [10, 27, 28, and]. This may be because of the different inclusion criteria for our study and others. The other studies focused on pediatric and young adult patients or patients with different AJCC stages. However, Bailly et al [29] have demonstrated that SUVmax, MTV, and TLG were all not able to discriminate among histopathological responders. There are three possible explanations. First, the study focused on pediatric patients but with mixed AJCC stages, which was different from our study. Second, this team pooled osteosarcoma and Ewing’s sarcoma. Third, the absence of sufficiently robust delineation techniques for tumor volume segmentation made consensus difficult.

All the above studies investigated only the conventional parameters of 18F-FDG PET without texture indices, which are reported to play a crucial role in various solid malignancies. To the best of our knowledge, there are no data on 18F-FDG PET texture analysis to predict tumor response of patients with extremity osteosarcoma treated by NAC. In our study, 18F-FDG PET texture analysis results revealed that there were four texture features that had a better performance and could predict treatment response: LA, DNU, RLNU, and SZNU. For both LA and RLNU, the predictive accuracy for nonresponders was 80%, compared with 82.86% for DNU and SZNU. Our study is the first to indicate that the four features (LA, DNU, RLNU, and SZNU) are associated with treatment response of osteosarcoma. When compared with the traditional indicators, MTV provides the best predictive power, with an accuracy of 88.57%. The possible reasons were as follows. First, due to the rarity of osteosarcoma, the number of patients included in this study was small. Second, we only studied patients with AJCC stages IIA and IIB, but did not include other stages comprehensively. Third, we only studied PET images without comparing CT and MR images at the same time. Our study is only the first step, using texture analysis, in predicting the treatment response of osteosarcoma. Prospective studies incorporating more patients and in which the staging criteria are expand are still needed to confirm our results and to evaluate the predictive value of other texture analysis parameters, especially the newly discovered indicators DNU and SZNU.

Prognosis prediction

In our study, MTV, histological response, and the texture feature of coarsenessNGTDM were independent predictors for EFS in the multivariate analysis. Consistent with our research, it has been demonstrated that MTV and histological response are independent predictors of metastasis in patients with osteosarcoma of the extremities [6]. Specifically, coarsenessNGTDM was a newly discovered texture feature, which was independently associated with EFS of osteosarcoma. The impact factor was derived from NGTDM, which included five fundamental parameters (coarseness, contrast, strength, busyness, and complexity). These parameters describe local tumor texture based on differences between each voxel and the neighboring voxels in adjacent image planes. Coarseness is a measure of average difference between the center voxel and its neighbors. It is linked to granularity within an image and is the most fundamental property of texture. A higher value indicates a lower spatial change rate and a locally more uniform texture. In esophageal carcinoma, coarseness is reported to discriminate responders to chemoradiotherapy from nonresponders [30]. Coarseness is also a feature that best differentiates head and neck cancer primary and nodal from normal tissues [31]. In non-small cell lung cancer, tumor coarseness is an independent predictor of overall survival in multivariate analysis [32]. Although texture features in functional imaging of cancer have potential clinical value, the relationship between biological characteristics of tumors and texture features is complex and largely unknown. Therefore, there is an urgent need to investigate thoroughly the texture features from different imaging modalities and to use different PET tracers to correlate with histopathological features that may influence patient stratification, treatment response, prognosis, and even gene expression.

Overall, our results add to the accumulating evidence that measurements of texture features of 18F-FDG uptake within baseline PET images of osteosarcoma can predict treatment response (LA, DNU, SZNU, and RLNU) and EFS (coarsenessNGTDM). We consider these findings to be encouraging from the perspective of personalized medicine. By identifying patients with localized osteosarcoma of the extremities, who might respond poorly to NAC, oncologists can adjust the course of NAC to improve response. In contrast, patients with better prognosis could eventually benefit from less-aggressive treatment with the objective of reducing the toxicity of chemotherapy, as well as related costs.

Although promising, the results of this research should be interpreted cautiously. First, the retrospective nature of our investigation means that we cannot exclude the possibility of selection bias. Second, due to the rarity of osteosarcoma and the homogeneous subgroups that we chose, the number of patients was small and in a single institution. Our ongoing prospective study will validate the predictive values of these parameters in a large number of patients. Third, we included only patients with localized high-grade extremity osteosarcoma, and thus, the conclusion of this study cannot be applied to patients with other AJCC stages. Finally, the radiomics study is constantly updating and developing, and further studies should include different methods for extracting texture features and determining the optimal approach.

Conclusions

Our present investigation suggests that intratumoral heterogeneity of baseline 18F-FDG uptake measured by PET texture analysis can predict tumor response of patients with extremity osteosarcoma treated by NAC, but the conventional parameter MTV provides the best predictive power. MTV, histological response, and the texture feature of coarsenessNGTDM were independently associated with EFS. The utility of such features, especially coarsenessNGTDM, should be confirmed by larger clinical studies before considering their potential integration into decisional algorithms aimed at personalized medicine.