Introduction

Pancreatic ductal adenocarcinoma (PDAC) is a critical global health problem, with the mortality rate remaining the highest among major solid cancers. Despite decades of clinical and research efforts, the 1-year survival rate is 20% , and the 5-year survival rate remained single digit for many years and only recently rose to 10%1,2. Clearly, new and synergistic approaches are needed to battle this ferocious disease.

Pancreatic cancer is often detected at late stages and has a weak response to current chemotherapy and a poor overall prognosis. Some hereditary risks were discovered, suggesting that up to 15% of pancreatic cancer is attributable to genetic causes3. Genomic analysis plays an important role in understanding the complex biology of pancreatic cancer development and progression, and in identifying novel treatments targeting specific molecular pathways. However, the hallmark of pancreatic cancer is a high degree of heterogeneity in the biology of pancreatic tumor progression. Clonal variations were observed in premalignant and malignant tumors that result in different and multiple biological properties of tumors that progress to kill the patient4,5. These differences manifest as tumors that progress with different biological properties, which affects the nature of the cells that grow and metastasize and the capacity of these cells to influence and organize their tumor microenvironment. This heterogeneity can be seen both at the molecular level, with non-consensus mutations and gene expression patterns, at the histological level, with different cell types and structures within the tumor, or at the tumor imaging level, with various appearances on CT images6,7. For the molecular level of intertumoral heterogeneity, whole exome sequencing analyses revealed a complex mutational landscape for PDAC8. Although mutations of some genes, such as KRAS and TP53, occur at rates of up to >50%, the frequency of most other recurrently mutated genes is less than 10%, and there is a long tail of infrequently mutated genes among the PDAC patient population9,10. This degree of heterogeneity has previously been underestimated or understated in the literature and in studies that undertake the discovery of biomarkers related to disease progression. It is also reflected by the results of population-based DNA sequencing and RNA expression studies to date, in that no consistent pattern of mutations or RNA expression profiles have yet been defined that accurately predict biological aspects of disease progression11,12.

Tackling this high degree of heterogeneity in pancreatic cancer demands a system science approach that integrates molecular data, biological properties of the tumors, and clinical features of the patients. The quantitative approaches for medical imaging analysis, such as extracting radiomic features, are perfect for globally assessing the heterogeneity of PDAC at the tumor imaging level13. In this work, we present a radiome-wide and genome-wide association approach to identify the driver genes for heterogenicity at the tumor phenotype level. This method was conducted in a unique patient population for which a large amount of tumor and normal tissue samples were collected in a rapid autopsy immediately following the patient’s demise. Our work demonstrates the feasibility of this novel systematic approach to providing new insight into the molecular mechanisms of pancreatic cancer progression.

Currently, whole exome sequencing (WES) is widely used to identify cancer-driver genes by searching for genes with a high rate of somatic mutation recurrence in multiple patient samples. What roles do these WES-identified cancer driver genes play in intertumor heterogeneity? This is the scientific question that we wish to answer. We collected a cohort of PDAC patients who had both tumor and healthy tissues from rapid autopsy, and pancreatic contrast-enhanced CT images. WES was conducted on both tumor tissues and healthy tissues. After conducting a comprehensive WES analysis and a tumor image radiomic analysis on these patients, we performed an association study to identify the cancer-driver genes that are significantly associated with image features. Our results shed some new light on tumor genomic and morphological heterogeneity in PDAC.

Results

Patient and dataset information

The patient and tumor clinical characteristics of our studied cohort are listed in Table 1. The patients in the cohort had a median of 6 (range 3–30) serial pancreas contrast CTs available. The CT used for analysis, i.e., the last CT in the series for each patient, was acquired a median of 34 (range 1–579) days before the date of death.

Table 1 Patient and tumor clinical characteristics.

Somatic single-nucleotide variants (SNVs)

Driver somatic SNVs are genetic changes in a cell that drive the development and progression of cancer. In pancreatic cancer, several driver mutations have been identified that contribute to the development of the disease14,15. In this study, we only retained translationally consequential SNVs, i.e., the missense variant, stop codon gain, start codon loss, sequence feature, splice donor variant, and intron variant. Detailed information on these somatic SNVs obtained from WES for this population is shown in Supplementary Table S1. The number of SNV recurrences in the above categories was counted for the patient population. Based on the single SNV recurrence, the mutations Chr12:25245350 C->T|A (G12D and G12V mutations) on KRAS had the highest recurrence rate, in 14 out of 26 patients (54%). These mutations, G12D and G12V, have been reported as the most common ones in pancreatic cancer, recurring at about 45% and 35%, respectively16,17. The second largest recurrence is 9 (35%), for the SNV Chr12:25245351 C-> G|A (G12R and G12C mutations) also on KRAS. These two SNVs caused the same missense variant on the amino acid sequence. Subsequently, the SNVs, Chr2:130074357 on POTEF and Chr7:152358679 on KMT2C had 7 (27%) and 6 (23%) recurrences out of 26 patients, respectively. For each gene, we also calculated the number of individuals carrying variants in any given gene. The top-ranked genes are KRAS (22), TP53 (17), KMT2C (17), LRP1B (14), FGFR2 (13), RGPD3 (11), EWSR1 (10), and RGPD4 (10) (Supplementary Table S2).

The KRAS gene had SNVs in 22 out of all 26 patients (84.6%). This agrees with the discovery that KRAS mutations are found in more than 90% of PDACs18,19,20. For gene TP53, 17 out of 26 patients (65.4%) had mutations, and previous studies showed that TP53 mutations were found in approximately 50% of PDACs and are associated with a poor prognosis. 17 out of 26 patients (65.4%) also had KMT2C mutations. Histone Lysine Methyltransferase (KMT2) family genes are frequently mutated in multiple cancer types21. Histone Lysine Methyltransferase 2C (KMT2C), also known as myeloid/lymphoid or mixed-lineage leukemia protein 3 (MLL3), is among the most frequently mutated cancer genes in major cancer types22,23. These genes have somatic mutations in most PDACs.

Gene CDKN2A had SNVs in 9 out of all 26 patients (34.6%). Previous studies showed that CDKN2A mutations are found in approximately 29% of PDACs and are associated with a poorer prognosis24. CDKN2A is a tumor suppressor gene that encodes the p16INK4A protein (hereafter mentioned as CDKN2A). As in its name, CDKN2A is a negative regulator of cell cycle progression (the G1-to-S phase transition) by disturbing the complex formation between CDK4/6 and cyclin D25,26.

Radiomic features

Radiomic features can be used to capture the heterogeneity of the tumor phenotype in the context of pancreatic cancer. Texture analysis can be used to quantify the spatial arrangement of pixel intensities within the tumor. With close-to-a-thousand radiomic features, a total of 944, extracted from each tumor volume-of-interest (VOI), a heatmap was generated to show the tumor radiomic feature pattern of the studied population (Fig. 1). The detailed data on radiomic features for patients in this cohort are listed in Supplementary Table S3. The radiomic feature pattern did not show direct correlations with the patient clinical data listed in Table 1. After feature selection using a recursive correlation pruning step for clustering with a correlation coefficient cutoff = 0.8, 170 representative radiomic features were kept for final clusters. Out of the 170 radiomic features, 59 features had a coefficient of variation (CV), the ratio of the standard deviation to the mean, greater than 1, indicating higher heterogeneity among the studied population. These 170 radiomic features and their CVs are shown in Supplementary Table S4. Radiomic feature wavelet.LHH.firstorder.Skewness had the largest CV, at 35.9. Figure 2 shows 3 tumor images with different values of wavelet.LHH.firstorder.Skewness. This feature is related to the asymmetry of the distribution of pixel values after applying a wavelet filter. The large CV indicates the high heterogeneity of the tumor pixel gray level distribution. The smallest CV including glcm.Idmn (5.98 × 10–3), glcm.InverseVariance (4.04 × 10–2), and original.shape.Sphericity (9.65 × 10–2), indicating high homogeneity of these features among all patients. Feature IDMN (inverse difference moment normalized) is used to assess the local homogeneity of VOI. Because all tumor images have very low local homogeneity, the values of IDMN for all tumor images were high, > 0.98. Sphericity is a measurement of the roundness of the tumor region’s morphology relative to a sphere. It is a measure without dimensions, independent of scale and orientation. The majority of tumors in this data set are in their later stages, and hence, the tumors are not in a round shape.

Figure 1
figure 1

Radiomic features. Each tumor from a patient (row) has 944 radomics features (column). The color indicates the Z-score of a patient for a given radiomic feature. The heatmap was generated using the function heatmap.2() from the R package of gplot (3.1.3.1) (https://search.r-project.org/CRAN/refmans/gplots/html/heatmap.2.html).

Figure 2
figure 2

Example tumor images of three patients. The corresponding tumor images have the value of the radiomic feature, wavelet.LHH.firstorder.Skewness, − 4.14 (A), − 2.31 × 10–3 (B), and 1.31 (C), respectively.

Radiome-wide and genome-wide association

This association analysis found several significant associations (P-value < 10–4) between radiomic features, from all 944 radiomic features, and genes with many somatic variants in the tumors. Figure 3 shows the distribution of all P-values of the associations between radiomic features and genes. Table 2 lists the radiomic features and genes with somatic variants with significant associations. Interestingly, there was no significant association found between radiomics features and driver genes with a high recurrence frequency of somatic variants, such as KRAS and TP53. Genes that had a significant association with radiomic features have a recurrent rate of 3 (11.5%) to 7 (26.9%) among the patients.

Figure 3
figure 3

P-values of the associations between radiomic features and genes. Association studies were conducted between all 944 radomics features (column) and 132 genes with many somatic variants in the tumors (row). The color indicates the value of log10(P-values). The heatmap was generated using the function heatmap.2() from the R package of gplot (3.1.3.1) (https://search.r-project.org/CRAN/refmans/gplots/html/heatmap.2.html).

Table 2 Associations between genes and radiomic features.

These genes that are significantly associated with radiomic features are related to tumor formation and progress. For example, the EGF and EGFR genes are important for cancer cell proliferation and spread in the body27,28. The EGFR gene with somatic SNVs is associated with wavelet.HHH.firstorder.Skewness which indicates the tumor pixel value distribution asymmetry on the CT images. Mutations in the EGFR gene, which encodes epidermal growth factor receptors, enable cancer cells to grow and proliferate. The expression and function of the mutant EGFR gene may contribute to the varying patterns of tumor growth, leading to the distinct skewness among patients. Some transcription factor genes, like RGPD6, which was the most commonly mutated in other types of cancers29, also had a significant association with the radiomic feature, wavelet.LHH.glcm.ClusterShade, which is a descriptor of the tumor texture pattern.

Several genes are associated with the same radiomic feature. This indicates these genes may be involved in the same biological pathway or are involved in synergistic interactions. For example, NOTCH1, JUN, and KDR genes were all associated with the radiomic feature, wavelet.HLH.ngtdm.Busyness. Literature suggests that both Notch and JUN genes are related to cell apoptosis30, and Notch-1 promotes JNK/c-Jun activation31. Loss of function of either JUN or NOTCH-1 can result in similar observed biological effects. In our cohort, only one patient carried somatic mutations in both the JUN and NOTCH1 genes, suggesting that the two mutations may be an alternative. Gene KDR encodes the Kinase insert domain receptor, also known as vascular endothelial growth factor receptor 2 (VEGFR-2). The corresponding radiomic feature, ngtdm, is a Neighboring Gray Tone Difference Matrix that quantifies the difference between a gray value and the average gray value of its neighbor pixels. Busyness is a measure of the change from a pixel to its neighbor. A high value for busyness indicates rapid changes in intensity between pixels and their neighborhoods. This indicates that somatic SNVs in NOTCH1, JUN, and KDR genes may cause cancer cells to develop at various speeds and result in varying localized colonization of different cell types in the tumor.

Three genes, EGF, EPHB3, and POTEF, were all associated with original.glrlm.RunVariance. Three patients have SNVs in at least two of the three genes. A gray level run length is defined as the number of consecutive pixels that have the same gray level value. The gray level run length matrix (GLRM) consists of all run lengths in a VOI. Run Variance is the measure of the variance in runs for the run lengths, and hence, is related to the intratumor heterogeneity. As an example, Fig. 4 shows the images of two tumors with or without an EGF gene mutation. The association between these three genes, EGF, EPHB3, and POTEF, and Run Variance indicates their roles in the heterogeneous colonization of cancer cells. For example, the gene, EPHB3, has been found to be involved in the signaling conduction of colonizing cells32.

Figure 4
figure 4

Tumor image of two different patients with (A) or without (B) an EGF gene mutation. The CT scans were obtained from two patients, one of whom had SNVs on the EGF gene, whereas the other did not. The corresponding tumor images have the value of the radiomic feature, original.glrlm.RunVariance, 0.249 (A) and 0.992 (B), respectively.

Genes, including SPEN, PRKG1, CDKN2A, BCORL1, and KMT2B, had a significant association with wavelet.LHL.ngtdm.Contrast, which is a texture feature. The Neighboring Gray Tone Difference (NGTD) is defined as the difference between a gray value and the average gray value of its neighbors within a distance. The value of Contrast is a measure of the local intensity variation and the spatial intensity change. Figure 5 shows an example of the tumor CT images of two patients with or without the SNV in the SPEN gene. Its high value indicates that the tumor region has large changes between voxels and their neighborhood. This association suggests that these five genes are involved in tumor growth and tumor shape regulation. Especially, the gene CDKN2A has somatic SNVs in 9 patients (34.6%) out of 26 patients. The gene CDKN2A, whose gene product is the cyclin-dependent kinase inhibitor 2A, plays an important role in cell cycle regulation and demonstrates tumor suppressor activity. Inactivation of CDKN2A leads to uncontrolled cell growth33.

Figure 5
figure 5

Tumor image of two different patients with (A) or without (B) a SPEN gene mutation. The CT scans were obtained from two patients, one of whom had SNVs on the SPEN gene, whereas the other did not. The corresponding tumor images have the value of the radiomic feature, wavelet.LHL.ngtdm.Contrast, 0.631 (A) and 3.02 × 10–2 (B), respectively.

Discussion

Previous whole-genome sequencing and variation analysis discovered that mutations on genes, KRAS, TP53, CDKN2A, ARID1A, ROBO2, KDM6A, and PREX2, are important in pancreatic cancer34. Activating mutations of KRAS are nearly ubiquitous, being found in more than 90% of PDACs18,19,20. Among all cancers, KRAS mutations are present in ~ 25% of tumors35 and frequently in lung, colorectal, and pancreatic cancers18,36,37,38. Actually, the RAS family, including KRAS, NRAS, and HRAS, is the most frequently mutated gene family in all different types of cancers16. The inactivation of TP53 reoccurs at rates of > 50% in pancreatic cancer. The gene, TP53, is the most frequently mutated gene in different cancers at rates ranging from 38 to 50%, such as ovarian, esophageal, colorectal, head and neck, larynx, lung, and pancreatic cancers39. These genes, KRAS and TP53, are important in the formation of cancer and hence have been extensively studied before, but they are not often directly related to intertumor or intratumor heterogeneity. In pancreatic cancer, the prevalence of recurrently mutated genes then drops to ~ 10% for a handful of genes involved in chromatin modification, DNA damage repair, and other mechanisms resulting in significant intertumoral heterogeneity34. These driver mutations can be used to inform diagnostic and treatment strategies for pancreatic cancer and provide a better understanding of the underlying biology of the disease. To understand the role of these genes with low recurrence mutations, the association between gene-mutation profiles and radiomic features was examined in this work. The significant association between sets of genes and radiomics features identified genes that contribute to morphological heterogeneity. These genes include NOTCH1, JUN, KDR, EGF, EPHB3, POTEF, SPEN, PRKG1, CDKN2A, BCORL1, and KMT2B.

This work discovered that several genes with mutations have significant associations with the same radiomic features. Some genes associated with the same radiomic feature are in the same regulatory pathway and work together for a regulatory cascade. We conducted enrichment analysis on 24 genes shown in Table 2 against both the Gene Ontology (GO) database40 and the KEGG database41. Figure 6 shows several genes are involved either in the same biological process described by the GO term or in the same KEGG pathway. For example, the KEGG database annotated both NOTCH1 and JUN in the same pathway, hsa01522, “Endocrine resistance”. NOTCH1 and JUN are associated with wavelet.HLH.ngtdm.Busyness and the gene product NOTCH1 promotes JNK/c-Jun activation30,31. Loss of function of either JUN or NOTCH1 can result in cells evading cellular death pathways. Gene JUN had somatic mutations in three patients, and NOTCH1 gene had somatic mutations in four patients. Only one patient had mutations in both JUN and NOTCH1 genes, and this patient had 220 somatic mutations, which is much more than the average number (51) of somatic mutations per patient in our dataset. In contrast to these alternative genes, the other genes that were associated with the same radiomic feature appear to work in different pathways, and several somatic mutations in multiple genes need to work together. For example, EGF, EPHB3, and POTEF are all associated with original.glrlm.RunVariance, but EGF works for cell proliferation and EPHB3 works for colonizing cells. In this case, SNVs in multiple genes have a high chance to occur in the same patient. The knowledge of various recurrence patterns of cancer driver genes associated with a specific phenotypic feature has the potential to guide combination therapy for cancer using multi-target medicines, which has recently garnered a great deal of attention as one of the most promising cancer-fighting tools42,43. In our novel approach, radiomic features expand the big data space that we could integrate and leverage for novel discoveries.

Figure 6
figure 6

Enrichment of significant genes in GO terms and KEGG pathways. In all significant genes, several (2–5) genes are involved in the same biological processes in the GO database (A) or the same pathways in the KEGG database (B). Y-axis shows the names of GO terms or KEGG pathways, and X-axis is the percentage of our target genes in all genes annotated by a given GO term or KEGG pathway.

Tumor heterogeneity can refer to intratumor heterogeneity, including heterogeneity of structures within a single tumor, or intertumor heterogeneity if tumors are compared among patients. The tumor structural heterogeneity can be quantified by medical images44, especially via radiomic features45,46. Radiomic features, such as tumor shape and texture features, can be used to quantify tumor heterogeneity, which gives them the potential to serve as imaging-based heterogeneity biomarkers47. Texture analysis can be used to quantify the spatial arrangement of pixel intensities within the tumor. Usually, tumor image textural features could be extracted in several different ways, such as using GLRM-based approaches. In this study, we found a significant association between pancreatic cancer driver genes, EGF, EPHB3, and POTEF, and original.glrlm.RunVariance. This association links the molecular level mechanism and their outcomes at the level of tumor structural heterogeneity. Because tumor structure and tumor structural heterogeneity are related to the patients’ overall survival, patients who had pancreatic cancer driver gene mutations with an association to a certain radiomic feature showed worse survival rates than cases without those somatic mutations. For example, we used a public cohort of pancreatic cancer in the TCGA database48 and collected the survival information of patients with somatic mutations on genes, CDKN2A, PRKG1, and BCORL1, which had a significant association with wavelet.LHL.ngtdm.Contrast from our study. Figure 7 shows the survival curves comparing the patients with somatic mutations on genes, CDKN2A, PRKG1, and BCORL1, and the other patients without. The survival time of patients with somatic mutations in these driver genes is shorter than that of the other patients (FDR-adjusted P-value = 0.0348).

Figure 7
figure 7

Survival curves for patients with different SNVs. Patients (blue) with somatic SNVs on genes, CDKN2A, PRKG1, and BCORL1, have shorter overall survival time than other pancreatic cancer patients (orange).

In this study, we explored genome-wide and radiome-wide association investigation. Screening hundreds of thousands of genetic variants across the entire genome, genome-wide association studies (GWAS) have been widely used to identify disease-specific genetic variants and use them in broad clinical and biological applications. For pancreatic cancer, previous GWAS have identified new and useful risk loci49. For this extremely lethal disease, these types of findings can be especially important owing to the complexity as well as the dynamic heterogeneity over the rapid progression of the disease. However, GWAS are expensive to conduct and require tissue samples that are not readily available. For pancreatic cancer, this is especially the case because tissue samples are usually only procured during surgery owing to the high risks associated with biopsy, and only around one-fifth of all pancreatic cancer patients are operable due to late stages at detection50. The association study approach described in this work can help link the radiome with the genome, i.e., link phenotypic radiomic information from medical images with genotypic information, therefore enabling a much broader and longitudinal genome-wide search for this highly dynamic disease.

The Pancreatic Cancer Rapid Autopsy provides a unique dataset that allows a comprehensive investigation of pancreatic cancer with a systems approach. In this proof-of-concept study, a genome-wide and radiome-wide association analysis was conducted on whole-exome sequencing data from both primary tumor and normal pancreatic tissue from the rapid autopsy. Our approach normalizes the SNVs identified on tumor tissue by those on normal tissue, i.e., the approach teases out all somatic SNVs and selects only tumor SNVs. This way, each patient acts as their own control, thereby suppressing the immense background noise and focusing only on the tumor-specific genomic signals. Potentially, similar “normalization” approaches could be applied to zoom in on the genomic changes between the primary tumor and each of the metastatic tumors, and the temporal changes of all tumors over the course of disease progression and treatments. For the former, the large tissue collection from the rapid autopsy is uniquely valuable by providing primary and all metastatic lesions as well as normal organ tissues. This radiome-genome association approach we established in this work could facilitate these investigations with relevant radiomic features. For the latter, direct genomic assessment is not possible as tissue samples cannot be collected repeatedly along the time course. On the other hand, because periodical medical imaging is already part of the cancer care routine, the relevant radiomic features can be used as surrogates to assess the longitudinal genetic changes accompanying the rapid and heterogeneous progression of this vicious disease. Together, this approach and the future investigations it enables may help decipher the mechanisms and pathways of how pancreatic cancer cells progress and respond to treatments and shed light on better treatment options.

Radiogenomics is an existing branch of radiomics51. Combining genomic data and imaging features has been shown to yield imaging biomarkers and provide valuable information for diseases, especially cancer. However, among different cancers, there is a relative paucity of radiogenomics literature for pancreatic cancer, largely due to the limited known molecular markers for this highly heterogeneous disease and the difficulty of obtaining simultaneous imaging data and genomic data at well-synchronized time points for this rapidly progressing disease. Furthermore, all existing radiogenomics studies, including those on pancreatic cancer52, focused only on known molecular markers. In contrast, our genome-wide and radiome-wide association study applies a system-wide search through large-scale radiomic and genomics data to explore novel imaging and genomic biomarkers and mechanisms. In this proof-of-concept preliminary study, radiomics and primary tumor whole genome information were correlated for pancreatic cancer patients. Using the last pancreatic CT scan of the patient, the imaging date was reasonably close to the date of death when the tissues used for genomic investigations were collected through rapid autopsy, comparable in quality to that obtained by surgical resection.

While the concept of a system-wide large-scale radiome- and genome-wide association study is innovative and the results encouraging, the study is not without limitations. First, the cohort size is rather small, with only 26 patients. A validation dataset was also not available to strengthen the robustness of our findings. We hope that our novel work and proof-of-concept findings will catalyze future development and examination of such datasets, which are crucial for advancing multiomics integration studies. Our dataset size is limited by data availability and more so by the substantial cost associated with the whole-exome sequencing of each tumor and tissue sample. This cost highlights the potential benefits of this multiomcis association approach to developing imaging surrogates for these costly large-scale screenings. At the same time, as our study applied the somatic SNVs of each patient as their own control, tumor SNVs can be identified with much higher specificity in even a small cohort. Another limitation is the slight heterogeneity of the imaging data in terms of the CT scanner model and acquisition protocol, as well as the time point of the scan relative to tissue sample collection. These variances are inevitable as the unique dataset came from the Pancreatic Rapid Autopsy Program, which is retrospective and curated carefully over more than a decade. On the other hand, coming from a single institution, the CT scanners used for this study were all from a single vendor and on a single line of models, and the imaging protocols were largely similar. The imaging dates for the cohort ranged from 1 to 224 days before patient death but were relatively synchronized with the tissue collection with imaging dates within 1 or 2 months for most patients. This timing misalignment between the genetic samples and imaging samples could potentially act as a confounding variable, especially for patients with longer time elapses between the two samples and those with more rapid progression and mutation. Our novel exploration and proof-of-concept findings will hopefully help motivate future curation of better synchronized biological and imaging data, for example, by adding clinical imaging of the deceased patient right before the autopsy. For large-scale omics research, false positives are always a potential challenge, especially when the sample size is small. In our study, we employed some strategies to minimize false positives, such as applying the gene-based burden-testing approach and selecting a stricter significance level at P-value = 10–4, which was used or suggested by other works in GWAS53,54.

Future applications of this novel methodology could include genome and radiome association studies on metastatic lesions to investigate their similarities and differences with the primary pancreatic tumor in terms of these molecular and imaging metrics. This could further validate the new approach and help us understand the complex tumor mutations that occur during the progression of pancreatic cancer. The application of the discovered imaging biomarkers to additional patients as well as the longitudinal images of these patients would be an additional investigation of value. The latter may shed light on the temporal changes of relevant genomic markers as the disease progresses and the patient responds to treatment. Currently, marker genes’ expression levels are used to identify molecular subtypes of PDAC that are widely accepted18,55. For example, the hypermethylated EGFR gene, which is associated with the radiomic feature wavelet.HHH.firstorder.Skewness, indicates the subtype of pancreatic progenitor. Therefore, it is important to explore the radiomic features concerning the molecular subtypes of PDAC in the future.

Materials and methods

Study population

Patients were included in this study from a unique database of the University of Nebraska Medical Center Pancreatic Cancer Rapid Autopsy Program. For over a decade, the program has been collecting large quantities of tumor and tissue samples from autopsies performed within hours of patient demise. In the rapid autopsy, all primary and metastatic tumors and a large number of tissue samples, such as liver, lung, spleen, and kidney, are collected under rapid conditions that produce tissue that is comparable in quality to that obtained by surgical resection. The resected autopsy samples are reviewed and annotated by at least two pathologists, in concert with lab members who conduct the autopsies. Twenty-six patients from the pancreatic cancer rapid autopsy program, with comprehensive, unique tissue sample collection and longitudinal contrast-enhanced CT images, were included in this study. All data collection was approved by the Institutional Review Boards (IRB) of the University of Nebraska Medical Center (Protocols: 728-16-EP and 127-18-EP), and all methods were performed in accordance with the relevant guidelines and regulations.

DNA isolation and whole exome sequencing (WES)

DNAs were extracted from tumor tissues and healthy tissues in the liver, kidney, etc. Illumina TruSeq DNA Exome kit was used for exon capture. Sequencing was carried out using Illumina 2 × 100 bp paired-end sequencing on a HiSeq 2500 instrument according to the manufacturer’s recommendation.

Genome-wide identification of somatic single-nucleotide variants (SNVs)

When applied to many samples of the same cancer type, the identification of the cancer driver gene can be conducted to search for multiple recurrences of somatic mutations in the same gene. With the WES data for the tumor and tumor-free organ tissues from 26 patients, tumor-specific somatic SNVs were identified with VarScan256 after the standard read preprocessing and read-mapping by BWA57. Based on the FDR-adjusted P-values calculated by VarScan2, we retained the significant somatic mutations with a cut-off of the adjusted P-value < 10–5 for subsequent analyses.

Imaging studies

For the 26 patients included in the study, varying numbers (3–30) of contrast-enhanced abdominal CT scans were acquired per standard pancreatic cancer care from diagnosis to longitudinal monitoring, using Lightspeed VCT, Lightspeed Pro 16, or Lightspeed RT16 (GE Healthcare, Boston, Massachusetts, USA). For the image acquisition, patients received ISOVUE injection with bolus triggering arterial phase imaging about 30 s and venous phase about 60 s after injection. These scans used a slice thickness of 1.5–5 mm with an in-plane resolution of 0.6–0.8 mm. For the purpose of this study, the last available CT scan prior to the patient’s death was used for radiomic analysis for the patient. This way, we could get the closest match between radiomic information from the imaging and the genomic information from the rapid autopsy.

Radiomic feature extraction

Pancreatic tumor volume-of-interest (VOI) was manually segmented by two experienced clinical investigators using a consistent window/level setting and reconciled disagreements to mitigate intra- and inter-observer uncertainty. From each tumor, VOI, 944 radiomic features were extracted using the radiomic module on 3D Slicer (version 4.10)58 and visualized using an interactive visualization platform. A resampled 2 × 2 × 2 mm3 voxel size and a bin width of 25 were used for feature extraction. The features are defined in compliance with feature definitions as described by the Imaging Biomarker Standardization Initiative (IBSI)59 and can be divided into original features (107 features), Laplacian of Gaussian features (LoG, 93 features), and wavelet features (744 features). The original features can be subdivided into 6 classes, including 14 Shape features, 18 First Order statistical features, 38 Gray Level Dependence Matrix (GLDM) features, 16 Gray Level Run Length Matrix (GLRLM) features, 16 Gray Level Size Zone Matrix (GLSZM) features, and 5 Neighboring Gray Tone Difference Matrix (NGTDM) features. The wavelet features included all except Shape features calculated on the filtered images with all 8 combinations of applying either a High or a Low pass filter in each of the three dimensions. All features are pre-selected to eliminate features unstable to respiratory motion and inter-observer contouring uncertainty60.

Genome-wide association analysis between radiomic features and somatic mutations

Based on our discovered genes with a high reoccurrence rate of somatic mutations and radiomic features from the corresponding tumor, we conducted an association study between these genes and radiomic features of CT scans from the same population. Here, we employed the gene-based burden-testing approach. For this approach, we used the number of individuals carrying variants in each gene to associate with traits in cohorts61. We applied the sequence kernel association test (SKAT)62 to test the association between each radiomic feature and somatic mutations within each gene. SKAT was originally designed to test the association between a trait and the rare variants in a genomic region and was based on a variance-component score test in a mixed-model framework. It was shown to have much higher power than many other burden tests for gene-based GWAS. We focused on the 132 genes with somatic mutations in at least three patients (the specific mutations can be different among these patients) and tested their association with 944 radiomic features. Here, we used principal component analysis (PCA) to estimate the population structure in our dataset. The population structure can be addressed by including principal components (PCs) as covariates63,64. The top two PCs of the somatic mutation matrix are used as covariates in the null model. In our dataset, the reoccurrences of individual somatic mutations are generally low due to the small sample size. Aggregating their potentially heterogeneous effects using SKAT is expected to improve the detection power. The output p-values were adjusted for multiple tests based on Benjamini–Hochberg procedure.

TCGA data and survival analysis

The R package, RTCGA, (https://rtcga.github.io/RTCGA/) and the full set of somatic mutations discovered by TCGA data for the cohort of Pancreatic Cancer from Firehose (https://gdac.broadinstitute.org/)48 were used to get mutation and survival information of 185 Pancreatic Cancer patients for survival analysis. The function, kmTCGA(), in RTCGA was used to plot Kaplan–Meier estimates of survival curves for survival data from patients with or without given mutations. The function, pairwise_survdiff(), in the R package of survminer was used to have the comparisons of multiple survival curves.

Informed consent

Informed consent was obtained from all subjects and/or their legal guardian(s). All methods were performed in accordance with the relevant guidelines and regulations.