Introduction

While patients with several different forms of cancer survive longer after diagnosis than in the past, the 5-year survival rate of patients with pancreatic ductal adenocarcinoma cancer (PDAC) has remained relatively unchanged over the past 5 decades [1]. As many as 10 % of PDACs have a hereditary component (familial PDAC, FPC), defined as a family with at least two first-degree relatives with PDAC [2, 3]. Known susceptibility genes include BRCA1, BRCA2, PALB2, ATM, STK11, PRSS1, SPINK1, and DNA mismatch repair genes, but all together these explain less than 20 % of familial pancreatic cancer cases [47]. PDAC is notoriously lethal because patients present late in the disease process and the cancers are chemorefractory. Importantly, the 9 % of cases that present with the tumor confined to the pancreas have a 5-year survival rate of 24 %, supporting the notion that lesions detected early enough can be cured [1]. To focus early detection resources, it is important to identify patients at particularly high risk, such as those with familial predispositions.

The molecular progression of SPC is well-established both histologically and molecularly [8]. The high-prevalence SPC driver genes are KRAS (>90 % of PDAC), CDKN2A/p16 (95 %), TP53 (50-75 %), and SMAD4/DPC4 (55 %) [912]. PDACs commonly arise from pancreatic intraepithelial neoplasia (PanIN) or Intraductal Papillary Mucinous Neoplasm (IPMN) precursor lesions. While the pathology of FPC has been shown not to differ from that of apparently sporadic disease, FPC patients have been shown to have significantly more precursor lesions as well as higher grade precursor lesions when compared to patients with sporadic disease [1315]. Knowing the genes involved in FPC molecular progression is essential to designing effective early detection strategies [16].

Early onset is a hallmark of most familial cancer syndromes, including hereditary breast and ovarian cancer (BRCA1, BRCA2), familial adenomatous polyposis (APC), hereditary non-polyposis colorectal cancer (MLH1, MSH2, MSH6, PMS2), and familial atypical multiple mole melanoma (FAMMM) syndrome (p16) [1720]. In contrast, an earlier age of onset is not an obvious hallmark of FPC [21, 22]. How can one possibly inherit a predisposition to a cancer without an obvious acceleration of the phenotype? This question challenges our current understanding of familial cancer syndromes and the canonical two-hit hypothesis [23, 24].

In the study, we first collated the age of onset in FPC and SPC reported in the literature to validate the general notion that the age of onset of familial and sporadic PDAC cases was similar. We then determined the status of known SPC driver genes in our own FPC cohort, a unique resource of eighteen FPC cell lines that we have generated over the past decade. We used an integrated approach including high density SNP microarrays, exomic sequencing, whole genome sequencing, and RNA-sequencing to investigate those genes involved in FPC progression. Finally, having established a consensus for each gene in each sample, we examined the ability of each tool to detect the mutations.

Materials and methods

Case selection

This study was reviewed and approved by the Institutional Review Board at Johns Hopkins Medical Institutions, and informed consent was obtained from all study participants. Familial pancreatic cancer was defined as a pancreatic cancer that arose in a proband with at least one first-degree relative with pancreatic cancer (i.e. a family with two or more affected first-degree relatives). Cancer cell lines were established from familial pancreatic cancers and matched normal DNA from the patients was obtained from Epstein–Barr virus (EBV) transformed lymphoblasts or frozen tissue [25]. Tumor-normal pair matching was confirmed by STR analysis of nine loci and Amelogenin using ABI Profiler kit (Life Technologies, Carlsbad, CA) and size-separated on an ABI CE3130xl instrument (Life Technologies). The data from 94 SPC and 7 FPC (four from discovery, three from prevalence) were previously reported [11]. An unpaired, two-tailed t test of our cohorts was used to determine if the mean age of onset difference between our familial and sporadic cases was statistically significant.

Collation of reported age of onset

Literature reporting age of onset in FPC (excluding hereditary pancreatitis) and SPC were collected from PubMed. Only the most recent study was used when multiple studies employed the same patient registry, on the assumption that previous reported families would be included in subsequent reports and therefore exclude redundant cases. Studies were stratified based on study type (population or referral) and statistic reported (mean or median).

Preparation of genomic DNA and RNA

Genomic DNA was extracted from early passage cell lines and matched normal EBV-transformed lymphoblasts or frozen normal tissue using QIAamp DNA mini kit (Qiagen, Valencia, CA), per manufacturer’s instruction. RNA was extracted from cell lines using RNeasy mini kit (Qiagen), per manufacturer’s instruction. A HPDE (human pancreatic ductal epithelium) cell line was used as a normal control for RNA-Sequencing [26].

High density SNP microarray

The Omni2.5 array (Illumina, San Diego, CA) was used to analyze cancer cell lines and matched normal samples at 2,379,855 (2.5 M) SNP loci. Analysis was carried out with Genome Studio with the following criteria: an average LogR Ratio (LRR) ≤ −2.0 for homozygous deletions (HDs); LRR of 0–0.53 and B Allele Frequency of 0 or 1 for loss of heterozygosity (LOH); and an average LRR ≥ 1.4, with at least one SNP LRR ≥ 2.0, for amplifications. At least four SNPs must fit criteria for the region to be called an alteration and boundaries were the first and last SNPs that meet criteria. Adjacent deleted or amplified regions (within 100 kb) were considered to be one alteration. Given that half or more of the p16 and SMAD4 inactivations are HDs, we excluded the 4 FPC and 81 SPC cases without SNP microarray data, in the analysis of p16 and SMAD4 genes.

Genomic DNA libraries and exomic sequencing

Genomic DNA libraries were prepared using 1 μg of genomic DNA and human exome capture was performed following a modified protocol from Agilent’s SureSelect Paired-End Version 2.0 Human Exome Kit (Agilent, Santa Clara, CA) as previously described [27]. Briefly, captured DNA libraries were sequenced with a GAIIx Genome Analyzer, yielding 150 bp (2 × 75 bp) from the final library fragments, to 200X coverage. Sequencing reads were analyzed and aligned to human genome hg18 with the Eland algorithm in CASAVA 1.7 software (Illumina). The Database of Single Nucleotide Polymorphisms was used in the analysis of whole-exome sequencing data (dbSNP). Mutations were visually confirmed in the aligned files.

Whole genome sequencing

Sequencing on an Illumina HiSeq 2000 (Illumina) was carried out at 60X coverage for cancers and 30X coverage for matched normal by Personal Genome Diagnostics (Baltimore, MD) using 3 μg of genomic DNA and generating 200 bp (2 × 100 bp paired reads) per fragment. Reads were aligned to human genome (hg19) with Eland v.2 algorithm in CASAVA 1.7 software (Illumina).

cDNA libraries and RNA sequencing

A total of 5 μg of total RNA was depleted of ribosomal RNA using ribominus and cDNA libraries were prepared using TruSeq Stranded Total RNA Sample Preparation (Illumina), as per the manufacturer’s instructions. Paired-end sequencing, resulting in 100 bp reads was carried out on an Illumina HiSeq to a level of 50 M reads. RSEM was used to align the sequences to human genome hg19 [28]. Alterations were visually confirmed using Integrated Genomics Viewer [29].

Results

Previous investigations have noted a similar age of onset of SPC and FPC. To comprehensively examine this, we culled studies reporting FPC and SPC age of onset and published from 1991 to 2013 (n = 15). To avoid overweighting the same families, we used only the most recent study when multiple studies were reported through time from the same institution or consortium. The collated studies have reported mean or median ages of 60–74 for SPC patients and 52–69 for FPC patients (Fig. 1, Supplemental Table 1). Due to potential ascertainment bias, we separated the studies that reported age of onset in a population unselected based upon family history (Fig. 1a) versus those from family registries (Fig. 1b). The mean age of PDAC diagnosis from 1973 to 2000 SEER data is 70 years [30]. In our small cohorts of FPC and SPC, there was no obvious difference in age (FPC cohort: mean 64 years (range: 42–81); SPC cohort: mean 66 years (range: 36–85)) (Table 1, Supplemental Table 2). The lower age of both of the cohorts we analyzed (FPC and SPC) compared to SEER may be attributable to ascertainment or referral bias. There appears to be a greater difference in the referral-based studies, likely because the vast majority of samples in our study underwent surgical resection. We have intentionally omitted statistical comparison of the groups because of the invalidity in comparing means and medians, and the small sample size that would exist without pooling these two statistics.

Fig. 1
figure 1

Reported age of onset for SPC and FPC, collated from the literature. Literature was separated based on a population-based or b referral cohorts, reported as means (filled symbols) or medians (empty symbols). Symbol sizes are adjusted according to the number of individuals in the study [2*log(n)]. There are no obvious differences in the age of onset for FPC (triangles) compared to SPC (squares)

Table 1 FPC cohort demographics

The molecular progression of SPC is well-documented, with common somatic alterations in the four driver genes KRAS, CDKN2A/p16, TP53, and SMAD4/DPC4, in addition to many other low prevalence genes [11, 12]. In an attempt to identify new FPC predisposition genes, we performed a comprehensive genomic analysis of our 16 FPC cell lines. No strong candidates for predisposition genes were identified in these samples. We also determined the mutational status of the four SPC driver genes in these 16 FPC samples as assessed by each of the four methods.

Overall, the prevalence of alterations in the four SPC driver genes was similar in the 16 FPC PDACs and the 94 SPC PDACs (Fig. 2a, b). Activating KRAS mutations were identified in 16/16 (100 %) of FPC PDACs, predominately at codon 12 (94 %:63 % G12D, 19 % G12V, and 13 % G12R) but with one case at codon 61 (6 %, Q61H) (Fig. 2b, Supplemental Table 3). Of the 94 sporadic PDACs, all but one had an activating KRAS mutation (99 %). The majority of KRAS mutations in the SPC PDACs were also at codon 12 (95 %:50 % G12D, 31 % G12V, and 12 % G12R). Four SPC PDACs had codon 61 mutations (3 Q61H, 1 Q61R), and one SPC PDAC had two different activating KRAS mutations (G12V and G13C).

Fig. 2
figure 2

Summarized alterations in PDAC molecular progression genes, for SPC and FPC. a PDAC molecular progression model with reported percent alterations of the four driver genes in PanIN lesions, figure modified from Iacobuzio-Donahue, et al. Clin Cancer Res 2012 [56]. b Percent alterations of molecular progression genes in PDAC cancers from SPC and FPC cohorts. As expected, the mutation prevalence in PDACs in panel b are higher than the early PanIN lesions in panel a. *CDKN2A, p = 0.04

CDKN2A/p16 was inactivated in 100 % (12/12, the four cases without SNP microarray data were excluded) by homozygous deletion (9/12, 75 %) or single base substitution with LOH (3/12, 25 %), of the FPC PDACs, compared to only 62 % (8/13, the cases without SNP microarray data were excluded) of the SPC PDACs (p = 0.04, Fig. 2b, Supplemental Table 4). Alterations of the CDKN2A gene are reported to occur in 95 % of SPC PDACs, with epigenetic silencing accounting for about 15 % of this inactivation [31]. As we did not assess epigenetic changes, the actual fraction of cases with somatically altered CDKN2A in Supplemental Table 4 is likely an underestimate.

TP53 was mutated in 88 % (14/16) of FPC PDACs, by single base substitution with LOH (10/14), frameshift with LOH (2/14), or biallelic mutation (2/14) (Fig. 2b, Supplemental Table 5). Of the 94 sporadic PDACs, 82 (87 %) had inactivating TP53 mutations. The mutation types included biallelic mutations (1/82), single base substitutions with LOH (64/82), frameshifts with LOH (15/82), and HDs (2/82).

SMAD4/DPC4 was inactivated in 75 % (9/12, the four cases without SNP microarray data were excluded) of FPC PDACs, by homozygous deletion (5/9), single base substitution with LOH (3/9), and frameshift with LOH (1/9) (Fig. 2b, Supplemental Table 6). Of the 13 sporadic PDACs (the cases without SNP microarray data were excluded), 62 % had inactivated SMAD4. The mutation types included single base substitutions with LOH (2/8), frameshifts with LOH (3/8), and HDs (3/8).

Having established a consensus gene mutation status, we retrospectively determined the ability of each genome-wide tool to detect the mutations. We first categorized the mutations as HDs, point mutations (including single base substitutions, frameshift deletions and insertions), and loss-of-heterozygosity (LOH) events. We then studied the ability of each tool to detect these three types of mutations in the four driver genes (Table 2).

Table 2 Relative power of each method to detect common alterations

Homozygous deletions are common in the tumor suppressors CDKN2A/p16 and SMAD4/DPC4 and were detected reliably by SNP microarray, whole exomic sequencing (WES), whole genome sequencing (WGS), and RNA sequencing (RNA-Seq). For only one homozygous deletion (Fig. 3a, b), the standard WGS Illumina pipeline for calling copy number alterations missed a p16 homozygous deletion (sample PA222C), clearly deleted by visual inspection of WGS data (Fig. 3c). The homozygous deletion included 17 kb of the 5′ end of p16 transcript variant 4 (NM_058195), but did not result in the deletion of any DNA sequence corresponding to transcript variants 1, 3, 5 (NM_000077, NM_058197, and NM_001195132) (Fig. 3a, b). The later transcript variants encode the p16(INK4) isoform, a CDK inhibitor, while transcript variant 4 encodes a structurally distinct p14(ARF) which stabilizes TP53 by sequestering MDM2. Both isoforms are normally expressed in the pancreas. Importantly, neither the p14ARF or p16INK4a transcripts are expressed according to the RNA-Seq data (Fig. 3e), a result of the loss of p14ARF’s first exon and p16(INK4)’s promoter sequence, respectively. This 97 kb homozygous deletion was identified by WES, RNA-Seq, and high density SNP microarray, and the deletion’s breakpoints were remarkably concordant across these methods (Fig. 3b–f). Because the WGS results were initially discordant, we used multiplex ligation-dependent probe amplification (MLPA) to confirm the homozygous deletion (Fig. 3g). That only one of the four alternative transcripts is included in the homozygous deletion explains why this was missed by the WGS using the standard Illumina pipeline. This highlights the importance of the reference transcript used in a NGS mapping algorithm and the potential utility of remapping to known deletions, such as p16 in the case of PDAC, especially at lower read depths.

Fig. 3
figure 3

CDKN2A/p16 homozygous deletion initially missed by WGS in one case (PA222C). Visualization of WGS reads for p16 gene region (hg19, chr9:21,951,176-22,102,475) in PA222C sample using IGV (Broad Institute, version 2.3.31) and Karyostudio (Illumina, version 1.4). The 2 protein isoforms of p16 (p14(ARF) and p16(INK4), blue) are shown as well as the adjacent genes, C9orf53 and CDKN2B-AS1 (a). There is clear agreement across the methods for the consensus 97 kb homozygous deletion boundaries (b). The homozygous deletion of p16 was not called by the standard Illumina pipeline for WGS data, despite a clear confirmation of the 5′ deletion by visual inspection, due to reference transcript choice (c). The homozygous deletion was detected by WES, as evidenced by the lack of reads (d). RNA-Seq produced no high quality reads that mapped to this deleted region, but there were reads upstream (MTAP) and downstream (DMRTA1) of the homozygous deletion (e line break indicates upstream or downstream reads shows). High density SNP microarray detected the homozygous deletion (red LogR line drops to −2.00 and scattered B allele frequencies) and the flanking LOH regions (red LogR line at -1.00 and B allele frequencies at 0 or 1) (f). MLPA probes were used to confirm the upstream LOH (black) and homozygous deletion (red) regions (g)

Point mutations and LOH in KRAS, p16, TP53, and SMAD4 were all detected by WES, WGS, and RNA-Seq. High density SNP microarray could of course not detect any of the point mutations in the four driver genes. However, it is likely that a custom SNP microarray could be designed to detect mutations in hotspots in KRAS. Where LOH in p16, TP53, and SMAD4 genes was detected, it was detected equally by all of the methods. In the two cases with biallelic TP53 mutations, there was no evidence of LOH, as expected.

We also investigated the mutation status of genes implicated at a lower frequency in PDAC, but reported to be mutated in cystic precursors, pancreatic neuroendocrine tumors (PanNETs), or implicated as FPC predisposition genes. MLL3 has been reported to be mutated in 9 % of PDACs [11, 32]. Here, MLL3 was mutated in 17 % (2/12) FPC PDACs. Both cases were single base substitutions (nonsense mutation with LOH in PA11X and bi-allelic missense mutations in PA18C). The genes implicated in pancreatic cystic lesions (GNAS, RNF43, CTNNB1, and VHL) were not mutated in any FPC case [33, 34]. ATRX, DAXX, and MEN1 are reported to be mutated in PanNETs, and ATRX was homozygously deleted in 1 (8 %) FPC case (PA102C) [27]. DAXX and MEN1 however, were not mutated in any FPC PDACs, and no clearly deleterious mutations were identified in ATM, STK11, PRSS1, PALB2, BRCA2, or SPINK1.

Discussion

We confirm, through our qualitative analysis of the literature, that most studies do not indicate a large difference in the age of onset between SPC and FPC. While some studies do show a slight difference in the age of onset, part of this difference could be due to ascertainment biases. Some studies have shown a slightly lower age of onset in FPC compared to SPC in their cohorts [3541]. One study even showed a slightly later onset in their FPC group [42], however most studies to date have shown a similar age of onset (Supplemental Table 1) [4352]. The field would benefit from a rigorous meta-analysis of FPC versus SPC age of onset.

Our study also showed that FPC PDACs harbor the same high prevalence genetic alterations that have been identified in SPC PDACs (Fig. 2b). One purpose of analyzing driver data prevalence is to identify “holes”, genes with lower than expected mutation prevalence, under the hypothesis that a homologue or pathway-related gene could be defective in the germline. A similar approach led to the elegant discovery of germline MYH mutations in familial colorectal cancers that were phenotypically similar to attenuated familial adenomatous polyposis, but lacked germline APC mutations [53]. From our data, there are no such “holes.” This finding confirms and builds upon a previous study that found that familial and sporadic pancreatic cancers had similar prevalence of mutations in the three SPC driver genes they assessed [31].

The late detection of pancreatic cancers contributes to the lethality of the disease. Much work has been done in the area of non-invasive early detection tests, using molecular signatures of pancreatic cancer—notably, the KRAS codon 12/13 mutation hotspot. Because FPC shares the SPC molecular signature mutations, these could be included in early detection tests and the gene panels currently in development could also be used in familial pancreatic kindreds for early detection and molecular relapse.

Assuming that FPC and SPC have a similar age of onset, how can one inherit a predisposition to a disease without accelerating its age of onset? Unfortunately, our study did not provide any great insights into this question and it remains unanswered. We note, however, that there is precedent in PDAC, even when the causative genes are known. For example, in patients with FAMMM syndrome, p16 germline mutations confer a significantly earlier age of onset for melanoma, but not for PDAC [20, 49, 54, 55]. These observations support the idea that it is the pancreatic tissue rather than the gene that is responsible for the curious lack of age dependence on the presence of hereditary predisposition genes. The mechanisms underlying this difference represent an important area for future study as it may shed light on PDAC pathogenesis in general.

We employed an integrative strategy to more comprehensively detect alterations in FPC than previous reports. Combining WES, WGS, and RNA-Seq allowed for a greater coverage of gene-coding regions, particularly in expressed genes (Table 2). The importance of gene transcript choice in identifying alterations, such as HDs, in next generation sequencing data was highlighted by the PA222C homozygous deletion of p16 initially missed by WGS analysis, but obvious upon visual inspection of the reads (Fig. 3). Other than this one example, the methods were remarkably concordant. High density SNP microarrays strengths are identifying LOH and large HDs, both hallmarks of tumor suppressor genes.

We conclude that FPC and SPC undergo similar pathogenesis permitting the same gene targets to be used for early detection and minimal residual disease testing.