Keywords

Introduction

It is well established that lung cancer is an aggressive disease and it remains the leading cause of cancer-related deaths. Worldwide, there are more than 1.8 million new lung cancer cases diagnosed annually and over 1.5 million lung cancer-related deaths [1]. Morphologically, lung cancer is subdivided into two main types: non-small-cell lung cancer (NSCLC) which accounts for the majority 85% of new diagnosis and small-cell lung cancer which accounts for the minority 15% [2]. While NSCLC cases represent the majority of new diagnosis, this group can be further subdivided by morphologic and immunotypic methods into squamous cell carcinoma (SCC), adenocarcinoma (ADC), or large-cell lung carcinoma [3]. If all subtypes and clinical stages of lung cancer are combined, only 16% of patients achieve a benchmark of 5-year survival, which is largely due to the late stage (advanced disease progression) at the time of initial diagnosis [4]. In cases detected at an early stage (still localized), the 5-year survival rate is greatly increased to approximately 53% [5]. In an effort to improve early disease detection, new National Comprehensive Cancer Network (NCCN) guidelines recommend increasing low-dose computerized tomography (CT) screening. Furthermore, potential new diagnostic assays such as the automated three-dimensional morphologic analysis of epithelial cells in sputum (LuCED lung test ) hopefully will increase the detection of early stage lung cancers [6, 7].

In addition to early detection, advances in understanding lung tumor biology and genomics are aiding in discovering new effective treatment solutions. Understanding the mechanisms and pathways that drive oncogenesis has directly led to the discovery of two predictive biomarkers in lung cancer: (1) epidermal growth factor receptor (EGFR) and (2) anaplastic lymphoma kinase (ALK ) [8]. For EGFR , multiple clinically significant alterations are known to occur in exons 18–21. Depending on the specific EGFR mutation, selection of a specific targeted therapy with sensitivity for that mutation can be determined (Fig. 10.1). One technology leading the path for new biomarker discovery and identification of driver pathways in lung cancer is next-generation sequencing (NGS). In the clinic, NGS technology is playing an essential role in interrogating large numbers of patients and screening vast portions of the genome in the search for altered genetic pathways and driver alterations in lung cancer. It is hopeful that NGS-based techniques, paired with prospective clinical trials, will expand our lung cancer biomarker knowledge and biomarker menu. Currently, only a limited set of biomarkers are routinely utilized in lung cancer clinical screening for targeted therapy selection (EGFR, ALK, and ROS1). In the following section, we will explore NGS with a focus on its clinical utility/benefits and variety of methodologies for addressing specific clinical questions and discuss barriers to widespread clinical adoption.

figure 1

Fig. 10.1 Correlation of EGFR mutations and predicted TKI response

Next-Generation Sequencing (NGS) Background

The clinical use of NGS has significant benefits for diagnostic biomarker discovery and clinical screening capacity compared to traditional molecular assays such as single-gene Sanger-based sequencing (also referred to as first-generation sequencing) (Table 10.1). For clarification, the terminology “next”-generation sequencing or “NGS” refers to sequencing methodologies other than the traditional first-generation Sanger di-deoxy sequencing. NGS broadly encompasses both currently utilized methods referred to as “second-generation sequencing” and new advancements in sequencing known as “third-generation sequencing” technologies. While debate over the exact categories for second- and third-generation sequencing Next-generation sequencing (NGS) exist, in general, second-generation sequencing represents methods that amplify DNA via emulsion PCR (e.g., Ion Torrent) or solid-phase amplification (e.g., Illumina). These methods are in contrast with third-generation sequencing which is performed utilizing non-amplified, single molecules (e.g., Pacific Biosciences and Oxford Nanopore). Regardless of classification as “second-” or “third”-generation sequencing, both methods are encompassed in the term “next-generation sequencing” in which the term “next” refers to any non-Sanger-based sequencing methodology.

Table 10.1 Summary of NGS testing benefits and potential barriers to clinical adoption

The utility of NGS (second and third generation) over that of Sanger is the ability to perform massive parallel sequencing. In essence, massive parallel sequencing involves interrogation of numerous samples and numerous alterations with speed and accuracy. Ultimately, this results in higher throughput which reduces cost per sample. NGS is also highly flexible with specific applications that can be tailored to the clinical question [9]. The clinical use of NGS has fundamentally improved our understanding of lung cancer biology and has led to revolutionizing clinical molecular diagnostic testing. As diagnostic lung tissue is often limited, NGS allows interrogation of numerous targets with limited sample input secondary to its ultralow sample input requirement. It is also capable of detecting mutations below 15% mutant allele frequency (compared to Sanger which requires 15–25% mutant allele frequency) [10].

Input material for NGS can be either DNA or RNA. Multiple sequencing methods exist and include whole-genome sequencing (WGS for DNA), whole-exome sequencing (WES for DNA), whole-transcriptome sequencing (RNA-Seq for RNA), and targeted sequencing (TS either DNA or RNA). Each method WGS, WES, RNA-Seq, or TS has specific strengths and weaknesses. In general, DNA-based methods identify small base pair alterations, insertion/deletions, as well as potential copy number changes. One significant difference between WGS, WES, and TS is depth of sequencing reads generated per target, which is higher for TS assays which focus on a selection of targets that typically represent a small fraction of the exome or genome. For instance, in lung cancer, TS-based NGS assays could focus on known genomic alterations in key biomarkers. RNA-based sequencing is utilized for detection of alternative gene-spliced transcripts, posttranscriptional modifications, gene fusion, mutations/single-nucleotide polymorphisms, small and long noncoding RNAs, or changes in gene expression. These methods will be explored and described in more detail below.

NGS Methodology

Whole-Genome Sequencing (WGS)

Currently, WGS represents one of the highest cost NGS methods and is not routinely utilized in routine clinical screening or monitoring of lung cancer. However, like most technologies, the cost of WGS is declining with improvements in NGS technologies [9]. WGS can detect a wide range of genomic alterations, including known disease-associated and novel variants, a feature that makes this technique well suited for research. Barriers to routine clinical lung cancer screening include the cost, the large volume of data produced, and necessary expertise/tools for data mining. Data analysis is a significant challenge for WGS, and streamlined process needs to be generated for this method to fulfill the gaps needed in personalized medicine [11]. Clinical strengths of WGS include the ability to determine breakpoints in balanced chromosome translocations and inversions and detecting genomic alterations outside of coding regions [12]. WGS allows full interrogation of promoters, enhancers, introns, noncoding RNAs (i.e., miRNAs), and unannotated regions [13, 14]. This full view of the genomic landscape is well suited for research applications or driver pathway discovery where a comprehensive profile of point mutations, complex rearrangements, indels, and copy number alterations is required [12]. For example, The Cancer Genome Atlas (TCGA) Research Network utilized WGS for lung adenocarcinomas and identified 25 significantly mutant genes, including both known mutations, TP53 (50%), KRAS (27%), EGFR (17%), STK11 (15%), KEAP1 (12%), ATM, NF1 (11%), BRAF (8%), and SMAD4 (3%), and unknown (never previously reported) mutations, SMARCA4, ARID1A, RBM10, SETD2, PICK3CA, CBL, FBXW7, PPP2R1A, RB1, CTNNB1, U2AF1, KIAA0427, PTEN, BRD3, FGFR3, and GOPC [15]. The trade-off for such complete genomic landscape analysis is low sequencing coverage. This one feature greatly limits the clinical application for routine lung cancer screening. WGS coverages vary depending on methodology but on average are below 100-fold, whereas targeted sequencing assays routinely achieve greater than 1000-fold coverage. Fold coverage is directly correlated with ability to identify tumors with low mutation burden, which is especially problematic in tumors that are not clearly separated from non-tumor stroma (dilutes mutant allele burden) [10].

RNA Sequencing (RNA-Seq)

RNA-Seq is a specialized form of NGS which can be utilized to interrogate the lung cancer transcriptome (represents up to ~4% of the human genome) [16]. Following the central dogma of molecular biology, DNA is transcribed to messenger RNA (mRNA), and mRNA is translated into protein. While the human genome contains approximately 25,000 genes, not all genes will be transcribed and translated into protein. Moreover, not every coded gene will be transcribed in proper order due to alternate splicing. Therefore, sequencing RNA (specifically mRNA) allows one to address questions including what genes are being expressed and at what level of expression. RNA-Seq can generate a comprehensive profile of the complete transcriptome or be utilized for a more focused targeted sequencing application. RNA-Seq as a method allows mapping the boundaries of exons and introns for identification of splice variants, identification of gene translocations, posttranscriptional modification, mutations, and noncoding of miRNAs [9, 12, 17]. It also offers a highly sensitive assay for quantification of the abundance of a transcript, even higher than comparative microarray technology [18]. While RNA-Seq offers several options not available by DNA-based NGS, it has its own inherent challenges which include library construction (inherently more difficult due to labile RNA molecule), data mining (high number of low abundant transcripts—potential false-positive calls), and obtaining complete transcript coverage [19].

Whole-Exome Sequencing (WES)

WES is utilized to specifically sequence the coding exons (~2.5% of the human genome) or the portion of genes that form the template for mRNA and successive protein production. This methodology specifically ignores noncoding regions such as promoters, enhancers, introns, and noncoding RNAs. Elimination of sequencing in these regions decreases the number of sequencing targets and thereby allows for improved fold coverage. WES focus solely on coding exons in annotated genes and therefore only allows variant detection in known coding genes. WES can be designed to also include sequencing of selected or limited regions of noncoding DNA regions which include exon-flanking regions and potentially select miRNAs. Similar to WGS, the amount of sequencing data can be extensive for each sample and the number of total detected variants by WES can be high (20,000–30,000 range) depending on tumor sample and NGS methods/bioinformatics utilized. This large number of variants makes detecting actionable activating mutations a challenge. While more focused than WGS, application of WES to lung carcinoma is still currently best suited for research rather than routine clinical practice. Improvements in NGS such as decreased cost, faster analysis time, increased coverage, and improved accuracy could drive increased adoption of WES into routine clinical practice [10].

Targeted Sequencing (TS)

TS represents the most clinically utilized current NGS assay for lung cancer diagnostic testing. This method focuses specifically on interrogation of known genomic regions of interest. TS limits the sequencing to a small number of targeted regions, ultimately decreasing the amount of sequencing time and data generated, while also making the assay highly cost-effective by increasing the number of samples that can be analyzed simultaneously (multiplexed). Limiting TS to known cancer-relevant alterations makes this assay highly suited for clinical use which requires detecting known alterations such as point mutations and deletions in EGFR or even translocations in ALK or ROS1. However, being so highly targeted, this method may miss variants that are present but not located in regions interrogated by the assay. The adoption of TS via NGS into clinical practice for lung cancer has resulted in the availability of a highly sensitive method for detecting actionable alterations in lung cancer specimens [20,21,22]. A recent report showed NGS-based TS was able to identify EGFR/KRAS/ALK alterations in up to 58% of patients that were called wild type by standard testing, which translated into improved opportunities for therapeutic intervention [23]. Since most NSCLCs are detected once locally advanced and/or inoperable tumors, often only fine needle aspirate (FNA) cytology samples of mets are available for molecular testing. FNA tumor cell content may be very limited and therefore testing by traditional Sanger sequencing would not be possible. However, TS via NGS can utilize nanogram quantities of DNA, and FNA/cytology samples have been shown to be sufficient for TS NGS analysis [24,25,26].

NGS Translocation Detection

Currently, the list of routinely tested and actionable translocation s specific for lung cancer is small and includes ALK, RET, and ROS1. Other kinase gene fusions have been detected by NGS from isolated lung adenocarcinoma DNA and RNA and include MPRIP-NTRK1, AXL-MBIP, SCAf11-PDGFRA, and EZR-ERBB4 [27,28,29]. Regardless of molecular methodology utilized for detection, accurate identification of translocations can be challenging. Utilizing in situ hybridization (ISH) is the current gold standard, but immunohistochemistry (IHC) is often performed as it offers a faster and less burdensome screening/detection methodology. However, IHC does not actually identify the translocation; rather, it identifies overexpression of a protein that occurs secondary to the translocation. Therefore, the IHC approach is applicable for ALK which lacks endogenous expression in the lung, but is not a viable option for identification of RET translocations due to endogenous RET expression [30] and potentially not useful for ROS1 due to false-positive staining and poor correlation with FISH [31]. Unlike ISH and IHC options, NGS can be applied to identify both known and de novo translocations. In addition, NGS allows the simultaneous screening of actionable gene fusions in a single assay with high specificity and low input requirements (sample preservation). The inherent difficulty in identifying translocations via NGS is the high variability of translocation partners and breakpoints along with low incidence of translocations in lung cancer. While the canonical EML4–ALK fusion consists of EML4 exons 1–13 fused to ALK exons 20–29, over 20 different ALK translocation partners have been identified [32]. NGS is gaining clinical utilization for translocation detection in lung carcinoma due to its comprehensive screening of multiple low incidence translocations, paired with high sensitivity for detection, rapid assay run time, and lower cost compared to single assay/single translocation testing options such as ISH [33]. Ultimately, the goal of utilizing NGS for translocation detection is to properly and rapidly stratify patients to the proper best personalized targeted therapy (sunitinib, sorafenib, or vandetanib) [28, 34].

NGS Utilizing Liquid Biopsy

The overarching trend in molecular diagnostics is to do more with less. NGS is perfectly suited for this task, as very little material is required for testing and the methodology is flexible to allow full mutation profiling or translocation screening. However, this is only applicable when tissue or cytology samples are available, which is not the case for routine follow-up or disease management. In these cases, often minimally invasive blood draws (liquid biopsies) are performed. Recently, much interest is focused on nucleic acid isolation from liquid biopsies via capturing rare circulating tumor cells (CTCs) or cell-free DNA (CF-DNA) . A detailed discussion on the advantages and disadvantages of CTCs vs. CF-DNA is outside the scope of this article; however, a good summary was recently published [35]. Both CTC and CF-DNA have been successfully applied to capture starting material for clinical NGS testing. CTCs ) have already shown utility for NGS-based EGFR mutation testing, with one study showing an 84% match in CTC EGFR mutation profile compared to tissue biopsy and in addition multiple EGFR mutations were identified demonstrating the possibility of detecting tumor heterogeneity [36]. Likewise, CF-DNA has been successfully utilized for NGS-based lung cancer diagnostic testing for both general mutation screening and focused identification of acquired tyrosine kinase inhibitor (TKI) resistance EGFR mutations [37, 38]. The difficulty with ) CTC or CF-DNA applications is the very limited amount of DNA and the mixture of genomic and tumor nucleic acid. To overcome these challenges, NGS methodologies have been developed such as Tagged Amplicon Deep Sequencing (TAm-Seq) , Safe Sequencing System (Safe-SeqS) , and Cancer Personalized Profiling by deep sequencing (CAPP-seq) which have demonstrated up to 92% sensitivity and >99.99% specificity for EGFR mutation detection at the variant level [39,40,41,42]. These novel NGS methods improve the sensitivity of standard NGS by performing highly targeted hybrid capture, high-throughput deep sequencing, and utilizing bioinformatic tools to remove artifacts and discover rare mutations and potentially translocations [43].

Barriers to Adoption of Clinical NGS for Lung Cancer

While NGS has gained widespread use as a research tool, it has only been in the last few years that it has started to gain acceptance and utilization in the highly regulated clinical CAP/CLIA laboratory-based environment. Several barriers exist for widespread clinical adoption including cost, rapid technology change, lack of regulatory guidance, and complex bioinformatic data interpretation challenges (Table 10.1). These items will be discussed in detail below.

Cost of Clinical NGS Testing

Like most new technologies, NGS instrumentation and reagents can represent a high-cost burden for labs interested in undertaking the task of starting NGS testing. Instrument prices vary from sub-100,000 US dollar benchtop sequences to over 1,000,000 US dollars for high-throughput instrumentation. On top of instrument capital purchase cost, there is an annual service contract (price is highly variable). There are also costs for reagents, assay validations, personnel, and data analysis. NGS has a high upfront and operation cost relative to other molecular diagnostic equipment such as real-time PCR or Sanger-based assays. Cost can be greatly minimized per sample or test by the high degree of multiplexing that is capable, but lab volume and in-house expertise should be considered before initiating a NGS sequencing assay in the clinical setting. An additional variable that should be considered is the amount of testing reimbursement that will be generated by NGS testing. Current Procedural Terminology (CPT) codes are continually updated and in 2017 CPT codes for NGS-based testing exist [44]. However, the rate of successful reimbursement and the amount of reimbursement can be highly variable depending on geographic location and payer. This uncertainty in financial return is a direct barrier to widespread clinical adoption.

Guidelines

Although NGS is extensively used for research , its application in clinical practice has not been fully realized in part due to the lack of formalized validation and testing guidelines. NGS testing, while a promising method for lung cancer screening, is still a relatively new technology, and therefore, standards for validation in the CAP/CLIA lab are not well established. In addition, the regulation of laboratory-developed tests (LDT ) in general has been a major unanswered question. The Food and Drug Administration (FDA) issued draft guidance in 2014 outlining new enforcement of testing regulation specifically targeting LDTs [45]. Based on this draft guidance, it was unclear what regulation NGS-based LDT testing would follow. However, recently, the FDA released a white paper that stated it would not issue a final guidance on the oversight of LDT [46]. This publication has largely cleared the way for NGS assay validations to move forward and fall under the regulatory guidance of CAP inspections and inclusion in proficiency testing, similar to other high complexity assays performed in the CAP/CLIA clinical laboratory setting. Moving forward, NGS still represents a unique validation challenge for molecular diagnostic labs, while well acquainted with running DNA-/RNA-based assays “wet lab,” it is the post-run analytical component of NGS that is difficult to validate, due in part to the novelty of NGS analytic tools and novel skills required for NGS bioinformatic data. In addition, NGS analysis requires a multistep “pipeline” method for processing data in which small deviations in assay design or post-sequencing analytic processing (filters) can impose any number of potential downstream errors. Despite these challenges , NGS is still being adopted in academic and private hospitals and has proven to be a profitable entity for commercial companies [47].

Bioinformatics

An in-depth exploration of bioinformatic approaches utilized in clinical NGS analysis is outside the scope of this chapter. However, it is worth mentioning that the large-scale data produced by NGS is a significant obstacle to adoption of clinical-based NGS assays [48]. To identify variants from NGS data, often multiple software packages need to be stitched together into a data analysis pipeline. These programs include sequence aligners, variant callers, and variant annotation. Each software component allows modification of multiple variables that can be user altered to allow highly customized workflows but at the price of decreased standardization and ability to perform quality assessments between labs. Most “pipelines” will consist of a sequence aligner (maps sequencing reads to a reference genome), variant caller (identifies variant sites), and variant annotation (links variant calls to database with annotated lists of clinical variants such as Catalog for Somatic Mutations in Cancer (COSMIC) [49]. The performances of different aligners have been extensively studied and each has pros/cons, making adoption of a uniform analysis pipeline unlikely [50, 51]. One common tool that offers a good introduction to NGS data analysis is the Genome Analysis Toolkit (GATK; Broad Institute, Cambridge, MA, USA) [49]. This toolkit allows multiple standardized forms of NGS analysis and has well-documented instructions for users.

In addition to software standardization, an additional hurdle with clinical NGS is the sheer volume of data produced. This can cause analytic bottlenecks, even with recent advances and lowered costs of processing power. An additional potential problem is long-term data storage in a CAP-/CLIA-approved manner. Types of files or recommended length of storage for NGS has not been standardized at this time. Lastly, how labs are reimbursed for complex NGS analysis and even how labs integrate NGS data into electronic health records is highly variable, with no set standardization or national guidance [52,53,54].

Conclusion

There is clear evidence that NGS can accurately identify clinically significant biomarkers for lung cancer, such as EGFR mutations and ALK rearrangements, and that this technology can help guide personalized targeted therapies [55]. NGS performed on lung biopsies or cytology specimens can identify both established and emerging biomarkers depending on selected targets for sequencing and analysis [8, 56]. Likewise, NGS performed on CTCs or CF-DNA can be utilized to identify biomarkers for guided therapy or follow patients for monitoring development of tyrosine kinase inhibitor (TKI) resistance (such as EGFR T790M). Ultimately, the significance of applying clinical NGS to lung cancer screening is its ability to simultaneously interrogate numerous biomarkers and rapidly/accurately direct patients to an approved efficacious targeted treatment. Ongoing exploratory research utilizing NGS will undoubtedly translate into discovery and validation of novel predictive biomarkers, which will ultimately translate into clinical NGS practice and improve lung cancer diagnosis and treatment. Furthermore, it is expected that NGS technology will continue to advance at an accelerated rate and that the tangible outcome of this will be the improvement in our understanding of causal genetic mutations/alterations in lung cancer and continued improvement in lung cancer treatment and outcome.