The potential of clinical cancer genomics
Cancer is, at its heart, a disease of the genome. Individual tumors harbor from hundreds to hundreds of thousands of point mutations (Lawrence et al. 2013). They can have global ploidy changes or local chromosomal abnormalities that alter as much as 50% of the genome (Zack et al. 2013). They can have dozens of genomic rearrangements of various types (Yang et al. 2013). Large-scale sequencing projects like the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA) have been sequencing hundreds of tumors of different subtypes to try to create catalogs of those that are recurrent in any given cancer type (The Cancer Genome Atlas Research Network 2008; Hudson et al. 2010). These studies have led to the discovery of fundamental new properties of cancer genomes, such as mutational signatures (Alexandrov et al. 2013a,b), focal genomic abnormalities like kataegis and chromothripsis (Stephens et al. 2011), robust estimates of the distribution and number of driver genes (Lawrence et al. 2014), and classification of many tumor types into distinct subtypes (The Cancer Genome Atlas Network 2012a,b, 2015).
However, these discoveries by themselves are not sufficient to impact patient care. Rather, bringing cancer genomics into the clinic requires two separate and generally orthogonal arms. First, genomic profiles need to be mined to identify candidate genes that can be targeted by novel drugs. By targeting vulnerabilities present in a tumor and not in normal cells, it is believed that drugs can be developed with greater specificity and sensitivity. Second, genomic profiles need to be used to create novel biomarkers that can be used to diagnose disease, to predict patient survival, to predict response to treatment (e.g., companion diagnostics to novel or existing therapies), and to monitor disease relapse. This review will focus on the second problem, discussing the barriers that are limiting the routine use of genomic assays in shaping and improving the care of cancer patients.
Barriers to adoption
In many fields, a very large number of biomarkers have been developed. For example, there are at least 106 separate biomarkers to prognose localized breast cancer, and these differ significantly in accuracy, error profiles (including sensitivity-specificity trade-offs and biases in errors towards specific clinical or molecular characteristics), number of genes, ease of interpretation, and biological origin (Tofigh et al. 2014). Most of these have not moved beyond the research setting, but which might best benefit from additional validation? Which have the most potential for clinical use? Even developing the answer to one of these questions is extremely challenging, but the continual rapid development of new (and sometimes only modestly improved) biomarkers poses several major challenges.
First, because there are no gold standards used in the field for validation, it is difficult or even impossible to assess which markers are performing best. Often, validation cohorts are insufficiently independent and poorly powered, although with some noteworthy exceptions (Kratz et al. 2012). Although challenge-based assessments are just now starting to create such data sets (Margolin et al. 2013;Boutros et al. 2014b), they still remain both statistically underpowered and underrepresentative of the broad diversity of human cancer and of genomic technologies. Second, the sheer number of biomarkers being developed in some fields can directly hinder clinical application both by creating confusion and by fostering an attitude that the field is too dynamic for practical application—that waiting for future research and technological advances is the best decision. Third, commercialization of biomarkers is increasingly difficult as more biomarkers are created in a field, because it reduces barriers to entry and limits the ability of an individual biomarker to gain significant market share. Indeed, as more biomarkers are created in a field, the development and advancement of new and improved approaches can be hindered both by intellectual property restrictions from prior art and by reluctance of funders and commercialization offices to support validation studies. Fourth, the ultimate utility of a biomarker is often unknown without long-term clinical follow-up studies in multiple settings, often prospectively, leading to significant development and validation costs.
Reproducibility of analyses
Clinical application of genomic techniques requires that the resulting tests are highly accurate and highly reproducible. Reproducibility can be considered in two different ways. First, there is reproducibility in the actual genomic measurements. Reproducibility of clinical tests is standardized under regulations like CLIA and GLP. Targeted sequencing assays appear to perform very well (Tran et al. 2013), but this result is certainly driven in part by the very high depth of coverage in such assays, and because most mutation-detection algorithms have not been optimized to distinguish low-frequency events from sequencing errors, these panels can yield false negatives. Further, the error rates for sequencing-based discovery of genomic rearrangements (e.g., translocations or inversions) are much less understood than those for single-nucleotide variants. As a result, whole-genome studies—which would be necessary to measure complex phenomena like kataegis or chromothripsis, for example—are likely to be less reproducible, especially given lower coverage levels. There has been a small amount of research into quality control of genomic studies (Daley and Smith 2013; Chong et al. 2014) and almost none into how quality affects final prediction of mutations and other genomic phenomena. An elegant study by the ICGC extracted DNA once from each part of a tumor/normal pair and shipped aliquots of this sample to five large international sequencing centers. Each center sequenced and analyzed the same sample using their own protocols, and the final somatic SNV predictions were compared. Only ∼20% of mutations were common to all five centers, while one third were predicted by only a single center (Buchhalter et al. 2014). Clearly, significant work is needed to standardize global analyses of cancer genomes.
There is, similarly, significant diversity in the analysis of cancer genomic data. Even small differences in the way a data set is preprocessed and analyzed can yield massive differences in the predictions of a final biomarker, and it appears that the more complex the biomarker, the more sensitive it is to processing differences, both in terms of computational methodologies (Starmans et al. 2012; Fox et al. 2014) and sample fixation processes (Van Allen et al. 2014). However, analysis methods cannot yet be standardized because there is very little consensus in the field about the best methods for different problems. For example, several studies of microarray processing techniques have yielded discordant results (Shedden et al. 2005; Shi et al. 2005, 2006, 2010; Canales et al. 2006; Zhu et al. 2010). To understand the variability in cancer genome analysis using next-generation sequencing data, the ICGC-TCGA DREAM Somatic Mutation Calling (SMC-DNA) Challenge has been launched (Boutros et al. 2014a). This crowd-sourced challenge, along with efforts by the ICGC Pan-Cancer Project and other groups, will start to create consensus in this area over the next decade. In its first results, the SMC-DNA Challenge has shown that even on relatively simple tumors (i.e., 100% tumor cellularity, no subclonality, normal ploidy), most groups made a significant number of errors: Across 119 submissions the median F-score was 0.88 (Ewing et al. 2015).
However, biomarker reproducibility is not only challenged by the reproducibility of high-throughput assays or their analysis, but also by the inherent biology of a tumor. A series of seminal studies have used high-throughput sequencing to profile the intra-tumoral heterogeneity of kidney (Gerlinger et al. 2012, 2014;Gulati et al. 2014), prostate (Boutros et al. 2015; Cooper et al. 2015; Gundem et al. 2015), breast (Shah et al. 2012; Eirew et al. 2015), lung (de Bruin et al. 2014;Zhang et al. 2014), ovarian (Bashashati et al. 2013; Anglesio et al. 2015), and other tumors. These studies have universally shown that individual tumors are comprised of myriad cell types present at different frequencies in different spatial sites. Importantly, some of these studies have demonstrated that existing biomarkers would give distinct predictions if derived from spatially distinct regions of the tumor. While a few studies have made preliminary estimates of the number of biopsy specimens needed to yield robust conclusions in the face of intra-tumoral heterogeneity (Bachtiary et al. 2006), it remains unclear exactly how biomarkers should be handled in general. For example, should multiple regions be tested and the prediction of the most adverse clinical outcome (e.g., highest drug resistance or shortest survival) used? The average across multiple regions? Should biomarkers focus on clonal driver mutations and, if so, how should variation in the frequencies of truncal mutations be handled (Shah et al. 2009)? Entirely new computational methods may be needed that directly account for intra-tumoral heterogeneity. Indeed, it has been reported that, for poorly understood reasons, some tumors are fundamentally more difficult to develop robust biomarkers for or to make accurate predictions on (Tofigh et al. 2014).
Defining complex phenomena
Some recently uncovered genomic abnormalities are highly complex. For example, several groups have recently shown that genomic instability is a robust biomarker for several tumor types (Vollan et al. 2015). However, there are many potential proxies for genomic instability for use in biomarker studies: number of copy number aberrations (CNAs), the fraction of the genome altered by a CNA, the number of genes showing a CNA, and so forth. Other genomic alterations in cancer are so complex that no real definition exists. Chromothripsis, for example, is generally described as a chromosome “shattering” event where a single chromosome acquires a large number of mutations of different types (Stephens et al. 2011). There is no singular definition of chromothripsis and even only a few operational ways of identifying it (Lapuk et al. 2012; Govind et al. 2014). Similarly, there is not yet a standard library of mutational signatures or standard algorithms to call them uniformly across data sets. The same is true for localized hypermutation at the point-mutation level such as kataegis (Alexandrov et al. 2013a) or for “complex” multichromosomal genomic rearrangements (Berger et al. 2011; Baca et al. 2013). Nevertheless, there is already evidence that global mutation burden can be prognostic in multiple tumor types (Lalonde et al. 2014;Vollan et al. 2015) and that trinucleotide signatures and mutation burden may be predictive of response to targeted therapies (Rizvi et al. 2015), making reproducible measurement critical. This problem is only going to be exacerbated as new methods (Ha et al. 2014; Oesper et al. 2014; Roth et al. 2014; Deshwar et al. 2015) and better understanding of the diversity of cells within a tumor and their evolution (Navin et al. 2011; Wang et al. 2014; Eirew et al. 2015) start creating population-level features that can be used in biomarker analysis. The next decade will likely see the rise of biomarkers based on nebulous terms such as “subclone number,” “total genetic diversity,” and “tumor heterogeneity index” that will be challenging to define and reproduce, but that will reflect a key aspect of tumor biology with significant predictive potential.
The current round of “compendium” cancer genomic studies thus identify a large number of interesting features that can potentially serve as biomarkers, but these have not yet been defined well enough to serve as components of clinical diagnostics.
Integrating multiple levels of data
Interrogation of any single type of genomic data may provide limited predictive accuracy: Several groups have tested large numbers of random biomarkers to evaluate the probable upper limit of prediction accuracies (Boutros et al. 2009;Starmans et al. 2011; Venet et al. 2011). In several cases, these limits have been surprisingly low. In a recent study, KRAS mutation status could only be predicted, at most, with ∼75% accuracy from mRNA abundances (Starmans et al. 2015). Thus, while almost all well-validated genomic tests exploit data of a single class (e.g., copy number aberrations, mRNA abundances, etc.), it is hypothesized that incorporating multiple types of genomic data will improve biomarker accuracy. For example, in the same status of KRAS, it was shown that different prognostic mRNA signatures were optimal in KRAS mutant and KRAS wild-type lung cancers, highlighting the synergy of combining somatic SNV and tumor mRNA abundance information into a composite biomarker (Starmans et al. 2015).
There are not yet any examples of biomarkers that predict clinically relevant endpoints based on simultaneous analysis of methylation levels, specific copy number aberrations or point mutations (both germline and somatic), mRNA abundances, and specific splice-isoform presence or absence. The algorithms required to create such complex biomarkers are now in development (Gonzalez-Perez et al. 2013; Creixell et al. 2015) but are necessarily very complex to develop and require harmonized, multimodal data sets with deep clinical information for both training and testing. Such data sets are not yet broadly available, although some groups have sought to mine TCGA data, despite its somewhat limited clinical follow-up (Yuan et al. 2014), and the METABRIC consortium has profiled miRNA, mRNA, germline SNPs, and somatic copy number aberrations on a coherent set of samples (Curtis et al. 2012; Dvinge et al. 2013). There will be an urgent need for standard data sets to be generated and used for groups to test methods for creating multimodal signatures. There will also be significant challenges in bringing such markers to clinical use, because clinical specimens—particularly those derived from patient biopsies—may not yield sufficient quantity or quality of analytes for simultaneous measurement of all desired biomolecule types. As a result, algorithms will need to be capable of handling missing entire data types, such as when high-quality DNA-based measurements are available but RNA-based ones are not.
However, interrogation of multiple levels of data goes beyond different types of -omic data. For example, several groups have shown that there is significant biomarker content present in the stroma surrounding a tumor (Finak et al. 2008;Hoshida et al. 2008). Others have demonstrated synergy between genomic measurements and tumor microenvironmental factors like hypoxia (Lalonde et al. 2014). A major research direction moving forward will be the integration of clinical imaging data with genomic studies both through the emergent field of “radiomics” (Aerts et al. 2014) and by exploiting standard pathology images (Yuan et al. 2012). These data types may be generally available on a large fraction of patients, but again, algorithms will be required that can handle missing data types.
Pharmaco-economics of genomic tests
A genomic biomarker may have good accuracy and reproducibility across a range of independent validation data sets. To reach routine adoption, however, it must also guide clinical decision making in a way that is demonstrably and economically efficient for the funders of a healthcare system. That is, one needs to determine if applying a biomarker to specific clinical subgroup will be financially efficient. Consider the use of prostate-specific antigen (PSA) as a population-screening tool to diagnose prostate cancer. Although there is some controversy about the statistical modeling, even conservative estimates suggest that >1250 individuals must be screened and >40 treated to save one life (Loeb et al. 2011). Thus there are many biomarkers that are statistically superior to random chance, but may not be beneficial for the health-care system as a whole. There are many ways of assessing the financial efficiency of a biomarker, although the number of quality-adjusted life years saved per dollar spent (QALY/$) is often used in formal modeling exercises. There are only a limited number of pharmaco-economic studies for genomic biomarkers to date. It is likely, moving forward, that the pharmaco-economic modeling will be built directly into modeling activities: For example, the cost functions in machine-learning exercises can be modeled explicitly based on the financial benefits or costs of different types of errors or successes.
Even if a biomarker is demonstrated to be accurate and economic, this is not always sufficient to guarantee its routine use; that requires adoption and interpretation by clinicians and patients. The development of biomarkers from large genomic data can occur in several ways. Many times a specific drug target is its own biomarker, as with levels of ERBB2 (i.e., HER2) predicting a response to Herceptin or presence of BCR-ABL1 predicting sensitivity to Gleevec. In these cases, the same molecule serves as both biomarker and target. However, single-molecule biomarkers are widely used in many clinical contexts outside of predicting response to treatment. For example, single-molecule biomarkers are widely used to predict prognosis or monitor disease relapse, as in the routine measurement of serum levels of PSA in prostate cancer patients.
Single-gene markers have the immense advantage of simplicity, both in terms of genomics interrogation and in terms of data analysis. However, the biology of a tumor can be extremely complex, especially when considering endpoints like prognosis: No single molecule can fully capture all the determinants of the processes of tumor initiation, progression, or metastasis. Indeed, classically 6–10 distinct molecular or biochemical functions have been identified as associated with these processes (Hanahan and Weinberg 2000, 2011). As a result, modern biomarkers are being developed using statistical and machine-learning techniques, often under the rubric of “data science” or “big-data analysis.” These types of analytical approaches can either be agnostic to the underlying biology or can incorporate domain knowledge such as known pathways (Vaske et al. 2010) or protein complexes (Leiserson et al. 2015), types of information flow between biomolecules, or other types of biological information (Wu et al. 2010).
Independent of whether or not domain knowledge is used, these complex models can use tens to thousands of genes, transcripts, or proteins (Monzon et al. 2009). To better reflect the nonlinearities of biological pathways, this large number of genes is often weighted using mathematical models such as support vector machines, random forests, and network models. Despite the potential greater predictive accuracy introduced by the better fit between true biology and these types of mathematical models, another challenge is introduced: that of interpretability. Patients and their caregivers need to be ready to interpret the results of genomic tests. When these tests involve complex multigene models or sophisticated statistical terminology, that communication can be challenging and can limit uptake.
At least four major changes are likely to occur in this area over the next decade. First, new generations of clinicians are much more familiar with and better trained in genomic techniques, which will facilitate interpretation of final models. Second, patients will become more comfortable with genomics and genomic techniques and be more capable of conversing with their clinicians in this area. Third, standardization of genomic approaches across multiple areas of medicine will create more familiarity and consistency. Fourth, ongoing work by many groups in visualization and communication will provide technical solutions.
The path forward
At times it seems inevitable to those doing genomic research that multimodal -omic biomarkers will become prevalent in routine clinical practice over the next 25 years. However, the path to move from current targeted sequencing panels of specific, carefully selected point mutations to genome-wide assays at multiple levels is unclear. It will require significant advances in genomics and computational biology. The seminal paper demonstrating that gene expression can predict outcome in breast cancer was published 13 years ago (van’t Veer et al. 2002), and in the intervening time, few other -omic clinical diagnostics have reached routine clinical practice. In part, this is a function of incomplete clinical annotation of many cohorts with genomic data, particularly with regard to long-term outcomes and response to treatment. This will change as the raw data sets underpinning biomarker discovery and application improve, with more consistent genomic data, better access to and sharing of clinical trial-linked data (as proposed in the next iteration of the ICGC), challenge-based methods assessments, and more frequent assessment of spatial heterogeneity within a tumor. These changes in the raw data will be complemented by improvements in data analysis, particularly in handling heterogeneity, incorporating prior biological knowledge, and in scoring large-scale genomic phenomena. Finally, these improvements in genomics and computational biology will reach their full potential as large numbers of new, targeted therapies continue to be developed, providing the clinical need to drive the development and application of genomic biomarkers for the cancer clinic.