Implications for Cardiovascular Medicine
Dramatic progress has been made in the technologies available to assess global alterations in mRNA levels in both clinical and research samples. Through commercial services and institutional core laboratories, these technologies are increasingly accessible to individual investigators. Although such transcript profiling can provide a powerful research tool, the broad range of options can be bewildering for the uninitiated and more often than not the limitations and pitfalls of this approach are not fully appreciated. Moreover, consensus standards for data collection, analysis, and validation have yet to emerge. It is important to recognize that the goals of transcript profiling experiments can be quite diverse. These goals range from hypothesis generation and identification of novel therapeutic targets to delineation of complex patterns of gene expression that provide a potentially pathognomonic molecular phenotype. We will first present a practical review of the commonly used approaches for data collection and analysis, and discuss possible standards for contextual validation. We will then examine the clinical and scientific applications of DNA microarray technology with an emphasis on important limitations and implications for cardiovascular medicine.
Accurate diagnosis and effective treatment of disease relies on our ability to recognize recurring constellations of clinical signs and symptoms that permit meaningful classification of these diseases. Unfortunately, in many instances, clinical signs and symptoms or even tissue pathology are poor predictors of clinical outcome or response to therapy. Because the genes expressed ultimately determine biological behavior, transcript profiling of diseased tissues—and its correlation with clinical endpoints—may provide insights into disease mechanisms and identify novel candidates for therapeutic intervention. Transcript profiling may also identify markers immediately useful for diagnostic and prognostic purposes. Because thousands of transcripts are simultaneously and quantitatively analyzed on microarrays, this technology provides a resolution and precision not previously possible. Therefore, molecular phenotyping of diseases through transcript profiling may provide diagnostic, prognostic, and mechanistic insights that improve management of human disease. Microarray analysis has also had a dramatic impact in basic science laboratories, providing an efficient way to globally assess the transcriptional effects of specific genetic1,2⇓ or pharmacological3 interventions, thus rapidly identifying possible downstream effectors or interacting pathways. In this context, as well, the shear number of transcripts screened in an unbiased fashion is a great strength of this approach, substantially enhancing the likelihood of discovering biologically important and previously unappreciated connections. However, in both settings, the limitations of this approach and important technical caveats must be kept in mind.
What Is a DNA Microarray?
Matrix arrays containing multiple DNA sequences were first developed early in the Human Genome Project to automate DNA sequencing through sequencing by hybridization. Shortly thereafter, photolithographic techniques4 enabled Affymetrix to build the first high-density DNA array in 1994. Smaller arrays containing ever-increasing numbers of DNA sequences called DNA microarrays or chips evolved from these early efforts. DNA microarrays have recently been used primarily to quantify mRNA expression or determine single nucleotide polymorphisms. Current microarrays encode up to ≈20 000 genes (Table 1). Given the estimated ≈30 to 70 000 genes in the human genome,5 microarrays encoding all the expressed genes of a particular organism will likely be available in the foreseeable future, thus enabling a truly comprehensive analysis of transcription.
Most microarrays consist of a solid support, usually a glass slide or nylon membrane, onto which DNA sequences are attached. DNA is either spotted, using pins or an ink-jet printer, or synthesized directly on the array using PCR or photolithography. The DNA may be either double-stranded copies of transcripts or shorter single-stranded oligonucleotides. For microarray analysis, RNA is first extracted from a sample. If the RNA yield is small, expressed sequences may be amplified. Although linear amplification of reasonable fidelity is generally possible, it is important to remember that this could introduce artifactual distortion of the original expression pattern. Subsequently, the RNA itself, complementary DNA, or amplified RNA is labeled using fluorescence or radioactivity. The labeled probe is hybridized, competitively or noncompetitively, to the microarray. Complementary sequences remain bound to the array and unbound sequences are washed off. Expressed genes are identified by the position of bound probes on the array (Figure 1).
Microarrays can be made in house, allowing array sequences to be individualized but the resources and expertise required for this can be prohibitive. Recently, commercially synthesized DNA microarrays have become widely available (Table 1). Although such chips forgo the benefits of customization, they have the advantages of lower initial costs and relieving the investigator of the responsibility for microarray manufacturing quality control.
Molecular Phenotyping and Hypothesis Generation: Implications for Study Design and Validation
Although the cost of DNA microarrays has decreased, it remains a practical consideration that sometimes has an impact on study design. Cost has prompted some investigators to pool RNA in lieu of analyzing different samples on separate arrays. Unfortunately, this approach can increase both false-positive and -negative results, because ≤1/3 of the observed variation between two arrays may be due to the arrays themselves.6 Thus, although samples may be combined to increase RNA yield, this does not obviate the need for multiple, independent experimental observations using separate arrays. Less consensus exists as to the number of replicates necessary, which obviously depends on the samples being analyzed and overall study design. Some would argue that 3 replicates are sufficient for experiments comparing two samples under controlled conditions.7 Far more replicates are likely necessary when comparing two conditions in a clinical setting, especially when studying clinically heterogeneous diseases.
The ability to simultaneously assess mRNA levels for tens of thousands of genes provides a highly detailed expression profile or molecular phenotype that may allow investigators to discriminate between entities indistinguishable by traditional criteria, thus providing diagnostic or prognostic precision. For example, cancers that appear similar by standard criteria may exhibit significantly different patterns of gene expression that predict metastatic potential,8 response to therapy,9 or overall prognosis10 better than standard criteria alone. In this context, the identity of the transcripts or whether they are translated into protein has little bearing on the utility of the pattern. Thus, independent verification of mRNA changes may not be necessary, or feasible, especially when many transcripts are differentially regulated. Alternatively, if a small number of transcripts constitute the pattern of interest, investigators may opt for a more economical technique (eg, QRT-PCR or customized chips) to extend initial observations. As with any clinical test, retrospective associations need to be prospectively validated in independent study populations. Moreover, an understanding of test specificity and sensitivity as well-associated positive and negative predictive values is required before results can be rationally incorporated into management decisions.
Microarray data can also be used to generate hypotheses about the mechanisms underlying observed phenotypes.11,12⇓ The strength of this approach lies in its ability to uncover unanticipated connections. In this setting, the identity of differentially regulated transcripts is a critical part of the story and should be confirmed through independent approaches (eg, Northern blotting, RNase protection, QRT-PCR). In this way, false-positives resulting from pseudogenes, cross-hybridization, or array sequence errors may be avoided13 and splice variants can be detected. Some of these approaches may also provide more accurate quantitation of mRNA than microarrays.2 Hypotheses generated from independently confirmed transcript alterations need to be subjected to rigorous scrutiny. Three obvious but often ignored points must be remembered. First, the correlation between mRNA and protein expression for a particular gene is highly variable. Because proteins are the actual effectors of most cellular processes, mRNA changes unaccompanied by corresponding alterations in protein may not be mechanistically meaningful. Second, the biological activity of many proteins is regulated by subcellular localization and/or posttranslational modifications, neither of which is addressed by transcriptional analysis. Finally, even when corresponding changes in protein expression and activity are demonstrated, this does not establish a biological role for these changes. Additional loss or gain of function experiments will be necessary to establish a mechanistic connection. It may be reasonable to report such in-depth studies separately from the initial microarray results, primarily because of the lengthy delays required to fully explore any such hypotheses. In addition, the initial data may be of interest to other investigators. Early and complete dissemination of this information should maximize the benefits of such studies to the scientific community as a whole.
Obviously, the two most common objectives of microarray experiments—molecular phenotyping and hypothesis generation—are not mutually exclusive. A molecular signature may simultaneously provide useful clinical information and suggest insights into mechanisms underlying the patterns observed. Nevertheless, we suggest that it is useful to clearly define study goals in these terms and to recognize the distinct scientific criteria by which each component should be judged.
Data analysis is critical to the success of microarray projects. The task of analyzing ≥20 000 data points per array, per condition, with multiple replicates can be daunting, and was, until recently, available only to larger institutions and companies through proprietary bioinformatics programs. Fortunately, freely distributed and commercially available software (Table 2) now enable investigators to effectively perform advanced microarray data analysis independently or in conjunction with bioinformatics core facilities. The major components in microarray analysis are normalization, filtering, and computational analysis.7,14,15⇓⇓ The first step in the process, normalization, influences all subsequent data processing and should be carefully considered. Unfortunately, there are no established standards for normalization. Global normalization is most commonly used. Here, the assumption is made that most transcripts are unchanging and therefore the mean intensity signal for each chip will be the same. A normalization/scaling factor is calculated and used to rescale intensities for each transcript. Alternative methods include normalization to an invariant set/feature, normalization using regression techniques, normalization using ratio statistics, or per-gene normalization.14,15⇓ Using multiple normalization methods can be valuable because different approaches may reveal different expression patterns in subsequent analyses. Indeed, the number of genes identified as differentially expressed may vary by as much as 3-fold, depending on which combination of normalization and statistical analysis has been used.15
Analysis may be limited to “meaningful” data through filtering. Although no consensus standards exist, most investigators filter data using some a combination of signal confidence (an index of signal quality), fold change, minimum acceptable signal intensity, and sometimes a statistical cut-off. Filtering out unacceptable data minimizes the computational requirements while maximizing the likelihood of statistical significance (because less correction for multiple-testing is required). Filtering also potentially introduces bias, and biologically important transcripts may be excluded from analysis. Confidence in signal specificity is assessed on Affymetrix Genechips by comparing sample hybridization to perfect-match (PM) and mismatched (MM) sequences. However, recent studies suggest that MM hybridization to target sequence is significant and nonlinear, undermining the utility of this intuitive control,16 and prompting some to suggest that analysis be based solely on the PM signals.17 Until a clear consensus arrives, using a combination of filtering techniques may be advisable.
The final steps in analysis can be broadly divided into clustering algorithms and conventional statistical tests. Clustering algorithms are an excellent means of uncovering latent patterns of gene-expression that may have biological implications.11,14⇓ Hierarchical clustering is often used to generate nested classes of co-related genes often depicted as phylogenetic trees. In addition, k-means clustering and self-organizing maps can also be useful in unmasking gene expression patterns. k-means clustering (with k set at 3; upregulated, downregulated, and unchanging) in conjunction with principal component analysis may be particularly useful when comparing two patient groups (eg, control and diseased). However, clustering techniques will generate clusters from any data sets (even if completely unrelated). Unsurprisingly, no consensus exists as to the best approach to testing statistical significance of microarrays. The Student’s t test with the Bonferroni correction is generally perceived as too stringent given the low number of replicates in most microarray experiments. Alternative techniques may be more appropriate, including parametric and nonparametric ANOVA and permutation-based significance analysis of microarrays (SAM).15,18⇓ If the experiment is aimed at describing a molecular phenotype, the more conservative SAM may reduce the chance of type I error. For hypothesis-generating experiments, parametric ANOVA will most likely generate a larger, less stringent data set that can be subjected to independent experimental validation.15
Clinical Applications of DNA Microarrays
Growing evidence from small clinical studies suggests valuable insights into disease classification can be obtained from transcript profiling. For example, microarray analysis of histopathologically similar breast tumors identifies expression patterns that persist in individual patients.10 Transcript profiling of histopathologically indistinguishable B-cell lymphomas revealed molecularly distinct disease subtypes that were associated with different prognoses19 and responses to treatment.9 Similarly, expression profiling of renal cell carcinoma enhanced prognostication compared with standard criteria alone.20 Microarray analysis also identified previously unappreciated subtypes of human melanoma and a novel subset of genes involved in their malignant transformation.8 Inhibition of one of these genes, RhoC, reduced metastasis in an animal model.21 Thus, transcript profiling can provide not only useful clinical markers but also mechanistic insights and potential targets for intervention. Cancer has been a logical subject for initial transcript profiling studies of human disease because of its clinical importance, relatively ready access to tissue samples, and many fundamental discoveries emphasizing the importance of genetic derangement in its pathogenesis. Whether similar approaches will prove as fruitful when applied to other diseases remains to be seen.
Implications for Cardiovascular Medicine
Cardiovascular diseases may not involve gross perturbations of cellular transcription to the degree observed in neoplasia and access to tissue for diagnostic purposes is often problematic. However, alterations in gene expression reflecting primary, secondary, or treatment22 effects are likely to be valuable in the management and mechanistic understanding of cardiovascular disease, 23 and a number of basic science and clinical cardiovascular studies suggest the utility of this approach.
Using both human tissue and animal disease models, initial cardiovascular microarray studies identified transcriptional changes associated with cardiac hypertrophy,6,24⇓ myocardial infarction,25 human heart failure,26–29⇓⇓⇓ and primary pulmonary hypertension.30 Many of the early studies involved small numbers, limited validation, and the biological or clinical significance of observed changes remains largely unaddressed. Recently, there has been a move away from studies of single diseases to experiments aimed at determining common and distinct expression patterns occurring in grossly similar phenotypes arising through different mechanisms.1 Such studies may enable us to identify distinct expression signatures associated with disease subtypes and thus provide a basis for individualized therapy. A recent study examining cardiac mRNA for candidate genes demonstrated that specific transcriptional changes correlated well with the clinical response to β-blockade in patients with dilated cardiomyopathy.22 Thus, expression profiling may ultimately enhance diagnostic precision and help guide or monitor therapy.
In addition to such direct clinical efforts, transcript profiling provides an important tool for laboratory experiments designed to elucidate important effector mechanisms of pharmacological or genetic interventions.1–3⇓⇓ Given the significant obstacles of clinical and genetic heterogeneity, as well as limited tissue access, we are even more optimistic that such carefully controlled laboratory experiments will generate useful information in the immediate future. For example, microarray analysis recently identified the proapoptotic mitochondrial protein, Nix, as upregulated in a transgenic mouse model of hypertrophy.1 Subsequent studies demonstrated an important role of Nix in the transition from hypertrophy to heart failure.31
An important issue not yet adequately addressed is how microarray experiments in cardiovascular medicine are to be standardized, annotated, archived, and distributed such that these rich and under-examined data sets may be mined in the future. Some organizations have begun making cardiac-specific (www.cardiogenomics.org) and general (www.ncbi.nlm.nih.gov/geo) microarray data available through the internet. In addition, cardiovascular tissue access may prove limiting. If initial studies continue to appear promising, we may see a resurgence of clinical interest in endomyocardial biopsies and other techniques providing tissue access.
Basic Questions Remain Unanswered
Although the desire to examine transcript profiles from diseased hearts is understandable, many fundamental questions remain that have an important bearing on such experiments. For example, to what extent is the expression profile genetically “hard-wired” and to what extent does it change moment-to-moment in response to environmental stimuli? Which stimuli are critical determinants of the transcript profile and thus need to be carefully controlled? The answers seem likely to vary for different tissues and sets of transcripts, but will have important implications for “signal-to-noise” considerations, estimating the requisite statistical power for a proposed study, and overall study design. How much does the transcript profile differ from one part of a normal heart to another? To date, little information is available about such geographic differences although such an expression map may be essential to interpretation of results from diseased hearts. How do medications alter the transcript profile? The relatively small size of current cardiovascular microarray studies generally precludes traditional multivariate analysis. Thus, we may need to directly define the effects of important confounders, to provide confidence that clinical studies of transcript patterns in heart failure, for example, are not actually a delineation of the effects of ACE inhibition or β-blockade.22 Ultimately, an understanding of such basic issues will provide a solid foundation for studies designed to define pathognomonic patterns of disease and unappreciated disease subtypes, as well as those attempting to identify important biological mechanisms and targets for intervention.
This work is supported by grants from the NIH (HL-59521 and HL-61557) and the Wellcome Trust (international prize traveling fellowship [SAC]). Dr Rosenzweig is an Established Investigator of the AHA.
Original received July 3, 2002; revision received August 23, 2002; accepted August 23, 2002.
- ↵Aronow BJ, Toyokawa T, Canning A, Haghighi K, Delling U, Kranias E, Molkentin JD, Dorn GWII. Divergent transcriptional responses to independent genetic causes of cardiac hypertrophy. Physiol Genomics. 2001; 6: 19–28.
- ↵Cook SA, Matsui T, Li L, Rosenzweig A. Transcriptional effects of chronic Akt activation in the heart. J Biol Chem. 2002; 277: 22528–22533.
- ↵Liu T-j, Lai H-c, Wu W, Chinn S, Wang PH. Developing a strategy to define the effects of IGF-1 on gene expression profile in cardiomyocytes. Circ Res. 2001; 88: 1231–1238.
- ↵Fodor SP, Read JL, Pirrung MC, Stryer L, Lu AT, Solas D. Light-directed, spatially addressable parallel chemical synthesis. Science. 1991; 251: 767–773.
- ↵Friddle CJ, Koga T, Rubin EM, Bristow J. Expression profiling reveals distinct sets of genes altered during induction and regression of cardiac hypertrophy. Proc Natl Acad Sci U S A. 2000; 97: 6745–6750.
- ↵Lee ML, Kuo FC, Whitmore GA, Sklar J. Importance of replication in microarray gene expression studies: statistical methods and evidence from repetitive cDNA hybridizations. Proc Natl Acad Sci U S A. 2000; 97: 9834–9839.
- ↵Bittner M, Meltzer P, Chen Y, Jiang Y, Seftor E, Hendrix M, Radmacher M, Simon R, Yakhini Z, Ben-Dor A, Sampas N, Dougherty E, Wang E, Marincola F, Gooden C, Lueders J, Glatfelter A, Pollock P, Carpten J, Gillanders E, Leja D, Dietrich K, Beaudry C, Berens M, Alberts D, Sondak V. Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature. 2000; 406: 536–540.
- ↵Rosenwald A, Wright G, Chan WC, Connors JM, Campo E, Fisher RI, Gascoyne RD, Muller-Hermelink HK, Smeland EB, Giltnane JM, Hurt EM, Zhao H, Averett L, Yang L, Wilson WH, Jaffe ES, Simon R, Klausner RD, Powell J, Duffey PL, Longo DL, Greiner TC, Weisenburger DD, Sanger WG, Dave BJ, Lynch JC, Vose J, Armitage JO, Montserrat E, Lopez-Guillermo A, Grogan TM, Miller TP, LeBlanc M, Ott G, Kvaloy S, Delabie J, Holte H, Krajci P, Stokke T, Staudt LM. The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. N Engl J Med. 2002; 346: 1937–1947.
- ↵Perou CM, Sorlie T, Eisen MB, van de Rijn M, Jeffrey SS, Rees CA, Pollack JR, Ross DT, Johnsen H, Akslen LA, Fluge O, Pergamenschikov A, Williams C, Zhu SX, Lonning PE, Borresen-Dale AL, Brown PO, Botstein D. Molecular portraits of human breast tumours. Nature. 2000; 406: 747–752.
- ↵Hoffmann R, Seidl T, Dugas M. Profound effect of normalization on detection of differentially expressed genes in oligonucleotide microarray data analysis. Genome Biol. 2002; 3: 0033.1–0033.11.
- ↵Chudin E, Walker R, Kosaka A, Wu SX, Rabert D, Chang TK, Kreder DE. Assessment of the relationship between signal intensities and transcript concentration for Affymetrix GeneChip arrays. Genome Biol. 2002; 3: 0005.1–0005.10.
- ↵Cheng L, Wong WH. Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application. Genome Biol. 2001; 2: 0032.1–0032.11.
- ↵Tusher VG, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A. 2001; 98: 5116–5121.
- ↵Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, Powell JI, Yang L, Marti GE, Moore T, Hudson J Jr, Lu L, Lewis DB, Tibshirani R, Sherlock G, Chan WC, Greiner TC, Weisenburger DD, Armitage JO, Warnke R, Staudt LM, et al. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature. 2000; 403: 503–511.
- ↵Takahashi M, Rhodes DR, Furge KA, Kanayama H, Kagawa S, Haab BB, Teh BT. Gene expression profiling of clear cell renal cell carcinoma: gene identification and prognostic classification. Proc Natl Acad Sci U S A. 2001; 98: 9754–9759.
- ↵Stanton LW, Garrard LJ, Damm D, Garrick BL, Lam A, Kapoun AM, Zheng Q, Protter AA, Schreiner GF, White RT. Altered patterns of gene expression in response to myocardial infarction. Circ Res. 2000; 86: 939–945.
- ↵Yang J, Moravec CS, Sussman MA, DiPaola NR, Fu D, Hawthorn L, Mitchell CA, Young JB, Francis GS, McCarthy PM, Bond M. Decreased SLIM1 expression and increased gelsolin expression in failing human hearts measured by high-density oligonucleotide arrays. Circulation. 2000; 102: 3046–3052.
- ↵Hwang JJ, Allen PD, Tseng GC, Lam CW, Fananapazir L, Dzau VJ, Liew CC. Microarray gene expression profiles in dilated and hypertrophic cardiomyopathic end-stage heart failure. Physiol Genomics. 2002; 10: 31–44.