Donate Help Contact The AHA Sign In Home
American Heart Association
Circulation Research
Search: search_blue_button Advanced Search
Circulation Research. 2001;88:1226-1227
doi: 10.1161/hh1201.093165
This Article
Right arrow Extract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrowRequest Permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Rao, J. S.
Right arrow Articles by Bond, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Rao, J. S.
Right arrow Articles by Bond, M.
Related Collections
Right arrow Biochemistry and metabolism
Right arrow Apoptosis
Right arrow Physiological and pathological control of gene expression
(Circulation Research. 2001;88:1226.)
© 2001 American Heart Association, Inc.


Editorial

Microarrays

Managing the Data Deluge

J. Sunil Rao, Meredith Bond

From the Departments of Biostatistics and Epidemiology (J.S.R.) and Physiology and Biophysics (M.B.), School of Medicine, Case Western Reserve University and the Department of Molecular Cardiology (M.B.), Lerner Research Institute, The Cleveland Clinic Foundation, Cleveland, Ohio.

Correspondence to Meredith Bond, PhD, Department of Molecular Cardiology/NB50, Lerner Research Institute, The Cleveland Clinic Foundation, 9500 Euclid Ave, Cleveland, OH 44195. E-mail bondm{at}ccf.org


Key Words: microarrays • statistical analysis • cDNA • cardiac remodeling

Over the last 10 to 20 years, the search for mechanisms responsible for cardiac remodeling during cardiac hypertrophy and failure has been hampered by the experimental tools available (primarily Western blot analysis and polymerase chain reaction). This is because these approaches only permit measurement of the expression levels of a few preselected genes at one time. However, there is increasing evidence that at the molecular level the changes that occur during development of heart failure represent a complex series of interrelated events.1 2 3 Thus, to identify the full scope and complexity of the subcellular changes that take place and thus make more rapid progress in identifying causes and cures of heart disease, we must depend on emerging high-throughput gene-profiling technologies. These newer approaches permit expression screening of very large numbers of genes simultaneously and then clustering of the results into functional gene families.4 5 As stated by Weinstein et al,6 "We will have to understand our favorite biological molecule in the context of many thousands of others ... a wide net must be cast to be sure that we have, in fact, found the important ones ..." (p. 627).

Both cDNA7 and oligonucleotide arrays8 permit an unbiased assessment (ie, no preselection required) of expression levels of thousands of full-length genes, cDNAs, or expressed sequence tags. In a relatively short period of time, high-density cDNA and oligonucleotide arrays have become almost household words in gene expression studies. Both off-the-shelf and customized arrays are increasingly finding their way into the tool chest of heart researchers. Just 1 year ago in Circulation Research, an article9 and accompanying editorial10 heralded the emergence of gene expression profiling using cDNA arrays as a powerful approach to perform broad-based gene expression studies in heart disease. Several reports have already appeared identifying changes in classes, or clusters, of genes whose expression changes during cardiac remodeling hypertrophy or failure.10 11 12 13 Collectively, these studies represent a major leap forward in our ability to sort out the different pathways in the heart or isolated cardiac myocytes, where changes in gene expression and, very likely, changes in protein expression have occurred.

With the draft sequence of the human genome now complete, the good news is that the number of genes or gene fragments whose expression can be assessed in a single pass by high-throughput analysis will steadily increase, and analysis and display programs that can handle and display the enormous amount of data should become more readily available.7 As the choices of microarrays increase and (hopefully) become cheaper, more and more investigators will be accessing this technology. However, the study by Liu et al14 in this issue of Circulation Research reminds us that in our rush to embrace these new technologies, we need to take pause and consider several important issues pertinent to data analysis and interpretation (FigureDown). Stage 1 of this gene exploration is relatively straightforward. It depends on the size of an investigator’s supply budget for purchase of cDNA arrays or GeneChip® and, secondly, on his or her technical skill to consistently produce high-quality labeled RNA or cDNA. Stage 2, data analysis, is more problematic. There are several issues to consider. First is simply the size of the data sets; gigabytes of computer storage and very fast computers are now routinely required for storage and manipulation of gene expression data. In theory, this problem can be overcome by use of faster computers, large disk storage arrays, fast network interconnects, and modern data backup and archiving systems (albeit at great expense). The second issue is a thornier one, and this is addressed by Liu et al.14 How can we determine which changes in gene expression are statistically significant? How do we set the sensitivity and specificity of the analysis, and, in view of the very large number of genes analyzed, how do we avoid false positives?



View larger version (93K):
[in this window]
[in a new window]
 
Figure 1. Flow chart showing issues to be addressed before cDNA/cRNA from tissue or cells is hybridized to the microarray. The issues can be divided into 4 groupings: (1) availability of computers and statisticians; (2) cost, sample size, replicates, etc; (3) verification of quality of RNA/cDNA; and (4) statistical analysis of results and consideration of FDRs.

Gene screening involves statistical hypothesis testing and as such has built in type I and type II errors. There are two issues at play here, one of which is addressed by the study by Liu et al.14 The other is indirectly addressed. The first issue deals with how replicates increase the accuracy of database estimates and hence statistical hypothesis testing. To investigate this question, Liu et al have chosen as their test system the changes in gene expression in isolated cardiac myocytes stimulated by insulin-derived growth factor-1 (IGF-1). IGF-1 is one of several factors known to trigger changes leading to cardiac hypertrophy, resulting in increased cell size, assembly of sarcomeres, and reexpression of fetal genes. Liu et al cap a rigorous statistical analysis of their cDNA expression data with a report of identification of several novel genes. Recently, this same issue was formally addressed for microarray data.15 However, the study by Liu et al14 uses a more heuristic approach to demonstrate how increasing the number of replicates reduces false detection rates (FDRs) of gene expression changes during cardiac remodeling.

The second issue deals with the multiplicity of statistical tests conducted. In this case, the usual error rates (P<0.05) applied to each test are no longer valid. Instead, family-wise error rates (cumulative error rates over the total number of hypotheses tested) need to be considered and procedures need to be developed to ensure that the overall error rate over all tests conducted is below some threshold. However, when thousands of tests are conducted, as in the case of gene screening, this becomes impractical. Therefore, the notion of FDRs16 17 has been developed to answer the following question: out of all of the hypothesis tests rejected (ie, significant gene differences found between IGF-1 and control), what proportion are rejected incorrectly? FDRs can be estimated from data using permutation or bootstrap methodologies (simulation techniques used when traditional assumptions, such as normality, do not hold) and have been successfully applied to gene screening for microarrays.18 A minor point is that the authors equate low FDRs with high specificity, whereas low FDRs actually indicate high sensitivity. To achieve high specificity, one would have to have some knowledge of false negatives. Whereas some theoretical work along these lines has been done, nothing has yet been extended to the microarray problem.

Acknowledgments

The authors wish to acknowledge Carley Gwin, Director, Gene Expression Core, and Eldon Walker, Director, Research Computing Services, at the Lerner Research Institute, The Cleveland Clinic Foundation.

Footnotes

The opinions expressed in this editorial are not necessarily those of the editors or of the American Heart Association.

References

1. Chien KR. Genomic circuits and the integrative biology of cardiac diseases. Nature. 2000;407:227–232.[Medline] [Order article via Infotrieve]

2. Houser SR, Lakatta EG. Function of the cardiac myocyte in the conundrum of end-stage, dilated human heart failure. Circulation. 1999;99:600–604.[Free Full Text]

3. Mann DL. Mechanisms and models in heart failure: a combinatorial approach. Circulation. 1999;100:999–1008.[Free Full Text]

4. Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A. 1998;95:14863–14868.[Abstract/Free Full Text]

5. Overbeek R, Fonstein M, D’Souza M, Pusch GD, Maltsev N. The use of gene clusters to infer functional coupling. Proc Natl Acad Sci U S A. 1999;96:2896–2901.[Abstract/Free Full Text]

6. Weinstein JN. Fishing expeditions [letter]. Science. 1998;282:627.

7. Eisen MB, Brown PO. DNA arrays for analysis of gene expression. Methods Enzymol. 1999;303:179–205.[Medline] [Order article via Infotrieve]

8. Lipshutz RJ, Fodor SP, Gingeras TR, Lockhart DJ. High density synthetic oligonucleotide arrays. Nat Genet. 1999;21(suppl 1):20–24.

9. Abdellatif M. Leading the way using microarray: a more comprehensive approach for discovery of gene expression patterns. Circ Res. 2000;86:919–920.[Free Full Text]

10. Stanton LW, Garrard LJ, Damm D, Garrick BL, Lam A, Kapoun AM, Zheng Q, Protter AA, Schreiner GF, White RT. Altered patterns of gene expression in response to myocardial infarction. Circ Res. 2000;86:939–945.[Abstract/Free Full Text]

11. Friddle CJ, Koga T, Rubin EM, Bristow J. Expression profiling reveals distinct sets of genes altered during induction and regression of cardiac hypertrophy. Proc Natl Acad Sci U S A. 2000;97:6745–6750.[Abstract/Free Full Text]

12. Taylor LA, Carthy CM, Yang D, Saad K, Wong D, Schreiner G, Stanton LW, McManus BM. Host gene regulation during coxsackievirus B3 infection in mice: assessment by microarrays. Circ Res. 2000;87:328–334.[Abstract/Free Full Text]

13. Yang J, Moravec CS, Sussman MS, DiPaola NR, Fu D, Hawthorn L, Young JB, Francis GS, McCarthy PM, Bond M. Decreased expression of striated muscle LIM protein-1 (SLIM1) and increased expression of gelsolin in failing human hearts by high density oligonucleotide arrays. Circulation. 2000;102:3046–3052.[Abstract/Free Full Text]

14. Liu T-j, Lai H-c, Wu W, Chinn S, Wang PH. Developing a strategy to define the effects of insulin-like growth factor-1 on gene expression profile in cardiomyocytes. Circ Res. 2001;88:1231-1238.[Abstract/Free Full Text]

15. Lee M, Kuo F, Whitemore G, Sklar J. Importance of replication in microarray gene expression studies: statistical methods and evidence from a single cDNA array experiment. Proc Natl Acad Sci U S A. 2000;97:9834–9839.[Abstract/Free Full Text]

16. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Royal Stat Soc Br. 1995;57:289–300.

17. Benjamini Y, Yekutieli D. Resampling-based false discovery rate controlling multiple test procedures for correlated test statistics. J Stat Plan Infer. 1999;82:171–196.

18. Tusher V, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A. 2001;98:5116–5121.[Abstract/Free Full Text]




This article has been cited by other articles:


Home page
Physiol. GenomicsHome page
G. E. Haddad, L. J. Saunders, S. D. Crosby, M. Carles, F. del Monte, K. King, M. R. Bristow, F. G. Spinale, T. E. Macgillivray, M. J. Semigran, et al.
Human cardiac-specific cDNA array for idiopathic dilated cardiomyopathy: sex-related differences
Physiol Genomics, April 1, 2008; 33(2): 267 - 277.
[Abstract] [Full Text] [PDF]


Home page
Physiol. GenomicsHome page
T. P. Cappola, L. Cope, A. Cernetich, L. A. Barouch, K. Minhas, R. A. Irizarry, G. Parmigiani, S. Durrani, T. Lavoie, E. P. Hoffman, et al.
Deficiency of different nitric oxide synthase isoforms activates divergent transcriptional programs in cardiac hypertrophy
Physiol Genomics, June 24, 2003; 14(1): 25 - 34.
[Abstract] [Full Text] [PDF]


Home page
HeartHome page
C Napoli, L O Lerman, V Sica, A Lerman, G Tajana, and F de Nigris
Microarray analysis: a novel research tool for cardiovascular scientists and physicians
Heart, June 1, 2003; 89(6): 597 - 604.
[Abstract] [Full Text] [PDF]


Home page
Physiol. GenomicsHome page
A. R. Lankford, A. M. Byford, K. J. Ashton, B. A. French, J. K. Lee, J. P. Headrick, and G. P. Matherne
Gene expression profile of mouse myocardium with transgenic overexpression of A1 adenosine receptors
Physiol Genomics, October 29, 2002; 11(2): 81 - 89.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
F.-L. Tan, C. S. Moravec, J. Li, C. Apperson-Hansen, P. M. McCarthy, J. B. Young, and M. Bond
The gene expression fingerprint of human heart failure
PNAS, August 20, 2002; 99(17): 11387 - 11392.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Extract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrowRequest Permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Rao, J. S.
Right arrow Articles by Bond, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Rao, J. S.
Right arrow Articles by Bond, M.
Related Collections
Right arrow Biochemistry and metabolism
Right arrow Apoptosis
Right arrow Physiological and pathological control of gene expression