Cardiovascular Event Prediction by Machine Learning: The Multi-Ethnic Study of Atherosclerosis
Rationale: Machine learning may be useful to characterize cardiovascular risk, predict outcomes and identify biomarkers in population studies.
Objective: To test the ability of random survival forests (RF), a machine learning technique, to predict six cardiovascular outcomes in comparison to standard cardiovascular risk scores.
Methods and Results: We included participants from the Multi-Ethnic Study of Atherosclerosis (MESA). Baseline measurements were used to predict cardiovascular outcomes over 12 years of follow-up. MESA was designed to study progression of subclinical disease to cardiovascular events where participants were initially free of CV disease. All 6814 participants from MESA, aged 45 to 84 years, from 4 ethnicities, and 6 centers across USA were included. 735 variables from imaging and non-invasive tests, questionnaires and biomarker panels were obtained. We used the RF technique to identify the top 20 predictors of each outcome. Imaging, electrocardiography and serum biomarkers featured heavily on the top-20 lists as opposed to traditional CV risk factors. Age was the most important predictor for all-cause mortality. Fasting glucose levels and carotid ultrasonography measures were important predictors of stroke. Coronary artery calcium score was the most important predictor of coronary heart disease and all atherosclerotic cardiovascular disease combined outcomes. Left ventricular structure and function, and cardiac troponin-T were among the top predictors for incident heart failure. Creatinine, age and ankle brachial index were among the top predictors of atrial fibrillation. Tissue necrosis factor-α and interleukin-2 soluble receptors, and N-terminal pro-Brain Natriuretic Peptide levels were important across all outcomes. The RF technique performed better than established risk scores with increased prediction accuracy (decreased Brier score by 10-25%).
Conclusions: Machine learning in conjunction with deep phenotyping improve prediction accuracy in cardiovascular event prediction in an initially asymptomatic population. These methods may lead to greater insights regarding subclinical disease markers without apriori assumptions of causality.
- machine learning
- deep phenotyping
- atrial fibrillation
- Random survival forests
- prediction statistics
- population studies
- heart failure
- cardiovascular events
- Received May 9, 2017.
- Revision received July 28, 2017.
- Accepted August 9, 2017.