Thursday, October 30, 2014

Modeling & Predicting Coronary Heart Disease with Logistic Regression

Coronary (ischemic) heart disease results from plaque built up in arteries that supply blood and oxygen to the heart.  The narrowing of these arteries can culminate into a heart attack, and is one of the leading causes of death in men and women.  As a motivator to stay healthy, I believe people could benefit from a quantifiable way of measuring their relative risk of heart disease.  This is similar to the Framingham risk score, but was modeled from different datasets.  First, I set out to explore the data to gain insight into the important features to be used in the model.  Then I used cross-validation of different supervised machine-learning algorithms to build an optimized model.



Heart Disease Between Genders: Age and Cholesterol

Using IHIS data from 2000-­2013, men aged 40+ were found to be at a significantly (t­-test, p<0.05) increased risk for heart disease compared to women of the same age (see Figures 1 and 2).

Figure 1


Assuming cholesterol level has a positive relation to risk of heart disease, the increased risk of older men compared to older women does not appear to be the result of increased cholesterol levels in older men (see Figures 3 and 4). Women instead appear to have a statistically higher (p<0.05) cholesterol level between 40-­42 years of age as well as above 58 years of age compared to their male counterparts. Unless cholesterol has a negative relation with heart disease, it appears that the risk from being an older man is largely independent of cholesterol.

Figure 3



Heart disease risk model

Using the previous insight of the significant combined effect of age and gender to model the UCI datasets, I built a logistic regression model that predicted an individual’s risk for heart disease (P(heartDisease)) using three highly significant (p<0.001) features:

1. age*gender
2. cholesterol level [cholesterol]
3. maximum heart rate achieved during exercise [max exercise HR]


Of the models tested, the logistic model (see above) had the highest prediction accuracy (74%) with a precision and recall of 70% and 68%, respectively. More generally, this model predicts that being an older male, having high cholesterol, and achieving a low maximum heart rate during exercise increases the likelihood of heart disease.  As quantitative examples of this heart disease risk modeling, if a 42 year old man who achieves 142 max beats per minute (bpm) during exercise reduces his cholesterol level from 250 mg/dL to 240 mg/dL (keeping all other features constant), he will have reduced his risk for heart disease by 1.3%. Compared to a man, a woman with these exact same stats will have a 22.4% reduced risk of heart disease. And finally, if this woman increases her max heart rate during exercise from 142 bpm to 152 bpm (keeping all other features constant), she will have reduced her risk for heart disease by 3.6%.  The following web application illustrates and quantifies this model (screenshot below).


Heart Disease and Menopause

I used 1994 and 1998 IHIS data to determine the relationship of a woman’s risk for heart disease with her menopausal status. As shown in Figure 5 and 6, I determined that 40­-42 year old women with menopausal symptoms have a significantly (t­-test, p<0.05) higher likelihood of heart disease compared to 40-­42 year old women who have never experienced menopausal symptoms. Thus it appears that a younger woman’s likelihood of heart disease may be increased if she has menopause. Future work can predict a woman’s likelihood of having menopause (if status is unknown) from other key information such as smoking, diabetes status, age, and other factors. This can then be incorporated in the heart disease model to determine whether it can more accurately predict a woman’s risk of heart disease.


Summary

Heart disease between genders: age and cholesterol
Men aged 40 and up are at an increased risk for heart disease compared to women of the same age. This statistically significant effect does not appear to be the result of increased cholesterol levels.

Heart disease risk model: effects of gender, age, cholesterol, and max heart rate during exercise
I built a model that predicted an individual’s risk for heart disease based on his/her gender and age, cholesterol level, and maximum heart rate attained during exercise. Specifically the model predicted that being older, male, having high cholesterol, and reaching a low maximum heart rate during exercise increased the likelihood of heart disease.

Heart Disease and Menopause
40­-42 year old women with menopausal symptoms have a significantly higher likelihood of heart disease compared to 40­-42 year old women who have never experienced menopausal symptoms. Future work can predict a woman’s likelihood of having menopause (if not available) from other key information to see if it can better predict a woman’s risk of heart disease.

Data sources

IHIS
UCI Heart Disease Data 1, Data 2