Heart Disease Prediction using Neural Networks¶

This project will focus on predicting heart disease using neural networks. Based on attributes such as blood pressure, cholestoral levels, heart rate, and other characteristic attributes, patients will be classified according to varying degrees of coronary artery disease. This project will utilize a dataset of 303 patients and distributed by the UCI Machine Learning Repository.

Machine learning and artificial intelligence is going to have a dramatic impact on the health field; as a result, familiarizing yourself with the data processing techniques appropriate for numerical health data and the most widely used algorithms for classification tasks is an incredibly valuable use of your time! In this tutorial, we will do exactly that.

We will be using some common Python libraries, such as pandas, numpy, and matplotlib. Furthermore, for the machine learning side of this project, we will be using sklearn and keras. Import these libraries using the cell below to ensure you have them correctly installed.

In [118]:
import sys
import pandas as pd
import numpy as np
import sklearn
import matplotlib
import keras
In [2]:
import matplotlib.pyplot as plt
from pandas.plotting import scatter_matrix

1. Importing the Dataset¶

The dataset is available through the University of California, Irvine Machine learning repository. Here is the URL:

http:////archive.ics.uci.edu/ml/datasets/Heart+Disease

This dataset contains patient data concerning heart disease diagnosis that was collected at several locations around the world. There are 76 attributes, including age, sex, resting blood pressure, cholestoral levels, echocardiogram data, exercise habits, and many others. To data, all published studies using this data focus on a subset of 14 attributes - so we will do the same. More specifically, we will use the data collected at the Cleveland Clinic Foundation.

To import the necessary data, we will use pandas' built in read_csv() function.

In [69]:
# import the heart disease dataset
url = "http://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/processed.cleveland.data"

# the names will be the names of each column in our pandas DataFrame
names = ['age',
        'sex',
        'cp',
        'trestbps',
        'chol',
        'fbs',
        'restecg',
        'thalach',
        'exang',
        'oldpeak',
        'slope',
        'ca',
        'thal',
        'class']

# read the csv
cleveland = pd.read_csv(url, names=names)
In [70]:
# print the shape of the DataFrame, so we can see how many examples we have
print 'Shape of DataFrame: {}'.format(cleveland.shape)
print cleveland.loc[1]
Shape of DataFrame: (303, 14)
age          67
sex           1
cp            4
trestbps    160
chol        286
fbs           0
restecg       2
thalach     108
exang         1
oldpeak     1.5
slope         2
ca          3.0
thal        3.0
class         2
Name: 1, dtype: object
In [71]:
# print the last twenty or so data points
cleveland.loc[280:]
Out[71]:
age sex cp trestbps chol fbs restecg thalach exang oldpeak slope ca thal class
280 57.0 1.0 4.0 110.0 335.0 0.0 0.0 143.0 1.0 3.0 2.0 1.0 7.0 2
281 47.0 1.0 3.0 130.0 253.0 0.0 0.0 179.0 0.0 0.0 1.0 0.0 3.0 0
282 55.0 0.0 4.0 128.0 205.0 0.0 1.0 130.0 1.0 2.0 2.0 1.0 7.0 3
283 35.0 1.0 2.0 122.0 192.0 0.0 0.0 174.0 0.0 0.0 1.0 0.0 3.0 0
284 61.0 1.0 4.0 148.0 203.0 0.0 0.0 161.0 0.0 0.0 1.0 1.0 7.0 2
285 58.0 1.0 4.0 114.0 318.0 0.0 1.0 140.0 0.0 4.4 3.0 3.0 6.0 4
286 58.0 0.0 4.0 170.0 225.0 1.0 2.0 146.0 1.0 2.8 2.0 2.0 6.0 2
287 58.0 1.0 2.0 125.0 220.0 0.0 0.0 144.0 0.0 0.4 2.0 ? 7.0 0
288 56.0 1.0 2.0 130.0 221.0 0.0 2.0 163.0 0.0 0.0 1.0 0.0 7.0 0
289 56.0 1.0 2.0 120.0 240.0 0.0 0.0 169.0 0.0 0.0 3.0 0.0 3.0 0
290 67.0 1.0 3.0 152.0 212.0 0.0 2.0 150.0 0.0 0.8 2.0 0.0 7.0 1
291 55.0 0.0 2.0 132.0 342.0 0.0 0.0 166.0 0.0 1.2 1.0 0.0 3.0 0
292 44.0 1.0 4.0 120.0 169.0 0.0 0.0 144.0 1.0 2.8 3.0 0.0 6.0 2
293 63.0 1.0 4.0 140.0 187.0 0.0 2.0 144.0 1.0 4.0 1.0 2.0 7.0 2
294 63.0 0.0 4.0 124.0 197.0 0.0 0.0 136.0 1.0 0.0 2.0 0.0 3.0 1
295 41.0 1.0 2.0 120.0 157.0 0.0 0.0 182.0 0.0 0.0 1.0 0.0 3.0 0
296 59.0 1.0 4.0 164.0 176.0 1.0 2.0 90.0 0.0 1.0 2.0 2.0 6.0 3
297 57.0 0.0 4.0 140.0 241.0 0.0 0.0 123.0 1.0 0.2 2.0 0.0 7.0 1
298 45.0 1.0 1.0 110.0 264.0 0.0 0.0 132.0 0.0 1.2 2.0 0.0 7.0 1
299 68.0 1.0 4.0 144.0 193.0 1.0 0.0 141.0 0.0 3.4 2.0 2.0 7.0 2
300 57.0 1.0 4.0 130.0 131.0 0.0 0.0 115.0 1.0 1.2 2.0 1.0 7.0 3
301 57.0 0.0 2.0 130.0 236.0 0.0 2.0 174.0 0.0 0.0 2.0 1.0 3.0 1
302 38.0 1.0 3.0 138.0 175.0 0.0 0.0 173.0 0.0 0.0 1.0 ? 3.0 0
In [72]:
# remove missing data (indicated with a "?")
data = cleveland[~cleveland.isin(['?'])]
data.loc[280:]
Out[72]:
age sex cp trestbps chol fbs restecg thalach exang oldpeak slope ca thal class
280 57.0 1.0 4.0 110.0 335.0 0.0 0.0 143.0 1.0 3.0 2.0 1.0 7.0 2
281 47.0 1.0 3.0 130.0 253.0 0.0 0.0 179.0 0.0 0.0 1.0 0.0 3.0 0
282 55.0 0.0 4.0 128.0 205.0 0.0 1.0 130.0 1.0 2.0 2.0 1.0 7.0 3
283 35.0 1.0 2.0 122.0 192.0 0.0 0.0 174.0 0.0 0.0 1.0 0.0 3.0 0
284 61.0 1.0 4.0 148.0 203.0 0.0 0.0 161.0 0.0 0.0 1.0 1.0 7.0 2
285 58.0 1.0 4.0 114.0 318.0 0.0 1.0 140.0 0.0 4.4 3.0 3.0 6.0 4
286 58.0 0.0 4.0 170.0 225.0 1.0 2.0 146.0 1.0 2.8 2.0 2.0 6.0 2
287 58.0 1.0 2.0 125.0 220.0 0.0 0.0 144.0 0.0 0.4 2.0 NaN 7.0 0
288 56.0 1.0 2.0 130.0 221.0 0.0 2.0 163.0 0.0 0.0 1.0 0.0 7.0 0
289 56.0 1.0 2.0 120.0 240.0 0.0 0.0 169.0 0.0 0.0 3.0 0.0 3.0 0
290 67.0 1.0 3.0 152.0 212.0 0.0 2.0 150.0 0.0 0.8 2.0 0.0 7.0 1
291 55.0 0.0 2.0 132.0 342.0 0.0 0.0 166.0 0.0 1.2 1.0 0.0 3.0 0
292 44.0 1.0 4.0 120.0 169.0 0.0 0.0 144.0 1.0 2.8 3.0 0.0 6.0 2
293 63.0 1.0 4.0 140.0 187.0 0.0 2.0 144.0 1.0 4.0 1.0 2.0 7.0 2
294 63.0 0.0 4.0 124.0 197.0 0.0 0.0 136.0 1.0 0.0 2.0 0.0 3.0 1
295 41.0 1.0 2.0 120.0 157.0 0.0 0.0 182.0 0.0 0.0 1.0 0.0 3.0 0
296 59.0 1.0 4.0 164.0 176.0 1.0 2.0 90.0 0.0 1.0 2.0 2.0 6.0 3
297 57.0 0.0 4.0 140.0 241.0 0.0 0.0 123.0 1.0 0.2 2.0 0.0 7.0 1
298 45.0 1.0 1.0 110.0 264.0 0.0 0.0 132.0 0.0 1.2 2.0 0.0 7.0 1
299 68.0 1.0 4.0 144.0 193.0 1.0 0.0 141.0 0.0 3.4 2.0 2.0 7.0 2
300 57.0 1.0 4.0 130.0 131.0 0.0 0.0 115.0 1.0 1.2 2.0 1.0 7.0 3
301 57.0 0.0 2.0 130.0 236.0 0.0 2.0 174.0 0.0 0.0 2.0 1.0 3.0 1
302 38.0 1.0 3.0 138.0 175.0 0.0 0.0 173.0 0.0 0.0 1.0 NaN 3.0 0
In [73]:
# drop rows with NaN values from DataFrame
data = data.dropna(axis=0)
data.loc[280:]
Out[73]:
age sex cp trestbps chol fbs restecg thalach exang oldpeak slope ca thal class
280 57.0 1.0 4.0 110.0 335.0 0.0 0.0 143.0 1.0 3.0 2.0 1.0 7.0 2
281 47.0 1.0 3.0 130.0 253.0 0.0 0.0 179.0 0.0 0.0 1.0 0.0 3.0 0
282 55.0 0.0 4.0 128.0 205.0 0.0 1.0 130.0 1.0 2.0 2.0 1.0 7.0 3
283 35.0 1.0 2.0 122.0 192.0 0.0 0.0 174.0 0.0 0.0 1.0 0.0 3.0 0
284 61.0 1.0 4.0 148.0 203.0 0.0 0.0 161.0 0.0 0.0 1.0 1.0 7.0 2
285 58.0 1.0 4.0 114.0 318.0 0.0 1.0 140.0 0.0 4.4 3.0 3.0 6.0 4
286 58.0 0.0 4.0 170.0 225.0 1.0 2.0 146.0 1.0 2.8 2.0 2.0 6.0 2
288 56.0 1.0 2.0 130.0 221.0 0.0 2.0 163.0 0.0 0.0 1.0 0.0 7.0 0
289 56.0 1.0 2.0 120.0 240.0 0.0 0.0 169.0 0.0 0.0 3.0 0.0 3.0 0
290 67.0 1.0 3.0 152.0 212.0 0.0 2.0 150.0 0.0 0.8 2.0 0.0 7.0 1
291 55.0 0.0 2.0 132.0 342.0 0.0 0.0 166.0 0.0 1.2 1.0 0.0 3.0 0
292 44.0 1.0 4.0 120.0 169.0 0.0 0.0 144.0 1.0 2.8 3.0 0.0 6.0 2
293 63.0 1.0 4.0 140.0 187.0 0.0 2.0 144.0 1.0 4.0 1.0 2.0 7.0 2
294 63.0 0.0 4.0 124.0 197.0 0.0 0.0 136.0 1.0 0.0 2.0 0.0 3.0 1
295 41.0 1.0 2.0 120.0 157.0 0.0 0.0 182.0 0.0 0.0 1.0 0.0 3.0 0
296 59.0 1.0 4.0 164.0 176.0 1.0 2.0 90.0 0.0 1.0 2.0 2.0 6.0 3
297 57.0 0.0 4.0 140.0 241.0 0.0 0.0 123.0 1.0 0.2 2.0 0.0 7.0 1
298 45.0 1.0 1.0 110.0 264.0 0.0 0.0 132.0 0.0 1.2 2.0 0.0 7.0 1
299 68.0 1.0 4.0 144.0 193.0 1.0 0.0 141.0 0.0 3.4 2.0 2.0 7.0 2
300 57.0 1.0 4.0 130.0 131.0 0.0 0.0 115.0 1.0 1.2 2.0 1.0 7.0 3
301 57.0 0.0 2.0 130.0 236.0 0.0 2.0 174.0 0.0 0.0 2.0 1.0 3.0 1
In [74]:
# print the shape and data type of the dataframe
print data.shape
print data.dtypes
(297, 14)
age         float64
sex         float64
cp          float64
trestbps    float64
chol        float64
fbs         float64
restecg     float64
thalach     float64
exang       float64
oldpeak     float64
slope       float64
ca           object
thal         object
class         int64
dtype: object
In [75]:
# transform data to numeric to enable further analysis
data = data.apply(pd.to_numeric)
data.dtypes
Out[75]:
age         float64
sex         float64
cp          float64
trestbps    float64
chol        float64
fbs         float64
restecg     float64
thalach     float64
exang       float64
oldpeak     float64
slope       float64
ca          float64
thal        float64
class         int64
dtype: object
In [76]:
# print data characteristics, usings pandas built-in describe() function
data.describe()
Out[76]:
age sex cp trestbps chol fbs restecg thalach exang oldpeak slope ca thal class
count 297.000000 297.000000 297.000000 297.000000 297.000000 297.000000 297.000000 297.000000 297.000000 297.000000 297.000000 297.000000 297.000000 297.000000
mean 54.542088 0.676768 3.158249 131.693603 247.350168 0.144781 0.996633 149.599327 0.326599 1.055556 1.602694 0.676768 4.730640 0.946128
std 9.049736 0.468500 0.964859 17.762806 51.997583 0.352474 0.994914 22.941562 0.469761 1.166123 0.618187 0.938965 1.938629 1.234551
min 29.000000 0.000000 1.000000 94.000000 126.000000 0.000000 0.000000 71.000000 0.000000 0.000000 1.000000 0.000000 3.000000 0.000000
25% 48.000000 0.000000 3.000000 120.000000 211.000000 0.000000 0.000000 133.000000 0.000000 0.000000 1.000000 0.000000 3.000000 0.000000
50% 56.000000 1.000000 3.000000 130.000000 243.000000 0.000000 1.000000 153.000000 0.000000 0.800000 2.000000 0.000000 3.000000 0.000000
75% 61.000000 1.000000 4.000000 140.000000 276.000000 0.000000 2.000000 166.000000 1.000000 1.600000 2.000000 1.000000 7.000000 2.000000
max 77.000000 1.000000 4.000000 200.000000 564.000000 1.000000 2.000000 202.000000 1.000000 6.200000 3.000000 3.000000 7.000000 4.000000
In [11]:
# plot histograms for each variable
data.hist(figsize = (12, 12))
plt.show()

2. Create Training and Testing Datasets¶

We will use Sklearn's train_test_split() to train and test datasets.

The class values in this dataset contain multiple types of heart disease with values ranging from 0 (healthy) to 4 (severe heart disease). We will need to convert our class data to categorical labels. For example, the label 2 will become [0, 0, 1, 0, 0].

In [77]:
# create X and Y datasets for training
from sklearn import model_selection

X = np.array(data.drop(['class'], 1))
y = np.array(data['class'])

X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, test_size = 0.2)
In [78]:
# convert the data to categorical labels
from keras.utils.np_utils import to_categorical

Y_train = to_categorical(y_train, num_classes=None)
Y_test = to_categorical(y_test, num_classes=None)
print Y_train.shape
print Y_train[:10]
(237L, 5L)
[[0. 0. 0. 0. 1.]
 [1. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0.]]

3. Building and Training the Neural Network¶

Begin building a neural network to solve classification problem using keras. We will define a simple neural network with one hidden layer. Since this is a categorical classification problem, we will use a softmax activation function in the final layer of our network and a categorical_crossentropy loss during our training phase.

In [102]:
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import Adam

# define a function to build the keras model
def create_model():
    # create model
    model = Sequential()
    model.add(Dense(8, input_dim=13, kernel_initializer='normal', activation='relu'))
    model.add(Dense(4, kernel_initializer='normal', activation='relu'))
    model.add(Dense(5, activation='softmax'))
    
    # compile model
    adam = Adam(lr=0.001)
    model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['accuracy'])
    return model

model = create_model()

print(model.summary())
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_22 (Dense)             (None, 8)                 112       
_________________________________________________________________
dense_23 (Dense)             (None, 4)                 36        
_________________________________________________________________
dense_24 (Dense)             (None, 5)                 25        
=================================================================
Total params: 173
Trainable params: 173
Non-trainable params: 0
_________________________________________________________________
None
In [103]:
# fit the model to the training data
model.fit(X_train, Y_train, epochs=100, batch_size=10, verbose = 1)
Epoch 1/100
237/237 [==============================] - 0s 359us/step - loss: 1.3269 - acc: 0.5401
Epoch 2/100
237/237 [==============================] - 0s 464us/step - loss: 1.2314 - acc: 0.5485
Epoch 3/100
237/237 [==============================] - 0s 616us/step - loss: 1.2052 - acc: 0.5485
Epoch 4/100
237/237 [==============================] - 0s 620us/step - loss: 1.1845 - acc: 0.5570
Epoch 5/100
237/237 [==============================] - 0s 624us/step - loss: 1.1671 - acc: 0.5527
Epoch 6/100
237/237 [==============================] - 0s 595us/step - loss: 1.1518 - acc: 0.5570
Epoch 7/100
237/237 [==============================] - 0s 709us/step - loss: 1.1437 - acc: 0.5570
Epoch 8/100
237/237 [==============================] - 0s 654us/step - loss: 1.1174 - acc: 0.5738
Epoch 9/100
237/237 [==============================] - 0s 654us/step - loss: 1.1009 - acc: 0.5738
Epoch 10/100
237/237 [==============================] - 0s 519us/step - loss: 1.1023 - acc: 0.5907
Epoch 11/100
237/237 [==============================] - 0s 354us/step - loss: 1.0935 - acc: 0.5823
Epoch 12/100
237/237 [==============================] - 0s 359us/step - loss: 1.0937 - acc: 0.5738
Epoch 13/100
237/237 [==============================] - 0s 350us/step - loss: 1.0879 - acc: 0.5823
Epoch 14/100
237/237 [==============================] - 0s 397us/step - loss: 1.0675 - acc: 0.5781
Epoch 15/100
237/237 [==============================] - 0s 405us/step - loss: 1.0498 - acc: 0.5696
Epoch 16/100
237/237 [==============================] - 0s 460us/step - loss: 1.0555 - acc: 0.5949
Epoch 17/100
237/237 [==============================] - 0s 354us/step - loss: 1.0428 - acc: 0.5781
Epoch 18/100
237/237 [==============================] - 0s 291us/step - loss: 1.0159 - acc: 0.5992
Epoch 19/100
237/237 [==============================] - 0s 401us/step - loss: 1.0314 - acc: 0.5823
Epoch 20/100
237/237 [==============================] - 0s 359us/step - loss: 1.0155 - acc: 0.5907
Epoch 21/100
237/237 [==============================] - 0s 367us/step - loss: 0.9974 - acc: 0.5949
Epoch 22/100
237/237 [==============================] - 0s 376us/step - loss: 1.0290 - acc: 0.5865
Epoch 23/100
237/237 [==============================] - 0s 392us/step - loss: 0.9907 - acc: 0.5907
Epoch 24/100
237/237 [==============================] - 0s 316us/step - loss: 0.9907 - acc: 0.5907
Epoch 25/100
237/237 [==============================] - 0s 388us/step - loss: 0.9867 - acc: 0.6160
Epoch 26/100
237/237 [==============================] - 0s 338us/step - loss: 0.9652 - acc: 0.5949
Epoch 27/100
237/237 [==============================] - 0s 439us/step - loss: 0.9561 - acc: 0.5949
Epoch 28/100
237/237 [==============================] - 0s 371us/step - loss: 0.9530 - acc: 0.6076
Epoch 29/100
237/237 [==============================] - 0s 342us/step - loss: 0.9536 - acc: 0.6203
Epoch 30/100
237/237 [==============================] - 0s 333us/step - loss: 1.0051 - acc: 0.6034
Epoch 31/100
237/237 [==============================] - 0s 439us/step - loss: 0.9643 - acc: 0.5992
Epoch 32/100
237/237 [==============================] - 0s 371us/step - loss: 0.9462 - acc: 0.5992
Epoch 33/100
237/237 [==============================] - 0s 224us/step - loss: 0.9354 - acc: 0.6118
Epoch 34/100
237/237 [==============================] - 0s 211us/step - loss: 0.9251 - acc: 0.5949
Epoch 35/100
237/237 [==============================] - 0s 219us/step - loss: 0.9224 - acc: 0.6160
Epoch 36/100
237/237 [==============================] - 0s 215us/step - loss: 0.9237 - acc: 0.6034
Epoch 37/100
237/237 [==============================] - 0s 283us/step - loss: 0.9169 - acc: 0.6203
Epoch 38/100
237/237 [==============================] - 0s 215us/step - loss: 0.9786 - acc: 0.5949
Epoch 39/100
237/237 [==============================] - 0s 291us/step - loss: 0.9995 - acc: 0.5738
Epoch 40/100
237/237 [==============================] - 0s 287us/step - loss: 0.9367 - acc: 0.6118
Epoch 41/100
237/237 [==============================] - 0s 194us/step - loss: 0.9448 - acc: 0.6034
Epoch 42/100
237/237 [==============================] - 0s 165us/step - loss: 0.9077 - acc: 0.6118
Epoch 43/100
237/237 [==============================] - 0s 181us/step - loss: 0.9092 - acc: 0.6076
Epoch 44/100
237/237 [==============================] - 0s 186us/step - loss: 0.9032 - acc: 0.6076
Epoch 45/100
237/237 [==============================] - 0s 241us/step - loss: 0.9020 - acc: 0.6076
Epoch 46/100
237/237 [==============================] - 0s 211us/step - loss: 0.9198 - acc: 0.5949
Epoch 47/100
237/237 [==============================] - 0s 177us/step - loss: 0.9189 - acc: 0.6034
Epoch 48/100
237/237 [==============================] - 0s 203us/step - loss: 0.9076 - acc: 0.6287
Epoch 49/100
237/237 [==============================] - 0s 165us/step - loss: 0.8909 - acc: 0.6203
Epoch 50/100
237/237 [==============================] - 0s 198us/step - loss: 0.8892 - acc: 0.6203
Epoch 51/100
237/237 [==============================] - 0s 173us/step - loss: 0.8948 - acc: 0.6329
Epoch 52/100
237/237 [==============================] - 0s 228us/step - loss: 0.8794 - acc: 0.6414
Epoch 53/100
237/237 [==============================] - 0s 207us/step - loss: 0.9005 - acc: 0.5949
Epoch 54/100
237/237 [==============================] - 0s 169us/step - loss: 0.8908 - acc: 0.6456
Epoch 55/100
237/237 [==============================] - 0s 207us/step - loss: 0.8840 - acc: 0.6371
Epoch 56/100
237/237 [==============================] - 0s 156us/step - loss: 0.9097 - acc: 0.6118
Epoch 57/100
237/237 [==============================] - 0s 177us/step - loss: 0.9166 - acc: 0.6118
Epoch 58/100
237/237 [==============================] - 0s 232us/step - loss: 0.8841 - acc: 0.6329
Epoch 59/100
237/237 [==============================] - 0s 165us/step - loss: 0.9133 - acc: 0.6371
Epoch 60/100
237/237 [==============================] - 0s 190us/step - loss: 0.9100 - acc: 0.6160
Epoch 61/100
237/237 [==============================] - 0s 177us/step - loss: 0.9022 - acc: 0.6118
Epoch 62/100
237/237 [==============================] - 0s 173us/step - loss: 0.8865 - acc: 0.6414
Epoch 63/100
237/237 [==============================] - 0s 173us/step - loss: 0.8894 - acc: 0.6203
Epoch 64/100
237/237 [==============================] - 0s 219us/step - loss: 0.9010 - acc: 0.6414
Epoch 65/100
237/237 [==============================] - 0s 165us/step - loss: 0.8986 - acc: 0.6245
Epoch 66/100
237/237 [==============================] - 0s 207us/step - loss: 0.8945 - acc: 0.6160
Epoch 67/100
237/237 [==============================] - 0s 169us/step - loss: 0.8883 - acc: 0.6371
Epoch 68/100
237/237 [==============================] - 0s 186us/step - loss: 0.8905 - acc: 0.6287
Epoch 69/100
237/237 [==============================] - 0s 190us/step - loss: 0.8845 - acc: 0.6329
Epoch 70/100
237/237 [==============================] - 0s 198us/step - loss: 0.8661 - acc: 0.6456
Epoch 71/100
237/237 [==============================] - 0s 207us/step - loss: 0.9058 - acc: 0.6287
Epoch 72/100
237/237 [==============================] - 0s 177us/step - loss: 0.9061 - acc: 0.6118
Epoch 73/100
237/237 [==============================] - 0s 177us/step - loss: 0.8814 - acc: 0.6371
Epoch 74/100
237/237 [==============================] - 0s 198us/step - loss: 0.8819 - acc: 0.6371
Epoch 75/100
237/237 [==============================] - 0s 190us/step - loss: 0.8675 - acc: 0.6498
Epoch 76/100
237/237 [==============================] - 0s 173us/step - loss: 0.8753 - acc: 0.6160
Epoch 77/100
237/237 [==============================] - 0s 190us/step - loss: 0.8677 - acc: 0.6203
Epoch 78/100
237/237 [==============================] - 0s 219us/step - loss: 0.8835 - acc: 0.6245
Epoch 79/100
237/237 [==============================] - 0s 173us/step - loss: 0.8721 - acc: 0.6414
Epoch 80/100
237/237 [==============================] - 0s 177us/step - loss: 0.8986 - acc: 0.6371
Epoch 81/100
237/237 [==============================] - 0s 190us/step - loss: 0.8748 - acc: 0.6371
Epoch 82/100
237/237 [==============================] - 0s 211us/step - loss: 0.8766 - acc: 0.6287
Epoch 83/100
237/237 [==============================] - 0s 181us/step - loss: 0.8817 - acc: 0.6287
Epoch 84/100
237/237 [==============================] - 0s 177us/step - loss: 0.8661 - acc: 0.6329
Epoch 85/100
237/237 [==============================] - 0s 215us/step - loss: 0.8674 - acc: 0.6371
Epoch 86/100
237/237 [==============================] - 0s 198us/step - loss: 0.8680 - acc: 0.6456
Epoch 87/100
237/237 [==============================] - 0s 186us/step - loss: 0.8728 - acc: 0.6414
Epoch 88/100
237/237 [==============================] - 0s 186us/step - loss: 0.8725 - acc: 0.6498
Epoch 89/100
237/237 [==============================] - 0s 156us/step - loss: 0.8851 - acc: 0.6287
Epoch 90/100
237/237 [==============================] - 0s 165us/step - loss: 0.8691 - acc: 0.6456
Epoch 91/100
237/237 [==============================] - 0s 165us/step - loss: 0.8525 - acc: 0.6329
Epoch 92/100
237/237 [==============================] - 0s 156us/step - loss: 0.8585 - acc: 0.6540
Epoch 93/100
237/237 [==============================] - 0s 177us/step - loss: 0.8712 - acc: 0.6287
Epoch 94/100
237/237 [==============================] - 0s 194us/step - loss: 0.8904 - acc: 0.6414
Epoch 95/100
237/237 [==============================] - 0s 165us/step - loss: 0.8650 - acc: 0.6245
Epoch 96/100
237/237 [==============================] - 0s 165us/step - loss: 0.8776 - acc: 0.6624
Epoch 97/100
237/237 [==============================] - 0s 152us/step - loss: 0.8667 - acc: 0.6498
Epoch 98/100
237/237 [==============================] - 0s 173us/step - loss: 0.8654 - acc: 0.6160
Epoch 99/100
237/237 [==============================] - 0s 190us/step - loss: 0.8660 - acc: 0.6456
Epoch 100/100
237/237 [==============================] - 0s 194us/step - loss: 0.8651 - acc: 0.6456
Out[103]:
<keras.callbacks.History at 0x1b37bdd8>

4. Improving Results - A Binary Classification Problem¶

Although we achieved decent results, we still have a fairly large error. This could be because it is very difficult to distinguish between the different severity levels of heart disease (classes 1 - 4). We'll simplify the problem by converting the data to a binary classification problem - heart disease or no heart disease.

In [104]:
# convert into binary classification problem - heart disease or no heart disease
Y_train_binary = y_train.copy()
Y_test_binary = y_test.copy()

Y_train_binary[Y_train_binary > 0] = 1
Y_test_binary[Y_test_binary > 0] = 1

print Y_train_binary[:20]
[1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 1 0 1 0]
In [105]:
# define a new keras model for binary classification
def create_binary_model():
    # create model
    model = Sequential()
    model.add(Dense(8, input_dim=13, kernel_initializer='normal', activation='relu'))
    model.add(Dense(4, kernel_initializer='normal', activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    
    # Compile model
    adam = Adam(lr=0.001)
    model.compile(loss='binary_crossentropy', optimizer=adam, metrics=['accuracy'])
    return model

binary_model = create_binary_model()

print(binary_model.summary())
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_25 (Dense)             (None, 8)                 112       
_________________________________________________________________
dense_26 (Dense)             (None, 4)                 36        
_________________________________________________________________
dense_27 (Dense)             (None, 1)                 5         
=================================================================
Total params: 153
Trainable params: 153
Non-trainable params: 0
_________________________________________________________________
None
In [106]:
# fit the binary model on the training data
binary_model.fit(X_train, Y_train_binary, epochs=100, batch_size=10, verbose = 1)
Epoch 1/100
237/237 [==============================] - 0s 570us/step - loss: 0.6540 - acc: 0.6076
Epoch 2/100
237/237 [==============================] - 0s 823us/step - loss: 0.6105 - acc: 0.7257
Epoch 3/100
237/237 [==============================] - 0s 624us/step - loss: 0.5799 - acc: 0.7004
Epoch 4/100
237/237 [==============================] - 0s 612us/step - loss: 0.5865 - acc: 0.6878
Epoch 5/100
237/237 [==============================] - 0s 641us/step - loss: 0.5452 - acc: 0.7131
Epoch 6/100
237/237 [==============================] - 0s 586us/step - loss: 0.5272 - acc: 0.7468
Epoch 7/100
237/237 [==============================] - 0s 616us/step - loss: 0.5359 - acc: 0.7342
Epoch 8/100
237/237 [==============================] - 0s 549us/step - loss: 0.5308 - acc: 0.7384
Epoch 9/100
237/237 [==============================] - 0s 481us/step - loss: 0.5533 - acc: 0.7426
Epoch 10/100
237/237 [==============================] - 0s 418us/step - loss: 0.5054 - acc: 0.7511
Epoch 11/100
237/237 [==============================] - 0s 392us/step - loss: 0.4987 - acc: 0.7595
Epoch 12/100
237/237 [==============================] - 0s 401us/step - loss: 0.5226 - acc: 0.7679
Epoch 13/100
237/237 [==============================] - 0s 401us/step - loss: 0.5379 - acc: 0.7257
Epoch 14/100
237/237 [==============================] - 0s 236us/step - loss: 0.5283 - acc: 0.7131
Epoch 15/100
237/237 [==============================] - 0s 190us/step - loss: 0.4965 - acc: 0.7511
Epoch 16/100
237/237 [==============================] - 0s 228us/step - loss: 0.4759 - acc: 0.7848
Epoch 17/100
237/237 [==============================] - 0s 241us/step - loss: 0.4623 - acc: 0.7764
Epoch 18/100
237/237 [==============================] - 0s 236us/step - loss: 0.4665 - acc: 0.7722
Epoch 19/100
237/237 [==============================] - 0s 283us/step - loss: 0.4560 - acc: 0.7848
Epoch 20/100
237/237 [==============================] - 0s 321us/step - loss: 0.4708 - acc: 0.7806
Epoch 21/100
237/237 [==============================] - 0s 354us/step - loss: 0.4512 - acc: 0.7806
Epoch 22/100
237/237 [==============================] - 0s 287us/step - loss: 0.4745 - acc: 0.7848
Epoch 23/100
237/237 [==============================] - 0s 232us/step - loss: 0.4527 - acc: 0.7890
Epoch 24/100
237/237 [==============================] - 0s 232us/step - loss: 0.4586 - acc: 0.7848
Epoch 25/100
237/237 [==============================] - 0s 262us/step - loss: 0.5072 - acc: 0.7975
Epoch 26/100
237/237 [==============================] - 0s 316us/step - loss: 0.4528 - acc: 0.7932
Epoch 27/100
237/237 [==============================] - 0s 287us/step - loss: 0.4211 - acc: 0.7890
Epoch 28/100
237/237 [==============================] - 0s 278us/step - loss: 0.4151 - acc: 0.8017
Epoch 29/100
237/237 [==============================] - 0s 228us/step - loss: 0.4155 - acc: 0.8017
Epoch 30/100
237/237 [==============================] - 0s 224us/step - loss: 0.4308 - acc: 0.7975
Epoch 31/100
237/237 [==============================] - 0s 295us/step - loss: 0.4094 - acc: 0.8101
Epoch 32/100
237/237 [==============================] - 0s 219us/step - loss: 0.4259 - acc: 0.7932
Epoch 33/100
237/237 [==============================] - 0s 215us/step - loss: 0.3998 - acc: 0.8481
Epoch 34/100
237/237 [==============================] - 0s 291us/step - loss: 0.4087 - acc: 0.8143
Epoch 35/100
237/237 [==============================] - 0s 329us/step - loss: 0.4211 - acc: 0.8143
Epoch 36/100
237/237 [==============================] - 0s 203us/step - loss: 0.3951 - acc: 0.8143
Epoch 37/100
237/237 [==============================] - 0s 203us/step - loss: 0.3918 - acc: 0.8312
Epoch 38/100
237/237 [==============================] - 0s 409us/step - loss: 0.3969 - acc: 0.8228
Epoch 39/100
237/237 [==============================] - 0s 232us/step - loss: 0.3803 - acc: 0.8397
Epoch 40/100
237/237 [==============================] - 0s 203us/step - loss: 0.3904 - acc: 0.8270
Epoch 41/100
237/237 [==============================] - 0s 203us/step - loss: 0.3795 - acc: 0.8354
Epoch 42/100
237/237 [==============================] - 0s 207us/step - loss: 0.3759 - acc: 0.8397
Epoch 43/100
237/237 [==============================] - 0s 173us/step - loss: 0.3871 - acc: 0.8354
Epoch 44/100
237/237 [==============================] - 0s 156us/step - loss: 0.3782 - acc: 0.8565
Epoch 45/100
237/237 [==============================] - 0s 152us/step - loss: 0.3765 - acc: 0.8270
Epoch 46/100
237/237 [==============================] - 0s 143us/step - loss: 0.3679 - acc: 0.8481
Epoch 47/100
237/237 [==============================] - 0s 190us/step - loss: 0.3659 - acc: 0.8397
Epoch 48/100
237/237 [==============================] - 0s 160us/step - loss: 0.3629 - acc: 0.8481
Epoch 49/100
237/237 [==============================] - 0s 160us/step - loss: 0.3847 - acc: 0.8481
Epoch 50/100
237/237 [==============================] - 0s 156us/step - loss: 0.4066 - acc: 0.8059
Epoch 51/100
237/237 [==============================] - 0s 181us/step - loss: 0.3699 - acc: 0.8439
Epoch 52/100
237/237 [==============================] - 0s 211us/step - loss: 0.3650 - acc: 0.8439
Epoch 53/100
237/237 [==============================] - 0s 165us/step - loss: 0.3661 - acc: 0.8481
Epoch 54/100
237/237 [==============================] - 0s 177us/step - loss: 0.3731 - acc: 0.8270
Epoch 55/100
237/237 [==============================] - 0s 190us/step - loss: 0.3983 - acc: 0.8312
Epoch 56/100
237/237 [==============================] - 0s 325us/step - loss: 0.3597 - acc: 0.8481
Epoch 57/100
237/237 [==============================] - 0s 186us/step - loss: 0.3375 - acc: 0.8523
Epoch 58/100
237/237 [==============================] - 0s 169us/step - loss: 0.3839 - acc: 0.8270
Epoch 59/100
237/237 [==============================] - 0s 190us/step - loss: 0.3876 - acc: 0.8481
Epoch 60/100
237/237 [==============================] - 0s 177us/step - loss: 0.3541 - acc: 0.8608
Epoch 61/100
237/237 [==============================] - 0s 186us/step - loss: 0.3475 - acc: 0.8692
Epoch 62/100
237/237 [==============================] - 0s 207us/step - loss: 0.3698 - acc: 0.8608
Epoch 63/100
237/237 [==============================] - 0s 160us/step - loss: 0.4110 - acc: 0.8059
Epoch 64/100
237/237 [==============================] - 0s 156us/step - loss: 0.3555 - acc: 0.8523
Epoch 65/100
237/237 [==============================] - 0s 160us/step - loss: 0.4011 - acc: 0.8059
Epoch 66/100
237/237 [==============================] - 0s 165us/step - loss: 0.3660 - acc: 0.8439
Epoch 67/100
237/237 [==============================] - 0s 173us/step - loss: 0.3937 - acc: 0.8270
Epoch 68/100
237/237 [==============================] - 0s 165us/step - loss: 0.3518 - acc: 0.8523
Epoch 69/100
237/237 [==============================] - 0s 181us/step - loss: 0.3497 - acc: 0.8439
Epoch 70/100
237/237 [==============================] - 0s 160us/step - loss: 0.3472 - acc: 0.8650
Epoch 71/100
237/237 [==============================] - 0s 177us/step - loss: 0.3473 - acc: 0.8481
Epoch 72/100
237/237 [==============================] - 0s 173us/step - loss: 0.3475 - acc: 0.8481
Epoch 73/100
237/237 [==============================] - 0s 156us/step - loss: 0.3392 - acc: 0.8481
Epoch 74/100
237/237 [==============================] - 0s 173us/step - loss: 0.3508 - acc: 0.8650
Epoch 75/100
237/237 [==============================] - 0s 181us/step - loss: 0.3445 - acc: 0.8608
Epoch 76/100
237/237 [==============================] - 0s 160us/step - loss: 0.3460 - acc: 0.8692
Epoch 77/100
237/237 [==============================] - 0s 173us/step - loss: 0.3400 - acc: 0.8650
Epoch 78/100
237/237 [==============================] - 0s 160us/step - loss: 0.3464 - acc: 0.8523
Epoch 79/100
237/237 [==============================] - 0s 165us/step - loss: 0.3735 - acc: 0.8354
Epoch 80/100
237/237 [==============================] - 0s 160us/step - loss: 0.3428 - acc: 0.8565
Epoch 81/100
237/237 [==============================] - 0s 173us/step - loss: 0.3675 - acc: 0.8523
Epoch 82/100
237/237 [==============================] - 0s 160us/step - loss: 0.3607 - acc: 0.8439
Epoch 83/100
237/237 [==============================] - 0s 173us/step - loss: 0.3447 - acc: 0.8608
Epoch 84/100
237/237 [==============================] - 0s 169us/step - loss: 0.3611 - acc: 0.8523
Epoch 85/100
237/237 [==============================] - 0s 160us/step - loss: 0.3500 - acc: 0.8650
Epoch 86/100
237/237 [==============================] - 0s 181us/step - loss: 0.3437 - acc: 0.8397
Epoch 87/100
237/237 [==============================] - 0s 169us/step - loss: 0.3469 - acc: 0.8608
Epoch 88/100
237/237 [==============================] - 0s 148us/step - loss: 0.3691 - acc: 0.8397
Epoch 89/100
237/237 [==============================] - 0s 148us/step - loss: 0.3431 - acc: 0.8523
Epoch 90/100
237/237 [==============================] - 0s 160us/step - loss: 0.3492 - acc: 0.8397
Epoch 91/100
237/237 [==============================] - 0s 177us/step - loss: 0.3603 - acc: 0.8481
Epoch 92/100
237/237 [==============================] - 0s 165us/step - loss: 0.3425 - acc: 0.8608
Epoch 93/100
237/237 [==============================] - 0s 177us/step - loss: 0.3474 - acc: 0.8734
Epoch 94/100
237/237 [==============================] - 0s 173us/step - loss: 0.3453 - acc: 0.8692
Epoch 95/100
237/237 [==============================] - 0s 165us/step - loss: 0.3512 - acc: 0.8481
Epoch 96/100
237/237 [==============================] - 0s 152us/step - loss: 0.3952 - acc: 0.8101
Epoch 97/100
237/237 [==============================] - 0s 181us/step - loss: 0.3925 - acc: 0.8397
Epoch 98/100
237/237 [==============================] - 0s 198us/step - loss: 0.3736 - acc: 0.8270
Epoch 99/100
237/237 [==============================] - 0s 160us/step - loss: 0.3411 - acc: 0.8397
Epoch 100/100
237/237 [==============================] - 0s 165us/step - loss: 0.3547 - acc: 0.8439
Out[106]:
<keras.callbacks.History at 0x1bd22438>

5. Results and Metrics¶

Let's test the performance of both our categorical model and binary model. To do this, we will make predictions on the training dataset and calculate performance metrics using Sklearn.

In [116]:
# generate classification report using predictions for categorical model
from sklearn.metrics import classification_report, accuracy_score

categorical_pred = np.argmax(model.predict(X_test), axis=1)

print('Results for Categorical Model')
print(accuracy_score(y_test, categorical_pred))
print(classification_report(y_test, categorical_pred))
Results for Categorical Model
0.6166666666666667
             precision    recall  f1-score   support

          0       0.73      0.90      0.81        30
          1       0.14      0.09      0.11        11
          2       0.38      0.38      0.38         8
          3       0.75      0.60      0.67        10
          4       0.00      0.00      0.00         1

avg / total       0.57      0.62      0.58        60

In [117]:
# generate classification report using predictions for binary model 
binary_pred = np.round(binary_model.predict(X_test)).astype(int)

print('Results for Binary Model')
print(accuracy_score(Y_test_binary, binary_pred))
print(classification_report(Y_test_binary, binary_pred))
Results for Binary Model
0.8
             precision    recall  f1-score   support

          0       0.75      0.90      0.82        30
          1       0.88      0.70      0.78        30

avg / total       0.81      0.80      0.80        60

In [ ]: