The early diagnosis of neurodevelopment disorders can improve treatment and significantly decrease the associated healthcare costs. In this project, we used supervised learning to diagnose Autistic Spectrum Disorder (ASD) based on behavioural features and individual characteristics. More specifically, we built and deployed a neural network using the Keras API.
This project used a dataset provided by the UCI Machine Learning Repository that contains screening data for 292 patients. The dataset can be found at the following URL: https://archive.ics.uci.edu/ml/datasets/Autistic+Spectrum+Disorder+Screening+Data+for+Children++
The data was obtained from the UCI Machine Learning Repository; it will be downloaded as a compressed zip file and then extracted manually. The data will then be read in from a text file using Pandas.
# import the dataset
file = 'autism-data.txt'
# read the csv
data = pd.read_table(file, sep = ',', index_col = None)
# print the shape of the DataFrame, so we can see how many examples we have
print 'Shape of DataFrame: {}'.format(data.shape)
print data.loc[0]
# print out multiple patients at the same time
data.loc[:10]
# print out a description of the dataframe
data.describe()
This dataset requires multiple preprocessing steps. There are columns in the DataFrame that aren't wanted for the training of the neural network. Also, the data being reported is using strings that need to be converted to categorical labels. Finally, the dataset needs to be split into X and Y datasets, where X includes attributes used for prediction and Y has the class labels.
# drop unwanted columns
data = data.drop(['result', 'age_desc'], axis=1)
data.loc[:10]
# create X and Y datasets for training
x = data.drop(['class'], 1)
y = data['class']
x.loc[:10]
# convert the data to categorical values - one-hot-encoded vectors
X = pd.get_dummies(x)
# print the new categorical column labels
X.columns.values
# print an example patient from the categorical data
X.loc[1]
# convert the class data to categorical values - one-hot-encoded vectors
Y = pd.get_dummies(y)
Y.iloc[:10]
Before training the neural network, the dataset needs to be split into training and testing datasets. This can be done using the train_test_split() function provided by scikit-learn
from sklearn import model_selection
# split the X and Y data into training and testing datasets
X_train, X_test, Y_train, Y_test = model_selection.train_test_split(X, Y, test_size = 0.2)
print X_train.shape
print X_test.shape
print Y_train.shape
print Y_test.shape
Keras is used to build and train the network. This model will be relatively simple and will only use dense (also known as fully connected) layers. The network will have one hidden layer, use an Adam optimizer, and a categorical crossentropy loss.
# build a neural network using Keras
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import Adam
# define a function to build the keras model
def create_model():
# create model
model = Sequential()
model.add(Dense(8, input_dim=96, kernel_initializer='normal', activation='relu'))
model.add(Dense(4, kernel_initializer='normal', activation='relu'))
model.add(Dense(2, activation='sigmoid'))
# compile model
adam = Adam(lr=0.001)
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['accuracy'])
return model
model = create_model()
print(model.summary())
Train the Keras model by calling model.fit().
# fit the model to the training data
model.fit(X_train, Y_train, epochs=50, batch_size=10, verbose = 1)
Now that the model has been trained, we need to test its performance on the testing dataset. The model has never seen this information before; as a result, the testing dataset allows us to determine whether or not the model will be able to generalize to information that wasn't used during its training phase.
# generate classification report using predictions for categorical model
from sklearn.metrics import classification_report, accuracy_score
predictions = model.predict_classes(X_test)
predictions
print('Results for Categorical Model')
print(accuracy_score(Y_test[['YES']], predictions))
print(classification_report(Y_test[['YES']], predictions))