Childhood Autistic Spectrum Disorder Screening using Machine LearningÂ¶

The early diagnosis of neurodevelopment disorders can improve treatment and significantly decrease the associated healthcare costs. In this project, we used supervised learning to diagnose Autistic Spectrum Disorder (ASD) based on behavioural features and individual characteristics. More specifically, we built and deployed a neural network using the Keras API.

This project used a dataset provided by the UCI Machine Learning Repository that contains screening data for 292 patients. The dataset can be found at the following URL: https://archive.ics.uci.edu/ml/datasets/Autistic+Spectrum+Disorder+Screening+Data+for+Children++

1. Importing the DatasetÂ¶

The data was obtained from the UCI Machine Learning Repository; it will be downloaded as a compressed zip file and then extracted manually. The data will then be read in from a text file using Pandas.

# import the dataset
file = 'autism-data.txt'

# read the csv
data = pd.read_table(file, sep = ',', index_col = None)

# print the shape of the DataFrame, so we can see how many examples we have
print 'Shape of DataFrame: {}'.format(data.shape)
print data.loc[0]

Shape of DataFrame: (292, 21)
A1_Score                            1
A2_Score                            1
A3_Score                            0
A4_Score                            0
A5_Score                            1
A6_Score                            1
A7_Score                            0
A8_Score                            1
A9_Score                            0
A10_Score                           0
age                                 6
gender                              m
ethnicity                      Others
jundice                            no
family_history_of_PDD              no
contry_of_res                  Jordan
used_app_before                    no
result                              5
age_desc                 '4-11 years'
relation                       Parent
class                              NO
Name: 0, dtype: object

# print out multiple patients at the same time
data.loc[:10]

# print out a description of the dataframe
data.describe()

2. Data PreprocessingÂ¶

This dataset requires multiple preprocessing steps. There are columns in the DataFrame that aren't wanted for the training of the neural network. Also, the data being reported is using strings that need to be converted to categorical labels. Finally, the dataset needs to be split into X and Y datasets, where X includes attributes used for prediction and Y has the class labels.

# drop unwanted columns
data = data.drop(['result', 'age_desc'], axis=1)

data.loc[:10]

# create X and Y datasets for training
x = data.drop(['class'], 1)
y = data['class']

x.loc[:10]

# convert the data to categorical values - one-hot-encoded vectors
X = pd.get_dummies(x)

# print the new categorical column labels
X.columns.values

array(['A1_Score', 'A2_Score', 'A3_Score', 'A4_Score', 'A5_Score',
       'A6_Score', 'A7_Score', 'A8_Score', 'A9_Score', 'A10_Score',
       'age_10', 'age_11', 'age_4', 'age_5', 'age_6', 'age_7', 'age_8',
       'age_9', 'age_?', 'gender_f', 'gender_m',
       "ethnicity_'Middle Eastern '", "ethnicity_'South Asian'",
       'ethnicity_?', 'ethnicity_Asian', 'ethnicity_Black',
       'ethnicity_Hispanic', 'ethnicity_Latino', 'ethnicity_Others',
       'ethnicity_Pasifika', 'ethnicity_Turkish',
       'ethnicity_White-European', 'jundice_no', 'jundice_yes',
       'family_history_of_PDD_no', 'family_history_of_PDD_yes',
       "contry_of_res_'Costa Rica'", "contry_of_res_'Isle of Man'",
       "contry_of_res_'New Zealand'", "contry_of_res_'Saudi Arabia'",
       "contry_of_res_'South Africa'", "contry_of_res_'South Korea'",
       "contry_of_res_'U.S. Outlying Islands'",
       "contry_of_res_'United Arab Emirates'",
       "contry_of_res_'United Kingdom'", "contry_of_res_'United States'",
       'contry_of_res_Afghanistan', 'contry_of_res_Argentina',
       'contry_of_res_Armenia', 'contry_of_res_Australia',
       'contry_of_res_Austria', 'contry_of_res_Bahrain',
       'contry_of_res_Bangladesh', 'contry_of_res_Bhutan',
       'contry_of_res_Brazil', 'contry_of_res_Bulgaria',
       'contry_of_res_Canada', 'contry_of_res_China',
       'contry_of_res_Egypt', 'contry_of_res_Europe',
       'contry_of_res_Georgia', 'contry_of_res_Germany',
       'contry_of_res_Ghana', 'contry_of_res_India', 'contry_of_res_Iraq',
       'contry_of_res_Ireland', 'contry_of_res_Italy',
       'contry_of_res_Japan', 'contry_of_res_Jordan',
       'contry_of_res_Kuwait', 'contry_of_res_Latvia',
       'contry_of_res_Lebanon', 'contry_of_res_Libya',
       'contry_of_res_Malaysia', 'contry_of_res_Malta',
       'contry_of_res_Mexico', 'contry_of_res_Nepal',
       'contry_of_res_Netherlands', 'contry_of_res_Nigeria',
       'contry_of_res_Oman', 'contry_of_res_Pakistan',
       'contry_of_res_Philippines', 'contry_of_res_Qatar',
       'contry_of_res_Romania', 'contry_of_res_Russia',
       'contry_of_res_Sweden', 'contry_of_res_Syria',
       'contry_of_res_Turkey', 'used_app_before_no',
       'used_app_before_yes', "relation_'Health care professional'",
       'relation_?', 'relation_Parent', 'relation_Relative',
       'relation_Self', 'relation_self'], dtype=object)

# print an example patient from the categorical data
X.loc[1]

A1_Score                               1
A2_Score                               1
A3_Score                               0
A4_Score                               0
A5_Score                               1
A6_Score                               1
A7_Score                               0
A8_Score                               1
A9_Score                               0
A10_Score                              0
age_10                                 0
age_11                                 0
age_4                                  0
age_5                                  0
age_6                                  1
age_7                                  0
age_8                                  0
age_9                                  0
age_?                                  0
gender_f                               0
gender_m                               1
ethnicity_'Middle Eastern '            1
ethnicity_'South Asian'                0
ethnicity_?                            0
ethnicity_Asian                        0
ethnicity_Black                        0
ethnicity_Hispanic                     0
ethnicity_Latino                       0
ethnicity_Others                       0
ethnicity_Pasifika                     0
                                      ..
contry_of_res_Italy                    0
contry_of_res_Japan                    0
contry_of_res_Jordan                   1
contry_of_res_Kuwait                   0
contry_of_res_Latvia                   0
contry_of_res_Lebanon                  0
contry_of_res_Libya                    0
contry_of_res_Malaysia                 0
contry_of_res_Malta                    0
contry_of_res_Mexico                   0
contry_of_res_Nepal                    0
contry_of_res_Netherlands              0
contry_of_res_Nigeria                  0
contry_of_res_Oman                     0
contry_of_res_Pakistan                 0
contry_of_res_Philippines              0
contry_of_res_Qatar                    0
contry_of_res_Romania                  0
contry_of_res_Russia                   0
contry_of_res_Sweden                   0
contry_of_res_Syria                    0
contry_of_res_Turkey                   0
used_app_before_no                     1
used_app_before_yes                    0
relation_'Health care professional'    0
relation_?                             0
relation_Parent                        1
relation_Relative                      0
relation_Self                          0
relation_self                          0
Name: 1, Length: 96, dtype: int64

# convert the class data to categorical values - one-hot-encoded vectors
Y = pd.get_dummies(y)

Y.iloc[:10]

3. Split the Dataset into Training and Testing DatasetsÂ¶

Before training the neural network, the dataset needs to be split into training and testing datasets. This can be done using the train_test_split() function provided by scikit-learn

from sklearn import model_selection
# split the X and Y data into training and testing datasets
X_train, X_test, Y_train, Y_test = model_selection.train_test_split(X, Y, test_size = 0.2)

print X_train.shape
print X_test.shape
print Y_train.shape
print Y_test.shape

(233, 96)
(59, 96)
(233, 2)
(59, 2)

4. Building the Network - KerasÂ¶

Keras is used to build and train the network. This model will be relatively simple and will only use dense (also known as fully connected) layers. The network will have one hidden layer, use an Adam optimizer, and a categorical crossentropy loss.

# build a neural network using Keras
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import Adam

# define a function to build the keras model
def create_model():
    # create model
    model = Sequential()
    model.add(Dense(8, input_dim=96, kernel_initializer='normal', activation='relu'))
    model.add(Dense(4, kernel_initializer='normal', activation='relu'))
    model.add(Dense(2, activation='sigmoid'))
    
    # compile model
    adam = Adam(lr=0.001)
    model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['accuracy'])
    return model

model = create_model()

print(model.summary())

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_1 (Dense)              (None, 8)                 776       
_________________________________________________________________
dense_2 (Dense)              (None, 4)                 36        
_________________________________________________________________
dense_3 (Dense)              (None, 2)                 10        
=================================================================
Total params: 822
Trainable params: 822
Non-trainable params: 0
_________________________________________________________________
None

5. Training the NetworkÂ¶

Train the Keras model by calling model.fit().

# fit the model to the training data
model.fit(X_train, Y_train, epochs=50, batch_size=10, verbose = 1)

Epoch 1/50
233/233 [==============================] - 0s 288us/step - loss: 0.6927 - acc: 0.5794
Epoch 2/50
233/233 [==============================] - 0s 245us/step - loss: 0.6910 - acc: 0.7210
Epoch 3/50
233/233 [==============================] - 0s 258us/step - loss: 0.6868 - acc: 0.7639
Epoch 4/50
233/233 [==============================] - 0s 236us/step - loss: 0.6779 - acc: 0.7082
Epoch 5/50
233/233 [==============================] - 0s 236us/step - loss: 0.6619 - acc: 0.8541
Epoch 6/50
233/233 [==============================] - 0s 305us/step - loss: 0.6340 - acc: 0.8283
Epoch 7/50
233/233 [==============================] - 0s 227us/step - loss: 0.5963 - acc: 0.8541
Epoch 8/50
233/233 [==============================] - 0s 305us/step - loss: 0.5446 - acc: 0.9399
Epoch 9/50
233/233 [==============================] - 0s 240us/step - loss: 0.4884 - acc: 0.8884
Epoch 10/50
233/233 [==============================] - 0s 227us/step - loss: 0.4220 - acc: 0.9227
Epoch 11/50
233/233 [==============================] - 0s 322us/step - loss: 0.3603 - acc: 0.9313
Epoch 12/50
233/233 [==============================] - 0s 245us/step - loss: 0.2935 - acc: 0.9614
Epoch 13/50
233/233 [==============================] - 0s 296us/step - loss: 0.2528 - acc: 0.9657
Epoch 14/50
233/233 [==============================] - 0s 330us/step - loss: 0.2087 - acc: 0.9657
Epoch 15/50
233/233 [==============================] - 0s 305us/step - loss: 0.1788 - acc: 0.9871
Epoch 16/50
233/233 [==============================] - 0s 313us/step - loss: 0.1605 - acc: 0.9700
Epoch 17/50
233/233 [==============================] - 0s 309us/step - loss: 0.1389 - acc: 0.9828
Epoch 18/50
233/233 [==============================] - 0s 335us/step - loss: 0.1258 - acc: 0.9785
Epoch 19/50
233/233 [==============================] - 0s 343us/step - loss: 0.1108 - acc: 0.9871
Epoch 20/50
233/233 [==============================] - 0s 399us/step - loss: 0.1004 - acc: 0.9871
Epoch 21/50
233/233 [==============================] - 0s 416us/step - loss: 0.0910 - acc: 0.9871
Epoch 22/50
233/233 [==============================] - 0s 343us/step - loss: 0.0820 - acc: 0.9871
Epoch 23/50
233/233 [==============================] - 0s 361us/step - loss: 0.0752 - acc: 0.9914
Epoch 24/50
233/233 [==============================] - 0s 356us/step - loss: 0.0714 - acc: 0.9957
Epoch 25/50
233/233 [==============================] - 0s 309us/step - loss: 0.0634 - acc: 0.9957
Epoch 26/50
233/233 [==============================] - 0s 339us/step - loss: 0.0585 - acc: 0.9957
Epoch 27/50
233/233 [==============================] - 0s 335us/step - loss: 0.0571 - acc: 1.0000
Epoch 28/50
233/233 [==============================] - 0s 429us/step - loss: 0.0526 - acc: 0.9957
Epoch 29/50
233/233 [==============================] - 0s 335us/step - loss: 0.0474 - acc: 1.0000
Epoch 30/50
233/233 [==============================] - 0s 322us/step - loss: 0.0463 - acc: 0.9957
Epoch 31/50
233/233 [==============================] - 0s 296us/step - loss: 0.0431 - acc: 1.0000
Epoch 32/50
233/233 [==============================] - 0s 348us/step - loss: 0.0381 - acc: 1.0000
Epoch 33/50
233/233 [==============================] - 0s 322us/step - loss: 0.0357 - acc: 1.0000
Epoch 34/50
233/233 [==============================] - 0s 292us/step - loss: 0.0331 - acc: 1.0000
Epoch 35/50
233/233 [==============================] - 0s 305us/step - loss: 0.0316 - acc: 1.0000
Epoch 36/50
233/233 [==============================] - 0s 335us/step - loss: 0.0294 - acc: 1.0000
Epoch 37/50
233/233 [==============================] - 0s 322us/step - loss: 0.0282 - acc: 1.0000
Epoch 38/50
233/233 [==============================] - 0s 236us/step - loss: 0.0281 - acc: 1.0000
Epoch 39/50
233/233 [==============================] - 0s 339us/step - loss: 0.0253 - acc: 1.0000
Epoch 40/50
233/233 [==============================] - 0s 223us/step - loss: 0.0252 - acc: 1.0000
Epoch 41/50
233/233 [==============================] - 0s 326us/step - loss: 0.0226 - acc: 1.0000
Epoch 42/50
233/233 [==============================] - 0s 326us/step - loss: 0.0213 - acc: 1.0000
Epoch 43/50
233/233 [==============================] - 0s 219us/step - loss: 0.0203 - acc: 1.0000
Epoch 44/50
233/233 [==============================] - 0s 215us/step - loss: 0.0193 - acc: 1.0000
Epoch 45/50
233/233 [==============================] - 0s 318us/step - loss: 0.0190 - acc: 1.0000
Epoch 46/50
233/233 [==============================] - 0s 232us/step - loss: 0.0176 - acc: 1.0000
Epoch 47/50
233/233 [==============================] - 0s 215us/step - loss: 0.0163 - acc: 1.0000
Epoch 48/50
233/233 [==============================] - 0s 202us/step - loss: 0.0161 - acc: 1.0000
Epoch 49/50
233/233 [==============================] - 0s 240us/step - loss: 0.0154 - acc: 1.0000
Epoch 50/50
233/233 [==============================] - 0s 223us/step - loss: 0.0150 - acc: 1.0000

<keras.callbacks.History at 0x12000f28>

6. Testing and Performance MetricsÂ¶

Now that the model has been trained, we need to test its performance on the testing dataset. The model has never seen this information before; as a result, the testing dataset allows us to determine whether or not the model will be able to generalize to information that wasn't used during its training phase.

# generate classification report using predictions for categorical model
from sklearn.metrics import classification_report, accuracy_score

predictions = model.predict_classes(X_test)
predictions

array([1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1,
       1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0,
       1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0], dtype=int64)

print('Results for Categorical Model')
print(accuracy_score(Y_test[['YES']], predictions))
print(classification_report(Y_test[['YES']], predictions))

Results for Categorical Model
0.9661016949152542
             precision    recall  f1-score   support

          0       0.97      0.97      0.97        36
          1       0.96      0.96      0.96        23

avg / total       0.97      0.97      0.97        59

	A1_Score	A2_Score	A3_Score	A4_Score	A5_Score	A6_Score	A7_Score	A8_Score	A9_Score	A10_Score	result
count	292.000000	292.000000	292.000000	292.000000	292.000000	292.000000	292.000000	292.000000	292.000000	292.000000	292.000000
mean	0.633562	0.534247	0.743151	0.551370	0.743151	0.712329	0.606164	0.496575	0.493151	0.726027	6.239726
std	0.482658	0.499682	0.437646	0.498208	0.437646	0.453454	0.489438	0.500847	0.500811	0.446761	2.284882
min	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
25%	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	5.000000
50%	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	0.000000	0.000000	1.000000	6.000000
75%	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	8.000000
max	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	10.000000

	A1_Score	A2_Score	A3_Score	A4_Score	A5_Score	A6_Score	A7_Score	A8_Score	A9_Score	A10_Score	...	gender	ethnicity	jundice	family_history_of_PDD	contry_of_res	used_app_before	result	age_desc	relation	class
0	1	1	0	0	1	1	0	1	0	0	...	m	Others	no	no	Jordan	no	5	'4-11 years'	Parent	NO
1	1	1	0	0	1	1	0	1	0	0	...	m	'Middle Eastern '	no	no	Jordan	no	5	'4-11 years'	Parent	NO
2	1	1	0	0	0	1	1	1	0	0	...	m	?	no	no	Jordan	yes	5	'4-11 years'	?	NO
3	0	1	0	0	1	1	0	0	0	1	...	f	?	yes	no	Jordan	no	4	'4-11 years'	?	NO
4	1	1	1	1	1	1	1	1	1	1	...	m	Others	yes	no	'United States'	no	10	'4-11 years'	Parent	YES
5	0	0	1	0	1	1	0	1	0	1	...	m	?	no	yes	Egypt	no	5	'4-11 years'	?	NO
6	1	0	1	1	1	1	0	1	0	1	...	m	White-European	no	no	'United Kingdom'	no	7	'4-11 years'	Parent	YES
7	1	1	1	1	1	1	1	1	0	0	...	f	'Middle Eastern '	no	no	Bahrain	no	8	'4-11 years'	Parent	YES
8	1	1	1	1	1	1	1	0	0	0	...	f	'Middle Eastern '	no	no	Bahrain	no	7	'4-11 years'	Parent	YES
9	0	0	1	1	1	0	1	1	0	0	...	f	?	no	yes	Austria	no	5	'4-11 years'	?	NO
10	1	0	0	0	1	1	1	1	1	1	...	m	White-European	yes	no	'United Kingdom'	no	7	'4-11 years'	Self	YES