In this Project, we'll:
Deep neural networks have been used to reach state-of-the-art performance on image classification tasks in the last decade. For some image classification tasks, deep neural networks actually perform as well as or slightly better than the human benchmark. We can read about the history of deep neural networks here.
Before the year 2000, institutions like the United States Post Office used handwriting recognition software to read addresses, zip codes, and more. One of their approaches, which consists of pre-processing handwritten images then feeding to a neural network model is detailed in this paper.
Why is image classification a hard task?
Within the field of machine learning and pattern recognition, image classification (especially for handwritten text) is towards the difficult end of the spectrum. There are a few reasons for this.
Why is deep learning effective in image classification?
Deep learning is effective in image classification because of the models' ability to learn hierarchical representations.
Scikit-learn contains a number of datasets pre-loaded with the library, within the namespace of sklearn.datasets
. The load_digits()
function returns a copy of the hand-written digits dataset from UCI.
Because dataframes are a tabular representation of data, each image is represented as a row of pixel values. To visualize an image from the dataframe, we need to reshape the image back to its original dimensions (28 x 28 pixels). To visualize the image, we need to reshape these pixel values back into the 28 by 28 and plot them on a coordinate grid.
from sklearn.datasets import load_digits
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
digits_data = load_digits()
digits_data.keys()
dict_keys(['data', 'target', 'frame', 'feature_names', 'target_names', 'images', 'DESCR'])
labels = pd.Series(digits_data['target'])
data = pd.DataFrame(digits_data['data'])
data.head(10)
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ... | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.0 | 0.0 | 5.0 | 13.0 | 9.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 6.0 | 13.0 | 10.0 | 0.0 | 0.0 | 0.0 |
1 | 0.0 | 0.0 | 0.0 | 12.0 | 13.0 | 5.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 11.0 | 16.0 | 10.0 | 0.0 | 0.0 |
2 | 0.0 | 0.0 | 0.0 | 4.0 | 15.0 | 12.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 5.0 | 0.0 | 0.0 | 0.0 | 0.0 | 3.0 | 11.0 | 16.0 | 9.0 | 0.0 |
3 | 0.0 | 0.0 | 7.0 | 15.0 | 13.0 | 1.0 | 0.0 | 0.0 | 0.0 | 8.0 | ... | 9.0 | 0.0 | 0.0 | 0.0 | 7.0 | 13.0 | 13.0 | 9.0 | 0.0 | 0.0 |
4 | 0.0 | 0.0 | 0.0 | 1.0 | 11.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 2.0 | 16.0 | 4.0 | 0.0 | 0.0 |
5 | 0.0 | 0.0 | 12.0 | 10.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 4.0 | 0.0 | 0.0 | 0.0 | 9.0 | 16.0 | 16.0 | 10.0 | 0.0 | 0.0 |
6 | 0.0 | 0.0 | 0.0 | 12.0 | 13.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 8.0 | 0.0 | 0.0 | 0.0 | 1.0 | 9.0 | 15.0 | 11.0 | 3.0 | 0.0 |
7 | 0.0 | 0.0 | 7.0 | 8.0 | 13.0 | 16.0 | 15.0 | 1.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 13.0 | 5.0 | 0.0 | 0.0 | 0.0 | 0.0 |
8 | 0.0 | 0.0 | 9.0 | 14.0 | 8.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 8.0 | 0.0 | 0.0 | 0.0 | 11.0 | 16.0 | 15.0 | 11.0 | 1.0 | 0.0 |
9 | 0.0 | 0.0 | 11.0 | 12.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 2.0 | ... | 4.0 | 0.0 | 0.0 | 0.0 | 9.0 | 12.0 | 13.0 | 3.0 | 0.0 | 0.0 |
10 rows × 64 columns
first_image = data.iloc[0]
np_image = first_image.values
np_image = np_image.reshape(8,8)
plt.imshow(np_image, cmap='gray_r')
<matplotlib.image.AxesImage at 0x2a7ace36090>
f, axarr = plt.subplots(2, 4)
axarr[0, 0].imshow(data.iloc[0].values.reshape(8,8), cmap='gray_r')
axarr[0, 1].imshow(data.iloc[99].values.reshape(8,8), cmap='gray_r')
axarr[0, 2].imshow(data.iloc[199].values.reshape(8,8), cmap='gray_r')
axarr[0, 3].imshow(data.iloc[299].values.reshape(8,8), cmap='gray_r')
axarr[1, 0].imshow(data.iloc[999].values.reshape(8,8), cmap='gray_r')
axarr[1, 1].imshow(data.iloc[1099].values.reshape(8,8), cmap='gray_r')
axarr[1, 2].imshow(data.iloc[1199].values.reshape(8,8), cmap='gray_r')
axarr[1, 3].imshow(data.iloc[1299].values.reshape(8,8), cmap='gray_r')
<matplotlib.image.AxesImage at 0x2a7ace113d0>
While linear and logistic regression models make assumptions about the linearity between the features and the output labels, the k-nearest neighbors algorithm make no such assumption. This allows them to capture nonlinearity in the data. If you recall, k-nearest neighbors don't have a specific model representation (hence why it's referred to as an algorithm and not a model).
The k-nearest neighbors algorithm compares every unseen observation in the test set to all (or many, as some implementations constrain the search space) training observations to look for similar (or the "nearest") observations. Then, the algorithm finds the label with the most nearby observations and assigns that as the prediction for the unseen observation.
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import KFold
# 50% Train / test validation
def train_knn(nneighbors, train_features, train_labels):
knn = KNeighborsClassifier(n_neighbors = nneighbors)
knn.fit(train_features, train_labels)
return knn
def test(model, test_features, test_labels):
predictions = model.predict(test_features)
train_test_df = pd.DataFrame()
train_test_df['correct_label'] = test_labels
train_test_df['predicted_label'] = predictions
overall_accuracy = sum(train_test_df["predicted_label"] == train_test_df["correct_label"])/len(train_test_df)
return overall_accuracy
def cross_validate(k):
fold_accuracies = []
kf = KFold(n_splits = 4, random_state=2, shuffle=True)
for train_index, test_index in kf.split(data):
train_features, test_features = data.loc[train_index], data.loc[test_index]
train_labels, test_labels = labels.loc[train_index], labels.loc[test_index]
model = train_knn(k, train_features, train_labels)
overall_accuracy = test(model, test_features, test_labels)
fold_accuracies.append(overall_accuracy)
return fold_accuracies
knn_one_accuracies = cross_validate(1)
np.mean(knn_one_accuracies)
0.9888728037614452
k_values = list(range(1,10))
k_overall_accuracies = []
for k in k_values:
k_accuracies = cross_validate(k)
k_mean_accuracy = np.mean(k_accuracies)
k_overall_accuracies.append(k_mean_accuracy)
plt.figure(figsize=(8,4))
plt.title("Mean Accuracy vs. k")
plt.plot(k_values, k_overall_accuracies)
[<matplotlib.lines.Line2D at 0x2a7af40d410>]
There are a few downsides to using k-nearest neighbors:
high memory usage (for each new unseen observation, many comparisons need to be made to seen observations) no model representation to debug and explore.
Let's now try a neural network with a single hidden layer. Use the MLPClassifier package from scikit-learn.
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import KFold
from sklearn.neural_network import MLPClassifier
# 50% Train / test validation
def train_nn(neuron_arch, train_features, train_labels):
mlp = MLPClassifier(hidden_layer_sizes=neuron_arch, max_iter=2000)
mlp.fit(train_features, train_labels)
return mlp
def test(model, test_features, test_labels):
predictions = model.predict(test_features)
train_test_df = pd.DataFrame()
train_test_df['correct_label'] = test_labels
train_test_df['predicted_label'] = predictions
overall_accuracy = sum(train_test_df["predicted_label"] == train_test_df["correct_label"])/len(train_test_df)
return overall_accuracy
def cross_validate(neuron_arch):
fold_accuracies = []
kf = KFold(n_splits = 4, random_state=2, shuffle=True)
for train_index, test_index in kf.split(data):
train_features, test_features = data.loc[train_index], data.loc[test_index]
train_labels, test_labels = labels.loc[train_index], labels.loc[test_index]
model = train_nn(neuron_arch, train_features, train_labels)
overall_accuracy = test(model, test_features, test_labels)
fold_accuracies.append(overall_accuracy)
return fold_accuracies
nn_one_neurons = [
(8,),
(16,),
(32,),
(64,),
(128,),
(256,)
]
nn_one_accuracies = []
for n in nn_one_neurons:
nn_accuracies = cross_validate(n)
nn_mean_accuracy = np.mean(nn_accuracies)
nn_one_accuracies.append(nn_mean_accuracy)
plt.figure(figsize=(8,4))
plt.title("Mean Accuracy vs. Neurons In Single Hidden Layer")
x = [i[0] for i in nn_one_neurons]
plt.plot(x, nn_one_accuracies)
[<matplotlib.lines.Line2D at 0x2a7bcb43750>]
It looks like adding more neurons to the single hidden layer improved simple accuracy to approximately 97%. Simple accuracy computes the number of correct classifications the model made, but doesn't tell us anything about false or true positives or false or true negatives.
Given that k-nearest neighbors achieved approximately 98% accuracy, there doesn't seem to be any advantages to using a single hidden layer neural network for this problem.
nn_two_neurons = [
(64,64),
(128, 128),
(256, 256)
]
nn_two_accuracies = []
for n in nn_two_neurons:
nn_accuracies = cross_validate(n)
nn_mean_accuracy = np.mean(nn_accuracies)
nn_two_accuracies.append(nn_mean_accuracy)
plt.figure(figsize=(8,4))
plt.title("Mean Accuracy vs. Neurons In Two Hidden Layers")
x = [i[0] for i in nn_two_neurons]
plt.plot(x, nn_two_accuracies)
[<matplotlib.lines.Line2D at 0x2a7af9c60d0>]
nn_two_accuracies
[0.964390002474635, 0.976077703538728, 0.9788604305864885]
Using two hidden layers improved our simple accuracy to 98%. While, traditionally, we might worry about overfitting, using four-fold cross validation also gives us a bit more assurance that the model is generalizing to achieve the extra 1% in simple accuracy over the single hidden layer networks we tried earlier.
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import KFold
# 50% Train / test validation
def train_nn(neuron_arch, train_features, train_labels):
mlp = MLPClassifier(hidden_layer_sizes=neuron_arch, max_iter=2000)
mlp.fit(train_features, train_labels)
return mlp
def test(model, test_features, test_labels):
predictions = model.predict(test_features)
train_test_df = pd.DataFrame()
train_test_df['correct_label'] = test_labels
train_test_df['predicted_label'] = predictions
overall_accuracy = sum(train_test_df["predicted_label"] == train_test_df["correct_label"])/len(train_test_df)
return overall_accuracy
def cross_validate_six(neuron_arch):
fold_accuracies = []
kf = KFold(n_splits = 6, random_state=2, shuffle=True)
for train_index, test_index in kf.split(data):
train_features, test_features = data.loc[train_index], data.loc[test_index]
train_labels, test_labels = labels.loc[train_index], labels.loc[test_index]
model = train_nn(neuron_arch, train_features, train_labels)
overall_accuracy = test(model, test_features, test_labels)
fold_accuracies.append(overall_accuracy)
return fold_accuracies
nn_three_neurons = [
(10, 10, 10),
(64, 64, 64),
(128, 128, 128)
]
nn_three_accuracies = []
for n in nn_three_neurons:
nn_accuracies = cross_validate_six(n)
nn_mean_accuracy = np.mean(nn_accuracies)
nn_three_accuracies.append(nn_mean_accuracy)
plt.figure(figsize=(8,4))
plt.title("Mean Accuracy vs. Neurons In Three Hidden Layers")
x = [i[0] for i in nn_three_neurons]
plt.plot(x, nn_three_accuracies)
[<matplotlib.lines.Line2D at 0x2a7acb54210>]
nn_three_accuracies
[0.9310089186176143, 0.9766369379412859, 0.9783091787439613]
Using three hidden layers returned a simple accuracy of nearly 98%, even with six-fold cross validation.