Diagnosing Broken Neural Network Training: A Small Case Study
Background
Recently, I’ve been catching up on my practical neural network knowledge, first by going through Michael Nielson’s Neural Networks and Deep Learning online book and then the fast.ai course.
After implementing a toy version of a classic neural network consisting only of fully-connected (i.e. dense) layers for making classifications on the MNIST database of handwritten digits, I figured I’d take a stab at implementing a new neural network for the same task using Keras and enter the results into the Kaggle Digit Recognizer competition.
There are quite a few guides to creating an MNIST handwritten digit classifier using Keras, but the one I primarily followed was an article by Jason Brownlee.
The Model
For reference, the original convolutional model I attempted to play with was written with Keras as follows:
model = Sequential()
model.add(Conv2D(32, (5, 5), input_shape=(1, 28, 28), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
The Initial Problem
After finishing my first draft of the implementation, I ran into an error when attempting to train the model. Executing the statement
model.fit(X_train, Y_train, validation_split=0.1, epochs=10, batch_size=200,
verbose=2)
yielded the error
ValueError: Error when checking target: expected dense_2 to have shape (None,
10) but got array with shape (60000, 1)
Because the error was related to the second layer of the model, which was a dense layer meant for forming classifications, I figured the problem lay somewhere in the label data. I began looking into the label data with some cowboy diagnostics at the point right before training.
Y_train = np_utils.to_categorical(Y_train)
num_classes = Y_test.shape[1]
# Display some diagnostic data.
print(Y_train.shape)
print(num_classes)
# Train model.
model.fit(X_train, Y_train, validation_split=0.1, epochs=10, batch_size=200,
verbose=2)
Not surprisingly, Y_train.shape
yielded (60000, 1)
, which meant that it was
not being split into a dedicated dimension per class as expected for the output
from np_utils.to_categorical()
; the proper shape should have been (60000, 10)
. However, the
test label array Y_test
was returning the expected shape of (10000, 10)
,
which was perplexing.
Upon further digging, after passing through np_utils.to_categorical()
, the
value of Y_train
was just an array of zeroes. I eventually attempted to force
np_utils.to_categorical()
to return a 60000 x 10 array by explicitly passing
in the number of classes through np_utils.to_categorical(Y_train, 10)
, which
allowed training to proceed, but yielded another problem.
The Network Isn’t Training
While the network did not seem to be learning after the first training epoch, the network’s accuracy was not only 100% after each training epoch, but the overall loss and accuracy were the exact same between every training session, even with many different tweaks to the model. This seemed very wrong!
Because the probability of the model being the problem was unlikely, I began to
look at the underlying data entering into the model. The first step was
to investigate whether the original data checked out. Because I had been running
into problems with converting the label data into the proper dimensions, I first
looked into the label data (i.e. Y_train
).
# Load MNIST data.
(X_train, Y_train), (X_test, Y_test) = mnist.load_data()
# Display label data array shape.
print(Y_train.shape)
# Count the number of occurrences of each class label.
data_labels = {
0: 0,
1: 0,
2: 0,
3: 0,
4: 0,
5: 0,
6: 0,
7: 0,
8: 0,
9: 0,
}
for label in Y_train:
data_labels[label] += 1
# Display some diagnostic data.
print(data_labels)
print(sum(data_labels.values())
I used a dictionary of “buckets” to count the occurrence of each class in the labels. When displaying the counts of each class, they seemed reasonably distributed, and the total number of labels amounted to the proper size: 60,000. No surprises there - the source data itself looked okay.
I then proceeded to look at the label data again at the point right before
calling model.fit()
. Although the shape was correct, Y_train
was now an
array where all the labels were set to 0 (i.e. [1., 0., 0., 0., 0., 0., 0., 0.,
0., 0.]
). Not seeing where the source of the issue was, I figured it might be
the way the data was transformed by np_utils.to_categorical()
, so I wrote a
patch to get around it:
tmp = np.zeros((Y_train.shape[0], 10))
for i, y in enumerate(Y_train):
for j in range(10):
if j == y:
tmp[i][j] = 1.0
Y_train = tmp
Checking Y_train
afterward yielded an array that looked exactly like I needed
it to, but it still did not change training results! Nor did it change the
apparent values held in Y_train
right before the call to model.fit()
.
Scratching my head, I checked on anything that might have modified Y_train
before passing it through np_utils.to_categorical()
. Lo-and-behold! There
was a line I mindlessly fudged by accidentally normalizing the training labels
instead of the testing input:
X_train = X_train / 255
Y_train = Y_train / 255
instead of
X_train = X_train / 255
X_test = X_test / 255
After fixing that, the network began training normally without any issues!
Conclusion
This wasn’t a particularly profound problem, and the error was quite stupid, but hopefully this might help give others insight into diagnosing bugs with their own neural networks. It might also serve to make you feel less bad about the your own frustrations and mistakes.