I am coding an MNIST digit recognition neural network. I thought I was finished but when I run the program to train the MNIST the accurcacy after each epoch is stable. I use MSE as my cost function and tanh(x) as my activation function. The learning rate is currently set to 0.1.
Here is the accuracy for the first couple of epochs with the first being pre-training:
8.35
9.8
9.8
9.8
9.8
9.8
My functions are these:
tanh(z): Takes a vector as input and outputs the activate vector
tanhDerivative(z): Takes a vector as input and calculates the gradient vector
feedforward(input, stop): calculates the output. Stop means that it is able to stop before the last layer to calculate the activations of any layer
feedforward2(input, stop): calculates the output of the function but the results outputted does not have any activation function
MSE(input, desiredOutput): calculates the MSE
transformer(label): takes a label such as 0 and outputs [[1],[-1],[-1],[-1],[-1],[-1],[-1],[-1],[-1],[-1]]
The following 2 functions are where I suspect the error lies:
#The actual training. Backpropagation is going to be another algorithm
def train(self, dataLocation, learnRate, batchSize=100):
self.bias_updates = self.bias_templates #This will contain the updates required for the biases
self.weight_updates = self.weight_templates #This will contain the updates required for the weights
file = np.loadtxt(dataLocation, delimiter=",", dtype="float128")
count = 0
print("Starting training")
for row in file:
count += 1
data = []
for item in row:
data.append([item/255]) #This takes an array like [1,2,3,4] and makes it [[1],[2],[3],[4]] which is necessary for this program
desired = self.Transformer(data.pop(0)) #Look at transformer method to understand
self.backpropagation(data, desired)
if count % 100 == 0:
for n in range(0,len(self.bias_updates)):
self.bias_updates[n] *= learnRate
self.weight_updates[n] *= learnRate
self.biases[n] += self.bias_updates[n]
self.weights[n] += self.weight_updates[n]
self.bias_updates = self.bias_templates
self.weight_updates = self.weight_templates
print(count//100)
#The heart of the training algorithm. BACKPROPAGATION
#NOTE: Learning rate is only applied in train()
def backpropagation(self, input, desiredOutput):
SigmoidLastLayerActivations = np.array(self.feedforward(input, self.size-1))
LastLayerActivation = np.array(self.feedforward2(input, self.size-1))
δ = 2 * (SigmoidLastLayerActivations-desiredOutput) * self.tanhDerivative(LastLayerActivation) #This first value of the delta is just the standard ∂C/∂z(L)
self.bias_updates[-1] += δ #The update for the biases in the last layer
self.weight_updates[-1] += np.matmul(δ, np.transpose(self.feedforward(input, self.size-2))) #The update for the weights in the last layer
for i in range(len(self.weight_templates)-2 ,-1 , -1):
requiredWeights = np.transpose(np.array(self.weights[i+1])) #This is the required weight matrix from the formulas
LayerActivations = np.array(self.feedforward2(input, i+1)) #This is the z thing from the formulas
SigmoidLayerActivations = np.transpose(np.array(self.feedforward(input,i))) #Look at formula
δ = np.matmul(requiredWeights, δ) * self.tanhDerivative(LayerActivations)
self.bias_updates[i] += δ
otherVariable = np.matmul(δ, SigmoidLayerActivations) #This is the variable that contains the update required for the weight updates instead of the
self.weight_updates[i] += otherVariable
I apologise if this looks confusing. Any help as to why the accuracy always converges to 9.8 would be very good. If any other function is needed to find the error, please ask.