My neural network for MNIST digit recognition learns for one epoch and then stops learning

06:54 25 Jan 2026

I am coding an MNIST digit recognition neural network. I thought I was finished but when I run the program to train the MNIST the accurcacy after each epoch is stable. I use MSE as my cost function and tanh(x) as my activation function. The learning rate is currently set to 0.1.

Here is the accuracy for the first couple of epochs with the first being pre-training:

8.35

9.8

My functions are these:

tanh(z): Takes a vector as input and outputs the activate vector

tanhDerivative(z): Takes a vector as input and calculates the gradient vector

feedforward(input, stop): calculates the output. Stop means that it is able to stop before the last layer to calculate the activations of any layer

feedforward2(input, stop): calculates the output of the function but the results outputted does not have any activation function

MSE(input, desiredOutput): calculates the MSE

transformer(label): takes a label such as 0 and outputs [[1],[-1],[-1],[-1],[-1],[-1],[-1],[-1],[-1],[-1]]

The following 2 functions are where I suspect the error lies:

#The actual training. Backpropagation is going to be another algorithm
    def train(self, dataLocation, learnRate, batchSize=100):
        self.bias_updates = self.bias_templates #This will contain the updates required for the biases
        self.weight_updates = self.weight_templates #This will contain the updates required for the weights
        file = np.loadtxt(dataLocation, delimiter=",", dtype="float128")
        count = 0
        print("Starting training")
        for row in file:
            count += 1
            data = []
            for item in row:
                data.append([item/255]) #This takes an array like [1,2,3,4] and makes it [[1],[2],[3],[4]] which is necessary for this program
            desired = self.Transformer(data.pop(0)) #Look at transformer method to understand
            self.backpropagation(data, desired)
            if count % 100 == 0:
                for n in range(0,len(self.bias_updates)):
                    self.bias_updates[n] *= learnRate
                    self.weight_updates[n] *= learnRate
                    self.biases[n] += self.bias_updates[n]
                    self.weights[n] += self.weight_updates[n]
                self.bias_updates = self.bias_templates
                self.weight_updates = self.weight_templates
                print(count//100)

    #The heart of the training algorithm. BACKPROPAGATION
    #NOTE: Learning rate is only applied in train()
    def backpropagation(self, input, desiredOutput):
        SigmoidLastLayerActivations = np.array(self.feedforward(input, self.size-1))
        LastLayerActivation = np.array(self.feedforward2(input, self.size-1))
        δ = 2 * (SigmoidLastLayerActivations-desiredOutput) * self.tanhDerivative(LastLayerActivation) #This first value of the delta is just the standard ∂C/∂z(L)
        self.bias_updates[-1] += δ #The update for the biases in the last layer
        self.weight_updates[-1] += np.matmul(δ, np.transpose(self.feedforward(input, self.size-2))) #The update for the weights in the last layer

        for i in range(len(self.weight_templates)-2 ,-1 , -1):
           requiredWeights = np.transpose(np.array(self.weights[i+1])) #This is the required weight matrix from the formulas
           LayerActivations = np.array(self.feedforward2(input, i+1)) #This is the z thing from the formulas
           SigmoidLayerActivations = np.transpose(np.array(self.feedforward(input,i))) #Look at formula
           
           δ = np.matmul(requiredWeights, δ) * self.tanhDerivative(LayerActivations)
           self.bias_updates[i] += δ
           otherVariable = np.matmul(δ, SigmoidLayerActivations) #This is the variable that contains the update required for the weight updates instead of the 
           self.weight_updates[i] += otherVariable

I apologise if this looks confusing. Any help as to why the accuracy always converges to 9.8 would be very good. If any other function is needed to find the error, please ask.

python mnist mlp

Your Answer

Privacy & Cookie Consent