Definitions
- are the outputs of the neurones in the layer
- are the weights of the neurones in the current layer
- is a row vector containing the biases for each neurone
- is the activation function performed individually for each entry
- denotes the matrix transposition (in this case of the weights)

Size of the Matrices in the Forward Pass

(the inputs) take the form of a matrix of size where is the number of neurones in the previous layer and is the batch size of the network.
(the weights) take the form of a matrix of size where is the number of neurones in the current layer.
The output of our current layer with then be a matrix of size .
The biases will also take the form of a column vector

For example given the following setup of neurones for layer and layer

with a batch size of the inputs would take the form of

The weights would then take the form

When these are multiplied together we get the following output values

Backwards Pass

Warning

The dimensions of the matrices in the following equations are probably not compatible for dot products, and in practice some might need to be transposed

If the layer is not the output layer:

If the layer is the output layer:

Definitions
- is the outputs of the previous layer
- is the outputs of the current layer
- is the outputs of the next layer
- is the weight matrix for the current layer
- is the cost function which is trying to be minimised
- is the number of neurones in layer .