Deep Learning
Part 2
How neural networks learn
In Deep Learning Part 1, we introduced deep learning and the fundamentals of neural networks. In this article, we will learn more about how neural networks work.
In a neural network, data is passed from layer to layer in several steps.
1. Input is multiplied by corresponding weights
2. Weighted inputs are summed
3. The sum is converted to another value using a math transformation (activation function)
There are several functions that can be applied in step 3.
Activation Functions
Activation functions define how a weighted sum of the input is transformed into an output from a node. Think of it as an additional transformation performed on the data. Below are three main examples of activation functions: Sigmoid, Tanh, and ReLU. Forward Propagation (Forward Pass)
Now, let’s discuss how the neural network passes information from the input layer to the hidden layers until the output layer.
At each layer, the data is transformed in three steps.
1. The weighted input (input X weight) is summed
2. An activation function is applied to this sum
3. The result is passed to the neurons in the next layer
When the output layer is reached, the final prediction of the neural network is made. This entire process is called Forward Propagation (Forward Pass). Loss/Cost Functions
After the forward pass is run on the network, we want some say to test how well the model is doing. To do this, we can use a loss function, a function that calculates the error of a model. Loss is defined as the difference between the model’s predicted output and the actual output. A common loss function is mean squared error (MSE). Our goal is to minimize the loss, leading us to the next topic. Backpropagation (Backward Pass)
Backpropagation (Backward Pass) is the process of learning that the neural network uses to adjust the network weights and minimize error.
Initially, weights in a network are set to random numbers, so the initial model is not accurate with high error. Backpropagation is needed to improve this accuracy progressively.
We need to find out how much each node in a layer contributes to the overall error. This can actually be done through backpropagation. The idea is to adjust the weights in proportion to their error contribution in order to eventually reach an optimal set of weights. The gradient descent algorithm is actually what is used to adjust the weights.   