Sponsored by AI Academies

Deep Learning

Part 2

Part 2

20pts

No code

Algorithms

Math

How neural networks learn

In Deep Learning Part 1, we introduced deep learning and the fundamentals of neural networks. In this article, we will learn more about how neural networks work.

In a neural network, data is passed from layer to layer in several steps.

- Input is multiplied by corresponding weights
- Weighted inputs are summed
- The sum is converted to another value using a math transformation (activation function)

There are several functions that can be applied in step 3.

Activation Functions

Activation functions define how a weighted sum of the input is transformed into an output from a node. Think of it as an additional transformation performed on the data. Below are three main examples of activation functions: Sigmoid, Tanh, and ReLU.

Forward Propagation (Forward Pass)

Now, let’s discuss how the neural network passes information from the input layer to the hidden layers until the output layer.

At each layer, the data is transformed in three steps.

- The weighted input (input X weight) is summed
- An activation function is applied to this sum
- The result is passed to the neurons in the next layer

When the output layer is reached, the final prediction of the neural network is made. This entire process is called **Forward Propagation (Forward Pass)**.

Loss/Cost Functions

After the forward pass is run on the network, we want some say to test how well the model is doing. To do this, we can use a loss function, a function that calculates the error of a model. Loss is defined as the difference between the model’s predicted output and the actual output. A common loss function is mean squared error (MSE). Our goal is to minimize the loss, leading us to the next topic.

Backpropagation (Backward Pass)

Initially, weights in a network are set to random numbers, so the initial model is not accurate with high error. Backpropagation is needed to improve this accuracy progressively.

We need to find out how much each node in a layer contributes to the overall error. This can actually be done through backpropagation. The idea is to adjust the weights in proportion to their error contribution in order to eventually reach an optimal set of weights. The gradient descent algorithm is actually what is used to adjust the weights.

Gradient Descent

The goal is to gradient descent is to find the set of weights that minimize the loss function. Optimization functions calculate a **gradient** to do this. A gradient is the partial derivative of the loss function with respect to the weights. In other words, it is how much a small increase in the weights will affect the loss function. Weights are adjusted in the opposite direction of the calculated direction. Think of this as taking a step towards the local minimum of a function f(x). This cycle is repeated until we reach a minimum of the loss function.

Note: Gradient descent is not guaranteed to find the absolute minimum in a function, as it could land on a relative minimum that doesn’t have as small a loss as the absolute minimum.

The **learning rate** is a hyperparameter that determines the step size (the amount by which weights are updated each time). A high learning rate can jump over minima, which is not ideal. A low learning rate will approach the minima too slowly, requiring many iterations of the model. We can try out different learning rates with trial and error to improve results.

Overfitting

Unfortunately, deep neural networks are prone to overfitting the training data and producing poor accuracies on the test data. One technique to mitigate overfitting is **batch normalization**, which consists of normalizing inputs. This can improve the performance and stability of neural networks.

Another technique is **dropout**, which is randomly shutting down a fraction of the layer’s training step (resetting their values to 0). This prevents the neurons from having too much dependency on each other. Below is a visual of the dropout method.

Types of Neural Networks

There are various types of neural networks. Three main ones are artificial neural networks (ANNs), recurrent neural networks (RNNs), and convolutional neural networks (CNNs). ANNs specialize in simple tasks, RNNs in natural language processing and time data, and CNNs in image data.

Join our discord server for updates, announcements, future event planning, to ask questions, or just for fun!

Copyright ©2021 Howard County Hour of Code

Copyright ©2021 Howard County Hour of Code