In this Blog, we’ll Discuss the Regularization techniques used to overcome the overfitting. Have you come across where your model performs better on training data, but not in able to predict on test data. This Scenario called overfitting. Overfitting is the concept of balancing of Bias and Variance. Now we will discuss Regularization Techniques.
Overview of Regularization:
2-what is Regularization?
3-Why do we need to apply Regularization?
4-When do we need to apply Regularization Techniques?
5-Different Regularization Technique in Deep Learning
- Data Augmentation
- Early Stopping
At times, when you are building a multi-linear regression model, you use the least-square for estimating the parameters for features. As a result, some of the following reasons.
- Mostly, the regression model fails to generalize on unseen data. The thing happens when the model tries to accommodate all kinds of changes in data including those belongings to both the actual pattern and also the noise data. As a result, the model ends up becoming a complex model having high variance due to overfitting. The below diagram represents the high variance.
The goal is to reduce the variance of the model while making sure that the model does not become biased. After applying the regularization technique we will get a proper model. The below Diagram is that we get after regularization technique.
What is Regularization?
In this Paragraph, I am going to discuss regularization. Firstly, If you are aware of regularization in machine learning, you will have a fair idea that regularization penalizes the coefficient in machine learning But in Deep learning, it actually penalizes the weight matrices of the node. It focuses on reducing the complexity by keeping the weight normal and regular. At last, We can say regularization is used to minimizing the model complexity and minimizing the error of the model.
Why do we need to apply Regularization?
The goal of our algorithm is to learn the data patterns and ignore the noise in the dataset and to solve such cases. Often, the Deep Learning model suffers from some of the following reasons.
- Overfitting: In overfitting, the model failing to generalize on the unseen dataset.
- Multicollinearity: Multicollinearity happen when Independent variables in the regression model are correlated.
- Computationally Intensive: A model becomes computationally Intensive.
The above problem makes it difficult to come up with a model that has higher accuracy on the unseen data.
Overfitting in a deep neural network:
Deep neural networks are highly-complex models and they are easy to overfit .hence, we need some form of regularization.
we already saw the model as the model complexity increases, the bias of the model decreases, and variance increases. Using various regularization techniques, we can try to achieve low training and testing errors so that we are able to trade-off bias and variance perfectly.
When do we need to apply Regularization Techniques?
- Model lack of generalization: Model found with higher accuracy fails to generalize on unseen or new data.
- Model Instability: Different models can be created on different accuracies.so, it becomes difficult to select.
Different Regularization Techniques in Deep Learning?
The first type of regularization technique is Dropout. As per this technique, we drop out some nodes of the network. Dropping out can be seen as temporarily deactivating or ignoring neurons of the network. As we know to remove any node from the network it minimizes the size of the network and smaller networks are good for the train. This Technique is used in the training phase to reduce the overfitting of the model.
More Technically, at each training stage, individual nodes are neither dropped out of the net with probability 1-p or kept with probability p, so that a reduce network will be left; incoming and outgoing edges to a dropped=out node are also removed.
To understand the technique of dropouts, let’s modified the weight array
If we deactivate the node, we have to modify the weight array accordingly.we will use a network with three input nodes, four hidden and two output nodes:
At first, we will look at the weight array between the input and the hidden layer. we called this “wih”(weights between input and hidden layer)
let’s drop out the i2.
This means we have to take out every second product from the summation, which means we have to delete the whole second column fro the matrix. The second element from the input vector has to be deleted as well.
Now we examine what will happen when we drop out the hidden node. we take out the first hidden node.i.e h1
In this, we can remove the complete first line from the matrix.
Taking out hidden layer node affects the next weight matrix as well. Let’s have look at what is happening in the network graph.
It is easy to see that the first column of the who weight matrix has to be removed:
so far we have arbitrarily chosen one node to deactivate. The Dropout approach means that we randomly choose a certain number of nodes from the input and the hidden layers, which remain active and turn off the other nodes of these layers. The next step consists in activating all the nodes again and randomly chose other nodes. It is also possible to train the whole training set with randomly created dropout networks.
we represent three possible randomly chosen dropout networks in the following three diagrams:
Data Augmentation is another regularization technique. Data Augmentation is applied to the image dataset where we apply certain types of transform to every image like zooming the image, cropping the image, flipping the image, toggling the images, etc. Augmenting with more data will make it harder for the neural network to drive the training error to zero.
By generating more data, the network has a better chance of performing better on the test data.deep-learning library Keras has a module called ImageDataGenerator which is used for data augmentation. Thus, we use data Augmentation techniques to generate new data points from the existing set of data points by performing various transformations on existing data points.
Three Types of Data Augmentation:
There are three types of data Augmentation when we applying deep learning in the context of computer vision applications.
Type:1 Dataset generation and expanding an existing dataset (less common)
The first type of data augmentation is dataset generation and dataset Augmentation.
The above process is used where we have not a lot of training data in the first place. Let’s discuss the most trivial case where you have only one image and we want to apply the data augmentation to create an entire dataset of images, all based on that one image.
Type:2 In Place/on the fly data augmentation(Most Common)
The second type of data augmentation is called in-place data augmentation or on-the-fly data Augmentation. This Type of data augmentation is what Keras ImageDataGenerator class Implements. Small code to show how to use ImageDataGenerator.
from keras.preprocessing.image import ImageDataGenerator
Type:3 Combining dataset generation and in-place Augmentation
The final type of data Augmentation seeks to combine both dataset generation and in-place augmentation.we can see this type of data augmentation when performing behavioral cloning.
example-Behavioral cloning can be seen in the self-driving car applications.
The next Kind of regularization technique is Early Stopping which helps in reducing the overfitting of the model in some way. As the name Suggest, we “Stop early “during the training Phase, before the model starts overfitting on the training Dataset.
Here, we will use the validation set along with a training set, and we monitor the loss before deciding on when the model will stop training further.
In the above image, the model stop training at the dotted line since after the dotted line model will start overfitting on the training data.
By using the Early stopping technique, we are making sure the model remember the pattern and noise contain in the training data.
Implementation of Early Stopping
In Keras, we can apply callbacks as a function
from Keras. callbacks import EaralyStopping
In the above code, Monitor denotes the quantity that needs to be monitored and val_err denotes the validation error.
In this blog, you learned about the regularization techniques in deep learning. We surely hope this must have cleared most of your queries surrounding the topic.