+91-9916812177 | contact@beingdatum.com

Simple Linear Regression

Linear Regression

Simple linear regression is a statistical method that enables users to summarise and study the relationships between two continuous (quantitative) variables. Linear regression is a linear model wherein a model that assumes a linear relationship between the input variables (x) and the single output variable (y). Here, y can be calculated from a linear combination of the input variables (x). When there is a single input variable (x), the method is called a simple linear regression. When there are multiple input variables, the procedure is referred to as multiple linear regression.

Application: Salary forecasting, Real estate predictions etc.

y = mx + b → Linear Equation

The motive of the linear regression algorithm is to find the best values for m and b. Before moving on to the algorithm, let’s have a look at two important concepts you must know to better understand linear regression.

Cost Function: The cost function helps us to figure out the best possible values for m and b, which would provide the best fit line for the data points. Since we want the best values for m and b, we convert this search problem into a minimization problem where we would like to minimize the error between the predicted value and the actual value.

Math

Given our simple linear equation: y=mx+b

we can calculate MSE as:

 

Gradient Descent

To minimize MSE we use Gradient Descent to calculate the gradient of our cost function. [TODO: Slightly longer explanation].

Math

There are two parameters (coefficients) in our cost function we can control: weight m and bias b. Since we need to consider the impact each one has on the final prediction, we use partial derivatives. To find the partial derivatives, we use the Chain rule. We need the chain rule because (y – (mx+b))^2 is really 2 nested functions: the inner function y – (mx + b) and the outer function: x^2

Returning to our cost function:

We can calculate the gradient of this cost function as:

 

Python Implementation

 

# Simple Linear Regression
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('Salary_Data.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 1].values

# Splitting the dataset into the Training set and Test set
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 1/3, random_state = 0)

# Fitting Simple Linear Regression to the Training set
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)

# Predicting the Test set results
y_pred = regressor.predict(X_test)

# Visualising the Training set results
plt.scatter(X_train, y_train, color = 'red')
plt.plot(X_train, regressor.predict(X_train), color = 'blue')
plt.title('Salary vs Experience (Training set)')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.show()

# Visualising the Test set results
plt.scatter(X_test, y_test, color = 'red')
plt.plot(X_train, regressor.predict(X_train), color = 'blue')
plt.title('Salary vs Experience (Test set)')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.show()

SEE ALL Add a note
YOU
Add your Comment
 
© BeingDatum. All rights reserved.
X