Linear Regression
Basic Definition:
Linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression.
Lets break down the above definition and understand each components.
Linear regression is a linear approach.
What does that mean when we say a linear approach?
Linear model assumes there is a linear relationship between the input variables and output variable.
Basically output y can be calculated with the linear combination of the input variables.
A simple linear model, we can understand is equation of a line.
Slope(m): Defines both the steepness and the direction of the line. Slope is calculated by the ratio of vertical change to horizontal change, between two distinct points on a line.
Δy = y2-y1
Δx = x2-x1
m =Δy/Δx
So from the above graph we can see that slope m is (50-40)/(4-3) = 10.
Where as the c is the constraint or called y intercept when x is zero. from the graph it is clearly 10.
So for x = 5,
y= 10 * 5 + 10 = 60
Here is where everything gets interesting. So technically if we have the slope and the intercept for a linear function we can literally calculate y for any value of the x. As the linear model assumption is the data relationship to the target variable is linear in nature.
How do we calculate the slope and the intercept
Lets make some fake data which has some linear relationship
Now we want a linear function which can understand this linear relationship and map the values of x to y with minimum error.
When we talk about minimising the error we mean the predicted value should be as close as to the actual value. The difference is the error.
In linear regression it is given by:
L = ( y -y′ ) 2
We are squaring the expression to avoid negative values.
That is we have to find out dL/dm.
Which is derivative of the loss with respective to the slope also called weights.
Let’s represent y’ as ‘a’.
L = 1/m(y — a)²
L = 1/m(y — (mx + b))²
Applying derivative w.r.t ‘m’
dL/dm = 2/m *(y — (mx + b)) * (x)
(y — (mx1+b)) can be also written as (y — a)
For all the observations from i=1 to m
dL/dm = 2/m∑mi=1 * (y — a) * x
Similarly we need to find out for dL/db.
Which is derivative of the loss with respective to the intercept.
L = (y — a)²
dL/dm = 2 * (y — (mx1 + b)) * (1)
Again, (y — (mx1+b)) can be also written as (y — a)
dL/db = 2/m∑mi=1 (y — a) * 1
w = w – α ∗2/m ∑mi=1 * (y — a) * x
b = b – α ∗2/m∑mi=1 * (y — a)
Each iteration we are calculating the cost and updating the parameters w and m, such that the cost is the least.