# Linear Regression with Linear Algebra

Linear Regression is the one of the simplest algorithm in machine learning, however, it is simple to learn and may be the most powerful method for your problem. The thing that you need to be careful is:

(All the images of equations and math part are from : http://www.stat.purdue.edu/~boli/stat512/lectures/topic3.pdf)

! You’re hearing Linear Regression is one of the algorithms of Machine Learning and you search for Linear Regression, you see linear algebra solutions like:

This is the solution that is provided by Linear Algebra. Another solution is the thing that you probably search for, gradient descent. These solutions have the principles, but the way they get the solution is different. We will discuss two of them. However, this post is about linear algebra part.

Let’s define the problem with our random dataset, we will discuss the code with Python:

Briefly, this is what we need to observe. We want to apply a linear fitting. This data can be anything that has a linear dependent feature(You can apply this to any dataset but it won’t work for all). We want to put a line between points and we want this line as close as possible to “the points”.

### What is a line?

Line consist of two parameters in linear regression.

Y = m*X+b

b is the intercept, m is the slope.

We have the data. One example is, x is the kilometer/hour that we are driving, y is the gasoline consumption. If you are driving with high speed, your car will consume more. But we want to predict it, we want to fit a line, means that we want to find good m and b values to find best fitting line.

One way to do this is by linear algebra, but let’s find out how we create such a dataset with NumPy. First define libraries.

``````import numpy as np
import matplotlib.pyplot as plt
import numpy.linalg
np.random.RandomState(42)``````

To create a dataset, we first define X, it is the numbers [1,2,3,4,…,98,99,100].

``X = np.arange(100).reshape(100,1)``

Since our formula is : Y = m*x+b , we will define and m and b value at the beginning. But we will add Gaussian noise to it. Let’s start by preparing and graphing without noise. We’ll just create Y, with m=2 and b =1

``y = 2*X + 1``

In real life, there’s no such gasoline consumption. We sometimes have something close to the linear, but not exact one. We have randomness in real life. So, let’s add it to our data.

For those who are not familiar with distributions, just assume that we are adding random +/- values to each point. For x=1, y = 2*1+1 = 3. We will add random number of 0.2 let’s say. It’ll become 3.2 . We will add -0.3 for x=2 that gives y = 2*2+1-0.3 = 4.7. We’ll do it by adding one random function to code above.

``y = 2*X + 1 + 3*np.random.normal(0,1,(100,1)).reshape(100,1)``

For those who’re familiar with distributions know that it’s normal distribution with μ = 0 and σ = 1*3 = 3 . Instead of multiplying, we can replace 1 inside random.normal function with 3, that’s up to you. Now here’s what we have.

It looks like the real world data now. Let’s do some math, then, code at the end.

All math and images of equations are from Purdue University .

We have:

where ε stands for randomness. It is for making Y equal to the right side. So we can predict every Yi with:

Let’s convert it to the matrix form.

Last two images represent same equations, but last one is the matris form of it. We are adding one column to left of X that includes full of 1.

We want to find such a β matrix that makes X*β as close as possible to Y matrix.

### Optimization

In theory,

and we want

it to be zero, or minimize as fast as we can. Instead of above, we want to minimize

so that we can penalise the errors regardless of the sign of the Y-X*β.

We set derivative equal to zero. This is the basic of optimisation. Do not be panicked of the matrix form. This is the matrix form of the derivative.

β is in this form. It looks complicated, however, we will code and see that it’s perfect easy.

We have created X, but we want to include columns of one.

``````X = np.arange(100).reshape(100,1)
X=np.hstack((np.ones((100,1)),X))``````

Now,
Y1 = 2*X1+1+Random
Y2 =2*X2+1+Random

``y = np.dot(X,np.array([1,2]).T).reshape(100,1) +  3*np.random.normal(0,1,(100,1)).reshape(100,1)``

Then directly apply

The code looks complicated but exactly does the above equation. .T is for transpose, inv for inverse.

``````B=np.dot(np.linalg.inv(np.dot(X.transpose(),X)),np.dot(X.transpose(),y))
print(B)``````

Then graph it to see what we observe.

``````
text = "y=",float(B[1]),"x+",float(B[0])

plt.scatter(X[:,1],y)
plt.title(text)
plt.ylabel("y")
plt.xlabel("X")
plt.plot(X[:,1],np.dot(X,B))

plt.show()``````

We’ll also find solution with gradient descent. Check free course of Practical CNN.