Linear Regression from Scratch in Python

Saroj Humagain
3 min readSep 19, 2018

Regression is a technique to model and analysis the relationship between variables and often times how they contribute and are related to producing a particular outcome together. In machine learning regression is widely used for prediction and forecasting.

Linear regression is a basic and commonly used type of predictive analysis. Let's visualize it through the datasets.

xs = [1,2,3,4,5,6,7,8,9,3,4,10,3,5,6,7,8]
ys = [4,2,3,6,5,7,6,6,9,9,0,8,4,3,2,12,11]
Visualization of the above data points

Now if we decide linear regression on that data set, a straight line passed through the data points.

How do you define that line? And to answer this question lets go back to the high school. The straight line is given by

y = mx+c

If you have m, c, and input value i.e x, we’ll get whatever the y is. In the above equation, m is a slope and c is y-intercept. We can predict y with some accuracy if we have slope and y-intercept. So how to find out m and c?

And y-intercept is given by

Since we got all the variables, we are ready to go. Let's transform the linear regression in python from python. The first step is to import libraries.

from statistics import mean
import numpy as np
import matplotlib.pyplot as plt

The second step is to define or load data from sources. Here we are going to define xs and ys. Since Python doesn’t have arrays we need to use numpy.

xs = np.array([1,2,3,4,5,6,7,8,9,3,4,10,3,5,6,7,8])
ys = np.array([4,2,3,6,5,7,6,6,9,9,0,8,4,3,2,12,11])

For the third step, we define a function which takes xs and ys as arguments and gives m and c.

we got slope and y-intercept now we are ready to calculate y.

regression_line = [(m*x)+c for x in xs]

lets plot the data points and the regression line.

plt.scatter(xs,ys)
plt.plot(xs,regression_line)

We made a regression model, and it should be capable to predict the value of y if x is given. Lets give x = 11

predict_x = 10.5
predict_y = (m* predict_x)+c

lets plot the point and see where is lies.

plt.scatter(predict_x, predict_y , color=’g’) #g for green

And the value of predict_y is found to be 11.111111111111112.

--

--

Saroj Humagain

I basically write on data science, ML and AI and sometimes random things.