Hi everybody,

Today, I would like to talk about the linear model for regression task in this post. While you have seperate labels (which may be (0 and 1) or (-1 and 1)) in classification tasks, the outputs in regression are continuous (i.e. they are arbitrary numbers lie in particular ranges).

On the other hand, as I mentioned in the last topic of series of Machine Learning, the linear regression (LR) is one of aspects of supervised learning, in which, you train the model using training datasets whose outputs is known and then use the trained model to predict the outputs of other datasets whose outputs is unknown. OK, before dipping a toe into more detail, I want to introduce a bit about the datasets which will be used in this topic.


Let’s take a look at a real problem in your life (I mean a problem in the life of boys or men) that show the relation between the total number of girl friends you have and the amount of money you get monthly. OK, I have to say that, I really hate the word “Materialism”, but sometimes, for example now, I have to agree with it that the more money you have, the more girl friends you get, it sounds terrible, ha ha.

The dataset here is a csv file which save a matrix which have 10 rows and 2 columns, 10 is the number of data, the first column in the file shows the input or the monthly salary of a man and another shows the output or the total number of girl friends that a man have with respect to his money.


Fig. 1 Relation between the monthly income of a man and total number of gf that he has

You can download the dataset file from my github: By the way, please note that, the theory of the relation is probably true, but don’t use the dataset I give you in order to try to find the relation, it’s not real (it was made just for fun), so don’t trust it, it will ruin your life.

Now, let’s get more detail in mathemetical approach of linear model for regression.

Mathematical approach

Linear model for regression

For the datset in this post, we have only one input or attribute, so the linear regression has the form:

\hat{y} = f(x) = \omega_{0} + x_{1} \omega_{1}

where, x is the input, \hat{y} is the our prediction which is also a function of x w.r.t. \omega , and \omega s are coefficients that is what we are looking for. In our dataset, because we only have 1 input and 1 output, our data can be visualized in 2D coordinate and the model is a line which approximately fit the the points in it.

The example in this post is kind of easy. However, in reality, we have to face with multi-variable problems more often than just single variable, so let’s take it further together with the model which have more input.

\hat{y} = f(x) = \omega_{0} + x_{1} \omega_{1} + x_{2} \omega_{2} + ... + x_{k} \omega_{k}

or, we can rewrite it to

\hat{y} = f(x) = \omega_{0} + \sum_{i=1}^{k}{x_{i} \omega_{i}}

On the equation upon, if we see \omega_{0} as coefficient of x_{0} which always equal 1, we can condense the equation by applying the vector form, it now become:

\hat{y} = \bold{x} \omega \indent \indent \indent \indent (1)

where, \bold{x} = [1, x_{1}, x_{2}, ..., x_{k}] is the input vector and \omega = [\omega_{0}, \omega_{1}, \omega_{2}, ..., \omega_{k}]^{T} is the weight vector.

Loss function

Now, to estimate our model or to know how close our model (our prediction \hat{y} ) is to the the real data y , we often use a loss function called least square error which define by

\mathcal{L}(\omega) = \frac{1} {2} \sum_{i=1}^{N}{(y_{i} - \hat{y}_{i})^{2}}  \indent \indent \indent \indent (2)

You may not see the coefficient \frac{1} {2} in some materials, but I want to use it as a trick to remove the coefficient 2 when we calculate the derivative of the function \mathcal{L}(\omega) .

Apply (1) to (2) we have

\mathcal{L}(\omega) = \frac{1} {2} \sum_{i=1}^{N}{(y_{i} - \bold{x}_{i} \omega)^{2}}  \indent \indent \indent \indent (3)

where, N is the number of data or (\bold{x}, y) pairs.

Now, we turn (3) to matrix form. With \bold{X} = [\bold{x_{1}}, \bold{x_{2}}, ..., \bold{x_{N}}] and \bold{y} = [y_{1}, y_{2}, ..., y_{N}]^{T} , we have

\mathcal{L}(\omega) = \frac{1} {2} (\bold{y} - \bold{X} \omega)^{2}  \indent \indent \indent \indent (4)

OK, we had a estimation, now, what we have to do is to find the \omega which make loss function minimum. To do this, we calculate the gradient of the function w.r.t. \omega and equate it to 0.

\frac{\partial{\mathcal{L}(\omega)}}{\partial{\omega}} = \bold{X}^{T} (\bold{y} - \bold{X} \omega) = 0


\bold{X}^{T} \bold{X} \omega = \bold{X}^{T} \bold{y}

Assuming that  is nonsingular or invertible matrix, so we have the root of linear model is

\omega = (\bold{X}^{T} \bold{X})^{-1} \bold{X}^{T} \bold{y} \indent \indent \indent \indent (5)

So far, we have looked details in mathematical approach of linear regression, now, I would like to introduce you how to implement linear model for regression in Python using 2 in popular machine learning libs. However, before doing this, let’s see how to read our data from csv file.

Read data from csv file

This code below is used to read data from csv file.

with open('funny_dataset.csv') as mycsvfile:
   datasets = csv.reader(mycsvfile, delimiter=',')

   for data in datasets:

Implementing Linear Regresison using Scikit-learn lib

Using Scikit-learn lib don’t require you to use any complicated equation or calculation because there was all method was built into the lib. Here is the code to implement the model.

At first, we need to add some libraries which are used.

import csv
import matplotlib.pyplot as plt
import numpy as np
from sklearn.linear_model import LinearRegression

Then, transform input and output to the form of \bold{X} = [\bold{x_{1}}, \bold{x_{2}}, ..., \bold{x_{N}}] and \bold{y} = [y_{1}, y_{2}, ..., y_{N}]^{T} as Eq. (4).

# Turning output to numpy array type
y = np.array(total_number_of_gfs, dtype=float)
y = np.transpose(np.matrix(y))
# Turning input to numpy array type
X = np.array(mans_monthly_income, dtype=float)
X = np.transpose(np.matrix(X))
# Creating an element vector to store x0 = 1
element_vector = np.ones((10, 1))
# Concatenating element vector to input vector
X = np.column_stack((element_vector, X))

Afterwards, call linear regression method from the lib to use

# Feeding data to linear model for regression
model = LinearRegression(fit_intercept=False), y)

Run the program and get results.

coef = model.coef_
print coef
>>> [[0.45753154],

The code above is available on my github:

Implementing Linear Regresison using TensorFlow lib

Using TensorFlow lib, by contrast, require you to understand some basic elements of the lib and sufficient mathematical knowledge to implement the model.

First of all, we need to import the libraries which are needed.

import csv
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

Secondly, convert input and output to the form as the code using sklearn.

Then, create TensorFlow ops, describe the computational graph

# Creating constant ops to store data
X_tensor = tf.constant(X)
y_tensor = tf.constant(y)

Next, find the root of linear model using (5)

y_tensor = tf.constant(y)
# Calculating XtX
first_step = tf.matmul(tf.transpose(X_tensor), X_tensor)
# Calculating (XtX)^(-1)
second_step = tf.matrix_inverse(first_step)
# Calculating (XtX)^(-1)Xt
third_step = tf.matmul(second_step, tf.transpose(X_tensor))
# Calculating (XtX)^(-1)Xty
root = tf.matmul(third_step, y_tensor)

Finally, run the program and get results.

# Plotting linear model
with tf.Session() as sess:
   params =
>>> [[0.45753154],

The code above is available on my github:


So, we can see that, the results from two ways of implementation are the same.

The linear model of the dataset is show as Fig. 2.


Fig. 2 Linear model

Now, let’s use the model to try to predict how many girl friend I have when I was a student. Assuming that my monthly salary at that time is $50 that I get from tutor job, so the prediction show that I have 0.45753154 + 0.00087198*50 = 0.50113 (girl friend), it means I may have 50% of chance to have a girl friend but really I didn’t. It seem to be true to me. You can use this model to try predicting your life, but I have to say it again, don’t trust it, because the datasets were not true.

OK, this post is ended up here. In the next post, I will talk about another approach of linear model based on neural network and learning algorithm ideas.

All codes in this post are available in my github:

See ya,

Curious Chick


[1] Nick McClure – Machine Learning Cookbook

[2] Simon Rogers, Mark Girolami – A first course in Machine Learning

[3] Trevor Hasti, Robert Tibshirani, Jerome Friedman – The Elements of Statistical Learning


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s