Hi everyone! It has been a long time no see. I got extremely busy preparing my application for the deadline of Spring 2018 semester admission at universities to study at a higher level of education and do my further researches on the field of Machine Learning and Human-Computer Interface. It really was a difficult process, occupied most my time for travelling around and around the city for preparing necessary paperworks, writing statments like statment of purpose or study plan and sending emails to my professors to ask for recommendation letters.

Related Topics:


OK, just a little talk, to begin this topic, let’s remember, I introduced you how to implement a Linear Neuron Model to solve a linear problem with the aid of famous library Tensorflow. However, remember that how frequently do you face with the linear problem? Honestly, it rarely happens, even the “Materialism” problem related to the number of girl friend and monthly salary we dealt with in the first and second topic of “Linear Regression” is also not a pure linear problem. Let me show you what is a problem you get when using linear model for this. You might say that it was very good at trying to guess the number of my may-be-have girl friends (GF) in the first topic. But how about trying to use this linear model to predict the number of girl friends of the richest man in the world – Bill Gates (OK, this is just a joke, I mean no offence when I said that). From [1], we know Bill’s monthly income is \$216,666,666.00 , infer from the model, we get the number of GF he may have is 0.45753154 + 0.00087198*216,666,666.00 = 188,929.457 . How could it be real? Theoretically, may be. Practically, it’s infeasible, even Casanova couldn’t reach this number. Now, you see the problem here, if the monthly income of a man increase to infinite, the number of GF he has will increase to infinte too while the limit on the number of population worldwide or others such as morality, religion, etc. make it impossible. To deal with this, we need a restriction on the upper bound.


First, we needa add a little bit more in our dataset to make a restriction. Assume that, even though there is a man who earn more than $15,000 monthly, the maximum number of GF he has are not higher than 10.


Fig. 1 The new dataset of the relation between monthly income and the number of GF of a man

The dataset can be downloaded from my github link: https://github.com/tenga061/machine_learning/blob/master/linear_regression/episode_3/dataset/funny_dataset.csv.

For this dataset, we won’t try to guess the number of Bill’s GF because, as you know, we don’t have enough data (our dataset stop at $20,000). Instead, we will implement a polynomial model next, and then use it to test another called testing dataset.

Polynomial regression model

Polynomial function

For our dataset, the polynomial model has form:

\hat{y} = f(x, \omega) = \omega_{0} + x_{1} \omega_{1} + x_{1}^{2} \omega_{2} + ... + x_{1}^{n} \omega_{n} \indent (1)

Or we can rewrite it into:

\hat{y} = f(x, \omega) = \sum_{i=0}^{n}{x_{1}^{i} \omega_{i}}

Or by matrix form:

\hat{y} = f(x, \omega) = \bold{X} \omega \indent \indent \indent \indent (2)

Let’s look at the equation (1), f(x, \omega) is a nonlinear function of x though, it is a linear function of the coefficient \omega . Because of that, we can still implement the polynomial model as the same way as we do it with the linear one in the first topic. But now, we have the form of the matrix \bold{X} is:

\bold{X} = \begin{bmatrix} 1&x_{1}&x_{1}^{2}&...&x_{1}^{n}\\ 1&x_{2}&x_{2}^{2}&...&x_{2}^{n}\\..\\1&x_{k}&x_{k}^{2}&...&x_{k}^{n} \end{bmatrix}

Implementing polynomial model using TensorFlow

To form input vector after feeding it into the model, I wrote the function below.

Forming X data:

# Creating the input matrix X
def create_x(order):
   global mans_monthly_income

   # Creating element vector
   X = np.ones((len(mans_monthly_income)), 1)
   for i in range(1, order+1):
      temp = np.power(mans_monthly_income, i)
      x_temp = np.array(temp, dtype=float)
      x_temp = np.transpose(np.matrix(x_temp))

      X = np.column_stack((X, x_temp))
   return X

Forming y data:

# Creating the target vector y
def create_y():
   global total_number_of_gfs
   y = np.array(total_number_of_gfs, dtype=float)
   y = np.transpose(np.matrix(y))

   return y

The results are showed below when using polynomial function with order of 3, 7 and 9, respectively:


Fig. 2 The model with order of 3


Fig. 3 The mode with order of 7


Fig. 4 The model with order of 9

The code can be downloaded from my github: https://github.com/tenga061/machine_learning/blob/master/linear_regression/episode_3/code/12_lr_nonlinear_polynomial.py.

Testing the model

The question now is that how could we know which order is the best? To answer this, I made another dataset (red points in Fig. 5) for the purpose of testing our model by evaluating the root-mean-square (RMS) error (showed as Eq. (3)) each order from 0 to 10.

E_{RMS} = \sqrt{\frac{1} {N} \sum_{i=1}^{N} {(\hat{y} - y)^{2}}} \indent \indent \indent \indent (3)


Fig. 5 Training (blue points) and testing (red points) dataset

The testing dataset can be downloaded here: https://github.com/tenga061/machine_learning/blob/master/linear_regression/episode_3/dataset/funny_dataset_test.csv.


Fig. 6 The relation between the value of RMS error and the order of polynomial function

The code can be downloaded from my github: https://github.com/tenga061/machine_learning/blob/master/linear_regression/episode_3/code/13_lr_nonlinear_polynomial_rms.py.

From the Fig. 6, we can see that the RMS value of models with order higher than 8 begin increasing. In that case, we say our model was overfitted. By which, the model couldn’t work well with other test datasets (even it may work wrong), though it have done well with the training. To deal with this problem, there is some techniques can be used.


Increasing the size of the dataset

The first is to collect more and more data. This depends on the data acquisition step, sometimes we can, but sometimes not. There is a ‘lex non scripta’ supported by many is that the number of data points should be larger than some multiple (from 5 to 10 is OK) of the number of parameters in the model.


However, what happens if you are in the situation that you cannot acquire more data. In the case, I would like to introduce you a simple technique of regularization by adding \frac{1} {2} \lambda \omega^{2} to the loss function \mathcal{L} (\omega) .  Now, the loss function become:

\mathcal{L}(\omega) = \frac{1} {2} \sum_{i=1}^{N} {(y_{i} - \bold{x}_{i})^{2} \omega} + \frac{1} {2} \lambda \omega^{2} \indent \indent \indent \indent (4)

The lambda in the adding element help us penalize the loss when the model become overfitting and keep the parameters small.

Turn Eq. (4) to the matrix form:

\mathcal{L}(\omega) = \frac{1} {2} (\bold{y} - \bold{X} \omega)^{2} + \frac{1} {2} \lambda \omega^{2} \indent \indent \indent \indent (5)

Calculating the derivation of Eq. (5) and set it equal 0 to find roots of the model:

\frac{\partial{\mathcal{L} (\omega)}} {\partial{\omega}} = -\bold{X}^{T} (\bold{y} - \bold{X} \omega) + \lambda \omega = 0


(\bold{X}^{T} \bold{X} + \lambda \bold{I}) \omega = \bold{X}^{T} \bold{y}

Assume that (\bold{X}^{T} \bold{X} + \lambda \bold{I}) is a nonsingular matrix, we have parameters of model is the root of the equation:

\omega = (\bold{X}^{T} \bold{X} + \lambda \bold{I})^{-1} \bold{X}^{T} \bold{y}

Implementing polynomial with regularization using Scikit-Learn lib

The part of code to implement the model with regularization using Scikit-Learn is showed below:

# Creating the model and feeding data to it
model = linear_model.Ridge(alpha=lamda)
model.fit(X, y)

# Getting the coefficients
coef = model.coef_
params = coef[0]
print params

The code can also be downloaded from my github: https://github.com/tenga061/machine_learning/blob/master/linear_regression/episode_3/code/16_lr_regularization_sklearn.py.

Implementting polynomial with regularization using TensorFlow lib

The part of code to implement the model with regularization using TensorFlow is showed below:

# Creating constant opt to store data
X_tensor = tf.constant(X)
y_tensor = tf.constant(y)

''' Finding the root of the model '''
# Calculating Xt*X
first_step_1 = tf.matmul(tf.transpose(X_tensor), X_tensor)
# Calculating lamda.I
temp_step = (order+1)*lamda*np.identity(order+1)
first_step_2 = tf.constant(temp_step)
# Calculating Xt*X + lamda.I
first_step = tf.add(first_step_1, first_step_2)
# Calculating (Xt*X + lamda.I)^(-1)
second_step = tf.matrix_inverse(first_step)
# Calculating (Xt*X + lamda.I)^(-1)*Xt
third_step = tf.matmul(second_step, tf.transpose(X_tensor))
# Calculating (Xt*X + 2.lamda.I)^(-1)*Xt*y
root = tf.matmul(third_step, y_tensor)

''' Getting parameters '''
with tf.Session() as sess:
   params = sess.run(root)

print params

The code can also be downloaded from my github: https://github.com/tenga061/machine_learning/blob/master/linear_regression/episode_3/code/15_lr_regularization.py.

Sigmoid function

Another way to confront with nonlinear problems is to use functions having features suiting dataset features. This is based on your experience much.

Look at our dataset, on the one hand, we can see points distributing in a curve that is partly look like a function called “sigmoid” (as showed as Fig. 7).


Fig. 7 The graph of sigmoid function

On the other hand, we have a restriction on the upper bound, so it can be easily recognised that these features suit sigmoid’s features well. For the reason, now, we will use the model showed in Eq. (6) instead of the polynomial model as before.

\hat{y} = f(x, \omega) = \omega_{0} + \frac{1} {1 + e^{-x}} \omega_{1} \indent \indent \indent \indent (6)

However, note that the input x is very large in our dataset (up to 20,000), so e^{-x} is very small and can be set as 0. For that reason, we need to add a trick to the Eq. (6) by dividing x by a constant number k. In this case, I use k = 4000, the Eq. (6) turn into:

\hat{y} = f(x, \omega) = \omega_{0} + \frac{1} {1 + e^{-x/4000}} \omega_{1} \indent \indent \indent \indent (7)

The new model is illustrated in Fig. 8.


Fig. 8 The sigmoid model

The code can be downloaded from my github: https://github.com/tenga061/machine_learning/blob/master/linear_regression/episode_3/code/9_lr_nonlinear.py.

Apart from polynomial or sigmoid function, we can also apply others like trigonometic or logaric function for our regression model.

So, that’s all I want to talk about how to use linear model for regression tasks. In the next topic, I would introduce the way to use linear model, but for classification tasks.

Hope you enjoy it, goodbye, and see ya,

Curious Chick


[1] https://paywizard.org/main/salary/vip-check/world-richest-people

[2] Simon Rogers, Mark Girolami – A first course in Machine Learning

[3] Christopher M. Bishop – Pattern Recognition and Machine Learning

[4] http://scikit-learn.org/stable/modules/linear_model.html


Author: curiouschick

There are many things you may never know about me. But, two things you absolutely know when you visit my blog for the first time: I am a chick and really curious to know everything.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s