LINEAR REGRESSION (3) – SOLVING NON-LINEAR PROBLEMS

Hi everyone! It has been a long time no see. I got extremely busy preparing my application for the deadline of Spring 2018 semester admission at universities to study at a higher level of education and do my further researches on the field of Machine Learning and Human-Computer Interface. It really was a difficult process, occupied most my time for travelling around and around the city for preparing necessary paperworks, writing statments like statment of purpose or study plan and sending emails to my professors to ask for recommendation letters.

Related Topics:

Introduction:

OK, just a little talk, to begin this topic, let’s remember, I introduced you how to implement a Linear Neuron Model to solve a linear problem with the aid of famous library Tensorflow. However, remember that how frequently do you face with the linear problem? Honestly, it rarely happens, even the “Materialism” problem related to the number of girl friend and monthly salary we dealt with in the first and second topic of “Linear Regression” is also not a pure linear problem. Let me show you what is a problem you get when using linear model for this. You might say that it was very good at trying to guess the number of my may-be-have girl friends (GF) in the first topic. But how about trying to use this linear model to predict the number of girl friends of the richest man in the world – Bill Gates (OK, this is just a joke, I mean no offence when I said that). From [1], we know Bill’s monthly income is \$216,666,666.00 , infer from the model, we get the number of GF he may have is 0.45753154 + 0.00087198*216,666,666.00 = 188,929.457 . How could it be real? Theoretically, may be. Practically, it’s infeasible, even Casanova couldn’t reach this number. Now, you see the problem here, if the monthly income of a man increase to infinite, the number of GF he has will increase to infinte too while the limit on the number of population worldwide or others such as morality, religion, etc. make it impossible. To deal with this, we need a restriction on the upper bound.

Dataset

First, we needa add a little bit more in our dataset to make a restriction. Assume that, even though there is a man who earn more than $15,000 monthly, the maximum number of GF he has are not higher than 10.

training_dataset

Fig. 1 The new dataset of the relation between monthly income and the number of GF of a man

The dataset can be downloaded from my github link: https://github.com/tenga061/machine_learning/blob/master/linear_regression/episode_3/dataset/funny_dataset.csv.

For this dataset, we won’t try to guess the number of Bill’s GF because, as you know, we don’t have enough data (our dataset stop at $20,000). Instead, we will implement a polynomial model next, and then use it to test another called testing dataset.

Polynomial regression model

Polynomial function

For our dataset, the polynomial model has form:

\hat{y} = f(x, \omega) = \omega_{0} + x_{1} \omega_{1} + x_{1}^{2} \omega_{2} + ... + x_{1}^{n} \omega_{n} \indent (1)

Or we can rewrite it into:

\hat{y} = f(x, \omega) = \sum_{i=0}^{n}{x_{1}^{i} \omega_{i}}

Or by matrix form:

\hat{y} = f(x, \omega) = \bold{X} \omega \indent \indent \indent \indent (2)

Let’s look at the equation (1), f(x, \omega) is a nonlinear function of x though, it is a linear function of the coefficient \omega . Because of that, we can still implement the polynomial model as the same way as we do it with the linear one in the first topic. But now, we have the form of the matrix \bold{X} is:

\bold{X} = \begin{bmatrix} 1&x_{1}&x_{1}^{2}&...&x_{1}^{n}\\ 1&x_{2}&x_{2}^{2}&...&x_{2}^{n}\\..\\1&x_{k}&x_{k}^{2}&...&x_{k}^{n} \end{bmatrix}

Implementing polynomial model using TensorFlow

To form input vector after feeding it into the model, I wrote the function below.

Forming X data:

# Creating the input matrix X
def create_x(order):
   global mans_monthly_income

   # Creating element vector
   X = np.ones((len(mans_monthly_income)), 1)
   for i in range(1, order+1):
      temp = np.power(mans_monthly_income, i)
      x_temp = np.array(temp, dtype=float)
      x_temp = np.transpose(np.matrix(x_temp))

      X = np.column_stack((X, x_temp))
   
   return X

Forming y data:

# Creating the target vector y
def create_y():
   global total_number_of_gfs
  
   y = np.array(total_number_of_gfs, dtype=float)
   y = np.transpose(np.matrix(y))

   return y

The results are showed below when using polynomial function with order of 3, 7 and 9, respectively:

polynomial_order_3

Fig. 2 The model with order of 3

polynomial_order_7

Fig. 3 The mode with order of 7

polynomial_order_9

Fig. 4 The model with order of 9

The code can be downloaded from my github: https://github.com/tenga061/machine_learning/blob/master/linear_regression/episode_3/code/12_lr_nonlinear_polynomial.py.

Testing the model

The question now is that how could we know which order is the best? To answer this, I made another dataset (red points in Fig. 5) for the purpose of testing our model by evaluating the root-mean-square (RMS) error (showed as Eq. (3)) each order from 0 to 10.

E_{RMS} = \sqrt{\frac{1} {N} \sum_{i=1}^{N} {(\hat{y} - y)^{2}}} \indent \indent \indent \indent (3)

test_dataset

Fig. 5 Training (blue points) and testing (red points) dataset

The testing dataset can be downloaded here: https://github.com/tenga061/machine_learning/blob/master/linear_regression/episode_3/dataset/funny_dataset_test.csv.

rms_error_graph

Fig. 6 The relation between the value of RMS error and the order of polynomial function

The code can be downloaded from my github: https://github.com/tenga061/machine_learning/blob/master/linear_regression/episode_3/code/13_lr_nonlinear_polynomial_rms.py.

From the Fig. 6, we can see that the RMS value of models with order higher than 8 begin increasing. In that case, we say our model was overfitted. By which, the model couldn’t work well with other test datasets (even it may work wrong), though it have done well with the training. To deal with this problem, there is some techniques can be used.

Overfitting

Increasing the size of the dataset

The first is to collect more and more data. This depends on the data acquisition step, sometimes we can, but sometimes not. There is a ‘lex non scripta’ supported by many is that the number of data points should be larger than some multiple (from 5 to 10 is OK) of the number of parameters in the model.

Regularization

However, what happens if you are in the situation that you cannot acquire more data. In the case, I would like to introduce you a simple technique of regularization by adding \frac{1} {2} \lambda \omega^{2} to the loss function \mathcal{L} (\omega) .  Now, the loss function become:

\mathcal{L}(\omega) = \frac{1} {2} \sum_{i=1}^{N} {(y_{i} - \bold{x}_{i})^{2} \omega} + \frac{1} {2} \lambda \omega^{2} \indent \indent \indent \indent (4)

The lambda in the adding element help us penalize the loss when the model become overfitting and keep the parameters small.

Turn Eq. (4) to the matrix form:

\mathcal{L}(\omega) = \frac{1} {2} (\bold{y} - \bold{X} \omega)^{2} + \frac{1} {2} \lambda \omega^{2} \indent \indent \indent \indent (5)

Calculating the derivation of Eq. (5) and set it equal 0 to find roots of the model:

\frac{\partial{\mathcal{L} (\omega)}} {\partial{\omega}} = -\bold{X}^{T} (\bold{y} - \bold{X} \omega) + \lambda \omega = 0

Similarly,

(\bold{X}^{T} \bold{X} + \lambda \bold{I}) \omega = \bold{X}^{T} \bold{y}

Assume that (\bold{X}^{T} \bold{X} + \lambda \bold{I}) is a nonsingular matrix, we have parameters of model is the root of the equation:

\omega = (\bold{X}^{T} \bold{X} + \lambda \bold{I})^{-1} \bold{X}^{T} \bold{y}

Implementing polynomial with regularization using Scikit-Learn lib

The part of code to implement the model with regularization using Scikit-Learn is showed below:

# Creating the model and feeding data to it
model = linear_model.Ridge(alpha=lamda)
model.fit(X, y)

# Getting the coefficients
coef = model.coef_
params = coef[0]
print params

The code can also be downloaded from my github: https://github.com/tenga061/machine_learning/blob/master/linear_regression/episode_3/code/16_lr_regularization_sklearn.py.

Implementting polynomial with regularization using TensorFlow lib

The part of code to implement the model with regularization using TensorFlow is showed below:

# Creating constant opt to store data
X_tensor = tf.constant(X)
y_tensor = tf.constant(y)

''' Finding the root of the model '''
# Calculating Xt*X
first_step_1 = tf.matmul(tf.transpose(X_tensor), X_tensor)
# Calculating lamda.I
temp_step = (order+1)*lamda*np.identity(order+1)
first_step_2 = tf.constant(temp_step)
# Calculating Xt*X + lamda.I
first_step = tf.add(first_step_1, first_step_2)
# Calculating (Xt*X + lamda.I)^(-1)
second_step = tf.matrix_inverse(first_step)
# Calculating (Xt*X + lamda.I)^(-1)*Xt
third_step = tf.matmul(second_step, tf.transpose(X_tensor))
# Calculating (Xt*X + 2.lamda.I)^(-1)*Xt*y
root = tf.matmul(third_step, y_tensor)

''' Getting parameters '''
with tf.Session() as sess:
   params = sess.run(root)

print params

The code can also be downloaded from my github: https://github.com/tenga061/machine_learning/blob/master/linear_regression/episode_3/code/15_lr_regularization.py.

Sigmoid function

Another way to confront with nonlinear problems is to use functions having features suiting dataset features. This is based on your experience much.

Look at our dataset, on the one hand, we can see points distributing in a curve that is partly look like a function called “sigmoid” (as showed as Fig. 7).

sigmoid_function

Fig. 7 The graph of sigmoid function

On the other hand, we have a restriction on the upper bound, so it can be easily recognised that these features suit sigmoid’s features well. For the reason, now, we will use the model showed in Eq. (6) instead of the polynomial model as before.

\hat{y} = f(x, \omega) = \omega_{0} + \frac{1} {1 + e^{-x}} \omega_{1} \indent \indent \indent \indent (6)

However, note that the input x is very large in our dataset (up to 20,000), so e^{-x} is very small and can be set as 0. For that reason, we need to add a trick to the Eq. (6) by dividing x by a constant number k. In this case, I use k = 4000, the Eq. (6) turn into:

\hat{y} = f(x, \omega) = \omega_{0} + \frac{1} {1 + e^{-x/4000}} \omega_{1} \indent \indent \indent \indent (7)

The new model is illustrated in Fig. 8.

sigmoid_function

Fig. 8 The sigmoid model

The code can be downloaded from my github: https://github.com/tenga061/machine_learning/blob/master/linear_regression/episode_3/code/9_lr_nonlinear.py.

Apart from polynomial or sigmoid function, we can also apply others like trigonometic or logaric function for our regression model.

So, that’s all I want to talk about how to use linear model for regression tasks. In the next topic, I would introduce the way to use linear model, but for classification tasks.

Hope you enjoy it, goodbye, and see ya,

Curious Chick

References

[1] https://paywizard.org/main/salary/vip-check/world-richest-people

[2] Simon Rogers, Mark Girolami – A first course in Machine Learning

[3] Christopher M. Bishop – Pattern Recognition and Machine Learning

[4] http://scikit-learn.org/stable/modules/linear_model.html

Advertisements

LINEAR REGRESSION (2) – NEURAL NETWORK APPROACH

Related Topics

Introduction

Hi everybody,

In the last post, I depicted how to implement linear regression based on inverse matrix approach or normal equation using both Scikit-learn and TensorFlow libs. Continue reading “LINEAR REGRESSION (2) – NEURAL NETWORK APPROACH”

LINEAR REGRESSION (1) – MATRIX INVERSE APPROACH

Hi everybody,

Today, I would like to talk about the linear model for regression task in this post. While you have seperate labels (which may be (0 and 1) or (-1 and 1)) in classification tasks, the outputs in regression are continuous (i.e. they are arbitrary numbers lie in particular ranges). Continue reading “LINEAR REGRESSION (1) – MATRIX INVERSE APPROACH”