MACHINE LEARNING – WHAT IS THE DATA TELLING YOU?

Hi everyone,

Once I mention the concept of Machine Learning (ML) to my friends, who doesn’t work in the discipline of technology or data science, at least one of them think that it may be related to some characters on sci-fi like The Terminator, Wall-E, or recently, Ultron. In fact, it is partly true, that ultimately, the final purpose of ML is to make a robot that can be able to think and do like human beings, but now, it seem to be used mainly for analyzing data and getting information from it.

Nowadays, lots of technical companies have been bringing researches to the life (Natural Language Processing (NLP), Pattern Recognition, etc.), many businesses also used ML to analyze their customer and stock stock, and it can be said that ML become one of the sexist field in the world. Basically, there is two types of ML: supervised and unsupervised learning (see Fig. 1). For supervised learning, you need a datasets including input and output to train your algorithms, it is used for regression or classification tasks. By contrast, unsupervised learning don’t need output to train and it is used for dimensionality reduction or cluster tasks. For more details about ML, you can refer to these references [1], [2].

ml_family

Fig.1 Two types of Machine Learning

Actually, learning ML is not as tough as you can think and if you just stay at using it, it is no more than a tool. The hardest works is that how to collect and preprocess the data before you feed it into ML algorithms. Each discipline have their own way to do that. For example, speaking from personal experience in building Brain – Computer interface systems (BCIs), the data in this case is EEG signal which is acquired from the scalp. Firstly, for data collection stage, what you need to do before collecting signal is to ask yourself some question to consider the purpose of collecting data, kind of what is the sampling rate you want to design the suitable hardware? Which position is chose to collect signal on your scalp to select the number of electrodes? etc. Secondly, for the preprocessing stage, of course, the raw data often have a lot of noise, so you need to use filters to denoise your datasets after that, then extract features into a feature input vector and feed it to ML algorithms. I can say that feature extraction is one of the most creative work when you try to analyze the data, and it require several knowledge about the field you work in. For example, come back to the field of BCIs, you may use some common descriptive statistics such as mean, median, standard deviation, etc. as features of data, but if you know a little bit about how your brain works, like the magnitude of alpha will increase and that of beta decrease when you are relax or in meditation, blah blah, and instead of statistical features, you now use the energy of alpha and beta from every electrode to make feature vector, it will be more efficient (not only decrease the size of feature vector but also increase the classification results). OK, you see that a lot of things to do before using ML. However, these works are beyond of this topic, so I will not discuss more.

In this series, I just focus on some popular algorithms in both supervised and unsupervised learning to help us solve some basic problems. And as I mentioned before, even your work is out of the field of technology, but if you are interested in ML, you can also learn and discuss it.

On the other hand, all codes in this series will be implemented using Scikit-learn and TensorFlow. A little bit about TensorFlow, it is low-level library which is more complicated than Scikit-learn to be used to implement ML algorithm. However, I would like to use it in some of my real projects because it is more efficient. The reasons of this is that TensorFlow sources was built by using C. Moreover, a computational graph is needed to drawn before running the ML algorithms. These help TensorFlow run much faster than Scikit-learn. Actually, there is other libs like Theano using the same ideas as TensorFlow but I would rather using it. It depends the taste.

Another thing is how to install TensorFlow? If you use Ubuntu or MAC OS, it wouldn’t be problem, prefer to [3] and [4] to get how to install it. However, there is a problem when you try to install TensorFlow in your Windows computer. The problem is that, TensorFlow is only support for Python 3.5.* on Windows. I tried it and it seemed not to work as expectation. For example, I wrote a simple code that print ‘Hello World’ to the screen using TensorFlow, but the line that is printed on the screen respond for byte instead of string type.

tensorflow_hello

Fig. 2 TensorFlow on Python 3.5

On the other hand, as I mentioned in the first tutorial in the series of Basic Python Tutorial, there are some important libraries that cannot be used on Python 3.5.* such as Scipy. Thus, to solve this problem, I use Ubuntu on Virtual Machine. The instruction to install Virtual Machine is described on [5].

How to install Ubuntu on VirtualBox in Windows [5]

OK, so have you set up all need of software on your computer for ML? In the next tutorial, we will familiar to the simplest ML algorithms called Linear Regression.

Hope you enjoy it,

Curious Chick

References

[1] https://www.toptal.com/machine-learning/machine-learning-theory-an-introductory-primer

[2] https://monkeylearn.com/blog/gentle-guide-to-machine-learning/

[3] https://www.tensorflow.org/install/install_mac

[4] https://www.tensorflow.org/install/install_linux

[5] https://www.youtube.com/watch?v=GGorVpzZQwA

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this:
search previous next tag category expand menu location phone mail time cart zoom edit close