Data science-An introduction to different Machine Learning techniques

Hello, readers. Before we dive deep into ML algorithms and dataset manipulation, it is imperative to quickly go over the basics.

This is kind of a roadmap explaining the various different components of machine learning so that you’ll have a guide to what to learn first.

Why data science?

The hype started with the Harvard business review article :

Data Scientist: The Sexiest Job of the 21st Century

Today we take the internet for granted, so let me provide some facts about today’s internet:

people in the world	7.77 billion
active Internet users	4.54 billion
people using the Internet in Asia	50.3 % of the population
people using the Internet in the USA	293 million
average Internet user hours/day	6.5 hours online every day
Internet traffic every second of the day	88,555 GB
average Internet connection speed	11.03 Mbps
mobile traffic every month	40.77 exabytes
no. of apps in Google Play Store	2,570,000
total data in Google Search Index	100,000,000 GB of data

Some Internet Statistics, 2020 updated

This is exactly the reason that data science is still a much-needed skillset.

Machine Learning – what is it?

Machine Learning (ML) is defined as a collection of computer algorithms that allow systems to learn and generate outputs autonomously and further enhance them from different analyses and outputs.

To solve market problems such as regression, sorting, forecasting, clustering, and associations, etc., machine learning algorithms can be used.

Types of ML

There are generally four types of ML algorithms:

Supervised ML
Unsupervised ML
Reinforcement learning
Semi-supervised learning

Supervised Learning

Supervised learning is a method that uses previously labelled data to learn and the algorithm predicts the unknown or potential knowledge mark.

In fact, a supervised algorithm for data science is informed of what to look for, and so it does. Until it finds the underlying patterns for the predicted output that produce a reasonable degree of accuracy.

In other words, using these previous known outputs, this machine learning algorithm learns from past data and then produces an equation for the value.

We try to model the data within supervised learning algorithms. For instance, when kaggle tells us that we need to use house price regression, then given some constraints, we model the data.

Supervised learning solves two types of problems:

Regression (continuous target variable)
Classification (categorical target variable)

Unsupervised Learning

Unsupervised learning is a category of machine learning algorithm that typically utilises unlabeled data to train its model.

In this method of machine learning, we allow the algorithm to self-discover the underlying patterns, connections, equations, and ties in the data without adding any bias from the user’s end (in the form of constraints / dependency).

By unsupervised learning, the secret and unknown variations in the data may be established.

And it also helps to define the characteristics that are especially beneficial for data auto-categorization.

While there is still a bit of doubt, the benefits greatly outweigh the cons.

Say you are charged with separating a bag of fruits that you have never seen. This can be summarised as follows.

So you’re likely to split them into parts depending on their physical attributes (such as weight, roughness, etc.), so even though you just divided them, you probably wouldn’t know the class titles. In unsupervised instruction, this is just what happens.

Main problems solved are:

Clustering (for example, customer segmentation)
Association (Market Basket Analysis)

Reinforcement learning

Reinforcement Learning consists of a part of the system, called an agent, for learning in a simulated, interactive virtual environment by trial and error.

In a reward-punishment mechanism created from feedback from their own actions and interactions, it enhances the performance.

There are no predefined labels.

Complementing the paradigm is a system of rewards for accurate responses and a system of punishments for wrong responses.

Finally, maximizing rewards is the aim of the algorithm.

But an online game called Dota 2 is an important achievement in the field of Machine Learning.

So, Elon Musk ‘s company developed an AI robot named OpenAI, which practised for an equal period of 45,000 human years against itself (reinforcement learning)! Of course, because it’s an AI, every game has been greatly accelerated.

Dota2 initially has around 115 heroes from which to pick.

After that, at any point, there is only 1 hero, with 4 skills, and a potential 6 items, each of which may have an active skill.

Multiply it with the number of players to form a team, and it is mind-boggling that the game is real-time, and the number of permutation-combinations.

Possibly, reinforcement learning is as “AI” as it comes, and it’s a highly researched subject.

Learning to strengthen addresses two broad categories of problems:

Classification (with a categorical target vector in presence) Regulation that does not have a goal (driverless vehicles, self-playing games, etc.).

Semi-supervised Learning

We also have data that is labelled with just a small portion of it, and the majority of the information is unlabeled. This is highly normal and comes under semi-supervised learning.

Therefore it is especially important because much of the data you can receive is of this sort in the real world.

And this is because knowledge marking is a human activity that requires a great deal of time, commitment, and expense. Any of the data is also easier to mark.

It has a categorical target vector and functions on problems of sorting and clustering.

And that’s about it.