hey guys welcome to another session by

Intellipaat driving a data-driven business using machine learning is

considered an important aspect in today’s world top companies such as

Amazon Facebook Apple and many more used machine learning to perform advanced

analytics and drive their business to success in today’s session we’re gonna

have a quick look into the world of machine learning algorithms now before

we begin do subscribe to Intellipaat YouTube channel so that you never miss

out on any of upcoming videos now let’s have a look at the agenda for today’s

video first we’ll understand why we require

machine learning algorithms then we’ll further understand what these algorithms

actually are after that we’ll take a quick dive into the world of machine

learning algorithms and finally we’ll do a couple of demos using these algorithms

also guys if you’re looking to get certified in data science Intellipaat

provides data science certification training courses for more details you

can check out the description without much further delay let’s get started

what do you think is the need for something called as an algorithm well

consider this situation right so let’s say you’re either baking a cake or

you’re driving your car you’re even walking or singing well your body is

continuously oh you know executing the set of steps that you have already

trained it to do and then this is what we call as an algorithm so basically

when you when you driving your car your brain is already programmed to do all

the tasks that are required to pretty much help you to you don’t drive your

car and then when you’re walking as well how do you maintain balance well as a

kid if you could realize that maintaining your balance as a toddler

was very difficult but then you trained yourself every day and then now you can

walk very easily right so this process which involves learning and then this

repetitive process is again pretty much can be termed as an algorithm as well

guys well if you’ve been wondering if algorithms are new concepts well they’re

not algorithms have been used for decades together well back to this

person on the screen called Alan Turing this person appears a good fact this

person was the reason probably why World War two ended he was the one who decoded

the very famous encrypted enigma messages from Germany and then this

person decoded that and then and all the code breakers and so much more right so

the entire point here is to tell you that algorithms have been used as an

age-old tradition that’s being used and these days we’ve been pushing it to our

computer science field as well and then making sure that we make full use out of

it guys and then again why would we require it well think of the huge amount

of data that’s being generated these days and then think of the methods that

we’d need to process it to understand the data or to process the data and then

to you know pretty much clean up the data and work with it right so for all

of these we have something called as algorithms guys so on that note what are

algorithms what is the formal definition of an algorithm

well guys algorithms are as simple as this they are just a set of rules or you

can call them as processes as well to be followed in calculations

or any other problem-solving operations when done by a computer well house how

simple is that well this is exactly what an algorithm means well you have a turn

a symbol on the left hand side that is pretty much what a flowchart looks like

as well or don’t worry you’ll just be checking out the flowchart sections in

the next set of this slide but then right now I want to tell you guys it you

guys are using algorithm as is well step one you’re looking at your screen while

you’ve programmed yourself to look at the screen that’s an algorithm and

YouTube is running a recommendation algorithms where you just aw let’s say

you search for something Python tutorials or anything for that matter

right intellipaat videos are up there so how does YouTube know that you know

it should recommend intellipaat’s videos to its learners

well again an algorithm is being said there and every time you check your mail

you mails a filter in your inbox or in your spam folder and so much more so how

does do a Google or gmail know what or what mail is a spam mail what mail is

not a spam mail right so that again is an algorithm right there and no matter

what operating system you’re on Windows right now or let’s say iOS let’s say Mac

OS Android whatever right so all these operating systems are using algorithms

right now and on that note we let’s quickly break it down into simple terms

and check out the relationship between a pseudocode and a flowchart guys a quick

info guys if you’re looking to get so defined in data science Intellipaat

provides the data science certification training courses do check out our

website for more information let’s continue the session so here it is a

very simple piece of code for you guys this is what we call a pseudocode or a

pseudocode is almost a high level language code it just looks a little

very literal and then you can figure out what the code is doing even though you

might not be a native programmer so once the code part of it and the other is

what we call as the algorithm which the flowchart alongside it so pretty much

we’re inputting a single variable a putting the value 10 to it we’re

inputting our variable B or putting a value of 20 to it we’re adding it pretty

much so C will have the value 30 right now right

a plus B is 10 plus 20 and then we out putting that the word start and stop

again are a part of the pseudocode flowchart relationship and on the right

side if it can just take a look this is what the flowchart of this exact

pseudocode will look like guys well this was very simple so let me quickly step

it up one single notch you know where we can go about checking another pseudocode

flowchart relationship guys so here again we’re inputting a inputting B and

then we’re making sure that until a becomes equal to B will be printing all

the values from A to B and then we’re gonna be increasing a by one so right

now a is 10 it’s gonna check if you know 10 is equal to 20 it’s not so until 10

becomes 20 we’re gonna start printing out everything so the answer is going to

be 10 11 12 13 14 all the way until 20 and this is going on around an iteration

in a loop if you can figure out the diamond box is called as the decision

box where it has two tracks one is one can be your true/false strike or a

yes/no track and in this particular case we have the yes/no track here guys

so on that note we need to understand why we would require all these

algorithms in machine learning right so before that why would we even require

machine learning well guys again the machine learning definition can pretty

much can be given you know to the world as the ability for a machine to learn

something without it being programmed for that particular thing well how cool

is that it is again basically the field of study where computers use a massive

amount of data and they apply all of these algorithms were training

themselves how here’s the keyword training themselves and again making

predictions on that right so again training in machine learning entails

feeding a lot of data into the algorithm and allowing the machine itself to learn

more about the process information well you’re gonna just tell the Machine a lot

of basics probably or just show it one iteration where the Machine pretty much

goes on to figure out say 9 or 10 more iterations on its own it’s gonna learn

on its own it’s kind of process on its own and pretty much you know you can

work with that data later on right so again we can call this a process of

converting just raw data into useful information as

but then we’re doing it with the help of these algorithms that we’re about to

learn guys so on that note or we need to check out what the types of machine

learning are so we have three main types of learning which happens when we talk

about machine learning guys it’s supervised learning

it’s unsupervised learning and it’s reinforcement learning guys so if I were

you guys I would just suggest I would just suggest you guys just take a minute

pause on the slide to note these three types of machine learning guys

supervised learning unsupervised learning and reinforced learning if

you’re already familiar with the concepts or if you think that you got it

in the bag well let’s more to check out what supervised learning actually means

oh well supervised learning as the name suggests requires some sort of

supervision right let us talk in terms of variables so we can understand it

easily again in super wise machine learning algorithms let’s say we have

input variables and our output variables these input variables are denoted by X

and the output variables are denoted by Y so X is input Y is output the goal of

any supervised learning system is to understand how your output variable Y

changes with respect to the change made in terms of X guys so how does the

output variable Y vary when we go about playing with our input variable X is

pretty much the goal of for supervised learning system guys and then here will

also be approximating the mapping function or to a point where we’ll have

new input data coming in which we haven’t seen which the machine hasn’t

seen and then we can predict new output variables Y with respect to all the new

data the new X data that the machine just saw so we have pre ended for a

particular amount of X’s and then it saw a new amount of data a new amount of

input variables and then it trains itself to pretty much give us new a Y

output value guys so how cool is that right and then we need to also know that

we have dependent variables and the concept of independent variables right

and our aim here is to pretty much understand how our dependent variable

will change with respect to one independent

variable so we have a couple of dependent variable with, you know

goes hand-in-hand with all the variability call as the independent

variable and then we need to understand what are the changes that goes into

these dependent variables when they are mapped across and compared or with

respect to our independent variable says just to make sure that you guys are

getting the concept out here here’s a very simple example showing you the same

so again here our independent variable in our particular cases let’s say our

gender of the student we have a girl and a boy here the dependent variable can be

the outcome of the educational qualification of these Students so

let’s say if the student either passed an examination or fail an examination

this becomes our dependent variable so the independent variable is our gender

the dependent variable becomes the output of what the student is trying to

do and at the end of it what we’re trying to do is basically trying to

determine whether the student would pass the exam or not based on the person’s

gender let’s say we’re doing a survey where we need to find out how many girls

have passed or how many boys have passed here again the gender becomes the

independent variable and all of that depending on it in our particular cases

the outcome the paths of the fail becomes dependent right so here again we

trying to find out if the student would pass based on the gender or not so the

dependent variable would pretty much here be again now as I’ve already been

mentioning it’s going to be the outcome and the independent variable is going to

be the gender guys so do we have anything more in terms of supervised

learning well yes guys here is more classification with respect to

supervised learning as we have for something called as classification and

something called as the regression let us quickly check out water regression is

and then we can come talk about classification guys well regression is a

type of supervised learning where the output variable is a continuous numeric

value to what do we mean by a continuous numeric value right so let me again take

another quick example to make sure you guys understand this better

I’ve images of two apples for you guys one Apple cost four dollars the other

Apple costs are three dollars here the output variable is the cost of the Apple

it is a numeric value which is a nice value you can predict it right is

the Apple ripe if it’s yes then its costly if it’s not yet ripe then it’s

cheap well is it or Shimla Apple as a Kashmiri Apple is it a Washington

Apple well you can you can pretty much go on adding so many factors around this

Apple and then come up with one particular outcome out of it which would

be the price right so the price depends on all of these factors and in our case

the price is the output variable so we’re trying to predict the cost of the

apple with respect to all these other factors right so again doing this in a

real-world or in a mathematical situation and in this situation pretty

much we call it as a regression guys a quick info guys if you’re looking to get

Certified in data science intellipaat provides the data science a certification

training courses do check out our website for more information let’s

continue the session so with respect to regression again there is another type

of regression which what we call it as the logistic regression and this is

basically just a technique you know where our dependent variable instead of

it being a country it’s numerical value it is a categorical value guys so again

what do we mean by this time for an example if you can take a look at the

example on your screen right now what we’re trying to do is we’re trying to

predict whether or if it’s gonna rain on that particular day or not and this is

being done with respect to two independent variables right so how do we

check rain again pretty much it’s usually done by checking the temperature

or checking the humidity and if all of this is good we probably just go out

take a look at the sky or to check for clouds and so much more right and you’re

coming back to logistic regression the dependent variable is the categorical

variable right so it can have only two values a categorical variable can only

have two values it is mostly binary guys so it is going to be either zero or it’s

gonna be one and in this logistic regression model what we call it

depending on all of these attributes or we get the probability our final answer

is going to be either yes or no right so if you ask someone a question is it

gonna rain their answer might be either a yes or a no right so it’s a binary

answer again here it’s the same as well again so – pretty much – graph out what

it would look like we have an s-shaped curve out of this model what

we call as the logistic regression case so on the Left we have a linear

relationship between our dependent variables and the independent variables

and it’s just a straight line on the right since it’s a binary value by the

outcome that we were looking at the curve looks like an S so again guys take

a moment pretty much pause on this slide to understand what a linear regression

graph looks like versus what a logistic regression graph looks like so on that

note let us quickly come back to check out the next subdivision under

supervised learning which is called as classification guys oh you pretty much

as the name suggests you might already know what classification means in

literal terms well again classification here the output variable is

categorical in nature so again it’s going to be a binary value so you can

just have a have a look at the picture on your screen and then we can

categorically analyze if that person is a male or a female right so here the

buyer your outcome is again the gender of the person if the person’s either a

man or a woman and then again the output variable is the gender of the person

which is a categorical value and we are trying to classify this person into a

specific gender or based on all the other factors as well well how do we

know it well we could see the beard on the face it looks like a man so our

brain pretty much told us it as a man right simple as that so on that note of

we’ve pretty much checked out what supervised learning is so what is

unsupervised learning well guys in unsupervised learning or

all of the algorithms that we have right we have input data which has no labels

so when we mean that we the data does not have any labels then there is

nothing that the Machine can map to understand the data offhand very easily

so if we can take a look at the raw data ourselves right so we can probably tell

that it there’s a couple of fishes in there there’s a couple of birds in there

well we know it because we have trained ourselves for that when the machine sees

this there’s not gonna be any label which is going to tell that this is a

fish or this is a bird so our unsupervised learning algorithm

is pretty much going to run through this again and at the end of it with respect

to clustering what we call is the process of clustering it’s going to

divide all the fishes for us divide all the birds for us on its

on so here the input data has no input labels has no class labels and it

doesn’t know what’s a fish what’s a bird right so again building a

supervised or unsupervised model on top of this input data is again very

interesting and very fun guys so here again is going to pretty much be

giving out two clusters first consists of all the fishes and second consists of

all the birds guys so coming to clustering which is again a major part

of unsupervised learning the most important clustering algorithm the most

simple one is the k-means clustering guys well k-means clustering again is an

unsupervised machine learning algorithm where the aim is to pretty much go about

grouping all the similar data points just like fishes and birds and making it

to do one cluster race so again there must be already high I know intra

cluster similarity and low inter cluster similarity out here right so what do we

mean by that well all the data points you know within a cluster should be as

similar as possible and all the data points in between two different clusters

must be as different as possible so all the data in one cluster is simple and

similar all the data when you compare two different clusters are very

different to each other right so this is pretty much the k-means clustering in

just a sentence guys well what is the K stand for on the k-means clustering

right well k is the number of clusters that you just want the outcome to be in

a particular case we have close to A cluster B and cluster C so the K value

here is three because we have three different clusters right very very very

simple as that guys so on that note the next type of learning that happens is

what we call as the reinforcement learning guys again in reinforcement

learning or there is something called as an agent and this agent pretty much runs

up and returns up most effective actions for us by mapping its state at every

single moment guys so to give you a better clarity just so I I hope you guys

have played pac-man in your raw in your olden days guys so in this particular

video game the space around or around the figure should what we call as a 2d

game space again you have all you have something called is packed dots you have

enemies you have walls and so much more right so the action here is to again

just pretty much more around and make sure you don’t

bad guys and just finish your entire goal here how do you know what the who

the good guys are and where you need to move and how you you’re not supposed to

you know get out every single time right so that particular thing you’ve been

playing this game for a while or let’s say you’ve been playing this game for a

couple of hours couple of days in your childhood and then you realize how the

game actually works well that exactly is reinforcement learning guys again to

give you another example reinforcement learning is pretty much how a dog or a

cat has trained in its real life as well if the dog does something right if the

dog has given a handshake let’s say we’re training a dog to give a handshake

and then if the dog is given a handshake you might see that the trainer just

feeds a biscuit that instant right so the dog knows that the outcome of giving

and a handshake is pretty much the right thing to do because there is a biscuit

at the end of it so the reward is being hunted by the animal right so again to

put it all in one single picture this would or reinforcement learning

environment would look like I guess so we have an agent who performs an action

in an environment and then here we can actually have two tracks where it if the

agent does it right if the task is being performed right there is a reward with

respect to it and everyone’s happy yeah else if you do not have that particular

reward then it means that something went wrong and this will have a state because

something went wrong you’re eventually not getting the reward let’s say the dog

did not give you a handshake or if you pretty much give it a biscuit at that

moment it will not realize if it’s doing the right thing or the wrong thing right

so that we can have a state of let’s say the dog did not give a handshake and

that’s pretty much what st means guys a reward is RP and this keeps on going in

a nitration where you’re just training your model better and better and better

to hunt more rewards the more the rewards then the machine is doing the

right thing it’s as simple as that case so all that note I have two very simple

demos which are in Python that I just quickly want to run it by you guys to

tell you the use of machine learning algorithms anyway also on that note let

me quickly jump into Google collab a quick info guys if you’re looking to get

so defined in data science intially path provides data science certification

training courses do check out our website for more information

let’s continue the session google collab is basically a Python or Jupiter

notebook hosted on the Google cloud and I use this for most of my Python coding

as well so anyway coming back to it here’s the

here’s the first example that we’d like to discuss with you guys well just give

me a second the runtime is being connected so it’s almost connected now

it’s initializing and then it’s gonna say connected any minute time and there

it is so first let us take out a k-means clustering demo right so pretty much

we’re gonna import a couple of packages such as numpy pandas we have matplotlib

to pretty much give us the output in terms of graphs we have SK learn to

pretty much import of what we have the sub library called as the k-means

library and then go on working with it so let me quickly import all of these

libraries that we’ll be making use of and then go ahead with that so to

generate a data of our own instead of just picking it up from any data set for

this particular case we’ll be making our own data using something called us make

underscore blobs case so we’ll have 300 samples here and then we’ll have four

clusters each so this is what we mean Zen and disco samples is $300 we have

300 dots on your screen right now and these dots are divided pretty much into

four clusters for us so let us use something called as the elbow method or

we’re pretty much it’s called as W CSS I would recommend you guys pretty much

google it what would if you want to know what W CSS means it does again a very

complex part of the k-means algorithm and and i would just suggest you guys to

check it out on your own because it is not on the scope of this particular

tutorial and then so we’ll be using that particular method and we’re gonna tree

in the entire model for us or to make it understand what’s going on so look at

this right so what does the optimal number of clusters again for us is

somewhere around or say 3 or 4 as well so we have 4 clusters and we have the

WCS s all the way from 2500 or till 0 right so we’re gonna have to categorize

this is just a graph to tell us what the data might look like right so we need to

find out the centroid of what we call as the centroid in our k-means clustering

algorithm of each different cluster and then we need to mark that Center

right so this is exactly the red dot what you see is again exactly what’s

going on then so if pretty much found out that there

are four clock clusters that exist and then we’ve pretty much mark the centroid

of the of the four different clusters that you see are using k-means

clustering guys it’s as simple as that so that was a very simple first demo

right for a second scenario I will be checking out our logistic regression and

in this particular case we’ll be going on to predict a heart disease prediction

data set and we’ll be performing our machine learning algorithms and we’ll be

using machine learning here to predict if a person is gonna have a heart

disease or not and we’re gonna be doing this entirely using the process of

logistic regression guys again we’re importing a couple of libraries here

pandas to handle the data on numpy 200 mathematical operations Skype right to

go on to do our computations then we have matplotlib and Seabourn – pretty

much to give us visualizations and we have SK learn which is a sky kick learn

which is again a very important machine learning library of Python and we’re

gonna import all of these guys so just before that we need oh we need the data

set file which is called as the framingham data site well the data set

is from the town of Framingham in Massachusetts so let me just quickly you

know import the file which is called as the Framingham dataset and then we can

pretty much go on to working over that guys so you know it’s gonna take a

second to pretty much get uploaded it’s a small file and as you can see it’s

been uploaded so now I can go out to pretty much run this code where this is

what our dataset would look like oh if it’s a binary value for mail it means if

it’s mail equal to one then the person’s mail if mail equal to zero it means the

person’s if email there it has the age it has if the person is a if the person

is a current smoker or not and how many cigarettes per day do you have PP Mandic

BB medications and their blood pressure basically and then have you had a stroke

in your life are you diabetic what is your total cholesterol what is your

systolic blood pressure what is your diastolic blood pressure what is your

body mass index what’s your heart rate what’s the glucose that and then it’s

not check your or CHT as well and so much more so this isn’t a me

using data said to work with and pretty much we’re gonna be just replacing the

column of mail by sections command that’s about it what we’re doing here

and then we need to find out how many missing values we have in this

particular data set and there are so many values with zeros in it right so we

have a about 388 missing values when it comes to glucose 50 missing values when

it comes to cholesterol and so much more so let us go on to you know remove all

of these missing values and say hey look it found pretty much about 500 or total

number of rows with missing values right and it’s fine in our particular case

because it’s only 12% of the entire data set so we can exclude that and we can

pretty much drop it and you know it wouldn’t hurt our analysis at the end of

it so to begin with you have to perform some exploratory analysis where we need

to show what the data is being distributed like I mean we just hunt

into our data to find out what the data is telling us right so here’s a couple

of for quick charts which pretty much give us all of our numerical data with

respect to graph so we have again the sex distribution we have the age

distribution current smokers BP medications distribution cigarettes per

day up again our diabetics total cholesterol is BMI systolic blood

pressure the weekend diastolic blood pressure and so much more right so we’re

just pretty much performing some quick exploratory analysis analytics on it and

then are they gonna be going about to find out what the actual this is just a

10-year raw CHD that i’m printing out and then we need to go about finding out

if the person has a rate you know has a chance of forgetting a heart disease or

not well here we can check out the count right so there are about 500 let’s say

600 people who are in the risk of getting a heart disease while there are

about 3,500 or let’s say 4,000 people who are healthy and quite well this is

what exploratory analysis you know pretty much helps us to do it gives us a

sort of an analytics number where it can find out of the person might you know

suffer from our heart disease or so in the near future and so much more right

so let us quickly you know go about plotting that and we can go out from

that well as you guys could see that pretty

much took about a minute of processing because it has to plot so many values

for us right I’m sorry let me quickly scroll down so we can get a better view

again this is respective this is a seaborne access grid plot and then you

can see all the concentration of all the values at every particular instant right

this is for every single aspect that we are using to compare so let us quickly

use describe to pretty much tell us what we’re just looking at and yeah so we

have a count of about three thousand seven fifty one males thieves it’s gonna

give you the age of so many people it’s gonna give you all the cigarettes BB

Mets prevail and stroke and so much more right so coming to the process of

logistic regression out here from all these data set we need to make we need

to have an inference at the end of it right so to do that we pretty much be

running a couple of functions one of those functions is lambda function and

then we can have this very nicely optimized output printed for us and then

as you can check out as it already says the tenure or CH D is pretty much our

dependent variable will be using logistic regression so much more right

so it’s going to give you all the standard errors all the values of we

call it the Z method it’s going to be the Z method value it’s gonna check if

your probability of your outcome is greater than or the value of Z with

respect to all of these single categorical variables that were checking

and then when it comes to backward elimination will pretty much be using

our off each of selection to go about doing it and the end of it we can have a

summary very nice looking somebody printed for us oh well again the

somebody looks nice right so we need to make more sense out of it such that okay

this is the odds this is the ratio around so here we have something called

as the p-values we have the odds ratio and the CI 95% value is out here so here

we can pretty much go on to analyze what actually causes or you know the the

outcome of let’s say our heart disease and so where we can make sense out of it

to use our model to make sense out of us let’s quickly split our row one single

dataset into a training data set and our testing dataset and let us make our

model give us the answer for us right so checking out model accuracy using our

raw skycat law library again you can pretty much find out that our

model is almost accurate for about 90 percent right so eighty-eight point one

four percent is a big number and it’s been training well not for many times

right so the number of high iterations again is very less so here’s our subplot

is what we call as an access subplot and here as well you can pretty much check

out the actual predicted outcome values which is predicted one predictor zero

the actual outcome values is this color while the actual values blue color right

so the color distribution here again will let you know if what’s going on

there as well well here is another step to pretty much print out what’s you know

what’s a true or true positive rate of the data true negative date of the data

and so much more to put it all into one single print statement to make it sure

it looks very nicely the accuracy of our entire model is about 88% the miss

classification is pretty much 1 – so what the accuracy is right so we’ve

missed about 11 percent of accuracy true positive rates we are somewhere about 4

percent – negative rates we have somewhere around 99 percent positive

prediction rate is 80 percent negative prediction rate is somewhere around 88

percent and so much more right so look at this amount of data look at this

amount of data that our machine learning algorithm is up is pretty much giving us

right so if you put it literally you know in terms of for use cases in terms

of medicine then this is going to help a lot of people right so that was a quick

walk through you know pretty much on how you can go about using gain means

clustering and logistic regression algorithm sketch all right guys I hope

this video is helpful to you if you have any further queries do let us know in

the comment section below we’ll reach out to you immediately so guys thank you

so much for watching this video and giving us your precious time