We will now begin our journey

into the world of statistics, which is really a way

to understand or get our head around data. So statistics is all about data. And as we begin our journey

into the world of statistics, we will be doing

a lot of what we can call descriptive statistics. So if we have a bunch

of data, and if we want to tell something

about all of that data without giving them

all of the data, can we somehow describe it

with a smaller set of numbers? So that’s what we’re

going to focus on. And then once we

build our toolkit on the descriptive

statistics, then we can start to make

inferences about that data, start to make conclusions,

start to make judgments. And we’ll start to do a lot

of inferential statistics, make inferences. So with that out of

the way, let’s think about how we can describe data. So let’s say we have

a set of numbers. We can consider this to be data. Maybe we’re measuring

the heights of our plants in our garden. And let’s say we

have six plants. And the heights are 4 inches,

3 inches, 1 inch, 6 inches, and another one’s 1 inch,

and another one is 7 inches. And let’s say someone just

said– in another room, not looking at your

plants, just said, well, you know, how

tall are your plants? And they only want

to hear one number. They want to somehow

have one number that represents all of these

different heights of plants. How would you do that? Well, you’d say, well,

how can I find something that– maybe I want

a typical number. Maybe I want some number that

somehow represents the middle. Maybe I want the

most frequent number. Maybe I want the number

that somehow represents the center of all

of these numbers. And if you said any

of those things, you would actually have

done the same things that the people who first came

up with descriptive statistics said. They said, well,

how can we do it? And we’ll start by thinking

of the idea of average. And in every day

terminology, average has a very particular

meaning, as we’ll see. When many people

talk about average, they’re talking

about the arithmetic mean, which we’ll see shortly. But in statistics, average

means something more general. It really means

give me a typical, or give me a middle number,

or– and these are or’s. And really it’s

an attempt to find a measure of central tendency. So once again, you have

a bunch of numbers. You’re somehow trying

to represent these with one number we’ll call

the average, that’s somehow typical, or middle,

or the center somehow of these numbers. And as we’ll see, there’s

many types of averages. The first is the one that you’re

probably most familiar with. It’s the one– and

people talk about hey, the average on this exam

or the average height. And that’s the arithmetic mean. Just let me write it in. I’ll write in yellow,

arithmetic mean. When arithmetic is a noun,

we call it arithmetic. When it’s an adjective like

this, we call it arithmetic, arithmetic mean. And this is really just the

sum of all the numbers divided by– this is a human-constructed

definition that we’ve found useful– the sum of

all these numbers divided by the number of

numbers we have. So given that, what

is the arithmetic mean of this data set? Well, let’s just compute it. It’s going to be 4 plus

3 plus 1 plus 6 plus 1 plus 7 over the number

of data points we have. So we have six data points. So we’re going to divide by 6. And we get 4 plus 3 is 7,

plus 1 is 8, plus 6 is 14, plus 1 is 15, plus 7. 15 plus 7 is 22. Let me do that one more time. You have 7, 8, 14, 15,

22, all of that over 6. And we could write

this as a mixed number. 6 goes into 22 three times

with a remainder of 4. So it’s 3 and 4/6, which is

the same thing as 3 and 2/3. We could write this as a

decimal with 3.6 repeating. So this is also 3.6 repeating. We could write it any

one of those ways. But this is kind of a

representative number. This is trying to get

at a central tendency. Once again, these are

human-constructed. No one ever– it’s

not like someone just found some religious

document that said, this is the way that

the arithmetic mean must be defined. It’s not as pure

of a computation as, say, finding the

circumference of the circle, which there really is–

that was kind of– we studied the universe. And that just fell out of

our study of the universe. It’s a human-constructed

definition that we found useful. Now there are other ways

to measure the average or find a typical

or middle value. The other very typical

way is the median. And I will write median. I’m running out of colors. I will write median in pink. So there is the median. And the median is literally

looking for the middle number. So if you were to order

all the numbers in your set and find the middle one,

then that is your median. So given that, what’s the

median of this set of numbers going to be? Let’s try to figure it out. Let’s try to order it. So we have 1. Then we have another 1. Then we have a 3. Then we have a 4, a 6, and a 7. So all I did is

I reordered this. And so what’s the middle number? Well, you look here. Since we have an even number of

numbers, we have six numbers, there’s not one middle number. You actually have two

middle numbers here. You have two middle

numbers right over here. You have the 3 and the 4. And in this case, when you

have two middle numbers, you actually go halfway

between these two numbers. You’re essentially taking the

arithmetic mean of these two numbers to find the median. So the median is going

to be halfway in-between 3 and 4, which is

going to be 3.5. So the median in

this case is 3.5. So if you have an even

number of numbers, the median or the middle two, the–

essentially the arithmetic mean of the middle two, or

halfway between the middle two. If you have an odd

number of numbers, it’s a little bit

easier to compute. And just so that

we see that, let me give you another data set. Let’s say our data

set– and I’ll order it for us–

let’s say our data set was 0, 7, 50, I don’t know,

10,000, and 1 million. Let’s say that is our data set. Kind of a crazy data set. But in this situation,

what is our median? Well, here we have five numbers. We have an odd

number of numbers. So it’s easier to

pick out a middle. The middle is the number that is

greater than two of the numbers and is less than

two of the numbers. It’s exactly in the middle. So in this case,

our median is 50. Now, the third measure

of central tendency, and this is the

one that’s probably used least often in

life, is the mode. And people often

forget about it. It sounds like

something very complex. But what we’ll see

is it’s actually a very straightforward idea. And in some ways, it

is the most basic idea. So the mode is actually the most

common number in a data set, if there is a most

common number. If all of the numbers

are represented equally, if there’s no one single

most common number, then you have no mode. But given that

definition of the mode, what is the single most common

number in our original data set, in this data

set right over here? Well, we only have one 4. We only have one 3. But we have two 1’s. We have one 6 and one 7. So the number that shows up

the most number of times here is our 1. So the mode, the most typical

number, the most common number here is a 1. So, you see, these

are all different ways of trying to get at a typical,

or middle, or central tendency. But they do it in very,

very different ways. And as we study more

and more statistics, we’ll see that they’re

good for different things. This is used very frequently. The median is really good if you

have some kind of crazy number out here that could

have otherwise skewed the arithmetic mean. The mode could also be useful

in situations like that, especially if you do

have one number that’s showing up a lot

more frequently. Anyway, I’ll leave you there. And we’ll– the next few videos,

we will explore statistics even deeper.

