Chapter 2: Measures of Central Tendency

Search this chapter

Audio Overview

0:00 / 0:00

Autoplay next chapter

Welcome to Last Minute Lecture.

This free chapter overview is designed to help students review and understand key concepts.

These summaries supplement, not replace, the original textbook and may not be redistributed or resold.

For complete coverage, always consult the text.

You know, it is said that the average person laughs about 10 times a day, and they fall asleep in exactly 7 minutes.

Wow, 7 minutes!

I wish.

Right, but it gets weirder.

Apparently they shed .7 kilograms of skin every single year.

They grow 944 kilometers of hair over their lifetime.

And get this, when they sneeze, it travels at 160 kilometers per hour.

That is just a terrifying image, honestly.

I mean, a person rocketing out 160 kilometers per hour sneeze while trailing like a thousand kilometers of hair behind them.

And shedding skin the whole way.

Yeah, no thank you.

I definitely would want to sit next to them on a bus.

But it brings up a really fascinating point about how we process the world.

Welcome to the Deep Dive.

Today we are acting as your personal guides through the core probability and statistics concepts covered in Chapter 2 of the Cambridge International AS and A -Level Mathematics Coursebook.

And our mission today isn't just to memorize formulas, it's to understand the philosophical choices behind how we actually summarize data.

Because that average person I just described, have you ever actually met them?

Probably not.

Exactly.

We use an average to summarize a massive, chaotic set of data.

We're trying to find a representative value that is typical of the whole.

It's basically a way of compressing reality so our brains can actually handle it.

But choosing how you find that typical value changes absolutely everything.

So today we are going to unpack the three main measures of central tendency.

That's the load, the mean, and the median.

And we're going to explore them by looking at the specific mechanisms, the potential traps and the historical context behind them.

Because by the end of this, you'll see that choosing the right measure depends entirely on the story you are trying to tell with your data.

So let's start with the simplest concept to grasp, but one that gets surprisingly tricky when the data is hidden from plain view.

The mode.

Right.

At its core, the mode is simply the most commonly occurring value in your data set.

It is a pure popularity contest.

Well, for raw data, that's incredibly straightforward.

Like if I roll a standard die 25 times and the number 2 comes up 6 times, which happens to be more than any other number, my mode is 2.

End of story.

It's elegant in its simplicity.

But here's the challenge.

In the real world, and especially in advanced statistics, we rarely get to see the raw numbers.

Data is often grouped into ranges just to make it manageable.

Don't worry.

Imagine you're inspecting a massive batch of 270 pencils and you're trying to find the most common length.

You don't measure each one down to the millimeter and list 270 separate numbers.

Instead, you just toss them into buckets based on ranges.

Okay, let's unpack this.

If we group them, we might end up with something like 100 pencils in a bucket for lengths between 4 and 7 centimeters.

Sure.

There may be 90 pencils in a bucket for lengths between 8 and 10 centimeters.

And finally, 80 pencils in a bucket for lengths between 11 and 12 centimeters.

That is a perfect scenario.

Now, looking at those buckets, where does the mode lie?

Well, my instinct is to just look at the raw number of pencils.

The 4 to 7 centimeter bucket has the most, right, with 100 pencils.

So wouldn't that just be our mode or, more accurately, our modal class?

That instinct is exactly what catches people off guard.

Because we're dealing with grouped data with varying class widths, we cannot simply look for the highest frequency.

We have to find the class with the highest frequency density.

Frequency density.

Okay, wait.

So it's not just about how many items are in the group.

It's about how wide the group is.

Exactly.

So the 4 to 7 centimeter class has 100 pencils, sure.

But it covers a width of 4 whole centimeters.

The 11 to 12 centimeter class only has 80 pencils.

But it covers a width of just 2 centimeters.

You've hit the nail on the head.

We calculate frequency density by taking the number of items and dividing it by the physical width of the category.

Oh, I see.

It makes me think of population density in a city.

It's not just about which room in a building has the most people in it.

It's about how tightly packed they are.

That's a brilliant analogy.

Like, if you cram 80 people into a tiny little studio apartment, they are far more crowded than 100 people wandering around a giant banquet hall.

The studio apartment has the higher density.

Yes.

Let's run the numbers on your pencils with that in mind.

For that first massive bucket, 100 pencils divided by a width of 4 gives a frequency density of 25 pencils per centimeter.

Okay.

The second bucket, 90 pencils divided by a width of 3 gives a density of 30.

And for the final bucket, your tiny studio apartment, 80 pencils divided by a width of just 2 gives a frequency density of 40.

Wow.

So the 11 to 12 centimeter class has the highest frequency density at 40, meaning that is our true modal class.

Even though it held the fewest total pencils, it had the greatest concentration of pencils per centimeter.

Which tells us a much more accurate story about where the true peak of our data lies.

Now, the mode is great if you're just looking for that peak popularity, but notice its glaring limitation.

It completely ignores the rest of the data.

Exactly.

It ignores the actual weight or numeric value of everything else.

If we want to measure that physically balances every single data point on a scale, we have to move to the arithmetic mean.

The classic average most of us learned in school.

So you add everything up and divide by how many things you have.

That's the one.

But at this level of mathematics, we need to formalize the mechanism.

We use the uppercase Greek letter sigma, which looks like a jagged capital E.

Right.

The sum symbol.

In statistics, sigma is an active command.

It simply means sum up everything that follows.

So the mean, which we represent as an X with a horizontal bar over it, literally called X bar, is calculated as sigma X divided by N.

The sum of all your values divided by the total number of values.

Simple enough.

But what if we have grouped data where we have frequencies instead of a list of single items?

Okay.

Yeah.

How does that work?

The logic remains the same, but the notation adapts.

It becomes sigma X F divided by sigma F.

You multiply each value by how many times it occurs, its frequency, add all those products together, and then divide by the total number of items.

Okay.

I have a theoretical question for you here.

Let's say I'm working with two separate data sets, group A and group B, and I've already done the hard work of finding the mean for both of them.

Okay.

If I want the mean of everything combined, can I just take the mean of group A, add it to the mean of group B, and divide by two?

I am incredibly glad you brought that up, because it is perhaps the most dangerous pitfall in basic statistics.

The mean of two combined data sets is almost never just the average of their two individual means.

Wait, really?

Yeah.

Why not?

That feels so intuitively right.

If I have two averages, the total average should be right in the middle of them.

It feels intuitive until you consider the size of the groups.

Imagine you have two bags of sweets.

You have a massive bag containing 72 sweets, and the total mass of that bag is 852 .4 grams.

Okay.

Got it.

Then you have a tiny bag with only 24 sweets with a total mass of 282 .8 grams.

If you want the mean mass of a single sweet across both bags, you cannot just average the two bag means.

Ah, I see it now.

Because the massive bag has three times as many sweets as the tiny bag.

Exactly.

If I just average the two means, I'm giving the 24 sweets in the tiny bag way too much voting power.

I'm treating them as if they have the exact same influence as the 72 sweets.

Precisely.

You are ignoring the mathematical gravity of the larger group.

To find the true combined mean, you have to tear it all down and rebuild from the ground up.

You need absolute totals.

So total mass divided by total sweets.

Yes.

You find the total mass of everything combined, 852 .4 plus 282 .8, giving 1135 .2 grams.

Then you find the total number of individual sweets, 72 plus 24, giving 96.

Finally, you divide that total mass by the total number.

So 1135 .2 divided by 96.

Which equals a true mean of 11 .825 grams.

So the golden rule is never average an average.

Always go back to the total sum divided by the total count.

Always.

Now, calculating the mean gets a little more philosophical when we go back to grouped data.

Remember our buckets of pencils.

Or let's use another scenario.

Imagine you have crates of coconuts.

Okay, coconuts.

You have 46 crates where the coconuts weigh somewhere between 20 and 25 kilograms.

But you don't know the exact weight of a single coconut.

You just know they're in that bucket.

So how on earth do we find a mean if we don't actually have any specific numbers to add up?

We can't do the sigma calculation without an x.

We have to make an educated compromise.

We use an estimate.

We find the exact mid value of each class boundary.

And we let that single mid value represent every item in the group.

For a bucket ranging from 20 to 25 kilograms, the mid value is 22 .5.

We assume statistically that the actual weights will balance out symmetrically around that middle point.

That makes perfect sense.

But what happens if the boundaries of those buckets are blurry?

Like if we are dealing with people's ages, if my buckets are age 18, age 19, and then ages 20 to 21,

there are gaps there.

This is a crucial warning.

When finding mid values, your class boundaries must be completely seamless.

Age is a continuous variable, but we record it as a discrete integer.

Think about it.

An 18 -year -old is someone who is anywhere from exactly 18 .2 years old right up to the day just before their 19th birthday.

Right.

You're still 18 even if you were 18 and 364 days old.

Exactly.

So the true mathematical boundary for the 18 -bucket isn't just the number 18.

It is the continuous range from 18 to 19.

Which means the true mid value is 18 .5.

Yes.

If you mistakenly just use the integer 18 as your mid value, your estimated mean for the entire data set will be skewed downward.

You have to close the gaps before you find the middle.

I have to admit, doing this manually sounds like a nightmare.

If I'm dealing with huge data sets, adding up hundreds of mid values, multiplying them by frequencies, dealing with massive totals, I feel like I'd drop a decimal point and ruin the whole thing by a third row.

Which is why, long before computers could do this instantly, statisticians invented a brilliant mathematical shortcut.

It's a technique called coding.

Coding.

It sounds like computer programming, but we're talking about a mathematical magic trick to make unwieldy numbers small enough to handle, right?

It's a way of temporarily simplifying a data set by applying a mathematical operation to every single value in it.

Usually you subtract a constant from every value, or you divide every value by a constant.

It shrinks the numbers down to a manageable size.

Oh, I love this concept.

It's like, imagine I have a really tall ladder leaning against a brick wall, and I want to measure the average height of all the rungs from the ground.

Instead of measuring 15 feet, then 16 feet, then 17 feet, I could just shift the entire ladder down the wall by 10 feet.

The distance between the rungs stays exactly the same relative to each other.

I just calculate my new, much smaller average height, and then at the very end I just add the 10 feet back on to get the true height.

That is a phenomenal way to visualize it.

The shape of the data hasn't changed, just its location.

And what's fascinating here is that whatever mathematical transformation you apply to every single data point, the mean undergoes the exact same transformation.

So how does that actually play out in a real calculation?

Let's say you have a data set of 20 values.

You code them to make them smaller.

You subtract three from every single value, and then you multiply every value by two.

When you sum up those new coded values, you get 104.

Okay, let me walk through this.

If the sum of my coded values is 104, and I have 20 values in total, my coded mean is just 104 divided by 20, which is 5 .2.

That's your coded mean.

But how do you find the real mean of the original data?

I just play the tape in reverse.

I have to undo the coding, and I have to do it in reverse mathematical order.

The original code was multiply by two, then subtract three.

Wait, sorry, you said subtract three, then multiply by two.

Yes, subtract and multiply.

Okay, so to undo it on my mean of 5 .2, I first divide by two, which gives me 2 .6, and then I add three, which gives me 5 .6.

Wait, let me check my math.

Actually, the textbook example uses the formula 2x minus 3, meaning you multiply by two first, then subtract three.

Oh,

got it.

Okay, so undoing that, I start with 5 .2, I add three, which gives me 8 .2, and then I divide by two, which gives me 4 .1.

The real mean is 4 .1.

You've got it.

And this isn't just for abstract numbers, it applies to real world scenarios like adjusting salaries.

Imagine a school with four teachers, and their mean salary is $4 ,000 a month.

The school board decides to give everyone a 10 % raise, but then deducts a flat $50 administrative tax from everyone.

And because of coding, I don't need to recalculate four individual salaries, I just apply the raise and the tax directly to the mean.

Exactly.

So a 10 % raise on the $4 ,000 mean makes it $4 ,400.

Subtract the flat $50 tax, and the mean salary across the board is simply $400, $350.

That is so satisfying.

It saves an immense amount of time.

But let's shift our focus.

We've found the most frequent item with the mode, we've mathematically balanced the scales with the mean, but the mean has a weakness.

It is highly sensitive to extreme outliers.

Sometimes what we really want is just the physical middle of the pack.

Which brings us to our third measure, the median.

Yes.

The median is the value that sits dead center in an ordered set of data.

The formula to find its position is simple.

If you have n items, the median is located at the m plus one divided by two position.

Here's where it gets really interesting for me, because visualizing the median can be beautiful.

There's a tool called the back to back stem and leaf diagram.

It looks almost like abstract art.

It is a fantastic tool because it orders the data visually without losing the raw numbers.

Imagine we are tracking the hourly customer counts at a DIY store.

We track a Monday, which is open for 12 hours, and a Saturday, which is open for 15 hours.

The diagram has a central stem of numbers reading vertically down the middle, two, three, four.

These represent the tens column, 20s, 30s, 40s.

Then leaves branch out to the left for Monday's hours and to the right for Saturday's hours, representing the single units.

So if my central stem is a two and a leaf branching out to the right is a five that represents 25 customers during one hour on Saturday.

Now let's find the median for Monday.

We have 12 hours of data.

Using our formula, the median position is 12 plus one divided by two, which is 6 .5.

The 6 .5th position, meaning the true median is floating exactly halfway between the sixth number and the seventh number.

And because our stem and leaf diagram is already ordered from smallest to largest, we simply count the leaves.

The sixth leaf corresponds to 28 customers.

The seventh leaf corresponds to 30 customers.

The median is halfway between them.

28 plus 30 divided by two gives us a median of 29 customers.

That works beautifully for discrete, whole numbers.

But what if our data is completely continuous, like measuring the exact mass of hundreds of objects?

We can't put endless decimals onto a stem and leaf diagram.

For continuous group data, we use a cumulative frequency graph.

Instead of the n plus one divided by two formula, we simply locate the median at the n divided by two position.

You plot a smooth rising curve that accumulates all your frequencies.

I always visualize this like filling a swimming pool with data.

Your axis is the volume of the pool.

Let's say it holds 300 items total.

Your smooth cumulative curve is the shape of the pool filling up.

If you want the median, you just find the exact water line when the pool is precisely half full at 150 items.

You draw a horizontal line across the curve and drop straight down to the x -axis to read the median mass.

That is an excellent visualization.

Now, before we move on to how to choose between these three, I want to take a brief historical detour.

Because relying on a single representative value like a mean or a median is actually a surprisingly modern invention.

Wait, really?

It feels so foundational to the universe, I just assumed ancient mathematicians were using averages thousands of years ago.

You would think so, but before the 17th century, the arithmetic mean basically didn't exist in scientific practice.

In the 11th century, a brilliant Persian scholar named Al -Buruni was trying to calculate the longitude of different cities.

He had multiple different measurements that didn't agree.

To resolve this, he used what we call the mid -range.

He simply took his smallest measurement, his largest measurement, added them together and divided by two.

Wait, he just averaged the two extremes and ignored all the data in the middle?

That seems wildly fragile.

What if one measurement was just a terrible mistake?

It was fragile, but it was intuitive, and even Isaac Newton used the mid -range centuries later.

It wasn't until scientists started trying to measure the magnetic declination of true north that things changed.

What happened?

Compasses varied wildly,

navigators were getting different readings, and a bad reading meant sinking your ship.

The errors were so prevalent that scientists realized they couldn't just trust the extremes.

They had to assume that the truth lay buried beneath a cluster of That makes sense.

That is what forced the invention of the arithmetic mean, summing up every flawed measurement and dividing by the total to mathematically hunt for the true center.

That is wild.

They literally had to invent the mean to stop ships from crashing.

Okay,

so now we have our three tools.

The mode, the mean, and the median.

The ultimate question is, choosing your weapon.

How do you know which one to use?

It comes down to what you want the data to say.

And be warned, data can absolutely be to tell a very specific, manipulative story.

Yeah, there's this classic scenario from the text of a manipulative student.

Imagine a student takes ten tests over a semester, scored out of twenty points.

Their scores are a disaster at first.

Three, four, six, seven, eight, eleven, twelve, thirteen, already.

But then they miraculously pull off two seventeens at the end.

Let's calculate their three averages.

The most frequent score is seventeen, because it's the only score that appears twice.

The mean adding them all up and dividing by ten is nine point eight.

The median, the middle value between the fifth and sixth scores is nine point five.

I know exactly what this student is going to do.

They are going to run home to their parents and say, guess what?

My average test score this semester is a seventeen, and mathematically they aren't lying.

The mode is a measure of central tendency, but it completely hides the fact that they failed half their tests.

Which perfectly illustrates the pros and cons of each measure.

The mean is excellent for further statistical calculations because it democratically includes every single value.

But as we see, it is highly sensitive to extreme outliers.

If you have a room of nine people earning twenty thousand dollars a year and one billionaire walks in, the mean income of that room suddenly makes everyone look like a multi -millionaire.

Which is why we always hear about the median house price, right?

The median is a buffer.

It completely ignores those massive fifty million dollar mansions at the very top that distort the reality of what a normal house costs.

Precisely.

The median resists extreme outliers, which is its greatest strength.

But that also means it intentionally ignores the magnitude of most of the data, which is its weakness.

And the mode.

The mode is largely useless for deep mathematical analysis, but it's vital for commerce.

A shoemaker doesn't care about the mathematical mean shoe size.

Nobody wears a size eight point three shoe.

They just need the mode to know which single size is the most popular so they can manufacture more of it.

This all boils down to the physical shape of the data itself.

Skewness.

Yes.

If we connect this to the bigger picture, the shape of your data dictates how these three measures interact.

When data is perfectly symmetrical, like a beautiful, flawless bell curve,

the mode, the mean, and the median are all exactly the identical number, sitting perfectly balanced right in the center.

But real life is messy.

It's almost never a perfect bell curve.

It's usually skewed in one direction or another.

Let's examine a scenario with positive skew.

Picture a bar chart representing wealth.

The vast majority of the data is clustered tightly on the left side.

The smaller numbers, the normal incomes.

But there is a long, thin tail of data stretching far out to the right toward the massive numbers representing a handful of billionaires.

Okay, I have that image in my head.

A big mountain on the left fading out into a long tail on the right.

In a positively skewed distribution like this, the relationships between our three measures are locked in place.

The mode is the absolute peak of that mountain, so it is the smallest number.

The mean, because it mathematically has to account for the massive weight of those billionaires, gets pulled the furthest to the right, making it the highest number.

And the median sits quietly between them as a buffer.

So the rule is mode is less than median, which is less than mean.

Wow.

We have covered a massive amount of ground today.

So what does this all mean?

Our mission today was to extract the core insights from Chapter 2.

We've defined the mode, the mean and the median.

We've learned how to find them in tricky grouped frequency tables using frequency densities and midvalues.

We've used cumulative frequency graphs to literally find the waterline of our data.

We've decoded unwieldy data sets by shifting them down and multiplying them back up.

And we've learned how the shape of skewed data pulls these averages apart.

Most importantly, I hope you see that calculating an average is not just a blind mechanical process.

It is a choice.

It is the ultimate choice.

Understanding these three measures is your absolute best defense against bad data.

Whenever you see an average reported in the news or on a corporate spreadsheet,

you now have the tools to question exactly how it was calculated.

Are they pulling a fast one, like our student who claimed their average was a 17?

I want to leave you with a final thought to mull over.

The next time a politician or a

salesperson throws an impressively large or impressively small average number at you, don't just accept it.

Ask questions.

Ask yourself, are they giving you the mean, the median or the mode?

And more importantly, what are the other two numbers trying to tell you that they desperately want to hide?

That is a phenomenal question to end on.

I want to give a massive thank you to you, the listener, for trusting us to walk you through this material.

On behalf of the Last Minute Lecture Team, we wish you the absolute best of luck with your statistical studies.

May your calculations be insightful, may your data never be deceptively skewed, and may your hair never actually grow 944 kilometers long.

Good luck out there.

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

Chapter SummaryWhat this audio overview covers

Measures of central tendency serve as fundamental tools for summarizing and describing the essential characteristics of any dataset through single representative values. The mode identifies the most frequently occurring observation and proves particularly useful when analyzing categorical variables or determining which items appear most often in practice; for grouped data, the modal class represents the class interval containing the highest frequency density rather than a single value. The arithmetic mean, obtained by summing all observations and dividing by the total number of values, functions as the most commonly employed average because it incorporates all available information and enables subsequent statistical calculations, despite its vulnerability to distortion from extreme values at either end of the distribution. When working with grouped data where raw individual values are unavailable, mean estimation relies on substituting class midpoints weighted by their corresponding frequencies to approximate the true average. The median locates the middle value when observations are arranged in order, splitting the dataset into two equally-sized portions, and demonstrates considerable practical advantage in situations where datasets contain extreme outliers that would otherwise skew results. Data transformation through coding—involving operations such as subtracting a constant or applying a multiplicative adjustment—can reduce computational complexity, with the original mean readily recovered by systematically reversing the mathematical operations applied. The choice among these three measures depends critically on several interconnected factors including whether variables are categorical or numerical, the shape of the underlying distribution, and the specific analytical objectives guiding the research. Mode focuses on frequency but ignores most data points, mean enables advanced statistical procedures but responds strongly to outliers, while median maintains stability in the presence of extreme values yet provides clear interpretability. Distribution skewness, which measures asymmetry in how values spread around the center, reveals important patterns through the relative positioning of the three measures: positively skewed distributions show the mean exceeding the median exceeding the mode, whereas negatively skewed distributions reverse this relationship with the mean falling below both the median and mode, illustrating how data symmetry fundamentally influences which average best represents the dataset.

Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.

Support LML ♥

Chapter 2: Measures of Central Tendency

Related Chapters