Chapter 8: Sorting in Linear Time: Counting, Radix, and Bucket Sort

Search this chapter

Audio Overview

0:00 / 0:00

Autoplay next chapter

Welcome to Last Minute Lecture.

This free chapter overview is designed to help students review and understand key concepts.

These summaries supplement not replaced the original textbook and may not be redistributed or resold.

For complete coverage, always consult the official text.

Alright welcome back to the Deep Dive.

Today we're taking a look at something that I think is really, really fascinating and that is linear time sorting.

You know you've got your classic sorting algorithms that you learn in your intro CS courses, you know like merge sort, quick sort, heap sort.

Those are all like, you know, they're your workhorses, they're always around that o n log n time.

Yeah, and that's actually a fundamental limit for algorithms that are only allowed to compare elements.

If you're just asking is this element bigger or smaller than that element, there's a hard limit you can't get any faster than o n log n in the worst case.

But if you know more about the data you're working with, there are some pretty neat tricks you can use to get around that limit.

So like, what if we know something special about the data?

What if we know it's integers?

Or what if we know it's all within a certain range?

That's where we get into the realm of linear time sorting algorithms.

These algorithms can actually achieve an o n time complexity under the right conditions.

Wow, okay.

So like for our deep dive today, we're going to explore three of these algorithms counting sort, radix sort and bucket sort.

And we'll see how they work their magic and why they can sometimes be so much faster than the usual suspects.

Cool.

So let's start with counting sort.

What's the like the core idea behind counting sort?

Yeah, counting sort is super elegant, but it does have a key assumption.

The input has to be integers within a known range.

So let's say from zero up to some value k.

Okay, so like if we're sorting test scores, we know all the scores have to fall between zero and 100.

Exactly.

So let's say your input array is called A.

What you do is you create a second array.

Think of it as a scorecard.

We'll call it C.

And the size of this array, C, is k plus one.

So if our highest possible score is 100, our scorecard would have 101 slots from zero to 100.

And we initialize every slot in the scorecard to zero.

Right.

So we've got a blank scorecard with a slot for every possible score.

What's next?

Okay.

So then you go through your input array, A, which is your list of scores.

And for every score, let's say the score is 85.

You go to the 85th slot in your scorecard, C, and you increment that count.

So if that slot was zero before, it now becomes one.

If you see another 85, you increment it again.

So you just go through all the scores, and the scorecard ends up telling you how many times each score appears in the list.

Exactly.

And then comes a cool part.

You go back to your scorecard, C, but this time you start at the second slot, index one.

And for each slot, you add the value from the previous slot to its current value.

What you're doing is creating a running sum.

So by the end of this, the scorecard is no longer just telling us how many times each score appears, but something more like how many scores are less than or equal to each score.

Precisely.

So if you look at the 85th slot and it has the value 30, it means there are 30 scores in total that are 85 or lower.

I see.

So the scorecard is kind of telling us where the last instance of each score should end up in the final sorted list.

Exactly.

And that's the key to quickly building the sorted list.

Now you create a new array.

Let's call it B, which is where you'll build the sorted output.

It's the same size as your original array A.

Okay.

So a blank array to hold the sorted scores.

Right.

And now you go back to your original array A, your list of scores, but, and this is crucial, so you process the scores in reverse order.

In reverse order.

Interesting.

Yeah.

So if your last score in A was, let's say 78, you look at the 78th slot on your updated scorecard C, which tells us how many scores are less than or equal to 78.

Let's say it says 25.

So there are 25 scores less than or equal to 78, meaning our current score, 78, should go in the 25th position index 24 in our output array B.

And after we place it there, we decrement the count in the 78th slot of our scorecard.

Okay.

So that's why we work backwards in decrement to handle duplicates correctly.

Exactly.

Imagine you had two 78s in the original list.

By going in reverse and decrementing, you ensure that the one that appeared later in the original list gets placed later in the sorted output as well.

It's like a queue, first one in, first one out.

That's really clever.

And it sounds like it also keeps the sorting algorithm stable, meaning if you have two items that are equal, they stay in the same relative order as in the original list.

Yes.

Stability is important for a lot of reasons, especially when we talk about erratic sort later, which uses counting sort as a subroutine.

All right.

So big picture.

What's the runtime for counting sort?

We go through the input list once to create the counts, then we go through the scorecard to update it with running sums, and finally we go through the input list again to build the sorted output.

So it's N plus K time.

So if K, our range of values, is relatively small, it's basically linear time, O -N.

Exactly.

And that's the magic.

If K is not significantly larger than N, then counting sort is incredibly efficient.

It's like it's cheating the system, not comparing anything, just counting and placing.

Right.

It's a fundamentally different approach than comparison -based sorting.

Now, the original source material also mentions a couple of interesting challenges, like how to modify counting sort to work directly on the input array without using extra space, what we call in -place sorting.

Yeah, and also what happens if you have, like, numbers with fractions, not just integers.

Right.

Those are great exercises to think about if you want to go deeper.

But for now, let's move on to erratic sort.

This one's also really neat.

OK.

So erratic sort, I always picture those old mechanical card sorting machines.

Yeah, it's the same principle.

Those machines sorted cards based on individual columns representing digits.

So it's like we're sorting numbers digit by digit.

But the order in which erratic sort does this always seemed a bit backwards to me.

You're right.

It actually starts with the least significant digit, the one's place, and then works its way up to more significant digits, like tens, hundreds, and so on.

Why does it do that?

Wouldn't starting with the most significant digit get us closer to the final sorted order faster?

Well, think about it.

If you sort by the most significant digit first, you'd get groups of numbers with the same leading digit.

Then you'd have to sort those groups based on the next digit and so on.

You'd end up with this nested sorting.

Erratic sort avoids that by starting with the least significant digit.

It's kind of counterintuitive, but it works because for each digit position, you use a stable sorting algorithm.

Like counting sort.

So let me see if I get this.

We sort based on the unit's digit first, making sure to keep the order of numbers with the same unit's digit the same.

Then we sort that list by the tens digit, keeping the order of those with the same tens digit the same, and so on.

Precisely.

And by the time you get to the most significant digit, the whole list is sorted.

Think about it like sorting dates.

Oh yeah.

Like year, month, and day.

If you first sorted all the dates by day, the least significant part, then you sort by month, making sure the days stay in the same order within each month, and finally you sort by year, keeping the months and days in order, you'd end up with a perfectly sorted list of dates.

That makes so much sense.

So looking at the source, the algorithm just loops through each digit position, starting with the least significant and moving to the most significant, and in each loop, it uses a stable sorting algorithm to sort the whole array based on that digit.

Exactly.

And if we have N numbers, and each number has at most D digits, and each digit can take on up to K possible values, like zero to nine, and if our stable sort takes N plus K time for each digit, like counting sort, then the total time complexity is D times N plus K.

And if the number of digits D is relatively small and K isn't too large compared to N, then radix sort also achieves that linear time complexity, O N.

Right.

And we can also look at this in terms of bits.

If we have N numbers that are each B bits long, we can process them in groups of R bits at a time.

So we'll have about B divided by R passes.

In each pass, we're sorting based on an R bit digit, which can have up to two to the power of R possible values.

If we use counting sort for each pass, the total time complexity becomes proportional to B divided by R times N plus two to the power of R.

So there's a trade -off here.

If we increase R, we have fewer passes, but the range of values in each pass gets bigger.

Yeah.

The optimal choice of R depends on the specific values of N and B.

Like, if B is not too large compared to log base two of N, then we might even choose R equal to B, basically treating each number as one big digit.

But if B is much larger, then choosing R to be about the logarithm of N is usually a good idea.

OK.

That's a lot of variables to consider.

So pragmatically speaking, when is radix sort the right tool to use compared to something like quicksort?

Yeah.

That's a good question.

I mean, radix sort can be faster in some situations, for sure.

Right.

But we have to remember that big O notation doesn't tell the whole story.

Radix sort might do fewer comparisons overall, but each pass can have overhead.

We also need to think about how well each algorithm uses memory caches, which can make a big difference in real -world performance.

Plus, some applications need the sorting algorithm to work directly on the input array in place without using extra memory.

The version of radix sort that uses counting sort isn't naturally in place, whereas some

comparison sorts can be.

So it's really a case -by -case thing.

So it's not a clear win for radix sort every time.

It depends on the data and the hardware.

All right.

Let's move on to our final linear time sorting algorithm,

bucket sort.

All right.

So bucket sort is interesting.

This one relies on a slightly different assumption.

It works best when you can assume your input numbers are uniformly distributed over a specific range.

So usually the interval from zero up to one, but not including one.

Uniformly distributed.

Yeah.

It means the numbers are kind of spread out evenly across that range, not clustered together.

It's like if you were to pick random numbers between zero and one, each number is equally likely to be chosen.

OK.

I get it.

So what does bucket sort actually do with these numbers?

So you take this interval from zero to one and divide it up into n equal size buckets where n is the number of numbers you're trying to sort.

Then you go through each number in your input and put it into the bucket that corresponds to its value.

So a number like 0 .15 would go in one of the early buckets and a number like 0 .88 would be in one of the later buckets.

Exactly.

And because we're assuming a uniform distribution,

ideally each bucket should end up with only a small number of elements.

Once all the numbers are in their buckets, you sort the numbers within each bucket.

Since the buckets are small, we could just use a simple sort like insertion sort, right?

Right.

Insertion sort is usually a good choice here because it's pretty efficient for small lists.

And then once all the buckets are sorted, you just concatenate them, take all the numbers from the first bucket, then all the numbers from the second bucket, and so on.

And you have your sorted list.

OK.

That's pretty clever.

And it sounds like it would be fast if the numbers are indeed spread out evenly.

Yeah.

And the source material lays out the steps pretty clearly.

First, create an array of n empty linked lists.

Those are your buckets.

OK.

Then, for each number in your input array, calculate which bucket it belongs to.

You usually do this by multiplying the number by n and taking the floor of the result, and then insert that number into the corresponding bucket.

OK.

So far, so good.

After all the numbers are in buckets, you sort each bucket using insertion sort, and then you concatenate the sorted lists from all the buckets to get the final sorted array.

So why does this work?

Like, how do we know the numbers will end up in the right order overall?

Well, if you have two numbers, ai and a, and ai is less than or equal to aj, then ai will always end up in a bucket that's either the same as or earlier than the bucket for aj.

OK.

And if they end up in the same bucket, the insertion sort will put them in the right order within that bucket.

And if they're in different buckets, then concatenating the buckets in order will keep them in the right order overall.

That makes sense.

So what about the runtime?

The average case running time for bucket sort, assuming a uniform distribution of the input is onn, which is linear.

Yeah.

Now, even though insertion sort has a worst case quadratic time complexity,

we expect the buckets to be small on average.

So the total time spent sorting within all the buckets averages out to be proportional to n.

The actual analysis uses some probability and expected values, but it shows that the expected number of elements in each bucket is constant.

That's pretty cool.

And it sounds like even if the data isn't perfectly uniformly distributed,

buckets sort can still be efficient as long as the distribution isn't too skewed so the buckets don't get too big.

However, in the worst case, if all the input numbers happen to fall into the same bucket and we're using insertion sort, then bucket sort could take onn sub time time, which is not great.

Ouch.

But the source mentions that we could potentially use a different sorting algorithm within the buckets, something like merge sort or quick sort, which have an onn log n worst case time to avoid that worst case scenario, right?

Yeah, that's a good strategy.

That way, even in the worst case, bucket sort would be onn log n, and it would still be linear on average for uniformly distributed data.

So it's like a way to hedge your bets a little bit.

Right, exactly.

It makes bucket sort more robust and practical.

OK, so to recap, we've gone over three super interesting linear time sorting algorithms,

counting sort, which works well when we know the range of the input values,

radix sort, which sorts digit by digit using a stable sorting algorithm, and bucket sort, which is great for uniformly distributed data.

Yeah, and the key takeaway here is that these algorithms work by making certain assumptions about the data and by using techniques that are different from the usual comparison -based sorting.

This lets them bypass that onn log n lower bound that we hit when we're just comparing elements.

And as we've seen, choosing the right sorting algorithm really depends on the characteristics of your data, things like the range, the distribution, the data type, integers, flows, strings, and even the hardware you're running it on.

So, for our listeners out there, I challenge you to think about the data you encounter in your work or your projects.

Are there cases where counting sort, radix sort, or bucket sort might be a good fit?

Could you potentially leverage these linear time algorithms to get a performance boost compared to always using the general purpose comparison sorts?

Think about those real -world scenarios where these specialized algorithms can really shine.

Excellent food for thought.

Thanks for joining us on this deep dive into linear time sorting.

Until next time.

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

Chapter SummaryWhat this audio overview covers

Non-comparison sorting algorithms bypass the fundamental Ω(n log n) lower bound that constrains comparison-based methods by exploiting structural properties of input data. The theoretical foundation for this distinction emerges from decision tree analysis: any algorithm relying exclusively on comparisons can be represented as a binary tree containing at least n! leaves corresponding to possible input permutations, and the minimum height of such a tree is Ω(n log n), proving that merge sort and heapsort are asymptotically optimal within their model but cannot exceed this performance ceiling. Counting sort operates under the assumption that input consists of integers bounded by a range from zero to k, building an auxiliary array that records element frequency, computing cumulative counts to determine final positions, and directly placing each element in its sorted location without any comparisons—yielding Θ(n + k) time complexity with the valuable property of stability, which maintains the relative sequence of equivalent values and makes the algorithm particularly efficient when k scales linearly with n. Radix sort generalizes this principle to handle multi-digit numbers by repeatedly applying a stable sorting routine across individual digit positions, proceeding from least significant to most significant digit, completing in Θ(d(n + k)) time where d denotes the number of digits, and achieving truly linear performance when both d and k remain constant relative to n; the flexibility to choose radix size r provides a practical tuning mechanism for optimizing wall-clock performance. Bucket sort adopts a fundamentally different strategy grounded in probabilistic analysis: it assumes input values follow a uniform distribution across the interval [0, 1), partitions the dataset into n equally sized buckets, uses insertion sort within each bucket, and concatenates results in order—delivering O(n) expected time because uniformly distributed inputs naturally create small average bucket populations, though adversarial inputs concentrating all elements in a single bucket degrade performance to Θ(n²) in the worst case. Collectively, these three algorithms demonstrate that circumventing comparison-based limitations requires making explicit assumptions about input structure, whether through bounded integer ranges, digit representations, or distributional properties, enabling asymptotically superior sorting performance under appropriate conditions.

Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.

Support LML ♥

Chapter 8: Sorting in Linear Time: Counting, Radix, and Bucket Sort

Related Chapters