Chapter 2: Back-of-the-Envelope Estimation

Search this chapter

Audio Overview

0:00 / 0:00

Autoplay next chapter

Welcome to Last Minute Lecture.

This free chapter overview is designed to help students review and understand key concepts.

These summaries supplement not replaced the original textbook and may not be redistributed or resold.

For complete coverage, always consult the official text.

Welcome back to the Deep Dive.

If you are building modern systems, I mean, whether you're a designer, an engineer, or even just prepping for that big interview, you know the most foundational skill isn't really writing code.

It's proving right at the start that your architecture can actually handle the load.

Exactly.

And our mission today is to sort of simplify the critical art of back of the envelope estimation, or BOE.

This is really the superpower that lets you figure out if a system is feasible in minutes, not, you know, months.

And we don't have to just guess.

We're pulling our insights directly from some key system

Our sources, they point to a definition from Google senior fellow, Jeff Dean, who said BOE calculations are basically estimates from thought experiments and common performance numbers.

They tell you which designs will actually meet the requirements.

It's essentially a pragmatic gut check.

But it's a gut check that's grounded in real quantitative data.

It stops you from sinking all this time into a really complex design, only to find out it, you know, it just collapses when it hits a million users because you misjudge the latency or the storage needed.

So we're basically giving you the ultimate cheat sheet today.

To master this, we're going to unpack the three core pillars you have to have in your mental toolkit.

First, the power of two for volume.

Second, critical latency numbers for speed.

And third,

availability metrics, or what we call the nines for reliability.

Okay, let's unpack that first one.

How we talk about data volume using the power of two.

When you're designing these big distributed systems, we aren't just dealing with kilobytes anymore.

We're talking about petabytes of data.

Why is this simple math concept so, so critical for scaling?

It's critical because all your large scale calculations, they have to be based on some consistent unit of measure.

And you just have to avoid miscalculating your unit by a power of 10.

I mean, we'll start with the basics.

A byte is eight bits, an ASCII character is one byte.

The confusion starts because computers use the power of two, but we humans, we like to round to the power of 10.

That's right.

So in the computer world, a kilobyte isn't exactly 1000 bytes.

Precisely.

We use two to the power of 10, which is 1024.

But for estimation, you know, for these quick calculations, we just round that down to 10 to the three to make the math easier in our heads.

So two to the 10 is roughly 1000, a kilobyte KB.

Then you jump up to two to the 20.

And you're at a million.

A megabyte.

Exactly.

And then the scale just starts exploding.

When we hit two to the 30, we're in the billions or a gigabyte GB.

This is like the size of a lot of phone plans these days or big documents.

And then move up to two to the 40 and you're at a trillion, a terabyte TB.

That's often the size of a high end desktop hard drive.

And then the big one, two to the 50 is a quadrillion, a petabyte PB, a quadrillion.

I mean, that number is so abstract.

What does 50 petabytes even look like in the real world?

It just means scale.

It's not personal storage anymore.

A petabyte of storage is massive.

It's the domain of large streaming services, huge data centers.

The real insight here is that every single jump in magnitude from a terabyte to a petabyte is a 1000 fold increase in data.

So when you're estimating, you need to instantly connect that magnitude, trillion, quadrillion with this unit terabyte, petabyte, just to know if your idea is even possible.

Okay.

So this is where it gets really interesting for me.

And frankly, kind of shocking, the We're looking at computer operations that span nine orders of magnitude, nine from fractions of a nanosecond to hundreds of milliseconds.

The difference here dictates almost every architectural decision you'll make.

What's fascinating is understanding the relative cost.

We're looking at updated numbers from 2020 and this disparity, it tells you exactly where your system is going to choke.

Let's start with the fastest stuff right inside the CPU.

Measured in nanoseconds.

So what is L1 cache and just how fast is it?

L1 cache is the absolute fastest, smallest and closest memory to the CPU core.

An L1 cache reference takes just one nanosecond.

One.

An L2 cache reference, it's a little further out, often shared by cores, is still incredibly fast at four nanoseconds.

So if L1 is basically instantaneous, what happens when the CPU has to, you know, leave its little bubble and go find data in the main memory or when it has to sync up operations?

We immediately jump in time.

A mutex lock and lock takes 17 nanoseconds.

And a mutex is just a tiny flag, right, that we flip to make sure two different threads don't try to mess with the same data at once.

And then a main memory reference, the CPU hitting your RAM, takes about 100 nanoseconds.

Wow.

Okay, so that's already 100 times slower than L1 just to get to the main memory.

Now let's jump into microseconds, which are a thousand nanoseconds.

This is where processing and network stuff happens, right?

Right.

And moving from nanoseconds to microseconds is a huge leap.

Compressing a 1KB chunk of data with a fast algorithm like Zippy takes about two microseconds.

Sending, say, 2 ,000 bytes over commodity network inside a data center is around 44 nanoseconds.

Still quick, but you're starting to see the time cost of leaving the chip itself.

And this next comparison is just so crucial for anyone designing a database.

Sequential versus random access.

Absolutely.

Reading one megabyte sequentially from your memory takes a mere three microseconds.

But doing a random read on an SSD, that jumps way up to 16 microseconds.

If you build a system that needs constant random lookups instead of batched sequential reads, you're just baking in a massive unnecessary time penalty.

And now we get to the really slow operations measured in milliseconds.

I'm a Torkovs.

That's a thousand microseconds or a million nanoseconds.

What are we trying to avoid at all costs here?

Well, traditional storage actions and anything that involves a lot of physical distance.

A simple disk seek, does the actual mechanical arm moving on an old hard drive, takes about two milliseconds.

That one action is 150 ,000 times slower than an L1 cache hit.

150 ,000 times.

And the slowest operation on the list.

If you have a request that has to make a round trip across a continent, say California to the Netherlands,

you're looking at a staggering 150 milliseconds of latency.

And that's just because of the speed of light.

That's the physical barrier.

Takes that long for the signal to travel fiber optic cables.

150 milliseconds versus one nanosecond.

That's the nine orders of magnitude.

So connecting all these dots, what are the four key conclusions every system designer has to take away from these latency figures?

First and foremost,

memory is fast.

Disk is agonizingly slow.

Second, you have to avoid disk seeks if you possibly can.

Their latency is crippling.

And third,

since simple compression is so fast, just a few microseconds, you should always compress data before you send it over a slow, expensive network.

And the fourth conclusion, that's all about global scale.

Right.

Data centers are often in different geographical regions, and that latency between them is significant.

That 150 millisecond delay for a cross -continent trip, it heavily impacts your consistency model, your failover strategies.

It basically dictates whether you can even afford real -time sync between data centers.

Okay, so moving from speed to reliability.

High availability, or HA.

We know HA is about a system being operational, but how do we actually quantify that?

We quantify it as a percentage of uptime, and that percentage is almost always part of the service level agreement, the SLA.

That's the formal contract that defines the minimum uptime a customer can expect.

And getting higher HA percentages, well, that usually requires exponential effort and cost.

And we measure uptime in nines, so let's make this really concrete for you listening.

Let's focus on the downtime over a year.

Why is even one missing nine so expensive?

Well, if you promise 99 % availability, that's just two nines, that translates to 7 .31 hours of downtime a month.

Or over a year, that's 3 .65 days of downtime.

And for any critical service, being down for three and a half days is just, it's unacceptable.

So when a service promises 99 .9 % uptime three nines,

what does that actually buy them?

Three nines brings that monthly downtime to just 43 .83 minutes.

That's a massive improvement.

And it's pretty much the common minimum for most cloud providers.

It requires really good monitoring and a fast incident response.

But the gold standard for, say, mission critical systems or financial data is five nines, 99 .999%.

Five nines is incredibly difficult to achieve, and it's expensive.

It means your system can only be down for 26 .3 seconds per month.

To go from 43 minutes of downtime at three nines to just 26 seconds at five nines, you need deep redundancy, things like hot spares and critically geographic replication to handle regional audages.

The cost isn't just in the architecture.

It's the whole infrastructure you need to guarantee that level of resilience.

Okay.

We've built the foundation.

Power of two for volume, latency for speed, nines for reliability.

Now let's bring it all together.

Let's use the provided example to estimate QPS queries per second and storage for a simplified Twitter -like service.

Right.

So to ground this experiment, we have to start with some clear assumptions.

Our source material gives us five key points.

We have 300 million monthly active users, or MAU.

We assume 50 % of them use this service every day.

They post, on average, two tweets per day.

10 % of those tweets have media.

And finally, we need to store all this data for five years.

Okay.

Let's pause on that second assumption.

50 % daily active users.

I mean, that feels a bit high for a platform like Twitter, doesn't it?

Why start with such an aggressive number?

That's a great question, and it really highlights why you have to list your assumptions out.

We use 50 % because for a high engagement platform, that's a good starting point for a growth model.

But if we were designing for something with lower engagement, maybe 30%, our whole QPS estimate would drop right away.

But sticking with our current 50 % assumption, let's calculate QPS.

So first, daily active users, DAU.

300 million MAU times 50 % gives us 150 million DAU.

Next, the average tweets QPS.

That's the total number of tweets per day.

150 million users times two tweets each, divided by the total seconds in a day, which is 86 ,400.

And that calculation gives us roughly 3 ,500 average QPS.

And because user activity spikes during big events or morning commutes, we have to account for peak load.

Exactly.

We usually estimate peak QPS by assuming it's double the average.

That's a pretty common rule of thumb for social media.

So our peak QPS estimate is around 7 ,000 queries per second.

And that one number, it immediately tells us how many application servers and database connections we're going to need to provision.

Now for the big one, media storage estimation.

We need to figure out the total storage for five years.

We noted the component sizes, text is tiny, 140 bytes, but media is the real bottleneck at an assumed one meter beat per piece.

And because the media is so much larger, we really only need to calculate the media storage.

Every day, we have 150 million daily users times two tweets, and 10 % of those have media.

That means we're generating 30 million new media items every single day.

Multiply 30 million items by one meter beat each, and that totals 30 TB of new storage needed per day.

30 terabytes added every single day, and then you extrapolate that over five years.

The calculation is 30 TB times 360 ,000 days a year times five years.

That totals approximately 55 petabytes.

55 PB.

And that massive number confirms that we need a specialized, highly scalable, probably distributed storage system like object storage, not a single giant database.

And the insight here is so crucial.

The final answer, 55 PB, is actually less important than the methodical process you used to get there.

It shows the feasibility and it reveals the biggest constraints right away.

Media storage and peak QPS.

Now that we've walked through the whole process, let's just wrap up with some essential practical advice for doing these estimations efficiently.

These tips save you time and, well, they prove your thought process.

The first tip is rounding and approximation.

Remember, precision is not the goal here.

Speed is.

You should use round numbers to simplify things.

For example, if you see a calculation like 34 ,987 divided by 71, don't hesitate, just mentally change it to 35 ,000 divided by 70.

That instantly gives you 500.

It proves you understand the magnitude, and that's what really matters.

Dramatically reduces your cognitive load, so you can focus on the actual architecture.

What's the second tip?

Write down your assumptions.

We started the Twitter example with five assumptions.

If someone challenges your estimate later, you can immediately point to and maybe even adjust your starting points.

It just shows transparency.

You know, if storage costs plummet, you might change the five -year storage plan to 10 years, and you know exactly what number to change in your calculation.

And the third tip seems so simple, but it's the cause of so many errors.

It is, and that is to label your units.

Always specify your units.

Is it five megabat or five gigabat?

Imagine you're calculating data usage, and you realize you just wrote down 50 earlier.

Does that mean 50 megabytes or 50 gigabytes?

If you're off by a factor of a thousand because you forgot a label, your whole server plan could be completely wrong.

Fantastic.

So those three tips make your estimation fast, repeatable, and defensible.

And for you listening, just remember the most common estimations focus on QPS, peak QPS, total storage, your required cache size, and ultimately the number of servers you'll need.

Those are the five numbers you have to be able to calculate in your sleep.

This has been an essential deep dive into system design estimation.

We defined the back of the envelope approach.

We mastered the power of two scale from kilobyte all the way up to petabyte.

And we analyzed the critical latency numbers, revealing why accessing memory is often a million times faster than making a network call.

We also crystallized the value of system resilience with the nines, understanding the huge effort needed to go from three nines to five nines of availability.

And we applied all of it to successfully estimate the QPS and that 55 petabyte storage requirement for a sample social media service.

And all this data about speed and latency, it leads to one final important thought for you to consider.

Given the immense performance cost of cross -continent latency, that 150 milliseconds versus the goal of five nines, which is just 26 seconds of downtime a month,

how does the necessity of geographical redundancy for high availability for system architects into these fundamental trade -offs between data consistency and overall performance?

Think about that trade -off.

Thank you for joining us for this essential deep dive into back -of -the -envelope estimation.

We'll see you next time.

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

Chapter SummaryWhat this audio overview covers

Back-of-the-envelope estimation is a critical problem-solving technique that enables engineers to rapidly evaluate whether a proposed system architecture can satisfy operational demands by combining reasoned assumptions with established performance benchmarks. Mastery of this skill requires deep familiarity with scalability fundamentals, beginning with proficiency in powers of two to accurately calculate data storage requirements across the spectrum from kilobytes through petabytes, essential when designing distributed systems that process massive datasets. Understanding latency measurements across different computational operations reveals dramatic performance disparities: accessing data from an L1 cache requires approximately 0.5 nanoseconds, while retrieving data from a disk drive demands approximately 10 milliseconds, a difference of millions of times. These quantitative comparisons yield actionable design insights: memory operations are vastly superior in speed compared to disk operations, making it imperative to minimize disk access patterns wherever feasible; additionally, lightweight compression algorithms execute quickly and should precede network transmission to minimize bandwidth consumption. High availability represents a critical system characteristic measuring a platform's capacity to remain accessible and functional continuously, expressed as a percentage and typically formalized through Service Level Agreements with infrastructure providers targeting 99.9 percent uptime or higher. The "nines" notation describes this availability quantitatively, where each additional nine represents substantially reduced permitted downtime over annual cycles. Practical application of estimation methodology appears in real-world scenarios such as calculating queries processed per second for high-traffic applications or projecting storage infrastructure needs over multiyear periods, which might reach 55 petabytes under demanding conditions. Effective estimation practice demands that candidates explicitly document all underlying assumptions, use precise unit labeling to prevent calculation errors, and employ rounding strategies to streamline mathematical operations, prioritizing sound analytical reasoning over computational exactitude.

Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.

Support LML ♥

Chapter 2: Back-of-the-Envelope Estimation

Related Chapters