Chapter 17: Cloud Computing – Architectural Principles

Search this chapter

Audio Overview

0:00 / 0:00

Autoplay next chapter

Welcome to Last Minute Lecture.

This free chapter overview is designed to help students review and understand key concepts.

These summaries supplement not replaced the original textbook and may not be redistributed or resold.

For complete coverage, always consult the official text.

Welcome back to the Deep Dive.

Today we're really immersing ourselves in the architecture of and distributed computing,

specifically how architects leverage these massive infrastructure services.

Yeah.

And it's all about building systems that don't just tolerate failure.

They almost expect it, right?

Exactly.

It really changes how you think about design from the ground up.

It does.

You know, you can't assume things will just work anymore.

You have to design around the idea that components will fail often at the, well, the worst possible moment.

Which brings us to that classic,

maybe slightly terrifying quote from Leslie Lamport.

Oh, the famous one.

A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable.

That really nails it, doesn't it?

It's like the architect's mantra for the cloud.

It forces you to be completely committed to resilience.

No other way.

So before we dive deep into those resilience patterns, maybe let's just quickly set the What do we mean by the cloud here?

Good idea.

So we're focusing on those core characteristics, resources available on demand,

elasticity, that ability to grow or shrink capacity automatically.

Self -provisioning and the big one pay only for what you use.

Right.

And for this discussion, we're mostly looking at infrastructure as a service, IES, specifically public clouds.

Why that focus?

Well, from an architect's point of view, using those basic infrastructure services, the technical design challenges are pretty similar, whether it's public, private or hybrid.

The differences are often more about management or policy.

Okay, got it.

So let's unpack this to really grasp the failure aspect.

We have to talk about the physical scale first.

It's not just a few servers, is it?

Oh, far from it.

Think data centers with tens of thousands of devices, often closer to 100 ,000 than 50 ,000.

Wow.

And what's really interesting is the physical limit isn't usually space.

It's more basic physics.

How much electrical power can you actually draw?

And how do you dissipate the incredible amount of heat all that gear generates?

That kind of scale immediately makes you think about redundancy, spreading things out geographically.

Which leads to regions, right?

Correct.

A cloud region serves two main purposes.

It's logical putting services physically closer to users reduces network delay.

Makes sense.

But it's also physical, often driven by regulations,

like GDPR needing data to stay within certain borders, for example.

And then within those regions,

providers drill down further into availability zones, AZs.

What's the thinking behind AZs?

So availability zones are groups of data centers, but they're physically separate within a region.

They often have different power sources, different network providers.

The whole point is to make the probability of two AZs failing at the exact same time extremely low.

That's your fundamental building block for keeping applications up and running.

So this infrastructure is massive.

It's designed for failure.

It's spread out globally.

How do I actually grab my slice of it, say, a new virtual machine, a VM?

That's where the management gateway comes in.

Think of it as your main control panel for all your cloud stuff.

Right.

When you ask for a new VM, you have to give it three key pieces of information.

Which region you want it in, the instance type, basically how much CPU and memory you need, and the ID of the specific VM image you want to run.

So the management gateway is like the resource allocator.

It finds the spot for my VM.

Exactly.

It looks for a hypervisor.

That's the software layer on a physical machine that has enough spare CPU and memory.

It tells that hypervisor, hey, create this VM.

And then crucially, it gives you back the new VM's IP address and its host name.

And just as importantly, it adds that info straight into the cloud's DNS system so it's immediately findable.

Okay, so now we've got our little VM running in this giant system.

And this is where it gets, as you said, really interesting.

Confronting the reality that failure isn't an if, it's a win.

Constantly.

It's happening all the time.

Amazon, for instance, has shared numbers.

In a data center with around 64 ,000 computers, they expect something like five computers and maybe 17 disk drives to fail every single day.

Every day.

That's quite a lot.

It really puts it in perspective.

Architects have to treat every single component as fundamentally temporary.

Assume it could disappear at any moment.

Okay, so one common tactic for dealing with this availability -wise is using timeouts.

You know, if I don't hear back in, say, five seconds, I assume it failed.

But the sources highlight a big problem with that.

Yeah, timeouts are, well, they're incredibly blunt tools.

They just can't tell you why you didn't get a response.

Was it a complete crash?

A broken network link?

Or was the server just a bit slow and missed the deadline by milliseconds?

So you get false positives.

You think something's broken when it's just momentarily overloaded.

Exactly.

And if your system is designed to react to a timeout by doing something expensive, like spinning up a whole new VM.

Which takes time and resources.

Precisely.

Starting a new VM isn't instant.

It can take minutes.

So you can't really tune your system to trigger that kind of recovery on just one single missed response.

It's too risky, too costly.

So what do designers do?

Often, they'll set things up to only react after multiple missed responses over a longer period.

You have to be more forgiving, especially if you're communicating over slower or less reliable networks like, you know, across the internet or over cellular links.

Otherwise, you just end up in these expensive, unnecessary recovery cycles.

Right.

And even when things aren't outright failing, there's this other sneaky problem,

long tail latency.

Can you unpack that a bit?

Sure.

Long tail latency is about the distribution of response times.

Imagine you measure the time it takes to launch, say, 1 ,000 VM instances.

Most might launch quickly.

The peak might be 22 seconds, the median maybe 23 seconds.

But the average time might be higher, say 28 seconds.

And then you look at the tail end, the 95th percentile, that could jump way up to maybe 57 seconds.

So a small percentage, like 5%, are taking way, way longer, two, five, maybe even 10 times longer than most.

Exactly.

And that can kill user experience.

The frustrating part is, often the cause isn't even in your application code.

It might be congestion elsewhere in the system, like the hypervisor being busy, something totally outside your direct control.

So if you can't fix the root cause, you have to architect around the symptom.

You got it.

Two main tactics here.

First one is called hedged requests.

Hedged.

What does that mean?

It means you actually make more requests than you strictly need.

Say you need 10 responses from a microservice.

You might actually fire off 11 requests simultaneously.

Okay.

Why?

Because you take the first 10 that come back quickly and you simply cancel or ignore the slowest one.

Ah, I see.

But wait, doesn't that automatically mean you're using more network bandwidth, more resources?

You're deliberately over -requesting.

It absolutely does.

That's the trade -off.

You're sacrificing some efficiency, some cost to gain predictable, low latency.

For systems where speed is critical, that trade -off is often worth it.

Sometimes even mandatory.

Okay.

Makes sense.

What's the second tactic?

You said it was slightly less aggressive?

Right.

That's alternative requests.

Instead of sending all the extra requests immediately, you send the initial set you need.

Then, only if they don't complete within a fairly short time window, you fire off a secondary, alternative set of requests.

So it's like a fallback.

Kind of.

It tries to limit the resource compared to hedging all the time, but still gives you a quick alternative path if some of your initial requests get stuck in that long latency tail.

Okay.

Shifting gears slightly.

Moving from handling delays to handling overall load.

This brings us to scaling.

Vertical versus horizontal.

Yep.

Vertical scaling or scaling up is conceptually simple.

You just run your service on a bigger, more powerful VM instance.

More CPU, more RAM.

It's a resource change, doesn't usually require redesigning your software.

And horizontal scaling?

That's scaling out.

This is where the real cloud architecture patterns come in.

It means running multiple identical copies of your service instance.

And, crucially, you need something in front to spread the traffic across them.

The load balancer.

Exactly.

This is a fundamental design change, not just a resource tweak.

So the load balancer's main job is just distributing requests.

Right.

Preventing any single instance from getting overwhelmed.

That's the core function.

It sits in front and intercepts all the incoming client requests.

It might use a simple algorithm like round robin send request one to instance A, request two to instance B, request three to instance A again, and so on.

But the load balancer is also key for availability.

For hiding failures.

How does it know if an instance behind it is actually healthy and working?

Especially since the response, the actual data, usually goes straight back from the

client, bypassing the load balancer on the way out.

Good point.

That's why load balancers rely on health checks.

These are separate periodic checks the load balancer makes directly to each instance.

It might be a simple network ping trying to establish a TCP connection.

Or even calling a specific health check API endpoint on the service.

If an instance fails too many of these health checks in a row, the load balancer marks it as unhealthy.

And then pulls it out of the rotation.

Stops sending it new traffic.

Immediately.

But, and this is the clever part, it doesn't just forget about it.

Ah, the recovery cycle.

Right.

It keeps periodically pinging that unhealthy instance with health checks.

If the instance recovers, maybe it was just temporarily overloaded, its internal queue drains, and it starts responding properly again.

The load balancer sees that.

And brings it back online.

Marks it as healthy and smoothly adds it back into the pool of available instances.

This whole cycle makes individual instance failures practically invisible to the clients.

It dramatically improves availability without needing anyone to manually intervene.

Okay, let's move from the network and infrastructure view more towards the application logic itself.

We need to talk about state management.

What exactly is state in this context?

State is basically any information internal to a service instance that influences how it responds to future requests.

Think session data, user preferences stored temporarily, that kind of thing.

And the rule here is?

The architectural mandate, the common wisdom, is really strive to make your services stateless.

This means the service itself doesn't hold on to any critical history between requests.

If information is needed across requests, it's stored externally, usually in a shared database or cache that all instances can access.

And why?

Why is statelessness so critical?

Does it tie back to that Lamport quote, the inevitability of failure?

Absolutely.

It's probably the single most important defense against failure.

If a stateless instance crashes, who cares?

Right.

Any other instance or even a brand new instance spun up by the autoscaler can immediately take over its workload because all the necessary history, the state, is safely stored elsewhere.

It makes recovery almost instantaneous, and adding new instances behind the load balancer becomes completely seamless.

Okay, another coordination challenge.

Time.

Seems basic, but getting lots of distributed machines to agree on the exact time is surprisingly hard, right?

Clock's drift, network latency.

It's a nightmare.

Even though cloud providers use incredibly precise atomic clocks as references deep in their infrastructure,

that latency between their clock and your VM means you can never guarantee perfect synchronization across a distributed system.

By the time a timestamp reaches you, it's already slightly out of date.

So what do architects do?

Give up on time.

Not exactly give up, but often they realize that precise wall clock time isn't actually the most critical thing.

In many systems, like say, processing financial transactions, what matters more is the order in which events happened, not the exact nanosecond they occurred.

Consistent ordering is key.

Which leads to things like vector clocks.

Precisely.

Vector clocks aren't physical clocks, they're a logical concept, essentially counters passed along with messages that help trace the causal sequence of actions as they ripple through the distributed system.

By ensuring all parts of the system process events in the same logical order, you avoid inconsistencies without needing perfectly synchronized physical clocks.

Okay, that makes sense.

Now, arguably the trickiest coordination problem, data coordination.

If I have two stateless instances, maybe they both try to update the same piece of critical data at the same time like a customer's account balance.

That's a race condition, isn't it?

That's the classic critical problem.

And the traditional ways of handling this in single databases, like using a two -phase commit lock, are just incredibly fragile and dangerous in a distributed cloud environment.

Why dangerous?

Well, first, the messages involved in the lock protocol can fail because of network issues.

But the real disaster scenario is if a service instance successfully acquires a lock on a resource and then crashes before it releases the lock.

The resource stays locked forever.

The whole system grinds to a halt.

Potentially, yes.

It's a huge risk, which leads to probably the most important piece of advice in this whole area, which is don't try and invent your own solution for distributed coordination or consensus.

Just don't.

Seriously, it's that hard.

It's notoriously difficult.

Algorithms like Paxos or Raft that solve this are incredibly complex and subtle.

Very easy to get wrong.

The strong recommendation for architects is use proven, off -the -shelf infrastructure components that handle this for you.

Tools like Apache ZooKeeper, Consul, etc.

They provide reliable consensus mechanisms to manage things like distributed locks safely.

Rely on them.

Okay, solid advice.

That brings us nicely to our final infrastructure piece, autoscaling.

This really feels like where the cloud's promise of elasticity pays off.

Absolutely.

Autoscaling is what lets you handle wildly fluctuating workloads without having to pay for idle capacity all the time.

You automatically add instances when load spikes, and you automatically remove them when demand drops.

It's all about efficiency and cost savings.

So when we're autoscaling VMs, there's an autoscaler component monitoring things and talking to the load balancer.

What kind of rules do architects typically set up?

You set rules based on utilization metrics measured over time.

A common one is CPU utilization.

If the average CPU across all instances stays above, say, 80 % for five minutes straight, then launch one new VM.

Makes sense.

What else?

You can also set rules based on network I .O., maybe queue lengths, or even just time -based schedules if you know you always get busy at 9 a .m.

on Mondays.

And crucially, you set minimum and maximum instance counts to keep things within sensible bounds.

And when traffic drops, and the autoscaler decides to remove an instance to save money, you can't just, like, pull the plug instantly, right?

No, that would be bad.

You need a graceful shutdown.

The process is often called draining the instance.

Draining.

Yeah.

First, the autoscaler tells the load balancer, Hey, stop sending any re -request to this specific VM.

Then it signals the VM itself to finish processing any requests it already has in its queue, and then shut down cleanly.

So you don't abruptly cut off users who are in the middle of something.

Exactly.

Skipping the draining step is a common way to cause errors and frustrate users.

It's essential.

And quickly, how does this differ for containers?

Yeah.

Is it the same idea?

Similar concept, but with an extra layer.

With containers, it's often a two -level decision.

First,

does the overall workload need another container instance?

Second, if yes, is there enough space on an existing VM to run that new container?

Only if the answer to the second question is no, does the system then go and provision a whole new underlying VM.

So it tries to pack containers efficiently onto existing VMs first.

Right.

But architects need to be a bit careful here.

If your container scaling logic gets too tightly coupled with the specific mechanisms of your cloud provider's VM scaling, you might accidentally create a strong dependency, making it harder to move later.

Vendor lock -in risk.

Good point.

Okay, we've covered a ton of ground here.

Yeah.

If we had to boil it down, what are the maybe top three architectural lessons for someone building systems in the cloud?

Number one, assume failure, constantly, at every level.

It will happen.

Number two, design specifically for that long -tail latency.

Don't just hope it won't happen.

Use techniques like hedged or alternative requests proactively.

Okay.

And three.

Prioritize stateless services wherever possible.

It makes scaling, recovery, everything so much easier.

And alongside that, lean heavily on the infrastructure services.

Load balancers, autoscalers, distributed coordination tools like zookeeper or console.

Don't reinvent those wheels.

Let the platform handle that complexity.

Assume failure, design for latency, stay stateless, and use the tools.

That's a fantastic summary.

Yeah, that's the core of it.

Okay, I want to leave you, our listener, with one final thought to chew on.

Tying a couple of these ideas together.

We talked about the load balancer being an intermediary component.

Usually adding intermediaries might help with things like modifiability, but they often add a bit of overhead, potentially hurting raw performance.

True.

Intermediaries usually have some cost.

But think about this.

The load balancer's entire reason for existing is actually to increase performance, or at least availability, which feels like performance to the user.

So how does the architecture of the load balancer manage to achieve both goals simultaneously?

How does it provide that modifiability, adding to moving instances easily,

and enhance overall system performance and resilience in this distributed failure -prone environment?

That's a great question.

It really forces you to think about how those health checks, the stateless nature of the services behind it, and the routing logic all work together.

Exactly.

Something to ponder for your next system design.

Thank you so much for joining us for this deep dive into cloud architecture.

Been a pleasure.

We'll see you next time on the deep dive.

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

Chapter SummaryWhat this audio overview covers

Cloud computing fundamentally transforms how applications access and utilize computing resources by enabling on-demand provisioning and elastic scaling across distributed infrastructure managed by specialized providers. These providers operate geographically dispersed data centers organized into regions and availability zones, a structural approach that minimizes latency for end users while simultaneously protecting against correlated hardware failures that could compromise entire systems. When clients request resources such as virtual machine instances, a management gateway processes these allocation requests and coordinates with underlying hypervisors to assign appropriate computing capacity based on specified instance types and preferred regions. Architects designing systems for cloud environments must confront the reality that failures are inevitable at scale, requiring defensive design patterns throughout the application stack. Timeout mechanisms serve as a critical detection strategy, identifying unresponsive components, though their configuration demands careful calibration to distinguish genuine failures from temporary network degradation. Long tail latency represents another persistent challenge, where a measurable percentage of requests encounter delays substantially exceeding median response times due to unpredictable network congestion or resource contention, making hedged requests and alternative request patterns valuable mitigation strategies. Horizontal scaling emerges as the preferred approach for handling increased traffic, involving the deployment of multiple service instances managed by a load balancer that distributes requests using algorithms such as round-robin and continuously monitors instance health to maintain service availability. The architectural principle of statelessness simplifies scaling operations by externalizing application state to persistent storage systems, though scenarios requiring coordinated state management across distributed nodes may necessitate sophisticated algorithms like Paxos or established frameworks such as Apache Zookeeper. Autoscaling automation completes this infrastructure model, dynamically adjusting the quantity of running instances in response to monitored metrics like CPU utilization and predefined scaling rules, thereby aligning resource consumption with actual workload demands and optimizing operational expenses.

Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.

Support LML ♥

Chapter 17: Cloud Computing – Architectural Principles

Related Chapters