Chapter 3: Google Maps System Design

Search this chapter

Audio Overview

0:00 / 0:00

Autoplay next chapter

Welcome to Last Minute Lecture.

This free chapter overview is designed to help students review and understand key concepts.

These summaries supplement not replaced the original textbook and may not be redistributed or resold.

For complete coverage, always consult the official text.

Welcome back to the Deep Dive.

Today we are taking on a system design challenge that, on the surface, sounds deceptively simple.

Build a simplified version of Google Maps.

Deceptively is the key word there.

Right, because there is absolutely nothing simple about supporting 1 billion daily active users or DAU.

And that scale, you know, that is the entire point of this deep dive.

Every single design decision from data storage to the network protocol, it's all bent toward handling that immense volume.

So what's our mission exactly?

What are the core features?

We have to deliver three things and deliver them well.

First, accurate constant location updates.

Second, fast navigation with reliable ETAs.

And third.

And third, map rendering that is just buttery smooth on any mobile device.

And it's those mobile constraints that make this particularly brutal, I think.

We're not just optimizing for server throughput.

No, not at all.

We are, and have to be, obsessed with minimizing data usage and battery drain for the user.

And that's a fundamentally different design philosophy than, say, building a traditional web service.

Precisely.

We are managing petabytes of static road data, constantly incorporating real -time traffic, and doing it all with millisecond latency,

all while keeping the client happy.

OK, let's unpack this.

Where do we even begin?

I think we have to start with the core concepts that allow us to scale location data across the entire globe.

Yeah, that's the right place to start.

And since our audience knows the basics, like latitude and longitude, we can jump straight into how we manage those coordinates efficiently.

We need a system to turn a location into something a computer can search fast.

And this is where it gets really interesting for efficiency.

Geo hashing.

This is absolutely key to scaling.

Geo hashing.

Geo hashing is essentially an encoding scheme.

It takes the 3D Earth, flattens it, and then recursively divides that space into smaller and smaller rectangular grids.

OK.

Each of those little sections is represented by a unique string of letters and digits.

The longer the string, the more precise the location.

So if I want to find everything within a specific city block, I only need to search for locations that share the first, what, four or five characters of that Geo hash string?

Exactly.

Instead of doing complex, heavy radius calculations using raw latitude and longitude?

You got it.

It organizes the entire world into these clean, searchable buckets.

And that concept, organizing space into fixed, manageable units, that leads directly to our first huge architectural problem, right?

Map rendering.

Yes.

Map rendering via tiling.

You can't just send one giant image of the world map.

So we break the world up, the entire map becomes a mosaic of these smaller, fixed -size image files.

What are they?

Usually 256 by 250 pixel images?

That's the standard, yeah.

Those are known as tiles.

The client only downloads the specific tiles it needs to display the current view and then just stitches them together.

And critically, the tiles change based on zoom level.

At the lowest zoom, the whole world might just be one tile, or maybe four.

As you zoom in, the number of tiles needed just explodes geometrically.

Each zoom level increases the total tile count by a factor of four.

Which is essential for saving bandwidth.

Absolutely essential.

Now, we have to apply this same tiling concept to the road network itself.

A road network is just a classic, massive graph problem, right?

It is.

Intersections are nodes, roads are edges.

And trying to run a path -finding algorithm like A or Dijkstra's on the entire global road graph at once?

Impossible.

It's just impossible.

It would choke your memory and the latency would be unusable.

So what's the solution?

So we introduce routing tiles that are very similar to map tiles.

They cover chunks of geography, but instead of contain images.

They contain data.

Right.

They contain binary files of the specific road graph data.

The nodes, the edges, the travel costs, just for that little area.

Okay, but how do we then move across continents without loading half the planet's road data?

That requires hierarchical routing tiles.

We create layers of tiles based on road importance.

Layers, I see.

So you have detailed tiles for local roads, then medium -sized tiles for major arterial roads, and then the largest tiles which contain only highway and major freeway data.

So when an algorithm calculates a long trip,

it can jump up to that highway layer for 90 % of the trip.

Exactly.

It drastically reduces the number of nodes it has to traverse before it drops down to the local road tiles near the destination.

That seems like a fundamental optimization.

It must just slash the processing time.

It does.

Before we get to the architecture, let's just try to grasp the scale we're talking about here.

Let's do some quick numbers.

Okay.

Storage first.

If you look at the highest zoom level, zoom 21, you're looking at about 4 .4 trillion tiles.

Trillion.

Now, even when you account for the fact that maybe 80 to 90 % of the planet is water or uninhabited and can be compressed away.

You're still left with a huge number.

You're still at around 50 petabytes, key B, for that single zoom level.

And since we need all 21 zoom levels, the total static map tile storage rounds up to, what about 100 pb?

Roughly, yeah.

When you factor in the decreasing size at the lower zoom levels.

Plus, the road graph data on top of that, which is terabytes more, it's a massive asset store.

Wow.

Okay, now throughput.

One billion daily active users.

If they're navigating for an average of,

say, 35 minutes a week, and we naively track their GPS every single second.

Oh.

We're looking at three million queries per second, QPS,

every single day.

That's an impossible burden for any location service.

It's completely unmanageable.

But that mobile constraint, that becomes our best friend here.

How so?

We implement a fundamental optimization, client side batching.

Client side batching.

Instead of sending an update every second, the client just buffers those updates, let's say for 15 seconds, and sends them all as one batch.

And that single change reduces our average required QPS for location updates from three million down to?

Down to 200 ,000.

That's an order of magnitude.

It is.

And even at peak usage, we're probably capping out around one million QPS.

We have to design our database around that one million QPS right peak.

Okay, so that sets the stage.

Let's move to step two, the high level architecture.

We have to separate these three functions, location, navigation, and map serving.

Absolutely.

They have to be different services.

So the mobile user hits a load balancer.

It directs location updates to the location service and route requests to the navigation service.

And the map tiles.

The static map tiles, they bypass the origin servers entirely.

They're served by a dedicated content delivery network, a CDN.

Got it.

Okay, let's focus on that location service first.

With one million QPS of bashed writes, we need speed and just massive scalability.

What database are we choosing and why?

We have to choose a database that's engineered for high write throughput, something like Cassandra.

Cassandra.

Yeah.

Since location data is continuously changing and frankly transient, the location I had 15 seconds ago is already stale.

We prioritize availability over strict consistency.

That's the CAP theorem trade off.

Right.

So if Cassandra temporarily misses a single location update during, say, a network split, it's not catastrophic.

Not at all.

A new update is coming in 15 seconds anyway, and the system just keeps functioning.

That's why prioritizing availability is the correct choice here.

And how do we store it?

We store the data using the user ID and a timestamp as the key.

And the value of collecting all this data is huge.

It feeds our machine learning.

It lets us monitor live traffic.

And it even helps detect new or closed roads automatically.

Precisely.

Okay.

Let's talk about the navigation service.

This is the heavy computation center responsible for finding a reasonably fast route.

Correct.

The latency here is a bit more tolerable because the user is just waiting for that initial route.

So it takes the origin and destination.

Yep.

Loads the relevant routing tiles from object storage, runs our hierarchical pathfinding algorithm, and returns the whole package.

Distance, duration, polyline coordinates, and all the turn -by -turn instructions.

All right.

And now for the 100 petabyte challenge.

Map rendering.

We already said dynamic generation is a non -starter.

We have to serve static, pre -generated tiles.

But how do we handle the sheer volume of requests for them?

This is where the CDN is an absolute hero.

Because static files are perfectly cacheable.

Perfectly cacheable.

We store the 100 PD of map tiles in cloud object storage, like S3.

But they are served almost entirely by the content delivery network.

The CDN pulls the tile from the point of presence closest to the user.

And this reduces latency from, what, maybe 300 milliseconds if you had to hit our origin server?

Down to maybe 10 milliseconds or less?

It's critical for that smooth user experience.

But there's a fascinating trade -off here, isn't there, about how the client finds the tile in the first place?

There is.

Do we calculate the URL on the client or on the server?

So option one, client -side geohash calculation.

It's maximally efficient, super fast.

The mobile app has the algorithm baked right in, calculates the geohash from its current lat -link zoom, and just constructs the URL instantly.

It's high efficiency, but it gives you zero operational flexibility.

What do you mean?

Well, if we ever need to change the tile encoding scheme, or maybe correct a little error in the geohash logic, we have to force an app update on a billion users.

Across dozens of device platforms, oh, that's a logistical nightmare.

It's a huge nightmare.

So the alternative is the map tile service.

An intermediary service.

Exactly.

The client just sends its viewport coordinates to this service, and the service calculates the geohashes and returns the exact URLs for the requested tile, plus the eight neighboring tiles for seamless panning.

That adds a little latency.

A negligible amount, maybe a few milliseconds.

But it gives us complete control.

We can change the encoding or optimize the tile -serving logic on the server -side instantly without forcing app updates.

And for a massive, long -lived system, that operational flexibility is often the winning trade -off.

Almost always.

Okay, let's move to step three, designing for real -time adaptability.

We have our data living in four places, routing tiles in object storage.

User location data in Cassandra, geocoding data in Redis for fast lookups.

And the pre -computed images on the CDN.

Right, and that high -volume batched location data in Cassandra is just too important to sit there.

So we stream it.

We stream it.

Through a high -throughput message queue like Kafka,

this single stream acts as a central nervous system, feeding data simultaneously to our live traffic update service,

our ML personalization models.

And even a service that updates the underlying road graph if it detects a new road has opened or closed.

Exactly.

It's a closed -loop system.

I love that.

We're constantly updating the map based on how people are actually using it.

Okay, back to map rendering.

There's a huge modern optimization that really impacts mobile bandwidth.

Vector tiles.

This is a huge shift.

Instead of serving rasterized PNG images, which are just static pixels.

We serve vector data.

Yes.

We're sending the mathematical descriptions of the roads, the rivers, the polygons.

Which means what for the user?

It means substantial bandwidth savings because vector data compresses far better than raw images.

But maybe more importantly for the user experience, the client is now responsible for drawing the map.

Ah.

So you get that really smooth, non -pixelated zoom experience where labels rotate cleanly and details render sharply no matter what.

That's it.

It's a vastly better experience.

Okay.

Let's look at the ETA service.

Once we have the shortest path, we need the time.

How do we accurately predict travel time, especially in rush hour?

The ETA service relies heavily on machine learning.

It combines historical travel times like this road segment takes 10 minutes on a Tuesday at 5 p .m.

with the current live traffic data we're pulling from that Kafka stream.

But the real challenge isn't just current traffic, is it?

No.

The critical challenge here isn't calculating distance.

It's predicting what traffic will be like 10 or 20 minutes into the future.

It's anticipating blockages or clear paths before the car even gets there.

That prediction element moves this from a simple calculation to a really complex forecasting model.

Which brings us to the final and maybe most complex scaling problem, adaptive ETA and real -time rerouting.

If a massive accident happens on the freeway,

how do we track millions of actively navigating users and notify only the affected ones efficiently?

If you took the naive approach, checking every single one of the thousands of routing tiles that cover the path of every single navigating user,

the computational load would be just overwhelming.

We need a way to filter and filter instantly.

So we use hierarchical tracking again?

Yes.

When a user starts navigating, instead of storing every tiny detail of their route, we store the user's location relative to a hierarchy of tiles that encompasses their destination.

Can you give us an analogy for that?

Sure.

Think about it this way.

If a traffic incident occurs, we assign it a specific geohash, say, for a single city block instead of checking if that block is on every user's route.

You check the big picture first.

We check if that block's geohash falls outside the largest, highest level routing tile we have stored for the user's entire destination district.

If it's outside that big boundary, the user is instantly filtered out.

We don't waste computation time.

That drastically reduces the number of users we have to analyze in detail.

Drastically.

It's brilliant.

It prevents checking a thousand unnecessary routes for every single incident.

And finally, to deliver that real -time rerouting alert to the user's phone, we need a server -to -client push protocol.

Right.

We can rule out standard mobile push notifications.

The payload is too limited.

We can rule out long polling because it's just too heavy on server resources.

So the superior choice is?

The superior choice for this kind of bi -directional communication, something lightweight and low -latency, is WebSocket.

Because it maintains a persistent connection.

It does.

It's perfect for constantly updating the user about traffic changes and even for future features like last -mile delivery tracking.

So the final system is really a testament to prioritizing mobile efficiency, blending static storage and caching with this continuous real -time data pipeline.

The essential knowledge nuggets here are foundational for any massive system design.

The necessity of tiling to break up petabytes of data, both map images and the road graph into manageable units.

Geo hashing for the fast indexing.

Right.

The conscious decision to use Cassandra for location data, prioritizing availability over consistency.

And the huge efficiency gains from client -side batching and vector tiles.

Exactly.

So what does this all mean?

The simple act of you opening a map application is a masterclass in compromise.

It requires managing 100 petabytes of static assets while simultaneously running real -time prediction models, all designed to respect your mobile battery and bandwidth.

The entire system is built around handling the present, but as we've discussed, the Thank you for joining us on this deep dive.

Let's leave you with one provocative thought.

If we can successfully predict traffic 20 minutes into the future,

what other real -world dynamics, weather patterns, parking availability, or even crowd density could integrate into this system to deliver a truly predictive, proactive navigation experience?

Something to ponder until next time.

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

Chapter SummaryWhat this audio overview covers

Building a mapping platform capable of serving one billion daily active users requires architecting interconnected services that handle location tracking, route computation, and visual rendering at massive scale. The foundational challenge involves converting raw geographic data spanning terabytes into usable formats through map projection, which translates three-dimensional global coordinates onto two-dimensional planes, and geohashing, a recursive subdivision technique that partitions geographic space into grids for efficient spatial indexing. These foundational concepts enable the system to organize petabytes of precomputed map tiles across 21 zoom levels into a manageable structure. The architecture separates concerns into specialized services: a Location Service that captures user movements, a Navigation Service that calculates optimal routes, and a Rendering Service that delivers visual map data. The Location Service employs write-optimized NoSQL databases like Cassandra to absorb millions of position updates per second, reducing peak traffic through client-side batching that transmits updates at intervals rather than continuously. Location data flows through Kafka message queues to downstream consumers that enhance traffic visibility, refine road datasets, and personalize user experiences. Map rendering leverages Content Delivery Networks to serve static, pre-generated tiles with URLs computed by clients using geohashing, while vector tiles provide compression advantages over traditional rasterized formats. The Navigation Service applies pathfinding algorithms, particularly A variants, against hierarchical routing tile structures stored in distributed object storage, enabling rapid route generation. Real-time travel time prediction derives from machine learning models trained on current and historical traffic patterns, feeding adaptive ETA calculations that respond to changing conditions. Supporting dynamic rerouting and live traffic updates requires bidirectional communication; the system employs WebSocket connections for their efficiency and low server overhead compared to traditional polling mechanisms. This multi-layered design balances consistency requirements with the inherent latency and scale challenges of geographic information systems operating at planetary scope.

Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.

Support LML ♥

Chapter 3: Google Maps System Design

Related Chapters