Chapter 7: Hotel Reservation System Design

0:00 / 0:00
Report an issue

Welcome to Last Minute Lecture.

This free chapter overview is designed to help students review and understand key concepts.

These summaries supplement not replaced the original textbook and may not be redistributed or resold.

For complete coverage, always consult the official text.

Welcome back to The Deep Dive.

Today we're getting into the real foundations of global travel.

We're talking about designing a rock -solid hotel reservation system.

Right.

If you've ever booked a room online, you know the system has to be fast.

But I mean, more importantly, it has to be absolutely accurate.

That's right.

So our mission today is to take on this massive challenge designing a system that can handle, let's say, a hypothetical empire of 5 ,000 hotels and a million rooms and just break down the architecture step by step.

Yeah.

Because the real complexity here is solving those classic distributed systems problems.

Yeah.

Concurrency, financial integrity, all of that.

A million rooms.

That is just massive scale.

So let's establish the core functional needs right away.

We need the obvious things.

Pages to show hotels and rooms, a way for staff to manage inventory through an admin panel,

the ability to reserve and cancel.

And this is the big one.

We have to support 10 % overbooking.

Overbooking is, well, it's a financial necessity in this industry.

It means we're actively selling more rooms than we physically have.

Right.

Like 110 rooms when you only have 100.

Exactly.

Because you anticipate a certain cancellation rate.

So our system has to manage this kind of artificial buffer, but without ever breaking consistency.

And that's where the tension comes in from the non -functional requirements.

You need high reliability.

You need to handle high concurrency.

You know, a big sporting event ends and suddenly everyone is trying to book rooms at the same time.

You can't have double booking.

Exactly.

But here's the key trade stuff.

Unlike say a financial trading system, we can accept moderate latency.

Okay.

While instant is great.

If the whole transaction checking inventory, locking the room, processing the payment, if that takes two, maybe four seconds,

the system is still functional.

So reliability and consistency are way more important than sub millisecond speed.

Far more important.

Okay.

So let's unpack that initial workload.

We're building for a million rooms.

Can you give us the quick math, the back of the envelope estimation for daily traffic?

Sure.

Let's assume about a 70 % occupancy rate and say an average stay of three days.

When you run those numbers across the million rooms, we estimate we're generating about 240 ,000 new reservations every single day.

That number sounds huge, but when you spread it out across a whole day, all 86 ,400 seconds, the average transaction per second, the TPS for the actual reserve click is only about three.

It's remarkably low.

Yeah.

From modern web platform.

That's tiny.

It is low.

And that's the core right bottleneck.

But if we look at the entire customer flow, the picture changes dramatically.

We need to visualize it as a traffic funnel.

Right.

If only three people per second are hitting that final book now button, how many were looking at the page just before that?

Good point.

So if we assume a pretty typical 90 % drop off as users move through the process and the order booking page, you know, put in your credit card, that's seeing around 30 queries per second or QPS.

And the page before that, the view detail page where you're just browsing the room, that sees a massive 300 to QPS.

So that ratio 300 reads for every three final writes, that tells us immediately this is an extremely read heavy system.

That's the whole game right there.

And that has huge implications, doesn't it?

It means our scaling strategy has to focus almost entirely on serving cached info quickly, not just optimizing rate throughput.

People want those instant reads.

Precisely.

A slow query to pull up a room image.

That's a user abandoning the site.

But a slow transaction on the back end is, while annoying, survivable.

Totally survivable.

Okay.

Let's move to the high level design then.

Starting with the front door, the APIs.

We're using restful conventions, right?

Which gives us clear APIs for hotel data, room data, and reservation data.

And you have to separate access right away.

Immediately.

Hotel and room APIs, anything that involves adding or updating inventory, that's for staff only, through internal APIs.

The public facing APIs are for customers.

Get your reservation history, post a new reservation, and delete to cancel.

Okay.

And in that critical POSD request for a new booking, we're introducing a concept that's vital for reliability.

The reservation ID as an idempotency key.

Why is that specific ID so essential?

It's the ultimate insurance policy.

It's for network hiccups, for impatient users.

The double clickers.

The double clickers.

If they hit submit twice, or their network times out and they hit it again, the system gets two identical requests.

So we generate a unique reservation ID on the client side before submission, and make it the primary key in the database.

The first request sales through.

The second one hits the database.

And gets rejected.

Right.

Throws a unique constraint violation.

No double charge, no double reservation.

It just fails gracefully.

Okay.

Now let's talk about the foundation.

The database.

Since this is a system dealing with money and inventory, why are we going with a relational database?

It all comes down to non -negotiable transactional integrity.

ACID guarantees.

ACID guarantees.

Edamicity, consistency, isolation, and durability.

We just cannot afford double booked rooms or double charged guests.

And ACID is what guarantees that our data stays financially sound, even when things fail.

So no eventual consistency here.

Absolutely not.

Plus the relationships, hotel, room, guest, are stable, clear, and just perfectly modeled by SQL tables.

That makes perfect sense.

But here's the trap, right?

The one that catches new designers.

Oh yeah.

When you first sketch out the schema, the reservation table might have a specific.

Precisely.

And that realization just simplifies the entire inventory problem dramatically.

Okay.

Moving to the architecture.

We're choosing a microservice architecture.

So what are the key services that a customer interacts with?

Well, up at the front, we've got a content delivery network, a CDN,

for all the static stuff like images and HTML.

That ensures blazing fast load times for those 300 QPS hitting the detail pages.

Okay.

Then there's a public API gateway.

It handles things like rate limiting, authentication, and then it routes requests to our internal components.

And which services are doing the heavy lifting?

We have a few core ones.

The hotel service handles all that static info addresses, amenities, which is highly cashable.

The rate service is pretty cool.

It does dynamic pricing based on occupancy or events.

And then?

Then you have the payment service, the hotel management service for staff, and the star of our show today, the reservation service.

That's what handles the booking requests and inventory validation.

Let's do a deep dive into that inventory solution then, because shifting to room type ID requires a totally new way to track what's available.

It does.

And the most crucial table in our entire system is called room type inventory.

This is where all the magic in the math happens.

How is it structured to handle rooms, types, and time?

I mean, that's a lot of variables.

It stores inventory data per room type, per hotel, and per date.

Per date.

Per date.

So its primary key is a composite of three things.

Hotel ID, room type ID, and date.

And the key columns are total inventory, so the physical room count and total reserved.

So if I book a room for three nights, the reservation service has to query and check inventory across three separate rows in that table.

One for each date.

Absolutely.

The check has to succeed for every single night in the booking range.

The application logic is basically checking if total reserved plus room store reserved is less than or equal to total inventory.

And that 10 % overbooking requirement?

We implement it right there.

We just change the validation check to 110 % of total inventory.

It's simple, it's contained, and it's critical.

Okay, let's talk scale.

Even with this optimize schema, you've got 5 ,000 hotels, maybe two years of inventory.

We're looking at around 73 million rows in this one table.

If that's too much for one database, how do we shard it?

Sharding is essential for scaling rights here.

And since almost all of our queries filter by hotel ID, because you're always booking a room at a specific hotel, our strategy is to shard by hash hotel ID percent number of servers.

Which keeps all the data for a single hotel on a single database server.

Exactly.

But doesn't that make it really hard to run global reports?

What if management wants to see the total occupancy across all of Europe?

That now requires a scatter -gather query across dozens of shards.

That's gotta be slow.

That's a great question, and it really highlights a classic trade -off.

We are prioritizing the performance of the core user workflow, the actual reservation over the complexity of analytical queries.

So you optimize for the customer, not the business analyst.

In this case, yes.

Global reports are usually run as batch jobs anyway.

You can handle them with separate, slower data warehousing systems that pull the data overnight.

For this system, the speed and availability of the booking transaction wins every time.

Excellent.

Okay, now we hit the biggest wall.

Concurrency.

We've solved the single user double -click with the idempotent reservation ed.

But what about the race condition?

Multiple users trying to book the very last room at the same time.

This is the nastiest problem.

And it happens because of database isolation.

So imagine we have 100 rooms, 99 are reserved.

User 1 checks the inventory, sees one room left.

At the exact same microsecond, user 2 checks, also sees one room left.

Because their transactions are isolated, they can't see each other.

Right.

They both proceed, thinking they got it.

They both try to update total reserve to 100, and both of their transactions commit successfully.

And now you have 101 reservations for 100 rooms.

System integrity violated.

Bingo.

So we need locking to prevent this.

The sources suggest three main strategies, each with some heavy trade -offs.

Let's run through them.

First up is pessimistic locking, using commands like selectbay.

For update, this is ultra reliable because the moment user 1 reads the row, they lock it.

User 2 is forced to wait until user 1 is done.

The downside.

It's not scalable.

It's prone to deadlocks.

You know, two transactions waiting on each other forever.

And it just kills performance.

We would not recommend this for a high -concurrency website.

Okay, strike 1.

What about optimistic locking with a version number?

So, optimistic locking assumes conflicts are rare.

A user reads the data and its current version number, makes a change, and when they write it back, they check if the version is still the same.

And if it's changed, the transaction aborts.

Right.

And the user has to retry.

It's much faster when contention is low.

But when contention is high, like 100 people fighting for that last room during a flash sale, almost everyone fails.

Which is a horrible user experience, just getting stuck in a retry loop.

A terrible experience, constantly saying, sorry, that room is gone.

Which brings us to option 3, just using simple database constraints.

This is often the most pragmatic solution, especially for a system with a low write TPS like ours.

We just add a constraint to the inventory table.

Cheech.

Total inventory, total reserved, derow when.

So simple.

Very simple.

If user 1 commits and updates the count to 100, user 2's later attempt to set it to 101 violates that constraint, and the database just instantly rolls it back.

So it guarantees integrity without any explicit locking.

But the user experience flaw is still there, isn't it?

The user sees a room, clicks book, and still gets an error.

Correct.

But given our low average TPS of 3 for that final reservation write, the risk of high contention is pretty minimal most of the time.

The operational simplicity and the absolute guarantee of data integrity it often outweighs that occasional poor user experience.

Let's talk about those 300 QPS read queries.

We definitely need caching for that.

How does the inventory cache, maybe on Redis, fit into all this?

The inventory cache is there to handle that immense read volume.

The cache key is structured around the query.

HoteliDroom type ID date.

So most read queries hit the cache first and get an immediate response.

Okay, but what's the workflow for updates?

This is crucial.

Updates must hit the relational database first.

It is always the definitive source of truth.

And how does that update get back to the cache quickly?

Asynchronously.

We use a process like Change Data Capture, or CDC.

CDC is basically listening to the database's commit logs.

Anytime a reservation changes the inventory count, that change is immediately packaged up and pushed as an event to the Redis cache.

But there might be a small delay.

A small delay, yes.

A brief inconsistency is acceptable because, and this is the important part, the final definitive inventory validation must still happen at the database level before we commit the transaction.

That ensures durability and consistency right where it matters most.

The money transaction.

Exactly.

Okay, finally.

We chose microservice architecture, but we kept the reservation service and the inventory data together in the same relational database.

Why sacrifice that microservice purity of separated databases?

It's pragmatism.

For this core high integrity workflow, the complexity of ensuring transactional consistency across separate service databases was just too high for the payoff.

You mean if we had separated them, we'd have to deal with massive cross -service consistency problems.

Huge problems.

You'd be looking at complex patterns like two -phase commit or the saga pattern.

Can you quickly define those first?

Sure.

Two -phase commit or 2BC is a protocol that forces all databases involved to agree to commit or roll back together.

It's blocking and not very performant.

If one part fails, the whole thing stalls.

And saga.

Saga is much looser.

It's eventually consistent.

It's a chain of local transactions.

If, say, step four fails, you have to run a really complex set of compensating transactions to undo everything that came before.

Like issuing a refund, deleting the record.

Yeah, and it is notoriously difficult to get right.

By just using a shared relational database for this one core function, we leverage the built -in ACID guarantees and just avoid all that complexity.

That is a masterclass in pragmatic architecture.

We've covered a lot today.

Choosing the relational DB for ACID, sharding by Hotelid, designing that crucial room -type inventory table to handle room types and overbooking.

And solving concurrency with idempotent APIs and database constraints.

The key takeaway, really, is knowing where to draw your integrity boundary and being willing to trade off a little performance for guaranteed reliability.

We focused entirely on a hotel chain that books rooms by type and allows for that 10 % overbooking.

Now, here's a provocative thought for you to take on the road.

How might the scaling strategy and the concurrency trade -offs change if we were designing a system like a movie ticket booking platform where the inventory, the specific seats, are truly unique physical objects that cannot be overbooked under any circumstance?

That changes everything.

It changes whether you're managing physical IDs or aggregated counts.

And that decision just shapes the entire system.

Indeed.

We'll leave you to ponder the difference between fungible and specific inventory.

Thank you for joining us on this deep dive into system architecture.

Congratulations on getting this far.

Now give yourself a pat on the back.

Good job.

β“˜ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

Chapter SummaryWhat this audio overview covers
Designing a hotel reservation system for a global chain with 5,000 properties and one million rooms presents significant architectural and concurrency challenges that exemplify real-world system design tradeoffs. The foundational scope establishes modest average transaction throughput of 3 reservations per second, though peak booking periods demand support for substantial concurrent traffic. A microservice architecture distributes responsibility across specialized services including hotel information, pricing, reservation processing, payment handling, and administrative operations, all coordinated through API gateways that manage external and internal communication. The critical decision to adopt a relational database stems from both the read-dominant access pattern and the absolute requirement for ACID guarantees to prevent catastrophic failures like duplicate reservations or double charging. Schema design diverges from the intuitive approach of tracking individual rooms, instead modeling inventory at the room type level with date-specific availability counters, reflecting how the business actually operates. Preventing accidental duplicate reservations relies on idempotent API design enforced through unique constraint mechanisms on request identifiers. The more subtle problem of concurrent bookings for limited inventory requires careful evaluation of locking strategies: pessimistic locking creates unacceptable deadlock risks and bottlenecks at scale, whereas optimistic locking using version numbers or database constraints provide practical solutions when query volumes remain manageable. Horizontal scaling through database sharding partitioned by hotel identifier enables the system to grow with the expanding property portfolio. Caching layers using Redis accelerate frequent read operations against inventory data while preserving the relational database as the authoritative source of truth. The architecture ultimately rejects complex distributed transaction patterns like sagas and two-phase commit in favor of a pragmatic hybrid approach: keeping inventory and reservation data within a single transactional database to exploit ACID properties where consistency matters most, while allowing acceptable eventual consistency elsewhere in the microservice ecosystem. This design reflects mature engineering judgment about where strict correctness is essential versus where eventual consistency suffices.

Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.

Support LML β™₯