Chapter 7: Transactions & ACID Guarantees

Search this chapter

Audio Overview

0:00 / 0:00

Autoplay next chapter

Welcome to Last Minute Lecture.

This free chapter overview is designed to help students review and understand key concepts.

These summaries supplement not replaced the original textbook and may not be redistributed or resold.

For complete coverage, always consult the official text.

Welcome to the Deep Dive.

If you need a shortcut to being well -informed on the critical infrastructure that runs, well, pretty much everything,

you are in the right place.

Today, we are tackling a really complex but absolutely crucial topic,

the mechanics of transaction processing.

We're drawing on some foundational material about data -intensive applications.

Right.

Our goal here is to really cut through that complexity.

We want to give you a high -definition look at the safety guarantees databases are supposed to give you.

We're moving past just saving data.

We're digging into how systems keep data from becoming a complete mess when faced with the harsh reality of failure.

And when you say failure, we're talking about everything, right?

Like application crashes, software bugs, network cuts in the middle of an operation.

And the big one, concurrent clients, multiple people trying to write to the same piece of data at the same time and just stomping on each other's changes.

Right.

For decades,

the transaction has been the go -to abstraction for this.

It lets the application basically pretend a whole class of these really nasty problems just don't exist because the database is meant to handle them.

I always come back to that great quote from the Google Spanner paper.

They said something like, it's better to make programmers deal with performance issues from transactions than to force them to write tons of code to work around the lack of them.

Exactly.

It just shows you how fundamental this is.

Yeah.

A transaction at its core is just a way to group things.

You take several reads and writes, bundle them into one logical unit, and that whole unit either succeeds completely.

We call that a commit.

Or it fails.

Or it completely fails.

The whole thing gets rolled back as if it never happened.

This gets rid of that catastrophic state of partial failure so your application can just safely retry.

Okay.

That sounds like a huge win for handling errors.

So let's unpack the actual promises here.

I feel like everyone's heard the acronym, but maybe doesn't know what it really means.

ACITY.

And this is where it immediately gets slippery.

Ancet atomicity, consistency, isolation, and durability was coined way back in 1983.

A long time ago.

A very long time ago.

And today it's become, frankly, more of a marketing term because one database's idea of isolation can be wildly different and much weaker than another's.

And you sometimes hear the opposite term, base, for systems that aren't ACID.

But that's even vaguer, isn't it?

Oh, much vaguer.

It basically just means the system cares more about being available than being strictly consistent.

Okay.

So let's break down ACID.

Let's start with A for atomicity.

Right.

So a lot of people mix this up with concurrency, but in ACID, atomicity is all about fault tolerance.

It's the guarantee that if a transaction fails halfway through, it gets aborted and all its rights are just discarded.

So it's about being able to hit the undo button.

The key thing is the abort.

It's not about making it look like one operation.

No, that's isolation.

Right.

It's about the guarantee that you can safely abort and retry.

The source material actually suggests abortability would have been a much better word for it.

I completely agree.

It would have saved a lot of confusion.

Okay.

Next up is C for consistency.

And this is probably the most misunderstood letter in the whole acronym.

Why is that?

Because it refers to an application -specific idea,

like an invariant in your data.

For example, in an accounting system, credits must always equal debits.

And the database itself can't know that rule.

That's business logic.

Exactly.

The application is responsible for maintaining that consistency.

The database just provides the tools, the A, I, and D, to help.

If you write a transaction that breaks your own rules, the database will happily commit it.

So the C doesn't really belong in ACID.

Many experts argue it doesn't.

It's a property of the application, not the database.

Fascinating.

Okay.

Let's get to the big one.

ISA for isolation.

This is all about concurrency, right?

This is all about concurrency.

Isolation's job is to hide the chaos.

It makes it seem like transactions are running serially, you know, one after the other, even if they're executing at the exact same moment.

And we can picture why this is needed with that classic counter example.

You have a counter at 42.

Two clients, A and B, both read 42 at the same time.

Client A adds one and writes 43, then client B adds one and also writes 43.

And you've lost an update.

The counter should be 44, but it's stuck at 43 because B's write just overwrote A's.

Isolation is what's supposed to prevent that.

Got it.

And finally, D for durability.

This one's the most straightforward.

It's the promise that once the database says commit successful, that data is safe.

It won't be forgotten, even if the power goes out a second later.

And how do they actually do that?

On a single machine, it means writing to non -volatile storage, like an SSD, and usually writing to a write ahead log before you even confirm the commit.

And with replication.

It means making sure the data has been copied to other machines.

But, you know, we should be clear.

Perfect durability doesn't exist.

You can have weird correlated failures or SSD firmware bugs.

It's always about risk reduction, not total elimination.

Okay.

So we've got the rules to the road.

Now, where do they apply?

When do you actually need a transaction?

Well, the complexity really starts when you need to coordinate changes across more than one piece of data, more than one object.

What about for a single object, like updating a single user profile?

For a single object, like one JSON document, pretty much every modern system gives you atomicity and isolation out of the box.

They'll just lock that one item while it's being changed to prevent partial updates.

Okay.

So if single object writes are safe, why all the complexity around multi -object transactions?

Because so many real world operations require it.

There are three big use cases.

First, in a relational database, keeping foreign keys valid across different tables.

Makes sense.

Second, and this is huge now, is keeping denormalized data in sync.

Can you give an example of that?

Absolutely.

Think of your email inbox.

When a new email comes in, at least two things have to happen.

You have to insert the email itself and you have to increment the little unread counter in the sidebar.

Ah, two separate writes to two separate objects.

Exactly.

And if this system crashes after it saves the email, but before it updates the counter, your data is now inconsistent.

The user will hate that.

And the third reason.

Secondary indexes.

When you update a record, that change needs to be reflected in every index that points to it.

If that process fails halfway through, the record might just vanish from some of your searches.

So the whole philosophy of an ACEID database is really, it would rather fail the entire operation than leave your data in some weird half -finished state.

Yes, exactly.

It's a trade -off.

It chooses integrity over availability in that moment, giving you a clean slate to just retry.

That retry is powerful, but there's a small catch, right, with the network.

There is.

If the transaction commits, but the network fails before you get the okay message, your application might think it failed and retry, potentially creating a duplicate action.

The app still needs to be smart enough to handle that.

Okay, so this brings us to the harsh reality of all this.

Getting that perfect serializable isolation is expensive, performance -wise.

Very expensive.

So most databases, by default, run at weaker isolation levels.

And this is the danger zone.

The source material is very clear about this, saying,

these weaker levels lead to subtle bugs that have caused, and I'm quoting here, substantial loss of money.

Wow.

Okay, so let's start with the most common weak level.

Read committed.

What does that protect you from?

It gives you two basic guarantees.

First,

no dirty reads.

You can't see data that another transaction has written but hasn't committed yet.

This is huge, because if that other transaction aborts, you won't have made a decision based on data that technically never existed.

Okay, that seems essential.

What's the second guarantee?

No dirty writes.

This prevents you from overriding a value that an uncommitted transaction has already written.

It stops that race condition where two users are trying to buy the last car and stock at the same time, and the system gets totally confused.

So how does read committed actually work?

How does it implement this?

For writes, it's pretty simple.

It uses row level locks.

If you want to change a row, you have to acquire a lock on it.

But here's the key trade -off for performance.

For reads, it doesn't use locks.

It doesn't.

So what happens if a reader wants a row that's locked by a writer?

It just reads the old committed value of that row, the one from before the write transaction started.

This is critical, because it means a long -running write won't block dozens or hundreds of readers.

That's a clever compromise, but it must have a downside.

It does.

It leads to an anomaly we call read skew, or sometimes non -repeatable read.

Okay, what's reads to?

Imagine Alice has two bank accounts with $500 each.

She initiates a transfer of $100 from one to the other.

Now, while that's happening, you run a query to check her total wealth.

Your query might read the first account after the $100 has been withdrawn, so it sees $400, but it reads the second account before the money has arrived, so it still sees $500.

It reports her total balance as $900.

Which is wrong.

Temporarily, but still wrong.

Exactly.

It's temporally inconsistent.

Each individual read was of committed data, so read committed allows it.

But for things like backups or analytics,

this kind of skew is unacceptable.

So what's the fix for read skew?

How do you get a truly consistent view?

That brings us to a much stronger level, snapshot isolation.

Okay.

The core idea here is really revolutionary.

Each transaction reads from a complete, consistent snapshot of the database as it existed at the exact moment the transaction began.

And what's the performance principle?

This is the magic part.

Readers never block writers, and writers never block readers.

That sounds amazing for concurrency.

How is that even possible?

The mechanism is called multi -version concurrency control, or MVCC.

MVC.

Instead of overwriting data, the database keeps multiple versions of a row.

Each transaction gets a unique ID, and every version of a row is tagged with the ID of the transaction that created or deleted it.

The database just applies a visibility rule to show your transaction only the data that was committed before your snapshot was taken.

Okay, this is a critical point from the source material.

This specific level, snapshot isolation,

it has different names in different databases, right?

Oh, it's a mess.

And a dangerous one.

Oracle calls its implementation of SI serializable.

PostgreSQL and MySQL call theirs repeatable read.

So developers could think they're getting a much stronger guarantee than they actually are just because the names are misleading.

They're flying blind.

It's a major flaw that comes from the old SQL standard, and it causes real -world problems.

That is a huge takeaway.

Okay, so SI solves read skew.

Let's move on to the more subtle write conflicts.

What about the loss update problem we talked about earlier?

Right, that classic read -modify -write cycle gone wrong.

Two people editing the same wiki page at once.

The last person to hit save wins, and the other person's work is just gone.

So how do we defend against that?

There are a few ways.

The best is to use the database's built -in atomic write operations like update, counterset, value, value plus one.

That's guaranteed to be safe.

And if you can't do that?

You can use explicit locking, telling the database select for update, which manually locks the rows you plan to change.

Or even better, some databases with SI like Postgresql have automatic lost update detection.

You'll spot the conflict at commit time and just abort one of the transactions.

Okay, that covers lost updates, but there's an even sneakier problem called write skew.

This is the most subtle one.

Write skew is when two transactions read some state,

make a decision based on it, and then update different objects in a way that invalidates the original state they both read.

The on -call doctor example is perfect for this.

It is.

So the rule is at least one doctor must be on call.

Alice and Bob are both on call.

Right.

Alice checks the system, sees their two doctors on call, and decides she can go off duty.

At the same time, Bob does the exact same check, sees two doctors, and also decides to go off duty.

They both update their own status, two separate rows.

Exactly.

And because they aren't touching the same data, snapshot isolation sees no conflict.

Both transactions commit.

And you're left with zero doctors on call, violating the business rule.

And that's write skew.

Because the premise they both read, there are two doctors on call,

was invalidated by their own actions, but the database couldn't see the connection.

Precisely.

It's a phantom.

A write in one transaction changes the result of a search query in another.

This happens with meeting room bookings, preventing double spending.

It's a whole class of bugs.

This all leads us, finally, to the promised land.

Serializability, the strongest guarantee, the one that prevents all these race conditions we've talked about.

Right.

If your application logic works correctly when run one query at a time, serializability guarantees it will work correctly when run concurrently.

So how do databases actually achieve this?

What are the techniques?

There are really three main approaches.

The first is, well, the simplest to understand, actual serial execution.

Just run transactions one at a time.

Literally.

On a single thread.

This only became practical recently with fast and memory databases, because for many applications, the active data set can fit entirely in RAM, making single threaded execution surprisingly fast.

But wait, there's a huge catch there with network latency, isn't there?

You can't have the application sending one query waiting, then sending another.

No, you're right.

That would kill throughput.

The only way it works is if you package the entire transaction up as a stored procedure and send it to the database in one go.

That eliminates the back and forth network waits.

Okay, so that's one way.

What's the second?

The old school traditional method,

two phase locking or 2PL.

And this is different from the locks we discussed earlier.

Very different.

2PL is heavy handed.

It's not just that writers block other writers.

In 2PL,

writers block readers and readers block writers.

Whoa.

Yeah.

Concurrency takes a massive hit.

Locks are held until the very end of the transaction.

Performance is often poor, latency is unpredictable, and you get a lot of deadlocks.

Where two transactions are stuck waiting for each other.

Exactly.

The database has to detect that and just kill one of them.

To deal with phantoms, 2PL has to use something called index range locks, which is basically locking a part of an index to prevent new rows from being inserted where you just read.

That sounds painful.

Is there a better way?

There is.

The modern and I think most promising approach is serializable snapshot isolation or SSI.

So it builds on snapshot isolation.

It does.

It's an optimistic approach.

It lets transactions run concurrently on a snapshot, just like regular SI, but then it adds a detection step at the end.

It hopes for the best, but verifies at commit time.

And what is it detecting?

How does it catch something like right skew?

It's clever.

It basically tracks two things.

First, it detects if a transaction reads something from its snapshot that has since been overwritten by a transaction that committed.

So it detects you read stale data.

Second, it tracks if another transaction has written to a range that you previously read from.

If either of those checks fail at commit time, the transaction is aborted.

And the huge benefit is that you get full serializable safety, but you keep the high performance of snapshot isolation because readers and writers aren't actively blocking each other.

It's really the best both worlds.

So to wrap this all up, transactions are this amazing abstraction for simplifying error handling.

They let you turn a whole universe of potential failures into just a simple abort and retry.

And that ACD acronym is your guide, but you have to be so careful because the implementation of I for isolation varies wildly.

Right.

We walk through that whole hierarchy from the basics of read committed, which stops the really dirty stuff.

To snapshot isolation, which gives you consistent reads, but is still vulnerable to things like writes queue.

All the way up to true serializability, which is the only level that protects you from everything.

So when you're designing an application, you have to assume you're on a weaker level unless you explicitly configured and tested it.

You can't just trust the ACD compliant marketing sticker on the box.

A really important warning.

And maybe a final thought to leave you with.

Everything we've talked about today has been focused on single node databases.

The second you try to provide these same transactional guarantees across a distributed system with multiple machines,

you enter a whole new world of incredibly difficult problems.

A perfect cliffhanger.

Thank you for diving deep with us today.

My pleasure.

We'll catch you next time.

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

Chapter SummaryWhat this audio overview covers

Database transactions provide a mechanism for grouping multiple operations into atomic units that either fully succeed or completely fail, eliminating the need for applications to manually handle partial failures from crashes, network interruptions, or hardware faults. The ACID framework characterizes the reliability guarantees that transactions offer: atomicity ensures all-or-nothing execution where changes are either committed entirely or rolled back completely; consistency represents application-defined rules about valid database states, enforced jointly by the database engine and application code; isolation prevents concurrent transactions from interfering with each other's operations; and durability guarantees that committed data survives system failures through persistent storage or replication. However, strict adherence to all ACID properties incurs significant performance costs, prompting most production systems to use weaker isolation levels that protect against certain concurrency problems while permitting others. Read Committed isolation prevents dirty reads and dirty writes by blocking access to uncommitted data, making it the default in many databases despite allowing other anomalies. Snapshot Isolation operates by giving each transaction a consistent view of the entire database at a single point in time, eliminating read skew and nonrepeatable reads while allowing transactions to proceed concurrently without blocking. Multi-Version Concurrency Control implements snapshot isolation by maintaining multiple versions of data objects, enabling readers and writers to access different versions simultaneously without mutual blocking. More subtle concurrency problems demand stronger guarantees: lost updates arise when concurrent read-modify-write sequences overwrite each other's changes, while write skew occurs when a transaction bases its decision on assumptions that another concurrent transaction invalidates, often involving phantom reads where new rows match a prior query. Achieving true serializability, which prevents all race conditions, requires one of three approaches: Actual Serial Execution queues transactions sequentially, practical only for short transactions on partitioned in-memory systems; Two-Phase Locking uses shared and exclusive locks held until commit, introducing latency unpredictability and deadlock risks; or Serializable Snapshot Isolation combines optimistic concurrency control with MVCC-based conflict detection at commit time, offering superior performance and scalability compared to traditional locking approaches.

Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.

Support LML ♥

Chapter 7: Transactions & ACID Guarantees

Related Chapters