Chapter 9: Main Memory: Paging, Allocation, and Swapping

0:00 / 0:00
Report an issue

Welcome to Last Minute Lecture.

This free chapter overview is designed to help students review and understand key concepts.

These summaries supplement not replaced the original textbook and may not be redistributed or resold.

For complete coverage, always consult the official text.

Have you ever stopped to think about what's actually happening inside your computer, like every single time you click something or open an app?

It's pretty amazing, isn't it?

We totally take for granted how quick it all seems.

Yeah, managing all those tasks at once.

But underneath, there's this incredibly complex system managing the computer's main workspace,

its main memory.

It's surprisingly intricate.

It really is.

I mean, every instruction, all the data, it has to be exactly where the CPU expects it right when it needs it.

And the big challenge, CPUs are fast, nanoseconds fast.

But getting stuff from memory takes way longer.

Many, many cycles longer.

So to keep things running smoothly, the operating system has to juggle, you know, maybe dozens or even hundreds of processes in memory all at the same time.

And that juggling act, that's where the really interesting management problems come in, which is exactly what we're diving into today.

Main memory and how operating systems handle it.

We're using operating system concepts by Silberschatz, Galvin, and Gagne as our guide.

Right.

Our goal here is to unpack the big ideas behind memory management.

We'll go from the absolute basics right up to the more advanced stuff you see today.

So you can understand how your phone, your laptop, even those massive cloud servers, how they all manage memory to keep running smoothly.

Exactly.

We're hoping for some of those aha moments for you, the kind of insights that make you feel like you really get what's going on under the hood.

OK, let's jump in.

At the most basic level, how does a computer even like see its memory?

What does it look like to the hardware?

Well, you can think of main memory as just this huge continuous list or array of bytes.

Each single byte has its own unique number, its address.

Like houses on a very, very long street.

Pretty much, yeah.

And the CTU, when it's running a program, it just deals with a stream of these addresses.

It needs an instruction from address X or data from address Y.

It doesn't inherently know what's at those addresses, just where they are.

OK, so it's just one giant numbered block.

But wait, if it's all just one big space, what stops, say, my web browser from accidentally reading data from my password manager or, even worse, messing with the operating system itself?

Yeah, that would be total chaos.

And that's why you absolutely need basic hardware protection.

User programs cannot just be allowed to access any memory location they feel like.

Makes sense.

So how does that work?

One of the fundamental ways is using two special hardware registers.

They're called the base register and the limit register.

OK, so they define the boundaries, like a fence.

Exactly, like a digital fence.

The base register holds the starting physical address that a process is allowed to use, its lowest legal address.

And the limit register specifies the size of the allowed range, how many bytes it can use from that starting point.

So anytime the CPU running a user program generates an address, the hardware automatically checks it.

Is this address at least the base address?

And is it less than the base address plus the limit?

If it fails,

either check if it's below the base or above or equal to base plus limit then bam, hardware trap.

A trap.

What's that?

It's an immediate hardware level interrupt that transfers control straight to the operating system.

The OS then usually says, nope, illegal access and terminates the offending program.

Protects everyone else.

And I assume only the OS can actually set those base and limit values?

Absolutely.

Critical point, yes.

Setting those registers requires privileged instructions that only the operating system kernel can execute.

User programs can't touch them.

Okay, that's a solid protection mechanism.

So the OS puts up the fence.

Now what about the addresses themselves?

Programs use names for variables, right?

How do those turn into the actual physical memory addresses the hardware uses?

That's address binding.

Exactly.

Address binding.

It's the process of mapping from one address space to another.

You start with symbolic addresses in your code, like my variable.

The compiler turns these into relocatable addresses, sort of like offsets 100 bytes from the start of this code section.

Then finally, these need to become absolute physical addresses.

Now when this final binding happens is really important.

It could happen at compile time, but that's super inflexible.

Yeah, if you move the program, you'd have to recompile it.

Painful.

Or it could happen at load time when the program is loaded into memory.

Better, but still, once it's loaded, it's stuck there.

Right.

The most flexible or the most common approach today is execution time binding.

The binding from the relocatable address to an absolute physical address is delayed until the program is actually running.

Ah, so the program could potentially be moved around in memory while it's running.

Precisely.

And that flexibility is crucial for modern systems doing things like swapping or memory compaction.

But it needs hardware support.

Okay, and this is where we get the distinction between logical and physical addresses.

Yes.

A logical address, sometimes called a virtual address in this context, is the address generated by the CPU.

It's what the program thinks it's using.

It sees its own private, continuous address space starting from zero.

Even though it might not actually be starting at physical address zero.

Right.

The physical address is the actual address that gets sent out to the memory hardware, the address the memory unit sees.

With execution time binding, these two are generally different.

And what is its translation?

How does logical become physical on the fly?

That's the daub of the memory management unit, or MMU.

It's a piece of hardware, usually right on the CPU chip these days, that dynamically maps the virtual addresses generated by the CPU to physical addresses in RAM.

How does it do that mapping?

Well, a very simple scheme, sort of extending the base register idea, uses what's called a relocation register.

The value in this register is simply added to every logical address generated by the user program before it goes to memory.

So if the relocation register has 10 ,000, and the program asks for address 500.

The MMU adds them, and the physical address recounted from memory is 10 ,500.

The key is, the user program only ever sees and works with its logical addresses, like 500.

It's completely unaware of the physical relocation.

Clever.

That separation seems really important.

Now speaking of efficiency, sometimes programs have huge chunks of code for things that rarely happen, like obscure error handling.

Is there a way to avoid loading that stuff, unless it's needed?

Yes, that's dynamic loading.

The idea is simple.

A routine isn't loaded from disk into memory until it's actually called.

So it saves memory initially.

Exactly.

The main program loads, starts running, and only when it tries to call, say, handle rare error does the system go and load the code for that routine from the disk into memory.

It keeps the initial memory footprint smaller, doesn't even strictly need OS support.

The program loader can handle it.

Okay.

But then there's dynamic linking in shared libraries.

Things like DLLs on Windows or Delo files on Linux.

That sounds related, but different.

And definitely needs the OS, right?

Very related, but crucially different.

And yes, it typically requires OS support.

Think about standard libraries, like the C library, libbicket.

Almost every program uses it.

Without dynamic linking, every single running program would need its own private copy of lib loaded into memory.

Imagine you have 40 processes running and lib is, say, 2 megabytes.

That's 80 megabytes of RAM just for copies of the same library code.

Ouch.

Yeah, that's incredibly wasteful.

Dynamic linking solves this.

The linking process for these libraries is postponed until execution time.

When a program starts, the system checks if a copy of the needed shared library, like lib, is already in memory from another process.

And if it is?

It just maps that existing copy into the new process's address space so all 40 processes can share the one single physical copy of the lib code.

So 80 megabytes becomes just 2 megabytes.

Huge difference.

Massive.

Plus, if you update the shared library, like fixing a bug, all applications using it benefit immediately without needing to be recompiled or relinked.

The OS manages the sharing and protection.

Okay, dynamic loading saves memory by loading only when needed.

Dynamic linking saves memory by sharing common code.

Both super important.

Now let's get into actually allocating blocks of memory.

The older way was contiguous allocation.

Right, contiguous memory allocation.

Conceptually simpler.

The memory is divided up, usually with the OS in one part, often low memory, and user processes getting the rest.

Each process gets allocated one single continuous block or chunk of physical memory.

Okay, simple enough.

We already talked about base and limit registers protecting that block.

But what happens when processes finish and leave?

You get holes.

Gaps of free memory scattered between the allocated blocks.

When a new process needs memory, the OS has to find a hole that's big enough.

Uh, and this leads to problems, I bet.

Like finding the right hole.

Exactly.

There are different strategies.

First fit.

Just scan memory from the beginning and take the first hole that's large enough.

It's fast.

Makes sense.

Best fit.

Search the entire list of holes and find the smallest one that's still big enough.

The idea is to leave the largest possible leftover hole.

Okay, prized to be efficient with space.

And worst fit.

Find the largest available hole and use part of it.

The thinking here is that the leftover piece will hopefully still be large enough to be useful later.

Huh.

Which one works best?

Well,

simulations and experience generally show first fit and best fit are better than worst fit in terms of both speed and reducing wasted space.

First fit is often the fastest.

But even with these strategies, you still end up with scattered holes, right?

This is fragmentation.

Yes.

This is the fundamental problem with contiguous allocation fragmentation.

And there are two main types.

Right.

External and internal.

What's external again?

External fragmentation is when you have enough total free memory available to satisfy a request, but it's not contiguous.

It's broken up into lots of small separate holes.

So you might have a hundred k -o -b free in total, but it's in ten chunks of ten k -o -b each.

And if you need twenty k -b, you're stuck.

Exactly.

It can get really bad.

There's a rule of thumb called the 50 % rule, which suggests that over time, with first fit allocation, you might end up with about half as many holes as allocated blocks and about one -third of your memory might become unusable, lost to this fragmentation.

One -third unusable.

That's terrible.

So what's internal fragmentation then?

Internal fragmentation happens when memory is allocated in fixed size blocks, maybe for simplicity, but a process doesn't need the entire block.

The unused space inside the allocated block is internal fragmentation.

Like getting a large pizza box for just one slice.

Kind of, yeah.

It's wasted space within an allocated region, usually less of a problem than external, but still waste.

What can you fix external fragmentation?

Well, one approach is compaction.

You shuffle all the allocated memory blocks down to one end of memory, consolidating all the free holes into one big block.

Sounds expensive, like stopping everything to reorganize the entire garage.

It is expensive in terms of CPU time, and crucially, it only works if you have execution time binding, because you're changing the physical addresses of running processes.

They need to be able to cope with that.

So compaction is tricky.

There must be a better way to deal with fragmentation.

There is, and it's the dominant approach today.

Paging.

Paging.

Okay, this is the big one.

How does it solve the fragmentation problem?

Paging's core idea is revolutionary.

It allows a process's physical address space to be non -contiguous.

It breaks free from that requirement of needing one single block.

Non -contiguous.

So pieces of the program can be scattered all over physical memory.

Exactly.

This completely eliminates external fragmentation as a problem.

Wow.

Okay.

How does it work?

What are the pieces?

It works by dividing things up into fixed -size chunks.

Physical memory is divided into fixed -size blocks, called frames, and the process's logical or virtual memory is divided into blocks of the same size, called pages.

Frames and physical memory.

Pages and logical memory.

Same size.

Got it.

When a process needs to run, the OS finds enough free frames wherever they happen to be in physical memory and loads the process's pages into those frames.

So page 1 might be in frame 5, page 2 in frame 23, page 3 in frame 2.

Completely scattered.

Potentially, yes.

Then how on earth does the CPU find anything?

If it asks for a logical address, say 5 ,000, how does it know that's actually in frame 23?

Physical location X.

That's the job of the page table.

Each process gets its own page table, which acts as a translator or a map.

A map, okay.

The page table basically stores for each logical page of the process which physical frame it's currently loaded into.

All right.

Let's walk through that translation.

CPU generates a logical address.

Right.

The MMU hardware automatically splits that logical address into two parts.

A page number, let's call it P, and a page offset D within that page.

Okay.

P tells you which page D tells you how far into that page.

Exactly.

The MMU then uses the page number P as an index into the process's page table.

It looks up entry number P.

And that entry contains?

It contains the frame number F where that logical page is stored in physical memory.

The MMU takes that frame number F, concatenates the original page offset D onto the end of it, and that forms the final physical address that gets sent to the memory bus.

So logical PD becomes physical.

Precisely.

And because page and frame sizes are powers of two, like 4KB, 8KB, this splitting and combining is really easy for the hardware to do quickly with bit manipulation.

That's elegant.

Yeah.

It completely separates the programmer's view, this nice contiguous logical space, from the potentially messy, scattered physical reality managed by the OS.

That's the core benefit.

Avoids external fragmentation, provides this clean abstraction.

The OS just needs to keep track of which frames are free using something like a frame table.

Wait.

If the page table itself is stored in main memory,

doesn't that mean for every actual data access the program wants, the system first has to access memory to look up the page table entry, then access memory again for the actual data?

You hit the nail on the head.

Yes.

That would double the effective memory access time.

One look up for the frame, one for the data.

That would be unacceptably slow.

So how do they fix that?

With another piece of specialized fast hardware, the translation look -aside buffer, or TLB.

The TLB.

Right.

It's like a cache for page table entries.

Exactly.

It's a small, very fast hardware cache.

It's associative, meaning it can be searched very quickly based on the page number.

It stores recently used page -to -frame mappings.

So walk me through the lookup with the TLB.

Okay.

CPU generates a logical address, page P, offset D.

The MMU first checks the TLB.

Do I have an entry for page P right here?

If yes.

That's a TLB hit.

TLB hit.

The frame number F is retrieved directly from the TLB super fast.

Combine F and D, get the physical address.

Almost no performance penalty compared to no translation at all.

Nice.

And if it's not in the TLB?

A miss.

Key LB miss.

Now we K the penalty.

The hardware, or sometimes OS, depending on architecture, has to go out to main memory, access the full page table, find the entry for page P to get frame F.

Takes time.

Takes time.

But then, critically, that PF mapping is loaded into the TLB, replacing some older entry.

The hope is that we'll need that page again soon, and next time it'll be a fast TLB hit.

So the TLB relies on locality of reference, just like other caches.

Absolutely.

Programs tend to access the same few pages repeatedly for a while.

TLBs are small, maybe 64 to 1024 entries, but they can have very high hit ratios, like 98 % or 99%.

And that makes a huge difference in the average memory access time.

A massive difference.

The closer the effective access time is to the TLB access time plus one memory access, the better.

What about context switching?

If you switch processes,

doesn't the TLB now have mappings for the old process?

Do you have to flush it?

That used to be a big problem.

Flushing the TLB on every context switch is expensive.

But modern TLBs often support address space identifiers, or ACIDs.

Each TLB entry gets tagged with the ACID of the process it belongs to.

Ah.

So the TLB can hold entries for multiple processes simultaneously.

Correct.

On a context switch, you just tell the MMU the ACID of the new process.

It ignores TLB entries that don't match that ACID.

No need to flush everything.

Big performance win.

Clever.

Okay, so TLBs make paging fast.

What about protection in a paged environment?

How do you ensure read -only pages stay read -only?

The page table entries handle that too.

Along with the frame number, each entry typically includes protection bits.

These can mark a page as read -only, read -write, sometimes execute -only.

And the MMU checks these during translation.

Yep.

During the address translation process, when it finds the frame number, it also checks these protection bits against the type of access being attempted.

Read, write, execute.

If there's a mismatch, like trying to write to a read -only page, it triggers a trap to the OS.

And there's also that valid -invalid bit you mentioned.

The valid -invalid bit is crucial.

It indicates whether the page table entry is actually valid or not.

If a process tries to generate an address for a page marked as invalid, that's an illegal address.

It's outside the process's logical address space.

Again, immediate trap to the OS.

This lets the OS define precisely which logical pages the process is allowed to use.

Exactly.

It prevents programs from accessing memory beyond their allocated limits, even within their potential logical range.

Sometimes there's also a page table length register, PTLR, to limit how large the page table itself is considered to be.

Got it.

Now, one of the really cool things about paging is sharing, right, like we talked about with dynamic linking.

How does paging enable shared pages?

It's actually quite elegant.

The key concept is re -entrant code, code that doesn't modify itself, like those shared libraries, lib, etc.

Let's say you have multiple processes all running code that uses lib.

Each process will have its own page table.

But for the pages corresponding to the lib code, the entries in all those different page tables will point to the exact same physical frames in memory.

Ah.

So multiple logical pages, one in each process, map to the same physical frame.

Precisely.

You load the lib code into physical memory just once, into a set of frames, then you just make the relevant page table entries in every process that needs it point to those frames.

And the protection bits would mark those shared code frames as read -only.

Typically, yes.

The OS enforces that shared code cannot be modified by any of the sharing processes ensuring integrity.

This serves an enormous amount of physical memory.

Okay, paging seems incredibly powerful.

But what happens when address spaces get truly massive, like 64 -bit systems?

A simple page table could become enormous itself, right?

Absolutely huge.

A 32 -bit address space with 4kb pages could need over a million page table entries.

For a 64 -bit space, it's astronomically large.

You can't possibly allocate gigabytes just for a page table contiguously.

So how do you handle that?

The common solution is hierarchical paging, or multi -level paging.

You basically page the page table.

Page the page table.

Okay, my head's starting to spin slightly.

How does that work?

Okay, think of a two -level scheme for 32 bits.

You split the logical address into three parts.

Say 10 bits for an outer page number, p1, 10 bits for an inner page number, p2, and 12 bits for the offset, d.

10, 10, 12.

Got it.

The first part, p1, indexes into an outer page table, or page directory.

The entry you find there doesn't give you a frame number, it gives you the physical address of an inner page table.

Oh, okay.

Then you use the second part, p2, to index into that specific inner page table.

And that entry finally gives you the frame number, f, where the actual data page resides.

Combine f with the offset, d, for the physical address.

So two memory accesses just to find the frame number, potentially, for the data access.

Potentially, yes.

If both levels miss in the TLB, that's a trade -off.

You save space on page tables, you only need inner tables for the parts of the address space actually used, but you might increase lookup time.

And for 64 -bit, two levels isn't enough.

Not nearly.

A simple two -level scheme would still result in impractically large outer page tables.

So 64 -bit systems often use three, four, or even more levels of paging.

Although architectures like UltraSpark went up to seven levels, which really highlights the access time problem.

Wow, seven lookups, potentially.

Okay, are there alternatives to deeply nested hierarchical tables?

Yes.

One is hashed page tables.

Instead of using the page number as a direct index, you hash the virtual page number.

The hash value gives you an index into a hash table.

Okay, like a dictionary lookup.

Each entry in the hash table might contain a linked list of elements, because multiple page numbers might hash to the same slot collisions.

You search the list for a match on the virtual page number to find the frame.

Good for very large, sparse address spaces.

Interesting.

Any other major approaches?

There's also the inverted page table.

This one flips the whole concept around.

Instead of one page table per process, you have one single page table for the entire system.

One table for everything.

How?

This table has one entry for each physical frame of memory.

Each entry says which process ID and which virtual page number of that process is currently residing in that physical frame.

So it maps physical frames back to logical pages instead of the other way around.

Exactly.

It saves a massive amount of memory on page tables, especially if you have many processes, because the table size depends only on the amount of physical memory, not the combined size of all logical address spaces.

But how do you find the right entry?

If the CPU gives you a virtual address,

process ID, page number, how do you find the physical frame?

You'd have to search the whole table, wouldn't you?

That's the main challenge.

A linear search would be far too slow.

So inverted page tables are almost always used in conjunction with a hash table.

You hash the virtual address, PID, page number, to quickly find a potential entry or a small set of entries in the inverted table to check.

Okay.

Hashing comes to the rescue again.

Does this make sharing memory harder?

It can, yeah.

Because each physical frame entry points back to only one virtual page, PID, page number, pair.

Implementing shared memory where one physical frame needs to be mapped by multiple virtual addresses requires extra mechanisms.

Different trade -offs everywhere.

So systems like IBM RT, PowerPC, some Spark used these.

Yes.

And some modern variations exist.

Oracle Spark Solaris, for instance, uses a pretty complex system with multiple hash tables and things called translation storage buffers, which are like hardware caches for TLB entries to make lookups incredibly efficient.

Okay.

That covers how we map logical to physical.

But what if all the logical address spaces added together are just bigger than the physical RAM you have?

Ah, now we're talking about swapping.

The basic idea is to allow the total physical memory demands of all processes to exceed the actual physical RAM available.

How?

By using disk space as an extension of RAM?

Essentially yes.

You need a fast backing store, usually a reserved chunk of disk space.

When memory gets full, the OS can move an entire process, or parts of it, from main memory out to this backing store.

That frees up frames for other processes.

And when the swapped out process needs to run again?

It has to be swapped back in from the disk to main memory, potentially swapping out another process to make room.

This standard swapping of entire processes increases the degree of multiprogramming how many jobs can be in the system.

It sounds really slow, moving whole processes back and forth to disk.

It is very slow due to disk transfer times.

That's why it's not common in its pure form on most modern systems.

What is common is swapping with paging, often just called paging or demand paging.

So you don't swap whole processes, just individual pages?

Exactly.

If the system needs a free frame, it picks a page currently in memory, using some replacement algorithm, like least recently used, writes its contents out to the backing store if it has been modified, and marks its page table entry as invalid, but indicating it's now on disk.

And when the process tries to access that page again?

It causes a page fault, because the valid bit is off.

The OS sees the page fault, checks its info, finds the page on the backing store, reads it back into a free frame, possibly forcing another page out first, updates the page table entry to point to the new frame and mark it valid, and then restarts the instruction that caused the fault.

Much more granular and efficient than swapping whole processes, this is what Linux and Windows do.

Yes, this demand paging is fundamental to how they manage memory and overcommit resources.

What about mobile?

Our phones, do they swap pages to flash storage?

Generally no.

This is a really important difference.

iOS and current Android versions typically do not support swapping in the traditional sense.

Why not?

Flash is fast, isn't it?

It's faster than old spinning disks, but there are issues.

First, flash memory has a limited number of write cycles before cells start to wear out.

Constant swapping could drastically shorten the lifespan of the phone's storage.

Okay.

Wear leveling helps, but maybe not enough.

Right.

And second, the throughput between RAM and flash, while better than hard drives, might still not be good enough for seamless swapping performance.

Plus, limited storage space is often a concern on mobile.

So if an iPhone or Android phone runs out of RAM, what does it do instead of swapping?

It gets aggressive.

The OS will first ask running apps to voluntarily free up memory.

It might discard clean, unmodified data like code pages, knowing it can reload them from storage if needed.

It keeps modified data pages.

But if memory pressure is still too high, the OS will simply start terminating applications, usually starting with background ones or least recently used ones.

Android tries to save the application state before killing it, so it can be restored quickly if you switch back.

Wow, so it just starts killing apps.

That explains why sometimes you switch back to an app and it has to completely reload.

Exactly.

It forces mobile developers to be very mindful of their memory usage, because there's no swap space safety net.

Good to know.

Ultimately, though, whether it's desktop or mobile, if the system is constantly swapping or killing apps, it probably just means you need more RAM or you're trying to run too much stuff at once.

That's generally the bottom line, yes.

Swapping is a mechanism to handle overcommitment.

But heavy swapping usually indicates a performance problem.

This has been fantastic.

Okay, let's bring it home by looking at how two major architectures actually implement some of these ideas.

Let's start with Intel, the classic 32 -bit IA32.

Right.

IA32 actually used a combination of segmentation and paging.

It was a bit complex.

A logical address generated by the program wasn't just an offset.

It included a segment selector.

Segmentation and paging.

Yep.

The hardware first used the segment selector to look up a segment descriptor in tables like the LDT or GDT.

This descriptor provided the base address and limit for the segment and performed protection checks.

Adding the offset from the logical address to the segment's base address produced what Intel called a linear address.

Okay, so segmentation first giving a linear address.

And then that linear address was fed into the paging unit.

The paging unit treated the linear address as a logical address for paging purposes.

And the paging part was hierarchical.

Yes.

Typically a two -level scheme for 4KB pages.

The 32 -bit linear address was split.

10 bits for page directory index, 10 for page table index, 12 for offset.

The CR3 register pointed to the current page directory.

They also supported larger format B pages, skipping the second level table.

What about accessing more than 4GB of RAM on 32 -bit?

I remember something called PAE.

Right, Page Address Extension, PAE.

This allowed 32 -bit systems with OS support to handle more physical RAM, up to 64GB.

It introduced a third level of paging, a page directory pointer table, and used larger 64 -bit page table entries to accommodate the wider physical addresses, even though the linear address remained 32 -bit.

Complex stuff.

What about modern 64 -bit Intel by 8664, which they adopted from AMD, right?

Correct.

By 8664, simplified things by largely sidelining segmentation for paging purposes in 64 -bit mode.

It currently uses a 48 -bit virtual address space, though the architecture supports larger.

Physical addresses can go up to 52 bits or more, depending on the implementation.

And how many levels of paging?

It uses a four -level paging hierarchy.

This allows it to efficiently manage the huge 48 -bit virtual address space.

Page sizes supported are typically 4KB, 2MB, using large pages at the second level, or even 1GB, using huge pages at the third level.

Four levels, okay.

Now the other giant,

ARM, dominant and mobile, how does ARMv8 handle this?

ARMv8, their 64 -bit architecture, also uses multi -level paging, up to four levels, similar in concept to by 8664.

But it has some interesting flexibility.

One key feature is translation granules.

Granules.

Yeah, the system can be configured to use different base block sizes for translation, typically 4KB, 16KB, or 64KB.

The choice of granule affects the number of paging levels needed and the sizes of larger blocks, regions that can be mapped.

So you can tune it based on needs.

Somewhat, yes.

For example, with a 4KB granule, you can have four levels and map 4KB pages, 2MB blocks, or 1GB blocks.

With a 64KB granule, you might only need three levels and can map 64KB pages or 512MB blocks.

Flexible.

And entries can point to tables or directly to big blocks.

Exactly.

Entries in the higher -level tables can either point to the next -level table down, or they can be a block entry that directly maps a large chunk of memory, like 2MB or 1GB,

bypassing the lower levels.

This is very efficient for mapping large areas like the kernel or frame buffers.

ARM also uses multiple levels of TLBs, typically micro -TLBs per core, and a larger shared main TLB, with hardware handling the page table walks on misses.

Wow.

It's really quite something.

We've gone from just, you know, simple base and limit registers all the way to these incredibly complex, multi -level page tables, TLBs, ASINs.

It's amazing how much work goes on just to figure out where a piece of data actually is in memory.

It truly is.

And understanding these layers, the logical view, the physical reality, the hardware tricks like TLBs, the OS strategies like paging and swapping it, really helps you appreciate what's happening inside any computing device.

It's not just theory.

It dictates performance, stability, everything.

Absolutely.

It forms the bedrock.

I see.

Which leads to a final thought.

We're now dealing with 64 -bit addresses.

That gives us, what, 16 quintillion bytes of theoretical address space.

It sounds practically infinite.

For now, perhaps.

But history tells us technology always seems to expand, to fill, and then exceed whatever capacity we give it.

So what's next?

What kind of applications or computing demands might actually make even a 64 -bit address space start to feel cramped?

What's the next big memory management challenge on the horizon?

That is a fascinating question to think about.

What pushes us beyond 16 exabits?

Something for us all to ponder.

Don't think about how these ideas play out on the devices you use every day.

Thank you so much for joining us on this deep dive into main memory.

We really hope you picked up some valuable insights.

Thanks for listening.

Until next time, keep exploring.

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

Chapter SummaryWhat this audio overview covers
Main memory management represents one of the most fundamental responsibilities of operating systems, directly influencing how efficiently applications execute and how well system resources are utilized. The chapter begins by establishing the hardware foundation necessary for memory management, including the concepts of address binding and the distinction between logical addresses generated by programs and physical addresses in actual RAM. The memory management unit serves as the critical hardware component that translates these logical addresses into their physical counterparts during program execution. Programs must be loaded into memory through a series of stages, from compilation through execution, and the chapter explains how dynamic loading and linking enable flexibility in this process. Contiguous memory allocation represents an early approach where processes occupy unbroken regions of memory, but this strategy creates significant problems through external fragmentation, where free memory becomes scattered, and internal fragmentation, where allocated space exceeds actual program needs. Swapping techniques allow operating systems to temporarily move entire processes between main memory and slower backing storage, freeing up space for other processes. Paging offers a non-contiguous solution by dividing memory into fixed-size frames and processes into equally sized pages, eliminating external fragmentation through the use of page tables that map logical pages to physical frames. The chapter explores multiple page table structures, including hierarchical approaches for multi-level translation, hashed designs for sparse address spaces, and inverted tables that reduce memory overhead. Segmentation provides an alternative model based on logical program divisions like code and data sections, offering better support for modular design but introducing its own complexity trade-offs. The comparison between paging and segmentation reveals fundamental differences in flexibility, performance, and memory protection capabilities. Practical examination of Windows and Linux implementations demonstrates how these theoretical concepts are realized in contemporary operating systems, showing the engineering choices that balance theoretical ideals with performance constraints.

Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.

Support LML ♥