Chapter 5: CPU Scheduling: Algorithms, Multi-Processor, and Real-Time Scheduling

Search this chapter

Audio Overview

0:00 / 0:00

Autoplay next chapter

Welcome to Last Minute Lecture.

This free chapter overview is designed to help students review and understand key concepts.

These summaries supplement, not replace the original textbook and may not be redistributed or resold.

For complete coverage, always consult the official text.

Welcome to The Deep Dive, the show where we unlock complex topics and, well, distill them into the essential insights you need to be truly well -informed.

And today, we're tackling something that's working away constantly inside your computer, often totally unnoticed.

We're diving deep into the hidden conductor, orchestrating every single interaction you have, CPU scheduling.

Yeah, think about your own day for a sec.

You're probably juggling several things at once, maybe cooking dinner, listening to music, texting a friend, all while sort of keeping an eye on email.

You feel like you're multitasking.

Right.

Well, your computer does the same, but often with way more applications, sometimes on just one single brain, a single CPU core.

How does it even manage to give everything attention without getting totally bogged down?

Exactly, it seems seamless, doesn't it?

Almost like magic, but you know, it's not magic.

It's the operating system's unsung hero, the CPU scheduler.

That's the bit deciding which program gets the CPU's precious attention and exactly when.

So, our mission today is to pull out the really important insights from chapter five of Operating System Concepts by Silberschatz, Galvin, and Gania.

We want to give you a clear, engaging, but still technically accurate understanding of how your computer manages its core processing power.

You know, from the basic ideas right up to the complex systems in your phone or even out in the cloud.

Get ready for some aha moments, hopefully.

Okay, let's unpack this.

At its core, CPU scheduling is really all about maximizing your computer's main engine, the CPU.

Right, and if your computer just has one CPU core, it can literally only execute one instruction, one command at a single point in time.

So, the big goal of something called multiprogramming isn't really about running multiple things truly simultaneously on that single core.

No, it's about keeping that single CPU busy, like all the time, maximizing utilization.

You never want your expensive processor just sitting there doing nothing.

Okay, so how does it do that?

How does it keep busy?

Well, it relies on a really fundamental observation about how programs actually behave.

It's called the CPU IO burst cycle.

Right, programs aren't just crunching numbers nonstop.

Exactly, they alternate between periods of intense CPU work, that's a CPU burst,

doing calculations,

loading data, and periods of, well, waiting for something external.

That's an IO burst.

Waiting for what, like a file or network stuff?

Could be anything external, yeah.

Waiting for a file from the disk, waiting for network data to arrive, even waiting for you to type something on the keyboard.

I like the chef analogy for this.

You know, chopping vegetables, that's your CPU burst, but then they put something in the oven and have to wait for it to cook, that's the IO burst.

And while the oven's busy, the chef doesn't just stand there right.

They switch tasks, start prepping a sauce, maybe work on another dish entirely.

That's exactly what the CPU does, switches tasks while waiting.

It's fascinating because processes really differ here.

Lots of programs, like your web browser or maybe a text editor, they're IO -bound.

They have loads of short CPU bursts because they're constantly waiting for your clicks, for network data, for disk access.

Right, waiting on the user or the network, mostly.

Then you have the other kind,

CPU -bound programs.

Think video rendering software or maybe complex scientific simulations.

They tend to have fewer but much, much longer CPU bursts.

And knowing whether a program is IO -bound or CPU -bound,

that's key for scheduling it efficiently, I guess.

Absolutely crucial.

So when the CPU does become idle, say, a process finishes its CPU work and starts waiting for IO, that's the moment the CPU scheduler jumps in.

It's the part of the operating system that picks the next process to run.

Yeah, chooses from this pool of ready -to -go tasks, usually called the ready queue.

But don't picture just a simple line like first come, first served.

This queue can actually be a pretty sophisticated data structure.

Like a priority system or something even fancier.

Could be a priority queue or even like a balanced tree in some modern systems.

It allows for really quick decisions about who goes next.

Okay, so the scheduler picks, then what?

Then another unsung hero, the dispatcher, takes over.

It does the actual context switch.

Ah, the switch itself.

That involves saving everything about the current process.

Everything.

It's CPU registers, where it's in the program, the program counter, all its memory references, saving all that state.

And then loading the save state of the new process it's switching to.

Exactly.

And then literally jumping to where that new program left off.

Yeah.

The time this takes, the dispatch latency, needs to be incredibly fast.

Because it's happening, how often?

Oh, thousands, even millions of times a second on a busy system.

It's constant juggling.

You know, you can actually see this.

On Linux, you can use commands like VM stat.

You see the system -wide context switches per second.

It's, well, it's kind of wild to see those numbers fly by hundreds of thousands sometimes.

Which brings us to a really core concept.

Preemptive versus non -preemptive scheduling.

Okay, what's the difference there?

So non -preemptive scheduling is like, well, once a process gets the CPU,

it holds onto it.

It keeps it until it either finishes its whole CPU burst or it voluntarily gives it up, maybe to wait for IO.

Like a polite conversation.

You wait for the other person to finish their thought.

Kind of, yeah.

But you can see the problem right.

Imagine one program starting a really long calculation.

It could hog the entire CPU.

Your whole system would just freeze up for everything else.

That doesn't sound very responsive.

Exactly.

And that's why modern operating systems,

pretty much all of them, Windows, Mac OS, Linux,

use preemptive scheduling.

Meaning the OS can step in and take the CPU away.

Precisely.

The OS can interrupt a running process, even if it's not done with its burst.

Maybe a higher priority task just became ready or maybe the current process just used up its allocated time slice.

That preemption sounds absolutely vital for making things feel responsive.

Like when you click something, you want it to react now.

It is.

But it also introduces challenges.

Like what happens if you interrupt the process right in the middle of updating some shared data?

That leads to potential problems called race conditions.

Something we'd get into more in the next chapter's topic, synchronization.

Okay, a little sneak peek there.

So if you're designing an OS, you need to pick a scheduling algorithm.

How do you even judge if one is good?

What makes a good scheduler?

That's a great question, because good really depends on what you're trying to achieve.

It's a balancing act, truly.

There are different goals, often conflicting ones.

Okay, so what are the main things we measure?

Well, first, there's CPU utilization.

Basically, how busy is the CPU?

You wanna keep it as busy as possible, ideally.

In a real system, you might aim for somewhere between,

say, 40 % and 90%.

Makes sense.

Don't want it idle.

What else?

Throughput.

This is simply the number of processes completed per unit of time.

How many jobs finished per hour?

It measures how much work the system's actually getting done.

Okay, throughput.

Then there's turnaround time, right?

Yep.

That's the total time from when a process is submitted until it's completely finished.

It includes everything waiting to get memory, waiting in the ready queue, executing on the CPU, and doing IO.

So the whole lifespan of the process.

Exactly.

And closely related is waiting time.

This is just the amount of time a process spends waiting in the ready queue, specifically waiting for its turn on the CPU.

It's a component of turnaround time.

Got it.

And the last one feels important for users.

Definitely.

Response time.

This is crucial for interactive systems.

It's not about how long the entire task takes, but how quickly the system gives the first response after you make a request.

Like when you type a character, how fast does it appear on screen?

Or you click a button, how fast does the menu pop up?

Precisely.

That's what makes a system feel fast and responsive to you, the user.

Even if the total job takes longer.

And you said these goals often conflict.

Oh yeah.

For example, maximizing throughput might mean running long jobs first, which could kill response time for short interactive tasks.

Or minimizing response time might involve lots of context switching, which adds overhead and potentially reduces overall throughput and CPU utilization.

So it's a trade off.

And for interactive systems, maybe consistency in response time is key too.

Like you don't want it sometimes fast, sometimes slow.

Absolutely.

Minimizing the variance in response time is often just as important, if not more so than minimizing the average.

Predictability matters a lot to users.

Okay, let's talk algorithms then.

What are some of the classics?

The building blocks.

The simplest one is first come, first served, FCFS.

That's exactly what it sounds like.

Like a queue at the grocery store, first in line, first served.

Yep, processes get the CPU in the order they arrive in the ready queue.

Simple to implement.

But it has that problem we mentioned, the convoy effect.

Exactly.

Imagine a really long CPU bound process gets the CPU.

Behind it, you might have several short IO bound processes.

They all have to wait for the long one to finish its entire CPU burst.

Even if they just need the CPU for a tiny moment before going off to do IO again.

Right, so those short processes wait, and the IO devices they would have used sit idle.

It lowers overall CPU and device utilization.

FCFS is non -preemptive, remember, so that long process just keeps chugging along.

Not ideal for mixing long and short tasks.

So what tries to fix that?

Well, their shortest job first, or SJF.

This one seems intuitive.

Give the CPU to the process that has the shortest next CPU burst.

Okay, shortest next burst gets to go first.

Does that help with average waiting time?

It does more than help.

It's provably optimal for minimizing average waiting time.

That's a mathematical guarantee.

If you know the burst lengths, scheduling the shortest one next will always give you the best possible average wait across all processes.

Wow, optimal, but hang on.

How do you know the length of the next CPU burst?

You can't see the future, right?

That's the million dollar question.

You don't know for sure.

So SJF is theoretically optimal, but practically you have to predict the next burst length.

How do you predict it based on past behavior?

Usually, yes.

A common technique is using an exponential average of the previous measured burst lengths.

You calculate a weighted average giving more importance to recent bursts than older ones.

So you're guessing the future based on the recent past.

Pretty much.

There's a formula, 10 plus one is your prediction.

Tn is the latest actual burst length and time was the previous prediction.

I is a waiting factor, usually set to 12, meaning you give equal weight to the last burst and your previous prediction history.

Interesting.

So SJF tries to prioritize short tasks, but it needs this prediction mechanism.

Can it be preemptive?

Yes, it can.

The preemptive version is often called shortest remaining time first.

If a new process arrives with a predicted burst length shorter than the remaining time of the currently running process, the OS preempts the current one and runs the new shorter one.

Okay, that sounds more responsive.

Now, what about systems designed for many users interacting at once, like old time sharing systems or even modern desktops?

For that, round robin or RR is really the classic.

Think of it as FCFS, but with preemption based on time limits.

Time limits, like everyone gets a turn, but only for a short while.

Exactly.

Each process gets a small fixed unit of CPU time.

This is called a time quantum or time slice.

Usually it's somewhere between 10 and 100 milliseconds.

And if it doesn't finish its work in that time?

If its CPU burst is longer than the quantum, it gets interrupted, preempted and moved to the end of the ready queue.

Then the CPU dispatcher picks the next process from the front of the queue.

So everyone gets a little bit of CPU time regularly, keeps things feeling interactive.

That's the goal.

The size of that time quantum is super important though.

How so?

Well, if you made the quantum too large, say larger than most CPU bursts,

then RR just behaves like FCFS because preemption rarely happens.

Okay, it becomes non -preemptive again, effectively.

But if you make the quantum too small, you get way too many context switches.

Remember context switching takes time, it's overhead.

If the quantum is kindy, you might spend more time switching between processes than actually doing useful work.

Ah, the overhead kills performance.

So there's a sweet spot.

There is.

A common rule of thumb is to set the quantum so that maybe 80 % of CPU bursts are shorter than it.

This means most processes finish their burst without being preempted, minimizing context switches, but it's still short enough to provide good response time for interactive tasks.

Makes sense.

It balances responsiveness with efficiency.

Okay, what's next?

Priority.

Priority scheduling.

This one's pretty straightforward, conceptually.

Each process is assigned a priority number.

And the CPU goes to the process with the highest priority.

Exactly.

Higher priority runs first.

If you have multiple processes with the same highest priority, you might use round robin or FCFS among them.

You mentioned SJF is kind of like priority scheduling.

It is, yeah.

You can think of SJF as a priority algorithm where the priority is inversely related to the predicted next CPU burst.

Shorter burst means higher priority.

Okay, but simple priority scheduling has a big obvious potential problem, doesn't it?

It sure does.

Starvation,

or indefinite blocking.

A low priority process might just never get to run.

If there's a steady supply of higher priority processes, yes,

the low priority ones could wait forever.

Imagine a VIP line at a club where VIPs keep arriving.

The regular folks might never get in.

That sounds bad for fairness.

How do systems fix that?

The common solution is called aging.

It's a technique where you gradually increase the priority of processes that have been waiting for a long time.

Ah, so even if you start with low priority, if you wait long enough, your priority eventually gets high enough that you will get scheduled.

Precisely.

It ensures that even the lowest priority tasks eventually get a turn, preventing starvation.

It's a neat way to keep the benefits of priority, but add fairness.

Okay, these seem like fundamental building blocks.

Do real systems just pick one, or do they combine them?

They often combine them and use more complex structures, like multi -level queue scheduling.

Instead of one ready queue, you partition processes into several separate queues.

Based on what?

Like system tasks versus user tasks.

Exactly.

You might have a queue for real -time processes, one for system processes, one for interactive user processes, one for batch background jobs.

Each queue might even have its own internal scheduling algorithm, like RR for interactive, FCFS for batch.

And how do you schedule between the queues?

Typically, there's fixed priority scheduling between the queues, so the real -time queue gets served first.

Only if it's empty, does the system queue get a look in, and so on.

No process in a lower queue runs unless all higher queues are empty.

That sounds like it could still lead to starvation for the batch jobs if the system is busy.

It could, yeah.

Which leads to an even more flexible approach.

Multi -level feedback queue scheduling.

This is probably the most general.

Feedback.

What does that mean?

It means processes can move between the different queues.

It allows a scheduler to adapt.

How?

Based on their behavior?

Yes.

For example, if a process uses up too much CPU time in a higher priority interactive queue,

it might get demoted to a lower priority queue.

This pushes CPU -bound processes down.

And maybe if a process waits too long in a low priority queue.

It could get promoted back up to a higher priority queue.

That's aging built right into the queue structure.

It prevents starvation and tries to automatically classify processes based on whether they behave like interactive short bursts or batch long bursts tasks.

That sounds really sophisticated and adaptive.

Okay, so far we've mostly talked as if we're scheduling whole programs or processes, but modern systems use threads a lot, right?

Absolutely, and that adds another layer.

Modern OZs primarily schedule kernel -level threads, not entire processes.

So the scheduler's picking which thread runs next, not which process.

Right, and this interacts with how threads are managed.

Remember user -level threads versus kernel -level threads?

The scheduling happens differently, depending on the model.

Okay, how does that work?

We talk about contention scope.

It defines which threads are competing with each other for CPU time.

Competing within a single process or system -wide?

Both exist.

With process contention scope, PCS, the thread library schedules user -level threads onto the available kernel threads, or LWPs, within that same process.

The competition is internal to the process.

So the OS scheduler isn't even aware of these user -level threads.

Not directly, it just schedules the kernel thread that the user threads are running on.

Then there's system contention scope, SCS.

Here, the kernel schedules its kernel -level threads directly onto the physical CPUs.

The competition is among all threads in the entire system.

Which one is more common now?

Most general -purpose OCs, like Windows and Linux, primarily use system contention scope.

They schedule kernel threads directly.

Got it.

Now let's really get modern.

Multiple processors, multiple cores on a single chip.

That must change the scheduling game completely.

Oh, it does.

We enter the realm of multiprocessor scheduling.

Having multiple CPUs or cores means true parallelism is possible, but scheduling, it's way more complex.

Do all processors handle scheduling, or does one act as the boss?

Well, there used to be asymmetric multiprocessing, where one CPU, the master, handled all scheduling decisions in IO.

The others just executed code.

But that master CPU could become a bottleneck.

So that's not common anymore.

Not really.

The standard today is symmetric multiprocessing, SMP.

Each processor is self -scheduling.

They all might share a common ready queue, or more commonly now, each processor might have its own private ready queue.

Why private queues?

Does that help?

Private queues avoid the bottleneck of multiple processors trying to access and lock a single shared queue, but it introduces a new problem.

Which is?

Keeping the workload balanced across all processors.

Ah, load balancing.

Making sure one core isn't overloaded while another sits idle.

Exactly.

If you have private queues, you need mechanisms to move threads around to balance the load.

How does that work?

Pushing tasks or pulling them?

Both strategies exist.

Push migration involves a dedicated task that periodically checks the load on each processor and pushes threads from overloaded cores to less busy or idle ones.

And pull.

Pull migration happens when an idle processor actively pulls a waiting task from a busy processor's queue.

They often work together.

But wait, if you move a thread to a different processor,

doesn't that mess with its cache?

Excellent point.

That's the tension with processor affinity.

When a thread runs on a core, its data gets loaded into that core's fast cache memory.

We call that a warm cache.

Accessing that data again is super fast.

But if you move the thread.

You lose that warm cache.

The data has to be fetched from slower main memory or another cache, which takes time.

So operating systems try to maintain processor affinity, keep a thread on the same core if possible to benefit from the warm cache.

So load balancing is kind of at odds with processor affinity.

It is.

Load balancing usually only becomes necessary when the affinity benefits are outweighed by the imbalance or when using those per processor private queues makes it unavoidable.

Okay, cache affinity.

What about memory itself?

I've heard of NUMA systems.

Right, non -uniform memory access, NUMA.

This is common in systems with multiple physical processor chips.

Each chip might have its own block of memory attached directly to it.

And accessing that local memory is faster than accessing memory attached to a different chip.

Significantly faster.

There's a performance penalty for remote memory access.

So NEO may aware scheduling algorithms try hard to keep a thread running on a CPU that is close to the memory it's using most often.

They consider the memory placement when making scheduling decisions.

Wow, that adds another dimension.

And things get even more complex with multi -core chips themselves, right?

Like hyper -threading.

Yes.

So inside a single physical core, we often face the memory stall problem.

Processors are way faster than main memory.

They often have to wait for data to arrive, especially if it's not in the cache, cache miss, that's wasted time.

So how do they fight that wasted time?

By using chip multi -threading, CMT, or what Intel brands as hyper -threading, SMT simultaneous multi -threading.

The idea is to have multiple hardware threads per core.

Each hardware thread has its own architectural state, registers, program counter.

So the core can juggle multiple threads internally.

Exactly.

If one hardware thread stalls waiting for memory, the core can almost instantly switch execution to another hardware thread on that same core that is ready to run.

It keeps the core's execution units busy more often.

And from the operating system's point of view?

From the OS's perspective, each hardware thread looks like a separate logical CPU.

So if you have an eight -core chip with two -way SMT, the OS sees 16 logical processors it can schedule software threads onto.

So there are actually two levels of scheduling happening.

Kind of, yes.

The OS scheduler assigns software threads, kernel threads, to logical CPUs, hardware threads.

Then the processor core itself has its own internal hardware mechanism to decide which of its ready hardware threads to execute in any given cycle.

Maybe simple round robin or something more sophisticated based on urgency.

Fascinating.

Okay, one more modern variation.

Different kinds of cores on the same chip.

Yes.

Heterogeneous multiprocessing, HMP.

Think ARM's big little architecture, common in mobile phones.

We have some high -performance, power -hungry big cores and some slower, energy -efficient little cores.

Why would you want that mix?

Power efficiency.

You can intelligently assign tasks.

High -intensity work like gaming or maybe UI responsiveness goes to the big cores.

Background tasks, notifications, music playback, those can run on the little cores to save significant battery life.

And the OS scheduler needs to be aware of this difference.

Absolutely.

It needs to know which cores are big and which are little and schedule tasks appropriately based on their performance needs and the system's power goals.

Windows 10 and later versions actually support this.

Incredible complexity hidden under the hood.

Okay, let's shift gears slightly.

What about systems where timing is absolutely critical?

Real -time scheduling.

Right, this is where deadlines are paramount.

We usually distinguish between two types.

Soft and hard.

Exactly.

Soft real -time systems provide preference to critical real -time tasks, but they don't offer a strict guarantee that a deadline will always be met.

Think streaming video amidst frame is annoying, but the system doesn't fail catastrophically.

Okay, degrades gracefully maybe.

Whereas hard real -time systems have strict deadlines that must be met, missing a deadline constitutes a total system failure.

Like what?

Anti -lock breaks in a car,

flight control?

Precisely, industrial control systems, medical devices, situations where a timing failure has severe consequences.

So for these systems, what's the main focus for the scheduler?

Minimizing delays.

Minimizing latency and making it predictable.

There are different kinds of latency.

Event latency is the total time from when an event occurs, like a sensor reading, until the task servicing that event starts running.

Okay, event to service start.

Then there's interrupt latency, the time from when a hardware interrupt arrives at the CPU until the first instruction of the interrupt service routine begins executing.

It needs to be very short.

In dispatch latency, we mentioned that before, the time to switch from one process thread to another.

Right, in real -time systems, especially hard real -time, this dispatch latency needs to be minimized and crucially bounded.

You need to know the absolute maximum time it could possibly take.

How do real -time schedulers achieve this priority?

Yes, they almost always use priority -based preemptive scheduling.

The highest priority real -time task ready to run gets the CPU immediately, preempting anything else.

Are there specific algorithms for real -time?

Yes, for tasks that need the CPU periodically at fixed intervals, a classic algorithm is rate monotonic scheduling, RMS.

Rate monotonic.

Sounds like it depends on how often the task runs.

It does.

It assigns priority statically based on the task's period.

Shorter period, meaning it needs to run more frequently, gets higher priority.

It's simple and effective.

Is it optimal?

It's optimal among static priority algorithms.

If a set of periodic tasks can be scheduled using any fixed priority assignment, RMS will be able to schedule it, but it's not guaranteed to use 100 % of the CPU efficiently.

What do priorities need to change?

Then you might use earliest deadline first, EVS scheduling.

This assigns priorities dynamically.

Based on?

The task with the nearest upcoming deadline gets the highest priority.

Priorities are constantly reevaluated as tasks complete or new tasks arrive.

And EDF is better?

Theoretically, yes.

EDF is optimal in the sense that it can schedule any set of tasks as long as their total CPU utilization requirement doesn't exceed 100 % under ideal conditions.

It can achieve higher CPU utilization than RMS.

So dynamic priorities based on deadlines seems powerful.

It is, though it could be slightly more complex to implement than RMS.

There are other approaches too, like proportional share scheduling, where you guarantee each task a certain fraction and T of the total processor time over some interval.

Okay, let's bring it back to the OSes we use every day.

How does, say, Linux handle scheduling?

Does it use these classic algorithms?

Linux has evolved its scheduler quite a bit.

The current default scheduler for normal tasks is the completely fair scheduler, CFS.

Completely fair.

Sounds ambitious.

How does it work?

It's pretty clever.

Instead of using fixed time slices in the traditional round robin sense, CFS tries to give each task a fair proportion of the processor's time.

Based on what?

Priority.

Based partly on a nice value.

Lower nice value means higher priority, meaning it deserves a larger proportion of the CPU.

But the core mechanism is based on something called runtime,

or virtual runtime.

Virtual runtime, what's that?

Think of it as a task's personal stopwatch that only runs when that task is actually executing on the CPU.

CFS simply always picks the runnable task that has the lowest runtime.

So the task that has run the least so far in this virtual sense gets to run next.

Exactly.

It naturally balances CPU time among tasks according to their nice values.

Tasks that sleep a lot, like IO -bound interactive tasks, will naturally have lower runtime when they wake up so they get priority, leading to great responsiveness.

That sounds elegant.

How does it keep track of all these runtimes efficiently?

Not just a simple queue, I guess.

No, definitely not.

CFS uses a red -black tree, which is a type of self -balancing binary search tree.

The tasks are ordered in the tree by their runtime.

Finding the task with the minimum runtime, the next one to run, just means finding the leftmost node in the tree, which is very fast, even with thousands of tasks.

Wow, a tree structure for the ready queue.

And it handles NUMA and load balancing, too.

Yes, CFS is NUMA -aware and integrates with Linux's load balancing mechanisms, which use scheduling domains, higher piece of cores and caches to try and balance load at the most local level possible first.

Okay, what about Windows?

How does it approach scheduling?

Windows also uses a priority -based preemptive scheduler.

It has a 32 -level priority scheme.

Priorities 115 are variable priority classes for normal threads, and 1631 are real -time priorities.

32 levels, how does it manage responsiveness?

Windows uses dynamic priority boosting.

When certain events happen, like completing an IO request, especially keyboard or mouth input, or when a window comes to the foreground, the scheduler temporarily boosts the priority of the relevant threads.

Ah, so the thread handling your typing gets a boost to feel snappy.

Exactly.

It also gives longer time quantums to foreground processes compared to background ones.

And like Linux, it considers processor affinity and tries to keep threads on their ideal processor when possible.

Okay, so both use sophisticated priority schemes with tweaks for responsiveness.

It seems like choosing the right algorithm depends heavily on the goals.

How do OS developers actually evaluate and choose?

That's the final piece of the puzzle,

algorithm evaluation.

And yeah, choosing the best depends entirely on the criteria you care about most for that specific system, throughput, response time, fairness, power consumption, et cetera.

So how do they test them out?

Run benchmarks.

There are several methods.

One is deterministic modeling.

You take a predefined sequence of CPU bursts and IO times, like a specific workload trace, and you manually calculate how each algorithm would perform for that exact input.

Simple, but only tells you about that one specific case, right?

It's exactly.

It's fast, but not very general.

For more general insights, you can use queuing models.

These use mathematical formulas based on statistics,

arrival rates of processes, distributions of CPU burst lengths.

Little's formula, N -A -K -A -W, is a classic example relating average queue length, arrival rate, and waiting time.

So more theoretical, using math and probability.

Right.

It can give you good ballpark estimates and identify potential bottlenecks, but the math can get very complex for realistic systems, and you often have to make simplifying assumptions.

What else?

Simulations.

Simulations are very common.

You write a program that models the computer system, the CPU, the ready queue, the devices.

You can then feed this simulation either randomly generated data based on distributions, or even better, trace files.

Trace files, like recordings of a real system.

Exactly.

You instrument a real system to record major events, process arrivals, CPU bursts, IO requests.

Then you feed this real world trace data into your simulator running different scheduling algorithms to see how they would have performed.

That sounds much more realistic.

Any downsides?

Well, simulations can be complex and time consuming to build and run accurately.

And even trace data only reflects the past.

The system might behave differently with a new scheduler.

Which leaves just trying it out for real.

Implementation, that's the ultimate test.

Code the algorithm into the actual operating system kernel and run it.

Measure its performance in a live environment with real users and applications.

That sounds like the most accurate, but also the most expensive and disruptive way.

It is.

And there's a final interesting twist.

The system's behavior, and even user behavior, can change in response to the scheduler itself.

How so?

Well, if users figure out the scheduling rules, they might try to gain the system.

For example, if they know that processes performing terminal IO frequently get higher priority boosts.

A programmer might add some meaningless screen output just to make their long running calculations seem more interactive and get more CPU time.

Precisely.

It's a fascinating feedback loop between the system's design and how people adapt to use it, or sometimes abuse it.

It really is an intricate dance, isn't it?

We've covered a lot from those basic CPU IO bursts through all the classic algorithms like FCFS, SJF, round robin, priority.

Right up to the complexities of modern multi -core scheduling, NUMA, hyper -threading, real -time demands, and how systems like Linux and Windows actually put these ideas into practice with things like CFS and priority boosting.

It's genuinely a testament to the incredible amount of sophisticated engineering that's hidden just beneath the surface of every single click and swipe we make.

Absolutely.

It just works most of the time without us even thinking about it.

So here's a final thought to leave you with.

Knowing how meticulously your operating system juggles all those tasks,

prioritizing, preempting, trying to keep everything responsive, what unprioritized background processes might be running in your own life, may be secretly impacting your overall response time.

Hmm, that's definitely some food for thought.

Thank you for joining us on this Deep Dive.

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

Chapter SummaryWhat this audio overview covers

Efficient allocation of processor time among competing processes lies at the heart of operating system performance, and CPU scheduling algorithms determine which ready process receives execution at any given moment. The fundamental objectives driving scheduling decisions include maximizing processor utilization and system throughput while simultaneously reducing turnaround time, waiting time, and response time for individual processes—objectives that often create unavoidable trade-offs requiring careful design choices. Operating systems employ three hierarchical scheduling levels: long-term scheduling governs which jobs enter the system, medium-term scheduling manages process suspension and resumption based on memory constraints, and short-term scheduling makes the direct CPU allocation decisions. The discipline encompasses several major algorithmic approaches, each with distinct characteristics and performance implications. First-Come, First-Served scheduling processes jobs in arrival order but can suffer from poor average response times when short jobs queue behind long ones. Shortest Job First minimizes average waiting time but faces practical challenges in predicting job duration; this algorithm may operate with or without preemption, allowing the scheduler to interrupt running processes. Priority scheduling assigns preference levels to processes, though this approach risks starvation of lower-priority work. Round Robin scheduling allocates fixed time slices to each process in rotation, ensuring fairness and responsiveness. More sophisticated approaches like multilevel queue scheduling partition processes into distinct categories, while multilevel feedback queue scheduling dynamically reassigns processes among queues based on observed behavior. Evaluation of these algorithms uses deterministic modeling for precise analysis, queuing theory for probabilistic behavior, and simulation for complex scenarios. Modern systems must address multiprocessor scheduling challenges including load balancing across processors and processor affinity to exploit cache locality. Real-time scheduling introduces additional complexity by distinguishing between hard deadlines requiring guaranteed completion and soft deadlines permitting occasional lateness. Practical implementations in Windows, Linux, and dedicated real-time operating systems demonstrate how theoretical scheduling principles translate into concrete system designs balancing efficiency, predictability, and responsiveness.

Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.

Support LML ♥

Chapter 5: CPU Scheduling: Algorithms, Multi-Processor, and Real-Time Scheduling

Related Chapters