Chapter 4: Threads & Concurrency: Multithreading, Libraries, and Implicit Threading

Search this chapter

Audio Overview

0:00 / 0:00

Autoplay next chapter

Welcome to Last Minute Lecture.

This free chapter overview is designed to help students review and understand key concepts.

These summaries supplement not replaced the original textbook, and may not be redistributed or resold.

For complete coverage, always consult the official text.

Have you ever used an app that just feels magical,

like your web browser loads images while you're still typing,

or your word processor checks spelling in the background without missing a beat?

Oh, definitely.

Or streaming video super smoothly while your phone's downloading some huge update in the background.

Exactly.

It feels like all these things are happening at once and everything stays responsive.

How does that work?

It absolutely does feel like magic sometimes.

And that seamless experience, that feeling of responsiveness, often comes down to something really fundamental in how our computers work,

multi -threading.

Multi -threading.

Yeah, it's kind of an unsung hero, always working away behind the scenes, making your digital life smooth.

So today we're doing a deep dive into threads and concurrency.

This is a critical concept, really, and we're drawing from the classic operating system concepts by Silvershots, Galvin, and Gunn.

A foundational text.

Absolutely.

And our mission here is to, well, unpack these complex ideas into insights you can actually use, give you a shortcut to understanding how your devices juggle all those tasks simultaneously.

Yeah, get ready for some aha moments, hopefully.

We'll explore why modern systems rely so much on threads, some of the challenges they bring up for developers, and how different operating systems and programming languages actually manage all this stuff.

Okay, let's get into it.

Let's unpack this idea of a thread.

At its core, what is a thread?

What are we talking about here?

A thread is basically the fundamental unit of CPU utilization.

Think of it like a lightweight worker inside a bigger job.

A worker.

Okay.

Yeah, and each thread has its own unique ID, its own program counter that's like its bookmark, telling you where it is in the code,

a set of registers for, you know, temporary data, and its own stack for local variables and function calls.

Okay, so it's a worker with its own little toolkit.

But here's where it gets, well, really interesting for me.

How is that different from a traditional process?

Because I usually think of launching programs or processes.

That's a great question, and it's a crucial distinction.

And so imagine a traditional process like, say, firing up a simple calculator app.

It usually has just one main line of execution, one thread, and its own dedicated, isolated memory space.

Everything it needs is self -contained.

Right, walled off.

Exactly.

But with multiple threads within the same process, they actually share a lot.

They share the program's instructions, the actual code, they share global variables, what the book calls the data section, and other OS resources like open files and signals.

Okay, so instead of completely separate things, a multi -threaded process is more like a team working together.

Precisely.

Think of a traditional process as one person doing a big job, maybe editing a huge photo album, painstakingly going through one picture at a time.

Slow work.

I'm picturing it.

Now, a multi -threaded process is like that same person suddenly getting a team of assistants.

They're all in the same office.

That's the process boundary.

They're sharing the main resources, the photo albums, the editing software.

That's the shared code and data.

But each assistant, each thread, has their own specific task list.

Maybe one is cropping, one's color correcting, and another's tagging photos.

They each have their own little notepad and workspace, their stack and registers.

But they're collaborating on the overall job.

That analogy really helps.

The textbook, Figure 4 .1, shows this visually right.

A single box for the old way.

And then a box with shared code data files,

but multiple little stacks register SPCs for the threads.

Exactly that visual.

It shows the shared context versus the individual execution context of each thread.

So if that's the structure, why do we do this?

What are the real -world motivations for using threads?

Oh, the benefits are huge, and it boils down to a few key things.

First, and maybe most noticeable for users, is responsiveness.

Ah, yes, the non -freezing app.

Right, if one part of your application gets bogged down doing something long, like saving a massive file or a complex calculation, that other parts, especially the user interface, can stay active.

That long operation runs in its own thread, so you can still click buttons, type text, whatever.

Your web browser is a perfect example.

One thread shows you stuff, another fetches data.

Keeps things snappy.

Totally.

Or think of a word processor.

One thread for your keystrokes, another maybe doing spellcheck in the background, a third rendering graphics.

All happening concurrently without slowing you down.

Okay, responsiveness.

Makes sense.

What else?

Second big one.

Resource sharing.

Because threads in the same process share memory and resources by default, it's just much more efficient.

How so?

Well, compared to having separate processes try to communicate, that usually involves more complex mechanisms like shared memory setup or message passing, which can be slower and harder for programmers to manage.

With threads, it's often as simple as reading and writing to the same variables, though.

Careful synchronization is needed, which we'll get to.

Easier collaboration within the office.

Exactly.

Which leads to the third benefit, economy.

Creating a brand new process is, computationally speaking,

quite expensive.

It needs its own memory map, its own file tables, all that stuff.

Creating a thread is much cheaper and faster because it reuses the existing process environment.

It just needs its own stack and registers,

essentially.

Switching between threads, context switching, is also generally faster than switching between full processes.

Less overhead.

More bang for your buck resource -wise.

You got it.

And finally, scalability.

This is huge with modern hardware.

You mean multiple cores.

Precisely.

Modern computers almost all have multiple processing cores, a single threaded application.

It can only run on one core, no matter how many you have sitting idle.

It's a waste.

Right.

But multi -threading lets a program split its work across multiple threads, and the operating system can schedule those threads to run in parallel on different cores.

You get true simultaneous execution.

That web server example in the book, Figure 4 .2, illustrates this well, doesn't it?

It does.

A single threaded server handles one client, then the next, then the next.

Lots of waiting.

But a multi -threaded server can spin up a new thread for each client request.

So it can handle tons of clients, concurrently.

Thousands.

And doing it with threads is way more efficient than creating a whole separate process for each client, which used to be a common approach, but is much heavier.

Even the OS kernel itself is typically multi -threaded, managing devices, memory, handling interrupts, all concurrently.

Okay.

So responsiveness, resource sharing, economy, scalability, those are some compelling reasons.

Now, you mentioned multi -core systems.

Yes.

The multi -core revolution has really pushed multi -threading to the forefront.

We went from single CPUs to multiple separate CPUs, and now we have multi -core chips, multiple processor cores, right on the same piece of silicon.

Standard everywhere now, from phones to servers.

Absolutely.

And this puts pressure on software developers.

You have to write software that can actually use all those cores effectively, otherwise you're leaving performance on the table.

Which brings us to a really key distinction the book makes.

Concurrency versus parallelism.

These terms get thrown around, sometimes interchangeably, but they're different, right?

They are crucially different.

Concurrency is about managing multiple tasks that are making progress over overlapping time periods.

Even on a single core, you can achieve concurrency.

The CPU just switches between tasks really, really fast.

Giving the illusion of doing things at once, like that single chef juggling multiple pans you mentioned earlier.

Figure 4 .3 in the book shows this interleaving on one core.

Exactly that.

It looks simultaneous, but it's actually interleaved execution.

Parallelism, on the other hand, means tasks are genuinely executing at the exact same time but on different processing cores.

That's the team of chefs, each working simultaneously in their own station.

Figure 4 .4 shows tasks running side -by -side on multiple cores.

Precisely.

Parallelism requires hardware with multiple cores.

Concurrency can happen even with just one.

You need concurrency to achieve parallelism, but they aren't the same thing.

Got it.

But programming for these parallel, multi -core systems, it sounds hard.

It definitely introduces challenges.

Big ones.

First, you have to identify tasks.

Find parts of your program that can actually run independently.

Not everything can be parallelized.

Right.

Some steps depend on previous steps.

Then there's balance.

You want the parallel tasks to do roughly equal amounts of work.

If one task finishes way early, its core just sits idle while others are still crunching.

Wasteful again.

Yep.

You also need to think about data splitting, how to divide the data the tasks operate on.

And crucially, data dependency.

If one task needs the result from another task, you have to manage that dependency,

often requiring careful synchronization.

And I imagine testing and debugging is a nightmare.

Oh, it can be significantly harder.

Because threads can run in slightly different orders each time, bugs might appear only sometimes.

These are called race conditions and other concurrency bugs, and they can be incredibly tricky to find and fix.

That leads into Amdahl's law, which you touched on earlier.

It puts a theoretical limit on speedup, right?

It does.

Amdahl's law is fascinating.

It basically says the maximum speedup you can get by adding more cores is limited by the part of your program that has to run sequentially, the serial portion.

So even with infinite cores?

Even with infinite cores,

if, say, 10 % of your application absolutely must run serially, you can never get more than a 10x speedup.

It highlights that optimizing that serial bottleneck is often just as important, if not more so, than just throwing more cores at the problem.

A dose of reality for parallel computing.

The book also mentions different types of parallelism, like data parallelism and task parallelism in Figure 4 .5.

Right.

Data parallelism is about distributing subsets of the same data across multiple cores and performing the same operation on each piece.

Think of summing a huge array you could give the first half to Core 1 and the second half to Core 2, both doing the summing operation.

Okay.

Same job, different data chunks.

And task parallelism is about distributing different tasks or threads across cores.

Each thread might be performing a unique operation.

Maybe one thread filters data, another sorts it, another logs results.

They could be working on the same or different datasets.

Different jobs, potentially on different data.

And in practice, many applications use a hybrid of both approaches.

So we know why we use threads and the challenges.

How do operating systems actually support them?

You mentioned user threads and kernel threads.

So threads can be managed either by a library in user space, user threads, or directly by the operating system's kernel threads.

Pros and cons.

User threads are generally faster to create and manage because it doesn't involve the OS kernel directly.

But if one user thread makes a blocking system call like reading from a file, the entire process might block, including all other user threads within it, because the kernel only sees the single process kernel thread it's mapped to.

Also, they generally can't run in true parallel on multi -core systems if the kernel isn't aware of them individually.

Okay.

Efficient but limited.

Kernel threads, on the other hand, are managed by the OS.

The kernel knows about each thread.

So if one kernel thread blocks, the kernel can schedule another one from the same or a different process to run.

And crucially, kernel threads can be scheduled independently onto different CPU cores,

allowing true parallelism.

More powerful, maybe slightly more overhead?

Historically, yes, a bit more overhead.

But modern OCs like Windows, Linux, Mac OS, they all primarily use kernel threads because the benefits of true parallelism and better handling of blocking calls are just too significant.

And there's this relationship between them, the mapping.

Figure 4 .6 shows user threads connecting to kernel threads.

Right.

The user threads created by the programmer need to be mapped onto kernel threads to actually execute.

There have been a few models for this mapping.

Like the many -to -one model, Figure 4 .7.

Yeah.

Many -to -one mapped many user threads to just one kernel thread.

Efficient thread management in user space, but it suffered from that blocking problem If one user thread blocked, they all blocked and no parallelism.

It's rarely used today.

Then the one -to -one model, Figure 4 .8.

That's the dominant one now.

Each user thread gets its own dedicated kernel thread.

This solves the blocking problem and allows true parallelism.

Linux and Windows primarily use this.

The main drawback is potentially creating a lot of kernel threads if the application creates tons of user threads, but OS designers have optimized this well.

Makes sense.

There was also many -to -many, Figure 4 .9, and the two -level model, Figure 4 .10.

Yes.

Many -to -many tried to get the best of both worlds, multiplexing many user threads onto a smaller pool of kernel threads.

It's flexible, but complex to implement correctly.

The two -level model was similar, but allowed binding a specific user thread to a kernel thread if needed.

These models saw some use, but are less common now than the straightforward one -to -one approach on major systems.

So if I'm a programmer, how do I actually use these threads?

You use thread libraries.

These provide the API, the application programming interface, the set of functions you call to create, manage, and synchronize threads.

Like toolkits for developers.

Exactly.

The three main ones the book covers are POCxThreads, which is a standard widely used on UNX -like systems like Linux and macOS.

Then there's the Windows Threads API, specific to Windows, and Java Threads, which are part of the Java platform and run on any system with a Java virtual machine, JVM.

And they let you do things like create a thread, wait for it to finish.

Threads has functions like threadCreate to start a new thread and threadJoin to wait for it.

Windows uses createThread and waitForSingleObject.

Java has its thread class with start and join methods.

The book has examples like summing integers from 1 to n, figure 4 .11 for threads, 4 .13 for Windows.

Yeah, those examples show the basic structure.

Define a function for the thread to run, create the thread passing at that function, and then have the main thread wait for the worker thread to complete using the join mechanism.

And Java has a couple of ways, extending thread or implementing runnable.

Implementing the runnable interface is generally preferred now.

You define the task in the run method, create a thread object with your runnable, and call start.

Java 8 and later also have lambda expressions, making it even more concise.

And Java also has this executor framework, figure 4 .14.

Yes, that came in Java 1 .5.

It's a higher level approach.

Instead of managing thread objects directly, you submit tasks, runnable or callable if you need a return value, to an executor service.

This service often manages a thread pool, which leads us nicely into the next idea.

Which is implicit threading.

Okay, what's that about?

With modern apps potentially needing hundreds or even thousands of threads, managing each one manually becomes incredibly complex and error -prone.

Implicit threading is the trend where developers focus on identifying tasks that can run in parallel.

And the compiler or runtime library handles the messy details of creating, managing and scheduling the actual threads.

Taking the burden off the developer.

Exactly.

When the experts who wrote the library handle the tricky bits.

One major technique here is thread pools.

We touched on this with the executor framework.

Right.

The problem is, constantly creating a new thread for every little task and then destroying it is inefficient.

Creating and destroying threads has overhead.

So pool keeps them ready.

Precisely.

A thread pool pre -creates a fixed number of worker threads when the application starts.

When a task arrives, it's handed off to an idle thread in the pool.

If all threads are busy, the task waits in a queue.

When a thread finishes, it doesn't die.

It goes back to the pool, ready for the next task.

Much more efficient, especially for servers handling lots of requests.

Faster response, controlled resource use.

Absolutely.

Windows has APIs for this, queue user work item and Java's executor service, as we mentioned, Figure 4 .15 shows creating different pool types, is built around this concept.

What other implicit techniques are there?

Another important one is fork join.

This is often used for algorithms that naturally break down problems recursively, like certain sorting algorithms, quick sort, merge sort.

How does it work?

A main task forks by splitting its problem into smaller sub -problems and creating new sub -tasks to handle them.

It might do this recursively until the pieces are small enough to solve directly.

Then it joins by waiting for its sub -tasks to complete and combining their results.

Divide and conquer, basically.

Exactly.

Java has a dedicated fork join framework, figures 4 .1c and 4 .18 show this, that's really optimized for this.

It often uses the technique called work stealing.

Work stealing?

Sounds mischievous.

Hey, a little.

It just means that idle threads in the fork join pool can steal waiting tasks from the queues of other busy threads.

It helps keep all the cores busy and balances the load automatically.

Very clever.

Cool.

Any other implicit approaches?

There's OpenMP.

This is used in C, C++, Ray, and Fortran.

It uses special compiler directives like hashtag pragma on parallel if you put in your code.

Legal instructions for the compiler.

Exactly.

You mark a block of code, like a for loop, and tell the compiler, hey, run this part in parallel.

OpenMP handles creating the threads, usually one per core, and distributing the loop iterations or tasks among them.

It makes parallelizing certain kinds of code remarkably simple.

Then there's Grand Central Dispatch, GCD.

That's Apple's thing, right?

Yes.

GCD is Apple's framework for macOS and iOS.

Again, developers define tasks often as blocks or closures.

They submit these tasks to dispatch queues.

Queues?

Yeah, there are serial queues.

Tasks run one after another, good for coordinating access to shared resources, like UI updates on the main queue, and concurrent queues.

Tasks run in parallel.

GCD manages a thread pool behind the scenes to execute tasks from the concurrent queues, often prioritizing based on quality of service hints provided by the developer.

So developers think about tasks and queues, not threads directly.

Pretty much.

And one more is Intel Thread Building Blocks, TBB.

It's a C++ template library.

Like OpenMP and GCD, you define tasks, and TBB's runtime scheduler maps them efficiently onto threads, handling load balancing, and even trying to be cache aware to improve performance.

Lots of ways to abstract away the complexity,

but even with these tools, there are still tricky issues, right?

The book calls them advanced threading issues.

Definitely.

For instance, how do the standard fork and exec system calls behave in a multi -threaded program when you fork to create a new process?

Does the new process get copies of all the parent's threads, or just the one thread that called fork?

Good question.

It actually depends.

Some systems allow choosing, often if the plan is to immediately call exec right after the fork, which replaces the entire process image anyway, it's more efficient to only duplicate the calling thread.

But if the new process needs to continue running with the same concurrency, duplicating all threads might be necessary.

Exec itself always replaces the entire process, including all its threads.

OK, subtle but important.

What about signal handling, like when you hit sterile plus C?

Signals are notifications sent to a process.

In a multi -threaded world, who gets the signal?

It depends on the type of signal.

A signal caused by a specific thread's error, like division by zero, usually goes just to that thread.

An external signal, like sterile plus C, might go to the whole process, may be delivered to one specific thread, or perhaps to all threads, or only those not explicitly blocking it.

Libraries like threads give you control over this.

Complicated.

And thread cancellation.

Stopping a thread early.

Right, like canceling a download.

There are two ways.

Asynchronous cancellation just kills the target thread immediately.

This is dangerous.

Why dangerous?

Because the thread might be halfway through updating shared data, or holding a lock, or might have allocated resources it hasn't cleaned up yet.

Killing it abruptly can leave the application in an inconsistent state, or lead to resource leaks.

So what's the alternative?

Deferred cancellation.

This is preferred.

The target thread periodically checks a flag to see if it should cancel itself.

If the flag is set, the thread cleans up nicely, releases locks, frees memory, and then terminates gracefully.

Thread supports this, and Java's interrupt mechanism works similarly by setting a status flag the thread needs to check.

Safer, definitely.

Then there's thread local storage.

TLS.

TLS is interesting.

We know threads share most data within a process, but sometimes each thread needs its own private copy of some data.

Not just a local variable inside a function, which disappears when the function returns, but something persistent for the life of the thread, yet unique to that thread.

Like an error code specific to that thread's operation, or a transaction ID?

Exactly.

TLS provides storage that is global in scope to the thread, but not shared between threads.

Many libraries and languages provide mechanisms for this.

And briefly, scheduler activations.

This sounds low -level.

It is.

It relates mainly to those many -to -many or two -level threading models we mentioned earlier.

It's about how the kernel communicates efficiently with the user -level thread library, especially when a thread is about to block.

The kernel uses an upcall to notify the library, allowing it to save the blocking thread state and schedule another ready user thread onto the now available kernel resource, often called a lightweight process, or LWP.

It's complex plumbing to make those hybrid models work well.

Okay, let's wrap up with the OS examples.

How does Windows handle threads?

Figure 4 .21 shows some data structures.

Windows uses the one -to -one model.

Each user thread has a corresponding kernel thread.

The key data structures are the E -thread, executive thread block, kernel space, KTUR, kernel thread block, also kernel space, handle scheduling, and the TAB, thread environment block in user space.

These structures hold the thread's context, like its ID, stacks, user and kernel, registers, and its thread local storage.

Pretty structured.

And Linux, you said it was different.

Linux is unique.

It doesn't fundamentally distinguish between processes and threads in the same way.

It just calls everything a task.

The magic is in the clone.

System call.

How so?

When you call clone, you pass flags specifying what resources the new task should share with the parent.

You can choose to share memory space, file system info, signal handlers, open files, etc., or none of them.

So clone can create something that looks like a traditional process, sharing nothing, or something that looks exactly like a thread, sharing almost everything.

Exactly.

It's incredibly flexible.

Internally, the task struct, the data structure for each task, uses pointers to share resources rather than duplicating them if sharing is requested.

This unified approach is very powerful and is also what enables technologies like Linux containers that use specific clone flags to create isolated environments that still share the underlying kernel.

Fascinating.

So Linux uses clone to implement what we conceptually think of as both processes and threads.

Essentially, yes.

It provides the billing blocks for both paradigms through one flexible mechanism.

Wow.

Okay.

We've really covered a lot of ground here.

We journeyed through what threads are, the huge benefits they bring, responsiveness, sharing, economy, scalability, the whole concurrency versus parallelism thing, the challenges for multi -core programming.

The different models for mapping user to kernel threads.

The libraries programmers use like threads, Windows, Java, and then the shift towards implicit threading with thread pools, a fork join, OpenMP, GCD, TBB.

And even tackled some of the tricky advanced issues like fork, signals, cancellation, TLS, and how Windows and Linux actually implement this stuff.

It's pretty amazing when you stop to think about all that hidden activity, all those threads constantly running inside our devices, making everything feel so seamless and powerful.

Absolutely.

From your web browser to the OS kernel itself, threads are truly the unsung heroes, the workhorses of modern computing.

They really are.

So here's a thought to leave folks with.

As multi -core systems become even more dominant and these implicit threading frameworks get smarter, pushing the complexity down into libraries and compilers,

how might that change software development itself?

Could we reach a point where programmers mostly just think about defining tasks and dependencies and almost never directly manage a thread object at all?

That's a really interesting question.

The trend certainly seems to be heading towards higher levels of extraction.

Definitely something to mull over.

Indeed.

Well, thank you for joining us on this deep dive into threads and concurrency.

My pleasure.

It was a great discussion.

This has been a Last Minute Lecture Production.

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

Chapter SummaryWhat this audio overview covers

Threads represent the fundamental unit of CPU scheduling within a process, enabling concurrent execution of multiple control flows that share the same memory space and system resources. Each thread maintains its own thread identifier, program counter, register set, and stack while accessing shared code and data segments, creating opportunities for both efficient resource utilization and complex synchronization challenges. Multithreaded architectures provide significant advantages over single-threaded designs, including improved application responsiveness through concurrent operations, efficient resource sharing among threads, reduced overhead compared to multiple processes, and enhanced scalability on multiprocessor systems. The operating system implements threading at two distinct levels: user-level threads, managed entirely by user-space libraries without kernel intervention, offer low-context-switch overhead but cannot exploit true parallelism on multiprocessors, while kernel-level threads receive direct scheduling support from the operating system, enabling genuine concurrent execution but incurring greater overhead. Threading models establish different relationships between user-level and kernel-level threads, ranging from many-to-one configurations that map multiple user threads to a single kernel thread, to one-to-one models providing dedicated kernel support for each user thread, to many-to-many designs that balance flexibility with efficiency. Practical thread libraries including POSIX Pthreads, Windows threading APIs, and Java's threading framework provide standardized mechanisms for thread creation, synchronization primitives like mutexes and condition variables, and controlled termination. Modern operating systems employ implicit threading strategies such as thread pools that reuse previously created threads, OpenMP frameworks for parallel loop distribution, and Grand Central Dispatch for automatic work queue management, reducing programmer burden while improving performance. Multithreaded programming introduces critical challenges including race conditions arising from unsynchronized access to shared data, deadlock situations from circular lock dependencies, and thread cancellation complexities. Successful multicore programming demands careful attention to thread safety mechanisms, appropriate synchronization strategies, optimal scheduling decisions, and memory consistency protocols that maintain correctness across multiple processors executing simultaneously.

Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.

Support LML ♥