Chapter 8: Email Forensics & Investigation Techniques

Search this chapter

Audio Overview

0:00 / 0:00

Autoplay next chapter

Welcome to Last Minute Lecture.

This free chapter overview is designed to help students review and understand key concepts.

These summaries supplement not replaced the original textbook and may not be redistributed or resold.

For complete coverage, always consult the official text.

Welcome to the Deep Dive.

We take a stack of sources, break them down, and pull out the key knowledge you need, fast.

That's the idea.

Today we're diving into email forensics, specifically chapter 8 from Learn Computer Forensics 2nd edition.

A really crucial topic.

Emails just everywhere, isn't it?

Absolutely.

So understanding how it works behind the scenes, how investigators trace things, that's important for anyone in cyber or even just curious about digital evidence.

Exactly.

And we're going to demystify it.

We'll cover the whole chapter, protocols, headers, email clients, web stuff, even a bit on the legal side.

Our mission.

Make sense of it all.

Plain and simple.

All right.

Let's jump in.

The chapter starts with the basics.

Email protocols.

What are these fundamentally?

Think of them as the agreed -upon rules,

standards that let different computers exchange emails smoothly.

Without them, it'd be chaos.

Okay.

And the first big one is SMTP, Symbol Mail Transfer Protocol.

That's the one.

SMTP is purely for sending email.

It gets your message from your computer to your email server and then onwards, maybe through several other servers until it reaches the recipient server.

And this is all standardized, right?

Through RFCs?

Precisely.

RFCs Request for Comments are the technical blueprints for internet standards.

SMTP started with RFC 821, but it's been updated.

Now you're mainly looking at RFCs like 5321 and 5322.

They ensure everyone's speaking the same language, electronically speaking.

And it usually uses a specific port?

Yep.

Typically TCP port 25.

The book actually has figure 8 .1, which maps out that journey visually, shows the email hopping between SMTP servers.

Okay, so SMTP sends it out, but how do we actually get our email?

That's where POP3 and IMMP come in.

You got it.

POP3 Post Office Protocol Version 3 is designed for downloading emails to your local inbox.

Just downloading?

Primarily, yeah.

Not for sending.

It lets you grab your mail and then you can read it, write replies offline if you want.

But here's a key forensic point.

Depending on your settings, POP3 can delete the email from the server once it's downloaded.

Oh, where?

So the only copy might be on the user's computer.

Exactly.

If that leave a copy on the server option isn't ticked, the local machine is your only source for that evidence.

POP3 usually uses port 110.

Figure 8 .2 in the book illustrates the SMTP sending, the POP download, and that crucial choice about leaving mail on the server.

Big difference from IMAP.

Right, IMMP Internet Message Access Protocol.

It also gets your mail, but differently.

Yeah, IMAP's more like remote control for your mailbox.

It's designed for accessing and managing emails directly on the server.

So emails stay on the server by default?

That's the default behavior, yes, until you specifically delete them.

That's why you can check your email and your phone, then your laptop, and see the same stuff.

Everything syncs because it's all managed server -side.

So it's newer than POP3, but both are still used?

Definitely.

Both are very much still out there.

Figure 8 .3 shows that SMTP IM flow, emphasizing how the mail stays on the server.

OK, POP3 downloads, maybe deletes.

IMAP manages remotely, keeps mail on the server.

Got it.

What about the other big way people get email webmail,

like gmailalloc .com?

Ah, web -based email.

That's a slightly different model.

You use your web browser right, and the provider, Google, Microsoft, Yahoo, they host everything.

The servers, the software, your mailbox.

And deletion works differently there, too.

Typically, yeah.

When you delete something, it usually goes into a trash or deleted items folder.

It stays there for a set period, depending on the provider's policy, before they permanently delete it.

So recovery really hinges on their rules and timing.

Makes sense.

OK, protocols covered.

Now, the chapter gets into actually decoding emails, finding the clues.

Exactly.

This is where the investigation really starts digging in.

Every email has more than just the message text.

Investigators look for unique identifiers, the mailbox addresses, the domain names, and especially the message ID.

And those are key for legal steps.

Absolutely.

Those identifiers are often what you need to get a subpoena or a search warrant to compel a service provider to hand over information.

We all see the basic fields.

Subject, date, from, to.

The book gives an example about background checks.

Seems normal at first.

Right.

The example shows an email between Allison and Jean about an employee spreadsheet with social security numbers.

Looks like a routine request,

but potentially sensitive.

But the chapter warns that even these basic fields, the ones the user fills in, aren't always reliable.

That's a crucial point.

The to, from, subject, the body content the user creates those, and the date and time, often pulled from the sender's system clock.

Which can be changed easily.

Exactly.

So while it's a starting point, you can't take it at face value.

You need to dig deeper to the email header.

The hidden layer.

You mentioned needing a special command like show original in Gmail to even see it.

Yep.

It's usually tucked away.

But this header, it's like the email's travel log.

It contains all the technical details about the journey.

So let's break down that example header in the book.

What are the key parts?

You mentioned return path first.

Yeah, often near the top.

The return path is where bounce messages go if the email can't be delivered.

It can actually be different from the from address you see, especially with things like mailing lists.

In the example, it's simsong at xy .dreamhostps .com.

Okay.

Then there are those received lines.

The chapter says read them bottom up.

Seems counterintuitive.

It does.

But here's why.

Each server that handles the email adds its own received line to the top of the header it received.

So the one at the very bottom.

That's the first server that touched the email after it was sent from the origin.

Ah.

Okay.

So reading upwards traces the path forward in time.

Exactly.

You're reconstructing the journey.

Look at that first received line in the example.

Received by xy .dreamhostps .com.

Postfix from usered558838.

Postfix.

That's the email software.

Common server software.

And notice xy .dreamhostps .com and usered558838.

That's huge.

Because you can potentially link that user ID back to an account.

Right.

The next step, as the chapter suggests, is often a subpoena to the ISP, dreamhost in this case, asking who had used 558838 active at this specific time.

It's a direct investigative lead.

Wow.

Just from that one line and the lines above it show the next hops.

They show the subsequent servers like smarty .dreamhost .com and spunkymail -mx8 .g .dreamhost .com in the example.

Each line tells you which server received it by, which server sent it from, often the software used, a unique ID for that transfer, and a timestamp.

It paints a picture of the path across the internet.

Very detailed.

What about the message, Eid?

The book stresses its uniqueness.

Think of it like a fingerprint for that specific email instance.

The first server that handles the outgoing email assigns this ID, and it's meant to be globally unique.

Globally unique, so no two emails should ever have the same one.

Ideally, no.

If you find duplicates, it might indicate a non -standard email system, or, more concerningly, potential tampering.

The example ID is long.

280719233957 .64c683b1dae at lexy .dreamhostps .com And you can sometimes decode parts of it, like a date.

Often, yes.

C2088790,

likely July 19, 2008.

And 233957 is probably the time, 23 .39 .57 GMT.

So it embeds some useful metadata, much harder to fake than the from address.

Incredible detail.

What about those optional ex -headers?

What are they for?

Ah, the ex -headers.

They're not part of the official email standard.

They're custom headers added by different email systems or software along the way.

For what kind of info?

All sorts.

Could be spam filter schools, ex -spam status, virus scan results, the specific mail client used, ex -mailer, PHP mailer, internal server IDs, maybe even user IDs for specific services like mandrill, ex -mandrill user.

You might also see an ex -originating IP.

The sender's actual IP address?

Potentially.

But, as the chapter points out, be cautious.

Major providers like Gmail often replace the user's real IP with one of their own server IPs for privacy or security.

So you might not get the sender's home IP there anymore.

Right.

And speaking of IPs, the distinction between public and private ones.

Why does that matter?

Big difference for tracing.

A public IP address can, in theory, be traced back to an ISP and potentially a subscriber account.

A private IP address like 10 .x .x .x or 192 .168 .x .x belongs to an internal network, like your home Wi -Fi or an office network.

So it doesn't directly point to someone on the wider internet?

Exactly.

It tells you that the email likely originated from within a private network, but not which network unless you're already investigating inside that specific organization.

It's not routable on the public internet.

That clarifies things.

We've dissected the header quite a bit.

What about attachments?

How does email handle pictures, documents, things like that?

That's where MIE comes in.

Multipurpose Internet Mail Extensions.

It's the standard that lets email handle more than just plain ASCII text.

So it allows attachments, different languages?

Exactly.

Non -ASCII characters, binary files like images or executables, even email structured with multiple parts.

You'll see MIE version 1 .0 in the header usually.

And you get other MIE headers like content type.

What does content type tell you?

It specifies the type of data in that part of the email, like text tml for formatted text or image peg for a picture.

It tells the email client how to display that part.

And there's encoding involved too, Base64.

Yes, content transfer encoding.

Binary files can't travel reliably through some older email systems.

So they need to be encoded into a text safe format.

Base64 is a very common way to do this, converting the binary data into ASCII characters.

So the attachment gets turned into text sent, then turned back into the original file.

Essentially yes.

And if an email has multiple parts, like text and an attachment, MIE defines boundaries, often started with part, to separate them.

Each part gets its own content type and content transfer from coding headers.

It's like a container with labeled compartments.

Fascinating how it's all structured.

Okay, let's move on to where this email data actually lives on our computer's client -based email analysis, programs like Outlook or Thunderbird.

Right.

These are the applications many people use to manage their email.

Outlook, or previously Outlook Express, is super common, often pre -installed.

Thunderbird's a popular free open source alternative.

And forensic analysis involves looking at the files these programs create.

Pretty much.

There are two main approaches.

You can export the client's data file, the container holding all the emails, and open it using the same email client software installed on a dedicated forensic computer.

Or, more commonly now, you use specialized forensic software suites.

Like Incase or FTK.

Exactly.

Tools like those are designed to understand and parse the file formats used by most major email clients automatically.

And those formats differ.

Outlook uses PST files.

Correct.

Microsoft Outlook mainly uses .PST files, personal storage tables.

They contain emails, calendar items, contacts, everything.

Usually found in the user's app data folder.

What about .ost files?

.ost files are offline storage tables.

They're local caches used when Outlook connects to a Microsoft Exchange server.

Found in a similar location.

They're also .ndb files, but those are typically on the Exchange server itself in corporate setups.

And you don't need Outlook running to analyze a PST or OST file?

Nope.

The forensic tools can access them directly.

One thing to watch out for, though these files can get huge.

If you're trying to recover a deleted one from unallocated space, it might be fragmented, making recovery harder.

Also, Outlook Express isn't really a thing anymore.

It was replaced by Windows Live Mail.

Ah, Windows Live Mail.

How did that store emails?

Differently.

Yes, quite differently.

Windows Live Mail, which came with Windows Vista and 7, stored each email as an individual .8ML file.

Oh, separate files for each message.

These .ML files are basically plain text files, formatted according to email standards.

They're usually stored in the Windows Live Mail folder under the user's app data.

Figure 8 .4 shows this structure.

The benefit is that .ML files are easily readable by many tools, even simple text editors.

Though, Windows Live Mail itself is also discontinued now, replaced by the Windows Mail app in Win 1011, which is more cloud -focused.

Gotcha.

What about Thunderbird?

The open source one.

Thunderbird uses a format often referred to generally as DomBioX.

It's actually a family of related formats, but the key idea is that all emails within a single folder, like inbox, sent, etc., are stored together in one large database file.

One big file per folder.

Where are those stored?

Usually in the user's app data roaming directory, inside a Thunderbird profiles folder.

Figure 8 .5 shows this.

You might also find crash dumps or calendar data in there.

And for each MBOX file, there's another file, an MSF file.

Right.

For every folder file, like imbux, with no extension, that's the MBOX file.

There's a corresponding .MBOX file.

MSF stands for Mail Summary File.

An index.

Exactly.

The MSF file contains header information and summaries for the emails in the MBOX file.

Thunderbird uses it to quickly find and display messages without parsing the entire large MBOX file every time.

Forensic tools, like X -Ways shown in Figure 8 .6, can read the MBOX file and extract the individual emails, often as .MML files again.

And the MBOX format is used by other clients, too, like Apple Mail, so most forensic suites handle it.

Okay, so understanding PST, OST, EML, MBOX, MSF.

That's key for finding email evidence on a local drive.

Now, the other big area.

Web mail analysis.

This seems trickier, since the mail isn't stored locally in the same way.

It definitely presents different challenges.

Web mail is super convenient, access from anywhere.

But forensically, yeah, the bulk of the data, the emails themselves, address books, resides on the provider's servers.

Google, Microsoft, Yahoo, etc.

So analysis shifts to the user's computer, looking for traces of web activity.

Precisely.

It becomes more about analyzing internet artifacts.

What did the browser store?

Think temporary internet files, cache, history.

Sometimes people do use a client like Thunderbird to access Gmail via IMAP, but often they just use the browser.

This means evidence on the local machine might be limited to fragments and browser data.

And getting the actual email content usually requires legal process served on the provider.

Generally yes.

In the U .S., typically a search warrant is needed to compel providers like Google or Microsoft to turn over the contents of a user's mailbox.

And recovering deleted emails depends entirely on that provider's retention policies and technical capabilities.

It's not always possible.

So on the user's machine, you're hunting in the browser cache.

Cache, history, sometimes even RAM if you can capture it live.

Or the system's page file.

The browser cache stores bits of web pages, images, scripts, text, to speed uploading next time.

But the chapter mentions Gmail's early use of AJAX made reconstructing full emails from the cache difficult.

That's right.

Modern web applications like Gmail load content dynamically.

They don't necessarily save a complete, static HTML file of your inbox or an individual email to the cache in a way that's easy to just double click and view.

So you might find fragments, keywords.

Exactly.

You might find snippets of text, email addresses, subject lines through keywords searching the cache files, RAM, or page file.

It requires more digging.

The book shows examples with Chrome and Firefox.

For Chrome, the history database is useful.

Very useful.

It's a squalite database typically located in the user data folder.

Figure 8 .7 shows it can log Gmail access times, even unread counts sometimes.

It proves access.

The Chrome cache itself, as figure 8 .8 suggests, is harder to make sense of directly, but...

But it might contain clues.

It can.

The example shows finding a JSON snippet in the cache that revealed other email addresses linked to the user, like badguy27 at yahoo .com.

Still not the email content, but another lead.

And Firefox, similar story, cache and history in the profile folder.

Yes, Firefox stores its cache, cache2 folder, and history, another squalite file, places .sqlite, within the user's profile directory under AppData.

Again, figures 8 .9 and 8 .10 illustrate the folder structure and the somewhat opaque nature of the raw cache data, but it can provide those investigative bread trumps.

It sounds like webmail analysis requires adapting techniques as browsers and web apps change.

Absolutely.

It's a moving target.

Software updates, OS changes.

They all affect where and how artifacts are stored.

Flexibility is key.

But ultimately, for the full picture in a webmail case, getting that data from the service provider via legal channels is often the most definitive route.

Makes perfect sense.

Proper legal process is paramount.

Okay, as we wrap up our deep dive into this chapter, let's quickly recap the main takeaways.

Sure.

We covered the fundamental protocols.

SMTP for sending out email, then POP3 and IMAP for retrieving it, noting the key difference that IMAP generally leaves mail on the server while POP3 might download and delete.

We dissected the email header, learning how to read those received lines bottom up to trace the path and the importance of the message ID, return path, and various X headers.

Yep.

Then we looked at client -based email, how programs like Outlook store data in PST or OST files, how Windows Live Mail use individual EML files, and how Thunderbird uses MBOX and MSF files.

Knowing where to find these is crucial for local analysis.

And finally, we tackled webmail analysis, focusing on browser artifacts like cache and history as primary sources on the user's machine and the necessity of involving service providers for actual content access.

So if you encounter an email header now, you should be able to make much more sense of it, identify servers, protocols, potential leads, and you'll know the common places to look for stored emails, depending on whether it's a local client or webmail.

I think you now definitely possess the foundational skills covered in this chapter.

You have a solid grasp of the email forensics landscape as laid out here.

Well that concludes our exploration of Chapter 8 from Learn Computer Forensics Seconded.

Our mission was to cover the protocols, headers, client analysis, webmail, and the related investigative and legal points.

I'd say we've hit all those marks.

Agreed.

We went through the nuts and bolts, hopefully making it clear and relevant for anyone interested in this field.

Definitely.

And before we sign off, here's a final thought, something building on what we discussed, but taking it a step further.

Consider the rise of end -to -end encryption in messaging.

How might widespread adoption of that for email change the forensic techniques and the kind of evidence we've talked about today?

That's a big one.

A really significant challenge for the future of digital investigations.

Definitely something to ponder.

It is indeed.

Thanks for joining us on this deep dive.

We have now fully covered the specified material on email forensics from the source chapter.

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

Chapter SummaryWhat this audio overview covers

Email investigation occupies a central position in digital forensics work, requiring investigators to understand both the technical infrastructure underlying message transmission and the practical methods for locating, recovering, and analyzing electronic communications. The foundation rests on mastering email protocols that handle different stages of message movement: SMTP manages the delivery of messages between servers, POP3 enables clients to download messages to local machines, and IMAP maintains messages on remote servers while allowing synchronization across devices. Each protocol creates distinct forensic artifacts and requires different investigative approaches. The distinction between client-based systems such as Outlook and Thunderbird versus web-based platforms like Gmail and Yahoo becomes critical because their storage mechanisms differ significantly, necessitating tailored recovery strategies for each environment. Email headers provide crucial investigative data, containing sender information, routing paths that reveal message transmission history, timestamp records, and message identifiers that help establish communication origins and identify spoofing or header manipulation attempts. Proper analysis of MIME structures and understanding base64 encoding enables investigators to properly extract and examine attachments that may contain evidence. Recovering deleted emails represents a substantial investigative challenge addressed through knowledge of storage file formats including PST, OST, MBOX, and EML, each requiring specific forensic tools and extraction techniques. Web-based email investigations extend beyond the email platform itself, requiring systematic examination of temporary internet files, browser cache repositories, and browsing history logs where residual traces of webmail activity often persist despite user deletion attempts. The practical reality of webmail forensics demands knowledge of multiple system locations where evidence fragments accumulate. The chapter also addresses the formal legal requirements for obtaining email data from service providers, emphasizing the necessity of search warrants, proper legal request procedures, and rigorous documentation of evidence chain integrity to ensure investigative findings remain admissible in legal proceedings and compliant with privacy regulations.

Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.

Support LML ♥

Chapter 8: Email Forensics & Investigation Techniques

Related Chapters