Chapter 5: Computer Investigation Process
Welcome to Last Minute Lecture.
This free chapter overview is designed to help students review and understand key concepts.
These summaries supplement not replaced the original textbook and may not be redistributed or resold.
For complete coverage, always consult the official text.
Welcome back to the Deep Dive.
Today, we're really getting into it, fully submerging into a foundational area of digital investigations.
We're looking at a really comprehensive chapter that details, you know, the ins and outs of computer forensics.
Yeah.
Think of this as your maybe accelerated path to grasping how digital evidence is found and well interpreted.
Precisely.
And it's about more than just like recovering data.
It's really about systematically piecing events back together.
Okay.
Understanding that digital footprint people leave behind.
This chapter, it gives you a structured framework, which is just essential for anyone trying to understand the process.
Okay.
Let's unpack this structured approach then.
The chapter lays out a whole range of crucial techniques, doesn't it?
It does.
From meticulously building a timeline of user activity to that granular examination of storage devices.
Right.
Using targeted keyword searches and even, well, the intricate art of bringing back deleted data.
Quite a lot to cover.
Yeah.
So our goal here is to pull out the core principles and the, you know, practical things you need to consider in each of these areas.
Ready to jump in.
Absolutely.
Let's start with a cornerstone of pretty much any investigation.
Establishing a reliable sequence of events.
Timeline analysis.
Timeline analysis.
Yeah.
On the surface, it sounds simple, like just looking at file dates, but we know it's quite a bit more involved than that.
Indeed.
It's the chronological examination of system activities, user activities.
The real power, though, comes from the context.
Ah, context.
The chapter illustrates this perfectly, actually, with that example of Google searches for, like, injury treatment.
Oh, right.
Initially, you see that you think, uh -oh, red flag.
But, as the chapter highlights, a proper timeline showed those searches weren't even done by the person they first suspected, someone else entirely.
Exactly.
It really underscores a critical challenge.
Yeah.
Figuring out who did what when attributing actions, especially on shared computers.
And here's where it gets interesting.
You absolutely cannot just look at one isolated artifact, can you?
No, you can't.
And the chapter rightly points out the, well, the inherent limitations of just relying on S times.
Those modified access created timestamps.
Yeah, the ones you see in the file system metadata.
They're a starting point, sure, but they can be easily altered.
How so?
Well, intentionally sometimes, or just through normal things the system does, like moving files, backups, even some third -party tools can change them without you realizing.
So, if S times aren't the definitive answer,
what should investigators be looking at?
Well, the chapter really stresses you have to corroborate those MS times.
Use a multitude of other sources to build a really robust timeline.
Like, what kind of sources?
We're talking system event logs.
They record a huge range of system activities.
Web browser history,
obviously, showing visited sites, downloads, logs of programs that were run, even network connection logs.
By cross -referencing all these different records, you can validate the timestamps and get a much, much richer picture of what actually happened.
Wow.
It sounds like trying to weave together a complete tapestry from loads of different threads.
That's a great analogy.
And with the sheer volume of digital information today,
Rob Lee from SANS Institute actually coined the term super timeline for this massive collection of event data.
A super timeline.
I like that.
Yeah.
The chapter notes it used to be a real challenge using all these separate tools to collect and correlate everything.
But thankfully, the forensic toolkit has evolved, hasn't it?
Oh, significantly.
The chapter talks about the shift towards integrated forensic suites, tools that can pull together data from all those varied sources into one unified chronological view.
Makes sense.
It specifically highlights Exway's forensics and its event list feature.
Okay.
Tell us more about this Exway's event list.
What makes it stand out?
Well, what's particularly powerful is its ability to just consolidate this vast array of data points.
It's not just looking at file system timestamps, right?
It's pulling in internal application timestamps, detailed browser activity, comprehensive OS event logs, registry entries, even email metadata, all lined up chronologically.
The chapter uses that compelling scenario, the one involving Gene, the CFO, and the leaked spreadsheet.
How does the event list help there?
Ah, yes.
That scenario perfectly shows its power.
The spreadsheet m57plan .xls, it has a unique MD5 hash, that digital fingerprint.
To prove it's the right file.
Exactly.
And a last modified time.
Those are your initial identifiers.
So by filtering the Exway's event list to that specific time frame and searching for the file name, investigators can uncover related system activities.
What kind of related activities are we talking about finding?
Well, the example reveals the creation of a link file for that spreadsheet right around the time of the suspected leak.
Okay.
And also the generation of a prefetch file for Microsoft Excel.
These things strongly suggest Gene access the file locally on that machine.
But crucially, the event list also shows an email sent by Gene.
Ah, the email that was supposed to go to the company president, Allison.
But there was something off, wasn't there?
A discrepancy.
Precisely.
The recipient address and the actual email header was different from Allison's official company address.
It was tuckgorge at gmail .com.
So by digging deeper into email -related events in that timeline, investigators could potentially uncover other communication with this external address, raising suspicions, maybe a compromised account, perhaps social engineering.
Wow.
So the event list didn't just give the time the email was sent.
It actually connected that timestamp with the email artifact itself and the suspicious recipient.
That paints a much clearer picture than just a file mod time.
Exactly.
And as the chapter details, Exway's forensics pulls this info from, well, a huge spectrum of sources, file system, browser data, network activity, user actions.
It gives investigators much higher confidence in the reconstructed timeline and the overall context.
Now, Exway's is a commercial tool.
The chapter also introduces an open source alternative for timeline analysis, right?
This does, PLASO, which operates mainly through a command line interface.
Right.
And it uses log two timeline as its core engine to extract all those timestamped events and build that consolidated timeline database, usually stored in a .plaso file.
Command line tools, they sometimes seem a bit complicated for people.
How does PLASO structure its commands?
Is it complicated?
The chapter gives a pretty helpful breakdown, actually.
Generally, you have the main executable name, that's the command you type, followed by various modifiers or flags.
Like switches.
Yeah, exactly.
Like switches that tell the tool to do specific things or look at particular data.
And PLASO isn't just one tool.
It's like a collection of utilities.
That's right.
There's image export.
That's designed for pulling out the raw content of files from a forensic disk image.
You can tell it which files by name or maybe by file extension and say where to save the extracted data.
The chapter even includes an example command for exporting that crucial m527plan .xls file.
Useful.
Then there's log two timeline itself.
What's its main job?
That's really the heart of PLASO for making the timeline.
You point it at your forensic image, it crunches through the data and generates that .plaso database file with all the chronological events.
The chapter also mentions the .flath modifier is really useful.
It lets you apply filters for more focused examinations.
You don't always need everything.
Right.
Narrow it down.
Yeah.
And it even points out resources where you can download pre -built filter files for common things you might look for.
Okay.
So once you have this .plaso file, this database,
how do you actually get insights from it?
Well, that's where PINFO comes in handy first.
It gives you metadata about the .plaso file itself.
Like what?
Like when it was created, what command line options were used to make it, how many events it contains, and even some basic info it figured out about the system from the image.
So like a summary of your timeline database.
An overview.
Precisely.
Then you have sort.
That's the tool for filtering, sorting, and ultimately analyzing the events within the .plaso database.
A critical modifier here is 8 .0, which lets you define the output format.
AOAT -TCSV is used a lot.
It exports the timeline data into a comma -separated values file, like for a spreadsheet.
CSV.
Though the chapter wisely warns you might hit performance issues if the CSV file gets extremely large.
Good point.
So he sort of sounds like it allows for really granular analysis of that timeline data.
Absolutely.
It also supports analysis plugins.
The chapter mentions one called tagging.
Tagging.
Yeah.
It can automatically apply tags to events based on predefined rules.
Makes it easier to categorize things, spot important activities, and the slice modifier is invaluable.
It lets you zero in on a specific event and see what happened immediately before and after it.
Gives you that crucial context.
Context.
Again, very useful.
Okay.
Finally, there's STEEL.
That sounds like it simplifies things.
You got it.
STEEL basically combines log two timeline and sort into a single command.
So it does the extraction and the initial processing in one go.
The chapter kind of playfully calls it the kitchen sink approach.
It has fewer modifiers, maybe less control, but it can be useful for a broad initial look.
Okay.
So you've built this detailed timeline, whether with X -Ways or Pleso.
What's the next step?
How do you really leverage this information?
Right.
That leads us straight into the analysis phase.
The chapter highlights several tools you can use to visualize and dig deeper into the timeline data.
What are some options?
Well, in the open source world, there's the ELK stack.
That's Elasticsearch for powerful searching and analysis.
Logstash for processing and pulling in data.
And Kibana for creating really insightful visualizations.
It's a robust solution, but needs some technical know -how to set up.
Okay.
And what about commercial options for analyzing timelines?
There's Timeline Maker Pro.
It's specifically designed for generating visual timeline charts from CSV data.
Makes it easier to spot trends, anomalies, that kind of thing.
Jewels can help?
Definitely.
TimeSketch is another open source tool.
Focuses more on collaborative timeline analysis, often used in Linux environments.
Eon Timeline is a commercial one that's great for visual timelines and showing relationships between events.
And what if you're dealing with, say, smaller, more focused timelines?
Maybe not the whole super timeline.
For that, the chapter recommends Timeline Explorer.
It's a free tool from Eric Zimmerman.
Oh yeah, he makes great tools.
He does.
It's specifically designed for efficiently reading MA time data and PLASSO CSV files without needing a full spreadsheet program.
But the author does advise against using it for extremely massive data sets.
It's better for targeted views.
Right.
So timeline analysis tells you the when.
Now let's dig into examining the raw data itself.
That brings us to media analysis, right?
Exactly.
The chapter explains that while timeline analysis can be applied to lots of different types of digital evidence, media analysis specifically focuses on the direct examination of physical storage devices.
Hard drives, SSDs, USB sticks, that sort of thing.
And within these storage devices,
investigators need to understand different kinds of data areas.
It's not all just files.
No, not at all.
There are four primary categories discussed.
There's allocated space that holds the active files currently in use by the OS.
Okay.
Then there's unallocated space.
That's the free space where deleted files might still physically reside waiting to be overwritten.
Slack space, that's the leftover space within the last cluster assigned to a file if the file doesn't fill it completely.
Sometimes bits of old data hang out there.
Interesting.
And finally, bad blocks or sectors areas the drive controller marked as faulty.
Sometimes data can be hidden there intentionally, though it's less common.
The chapter presents a kind of progression for media analysis, concepts defined by Brian Carrier.
Can you walk us through that?
Sure.
Carrier outlines a logical flow, sort of layers of abstraction.
You start with the disk, the physical thing itself.
The hardware.
Yeah.
Then the volume, a logical section of the disk, like a C drive.
Then the file system, the structure organizing files within that volume, like NTFS or FAT.
Then the data unit, the smallest chunk the file system manages, usually a cluster.
And finally, metadata, the information about the files, like timestamps, permissions, file name, size.
So the ultimate goal of media analysis is basically hunting through all these layers to find relevant digital artifacts.
Exactly.
That can either support or refute whatever is being investigated.
Precisely.
And the artifacts you find doing media analysis, they often provide critical context.
They can corroborate or sometimes contradict what you found in the timeline analysis.
They work together.
Okay.
Makes sense.
Let's shift focus again to another fundamental technique.
String searching.
This is basically like doing a
targeted keyword search across the entire digital landscape, right?
That's a pretty accurate way to put it.
String searching, or sometimes byte searching, involves looking for exact sequences of characters or bytes that match specific keywords or patterns you define.
And forensic tools are bimped to do this searching everywhere.
Generally, yes.
Good tools will search across all accessible areas.
Allocated space, unallocated space, and that slack space we mentioned.
And investigators usually have lists of keywords they use.
Absolutely.
The chapter talks about categorizing these.
You might have generic keyword lists, broad terms relevant to many cases, maybe categorized by investigation type, like fraud or IP theft.
Okay.
And then you have case -specific lists.
These are tailored to the unique details, names of people involved, locations, specific jargon or slang they might use, maybe usernames, emails, addresses,
phone numbers, credit card numbers, things like that.
The chapter also mentions different encoding schemes, like ASCII and Unicode.
Why are those important for string searching?
Ah, yeah, that's crucial because digital text isn't just text.
It's represented in different formats, different encodings.
ASCII is older, more limited.
Unicode is the modern standard, handles a much wider range of characters, different languages.
If your search tool isn't set up to look for the right encoding or both, you might just completely miss relevant keywords even if they're right there.
Now what if the person you're investigating uses misspellings or variations of a keyword?
A standard string search wouldn't catch those, would it?
That's a key limitation, yeah.
Literal string searching is, well, literal.
If you search for secret plan, it won't find cert plan or secret plans.
This is where pattern matching using regular expressions becomes incredibly valuable.
Ah, regular expressions, regex.
They always seem pretty complex.
Can you give us a simplified idea of how they help in forensics?
They can seem intimidating, but the core idea is powerful.
A regular expression, or regex, is basically a sequence of characters that defines a search pattern.
A pattern, not just a word.
Exactly.
Instead of looking for an exact string, you're describing a template of what you want to find.
The chapter breaks down some common symbols, the metacharacters, like an asterisk often means zero or more of the preceding character.
A dot can stand for any single character.
Okay, I see.
So it allows for much more flexible searches.
The chapter gives examples for finding things like IP addresses and US phone numbers.
It does.
The IP address example uses a pattern to match, you know, four sets of one to three digits separated by periods.
The phone number one uses patterns to catch different formats, parentheses, dashes, spaces, or none.
Very handy.
It makes the search much more powerful.
The chapter even recommends a website, regexlib .com, which is a great resource.
A library of pre -built regular expressions for all sorts of common patterns.
It sounds like getting comfortable with regular expressions is a, well, a significant advantage for a digital investigator.
Oh, absolutely.
It lets you find patterns and variations that simple keyword searches would just fly right past.
Significantly boosts your ability to find relevant evidence.
Okay, we've covered timelines, searching live data, pattern matching.
Now let's talk about deleted files.
When something's deleted, is it really truly gone?
Well, usually, no, not right away anyway.
The chapter gives a really detailed explanation of how deleted data recovery works, starting with the older FAT file system.
Okay, FAT.
So when you delete a file in FAT, what actually happens on the disk?
The chapter explains that the file's actual data, the content, often stays right there on the disk, physically.
What changes is primarily in the directory structure.
The first character of the file name in the directory entry gets replaced with a special marker, the hexadecimal value E5.
Okay, so it's marked as deleted.
Right.
And also, the entries in the file allocation table, the FAT itself, which is like the map showing which clusters the file used, those entries get reset to zero.
That tells the OS those clusters are now free, available to be used by new files.
So it's more like taking down the signposts pointing to the data, rather than actually erasing the data itself.
That's a perfect analogy, yes.
So to recover the file, you essentially need to reverse that process, rebuild the directory entry, and reconstruct that FAT chain, the map.
How do you do that?
The chapter walks through an example.
You look at the remnants of the directory entry to find the starting cluster number and the file size.
You also need the cluster size, which you get from the system's boot record.
And the recovery involves actually editing these low -level structures, like with a hex editor.
Yes, typically.
Using a hex editor, you'd change the FAT entries from zero back to the values that correctly link the files clusters together, usually marking the last cluster.
Okay.
And you'd replace that E5 marker in the directory entry with a valid character.
The chapter suggests using something neutral like an underscore or a dash, so you're not guessing the original file name.
What about files with long file names in FAT?
Does that complicate recovery?
It adds a little wrinkle.
The chapter points out you need to make sure you relink the associated long file name entries back to the short file name entry you just recovered.
There are checksums involved based on the short file name, so consistency matters.
Okay, that's FAT, it's older.
How does deletion work differently in NTFS?
That's what most modern Windows systems use.
Right, NTFS uses a different, more complex mechanism.
When a file is deleted in NTFS, the system does a few things.
It increments a sequence counter in the file's record in the master file table, the MFT.
MFT record, okay.
The record's allocation status flag is changed to show it's no longer in use.
And if the file's data was stored outside the MFT record itself, which is called non -resident data, common for bigger files,
then the bitmap file, which tracks cluster usage, is updated to show those data clusters are now free.
The chapter mentions a file signature within the MFT record.
How can investigators use that for recovery?
Yeah, every valid MFT record starts with a specific signature.
Usually the character's file E.
Investigators can actually search the unallocated space on the drive for the signature.
Looking for orphaned MFT records.
Exactly.
To identify potentially deleted MFT records, if you find one, and critically, if the data clusters it pointed to haven't been overwritten yet, recovery might be possible.
So if you find an unused MFT record, can you just, like, undelete the file easily?
Well, the chapter explains you need to interpret the information inside that found MFT record.
If the file's content was really small, it might have been stored entirely within the MFT record itself that's called resident data.
In that case, recovering the MFT record basically is recovering the file.
But for non -resident data, the more common case,
you need to parse the data runs recorded within the MFT record.
These data runs are like pointers telling you exactly when clusters on the disk hold the file's data and in what order.
But what if the MFT record itself, the one you found, has been partially or fully overwritten?
Then your chances for non -resident data recovery drop significantly.
That's correct.
If the MFT record itself is overwritten, that crucial information about the file's data runs is lost.
And if that's gone, it becomes extremely difficult, often impossible, to reliably recover the associated non -resident data, even if the actual data clusters themselves haven't been overwritten yet.
You've lost the map.
Wow.
It really highlights just how delicate digital evidence can be and why acting quickly in forensic investigations is so important.
Absolutely.
The longer a system keeps running after files are deleted, the greater the chance that the underlying data or the pointers to it will be overwritten and lost for good.
Well, this chapter has certainly given us a really comprehensive look at the fundamental techniques in computer investigations.
We've touched on everything from meticulously building timelines and analyzing storage media to performing those targeted searches and even attempting the recovery of deleted data.
Indeed.
We've covered the core ideas behind timeline analysis, dug into the nuts and bolts of media analysis, examined string searching, including the power of rejects, and discussed how deleted data recovery works, or sometimes doesn't work, in both FAT and NTFS systems.
It really drives home how these technical processes are just critical for understanding and reconstructing digital events in, well, all kinds of real world situations.
And I think it leaves us and you listening with a pretty significant thought to mull over.
Oh.
As the amount and the sheer complexity of digital data just continues to explode, I mean, exponentially.
How will these fundamental forensic techniques we've talked about today need to adapt?
How will they evolve to stay effective against future technologies and, well, future challenges?
It's a big question.
And one that's going to keep shaping digital forensics for years to come.
ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.
Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.
Support LML ♥Related Chapters
- Computer ForensicsCriminalistics: An Introduction to Forensic Science
- Computer SystemsLearn Computer Forensics
- Crime Scene Investigation of Biological EvidenceForensic Biology
- Death InvestigationCriminalistics: An Introduction to Forensic Science
- Email Forensics & Investigation TechniquesLearn Computer Forensics
- Forensic Aspects of Fire and Explosion InvestigationCriminalistics: An Introduction to Forensic Science