Chapter 24: Forensic DNA Databases: Tools for Crime Investigations

Search this chapter

Audio Overview

0:00 / 0:00

Autoplay next chapter

Welcome to Last Minute Lecture.

This free chapter overview is designed to help students review and understand key concepts.

These summaries supplement not replaced the original textbook and may not be redistributed or resold.

For complete coverage, always consult the official text.

Welcome back to The Deep Dive.

Today we're getting into something absolutely fundamental to modern forensics, the combined DNA index system, CODIS.

We're going to figure out how this isn't just a database,

but really a whole network turning genetic bits into major investigative tools.

That's the plan.

Our mission here is threefold, really.

First, get a clear picture of how CODIS has built its structure.

Second, look at the tech and the search methods, how it actually works.

And finally, we absolutely have to tackle the tricky legal and ethical stuff that comes with its growth.

Okay, and context is always helpful.

It sounds like the US wasn't actually out of the gate with this.

No, that's right.

The UK actually set up the world's first national DNA database, the NDNA -D, back in 95.

CODIS came online in 98 here in the US, but the groundwork was laid by the DNA Identification Act back in 1994.

So CODIS, at its heart, it's a way for labs across the country to share DNA profiles, standard profiles, right?

From offenders, crime scenes,

identified remains too.

Exactly.

And the key technical piece established back then in 98 was the use of 13 specific locations on a genome, these short tandem repeats, or STR loci.

Why 13 specifically?

Is that just an arbitrary number?

Well, not arbitrary, no.

13 loci provided an incredibly high power of discrimination.

I mean, the odds of two unrelated people matching at all 13 spots by chance, astronomically low.

It gave the necessary statistical weight for identifications at that time.

Got it.

Okay, so let's unpack the structure.

You said it's a network, not just one big computer.

How's it organized?

It's strictly hierarchical.

Think of a pyramid.

At the very bottom, you have LDIS, the local DNA index system.

These are run by your city or county crime labs, police department, sheriff's offices.

Every single profile starts here.

So LDIS is the entry point.

What happens once a profile is generated and approved there?

It moves up one level to SDIS, the state DNA index system.

A designated state lab usually runs this.

SDIS does two main things.

It lets all the local labs within that state compare profiles, and it's the secure gateway to the top level.

Which must be NDIS, the national DNA index system.

Precisely.

NDIS is managed by the FBI.

It's the central hub holding all the profiles contributed from across the country.

This is what allows for those crucial cross -speak comparisons.

They run automated searches once a week, checking everything against everything else.

Okay, let's go back to LDIS for a sec.

If everything starts local, how do you make sure the quality is consistent everywhere?

A sample from, say, Miami needs to be comparable to one from Seattle, right?

Absolutely.

Critical point.

That's where quality assurance standards come in.

Set by groups like SWG DAM, that's the scientific working group of DNA analysis and methods, they dictate really detailed protocols for everything.

How you extract the DNA, how you analyze it, even how you name the profile file.

Standardization is key.

It ensures every profile, no matter its origin, is reliable for comparison nationwide.

And of course, access is tightly controlled.

Only FBI vetted personnel using secure encrypted networks can get in.

Right.

Structure makes sense.

Now how are these millions of profiles actually sorted?

The organization must be vital for effective searching.

It is.

For the criminal justice side, there are three primary indexes.

The biggest is the convicted offender index.

Profiles from people convicted of usually felony offenses requires the full 13 core CODIS loci.

Okay.

Then there's the arrestee index.

This one includes profiles from people who've been arrested, but not necessarily convicted yet.

The rules for this vary a lot from state to state, but it also requires the 13 core loci.

And the third one is where the crime scene evidence itself lives.

Exactly.

That's the forensic index.

This holds the DNA profiles developed from crime scene evidence, blood stains, saliva, skin cells, you name it.

Importantly, these aren't profiles from known suspects.

They're from the scene itself.

And because that evidence can often be old or degraded, this index has a slightly lower requirement.

It only needs at least 10 CODIS loci to be present.

We should also mention the other side of CODIS, the part focused on missing persons, right?

The NMPDD.

Yes.

The National Missing Person DNA Database Program.

It's separate, but related and incredibly important.

It has its own indexes, one for missing persons, one for unidentified human remains, and one for biological relatives of missing people.

That relative's index is interesting.

It can use different kinds of DNA profiles like YSTR for paternal lines or mitochondrial DNA for maternal lines to help make connections.

It requires the 13 core loci plus a mellogenin, which helps determine sex.

Now this is where things get, well, really fast moving.

The technology push.

You mentioned the FBI's rapid DNA program from 2010.

What's the big deal there?

The difference is stark.

Traditional lab processing with manual extraction amplification analysis often takes weeks, even months, largely because labs batch samples for efficiency.

The problem, an arrested person might be released long before their DNA results come back from CODIS.

Okay, so rapid DNA aims to change that

drastically.

Drastically is the word.

We're talking about a fully automated, often portable machine think lab in a box that can take a reference sample, like a cheek swab, and generate a CODIS compatible profile in less than two hours.

Less than two hours while someone's still being booked?

That's the goal.

It fundamentally changes the game.

The idea is to integrate this into a proposed RDIs rapid DNA index system, allowing a database search while the arrestee is still in custody.

Imagine getting a hit linking that person to an old, unsolved crime right there at the booking desk.

Wow.

That kind of speed, combined with the push to include more people like arrestees, means the database must be growing exponentially, and that growth inevitably leads to legal challenges.

Absolutely.

By 2013, CODIS already had profiles from over 10 million convicted offenders and half a million crime scenes.

The trend in many places is adding more types of offenses, even misdemeanors, and definitely more arrestees.

The underlying assumption is simple.

Bigger database equals more solved crimes.

But it bumps right up against constitutional rights, doesn't it?

Like the Alonzo King case.

That's the landmark case here.

Alonzo King was arrested in Maryland for assault.

Under Maryland law at the time, they took his DNA.

Later, that DNA profile matched evidence from an unsolved rape committed years earlier.

And King challenged this, arguing that taking his DNA upon arrest for a crime he wasn't yet convicted of and then searching it against unsolved cases violated his Fourth Amendment right against unreasonable searches.

He did.

And it went all the way to the Supreme Court, which was actually quite divided.

They ruled five to four against King.

The majority essentially said that taking DNA upon arrest for a serious crime was a legitimate booking procedure, much like taking fingerprints, primarily for identification purposes.

They argued the state's interest in identifying arrestees correctly and potentially solving other serious crimes outweighed the individual's privacy expectation in that specific context.

But the dissent must have pushed back hard on that comparison to fingerprints.

DNA contains so much more personal information.

Well, strongly.

The dissenting justices argued that a DNA cheek swab is a significant search, unlike a fingerprint.

It delves into a person's unique genetic blueprint, revealing potentially sensitive medical and familial information.

They saw it as fundamentally different and requiring more justification than fingerprinting.

That five -four split really highlights the core tension we keep talking about.

It really does.

So, OK, a profile makes it into the system.

How often are searches actually run?

The system runs automated searches once a week.

The whole point is generating investigative leads.

You're looking for two main kinds of hits.

First is an offender hit.

That's when a crime scene profile from the Forensic Index matches a profile in the Convicted Offender or Arrestee Index.

Boom, potential suspect identified.

Like the Leon Dundas case you mentioned in the prep, where his sample, taken after he died, linked him to rapes he denied for years.

Exactly.

Shows the database's power, even posthumously.

The second type is a Forensic Hit.

This is maybe even more powerful.

Sometimes, it links DNA from two different crime scenes.

It tells investigators these scenes might be connected, potentially pointing to a serial offender, even before they have a suspect name.

Like the Dominic Moore case linking separate abductions through DNA, which then connected to a third crime he'd already confessed to.

Precisely.

It connects the dots between seemingly unrelated incidents.

Do these matches need to be perfect?

Like, every single one of those 13 markers has to line up exactly?

Not always.

And that's important.

Because crime scene DNA is often degraded, mixed, or just present in tiny amounts.

The CODA software uses different search stringencies.

There's high stringency, which does demand a perfect match at every single locus tested.

But NDIS, the national level, primarily operates at moderate stringency.

And moderate means what, in practice, it allows for some imperfections.

It allows for what are called allelic dropouts.

Basically, if the DNA sample was poor quality,

the lab's equipment might not have been able to detect one or both alleles at a particular locus.

Moderate stringency says, okay, we're missing a piece here, but if everything else matches perfectly, we'll still flag this as a potential hit.

Ah, so it builds intolerance for real world sample problems.

Without that, you'd lose a lot of potential leads from less than perfect crime scene samples.

It'd lose tons.

There's also low stringency, which allows for even more variation, like potential mismatches and dropouts.

But moderate is the workhorse for NDIS because it balances sensitivity with accuracy.

Got it.

Which leads us, I think, to the most complex and maybe controversial use of these databases,

familial searching.

Yes.

This is where we move beyond looking for the perpetrator themselves and start looking for their close relative.

How does that even work?

It's based on basic genetics.

You share significantly more of your DNA profile, more alleles at those STR loci with your parents, siblings, and children than you do with unrelated people.

So the idea is, if the perpetrator isn't in the database, maybe one of their close relatives is.

Finding that relative can give investigators a crucial lead to the actual person they're looking for.

And this has actually cracked some major cases, hasn't it?

It has.

Some really high profile ones.

In the UK, the Craig Harmon case, he was convicted of manslaughter after investigators got a lead from his relatives' profile in the NDNAD.

Here in the US, it was instrumental in the Darrell Hunt case, not just finding the killer, Willard Brown, but exonerating Hunt after he'd spent 18 years wrongly imprisoned.

And the Grim Sleeper killer in California.

That involved familial searching too, right?

Lonnie Franklin, yes.

That case was particularly complex.

It actually required a second, more refined familial search strategy.

They specifically looked for profiles in the state database that shared a high number of alleles.

I think the threshold was 15 with the prime scene DNA.

That led them to Franklin's son.

Police then focused their investigation on the father, Lonnie Franklin, eventually getting his DNA from a discarded pizza slice, which confirmed he was the source of the crime scene evidence.

But this feels like it amplifies the ethical concerns we talked about.

You're essentially searching the genetic information of people not suspected of the crime, the relatives, to get to someone else.

Where does the Fourth Amendment fit in there?

That's the core legal and ethical debate.

Does using a relative's profile constitute an unreasonable search of that relative, or even of the target who wasn't initially identified?

Plus, there's a major concern about disproportionate impact.

Since offender databases already reflect existing disparities in the justice system, familial searching might intensify the focus on certain racial or ethnic minority groups.

So just to be absolutely clear, is this familial searching something that happens routinely at the national NDIS level?

No, absolutely not.

That's a crucial distinction.

Familial searches are not conducted by NDIS.

It's currently permitted by only a handful of states.

California, Colorado, Virginia, Texas are examples, and usually only under very strict conditions for serious violent crimes, and only after all standard database searches have come up empty.

Okay, so when a state does decide to run a familial search, how do they possibly sift through the results?

Even with close relatives sharing more DNA, wouldn't you still get a huge list of potential partial matches?

You do.

The initial list can be massive, hundreds or even thousands long, so they use specific strategies to narrow it down.

The first pass might use something called IBS identity by state.

It's relatively simple, just counting the number of alleles or loci shared between the crime scene profile and database profiles, looking for those above a certain threshold, like maybe 15 shared alleles.

Just a raw count to start.

Right.

But for better accuracy in ranking, they often use the kinship index, or KI.

This is a more sophisticated statistical method.

It calculates a likelihood ratio.

What's the probability these two profiles came from related individuals versus the probability they're just similar by chance in the population?

It gives a much stronger statistical basis for ranking candidates.

Are there other tricks they can use to filter the list?

Definitely.

Focusing on rare allele is this one.

If the crime scene DNA has an allele that's very uncommon in the general population, finding database profiles that also share that specific rare allele is a powerful filter.

And another big one, especially since many database profiles are male, is YSTR screening.

Using the Y chromosome mark.

Exactly.

Since YSTRs are passed down directly from father to son, if you have a male crime scene profile, you can quickly check the YSTR profiles of potential male relatives on the candidate list.

If the YSTRs don't match, you know they aren't related through the paternal line, allowing you to eliminate them quickly.

It's an incredible amount of layered complexity from the local lab all the way to these advanced search techniques.

So, summing up, CODIS is this tiered system, LDIs feeding STIS feeding NDIS.

It relies on those core SDR loci, currently 13, and houses different indexes for offenders, arrestees, and forensic samples.

And technologies like rapid DNA are pushing the boundaries of speed and efficiency.

And underlying all of that is the constant balancing act we've discussed.

The undeniable power of this technology to solve horrific crimes, to link serial offenders, even to exonerate the innocent,

weighed against profound questions about privacy, civil liberties, the reach of the state, and potential bias, as highlighted by debates around arrestee collection and familial searching.

Which brings us to our final thought for you, the listener.

The push is definitely on, both here and internationally, to expand the number of core CODIS loci maybe to 20 or more for even greater discrimination power.

At the same time, there's pressure to include more people, non -felons perhaps, in the database.

So, the question to mull over is, does increasing the technical power always justify expanding the societal reach?

Where do we draw the line?

A critical question as this technology continues to evolve at breakneck speed.

Thank you for joining us on this deep dive into the world of forensic DNA databases.

We hope this gives you a much clearer picture of how it all works.

We'll catch you on the next one.

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

Chapter SummaryWhat this audio overview covers

Forensic DNA databases serve as critical investigative instruments that enable law enforcement agencies to connect crime scene evidence with known offenders and to establish linkages between disparate criminal cases. The Combined DNA Index System (CODIS) operates as the foundational national infrastructure in the United States, organizing DNA profiles into specialized indexes that each serve distinct investigative purposes. The Offender Index contains profiles from individuals with qualifying criminal convictions, the Forensic Index stores unknown profiles recovered from crime scenes, and the Missing Persons Index preserves profiles from disappeared individuals and their biological relatives. Database searches function through two primary mechanisms: case-to-offender searches attempt to match unknown crime scene profiles against the offender population to identify potential suspects, while case-to-case searches detect connections between separate unsolved crimes by identifying matching profiles across multiple incidents. The effectiveness of database searches depends significantly on search parameters and stringency levels, which establish how closely profiles must align before generating investigative leads, and investigators must develop competency in evaluating and interpreting partial match results that fall short of definitive matches. Familial searching represents a more sophisticated and controversial technique deployed when standard database searches fail to identify direct matches but reveal partial genetic similarities suggesting the actual perpetrator may be a biological relative of someone already in the database. This approach depends on statistical frameworks including the Kinship Index and identity-by-state calculations to estimate the likelihood of biological relationships, with particular emphasis on rare alleles that enhance the discriminatory capacity of these inferences. Y-chromosome short tandem repeat analysis provides a complementary investigative tool especially useful in cases where male suspects are involved, as this methodology enables investigators to exclude individuals who do not share paternal lineage with the source biological material. Beyond the technical dimensions of database operation, forensic DNA databases raise substantial legal and ethical considerations including genetic privacy protections, the potential scope of law enforcement surveillance capabilities, and the establishment of appropriate limits on database searches within constitutional frameworks designed to protect citizens in democratic societies.

Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.

Support LML ♥

Chapter 24: Forensic DNA Databases: Tools for Crime Investigations

Related Chapters