Chapter 25: Evaluation of the Strength of Forensic DNA Profiling Results

Search this chapter

Audio Overview

0:00 / 0:00

Autoplay next chapter

Welcome to Last Minute Lecture.

This free chapter overview is designed to help students review and understand key concepts.

These summaries supplement not replaced the original textbook and may not be redistributed or resold.

For complete coverage, always consult the official text.

Welcome back to The Deep Dive.

So picture this.

A forensic scientist testifies.

The suspect's DNA matches the crime scene sample.

Okay, great.

But what does that really mean?

Right.

Just saying it's a match isn't the whole story.

Not even close.

Exactly.

The crucial question, the one that makes all the difference in court, is how significant is that match?

How rare is it for two people to match just by chance?

And that's where we're going today.

Biology finds the match.

Yeah, but it's the statistics, the math that gives it weight.

We're digging into the probability,

the calculations that tell us just how incredibly unlikely that random match really is.

Our mission, for you listening, is to connect those dots.

We'll go from Mendel's pees, believe it or not, all the way to these probabilities like one in a quadrillion.

Step by step.

Yeah, step by step.

So by the end, you'll understand the formulas, sure, but also the important adjustment scientists make to ensure these conclusions are solid.

Okay, let's unpack this before we can talk about rarity across millions.

We need a quick refresher.

How is genetic stuff passed down?

We start with Mendel, right?

Autosomal DNA.

That's the foundation, yes.

Mendel's laws govern how the DNA we typically test in forensics.

The autosomal DNA is inherited.

Law number one.

The law of segregation.

Basically, when you make sperm or egg cells,

your pairs of allele gene variants split up, so each gamete only gets one copy.

It shuffles the deck.

Got it.

And number two is independent assortment.

Now, I always wondered,

if they split up anyway, why is this second law so important?

Don't things just sort independently?

Ah, well, that's the key assumption.

The law of independent assortment says, the way one pair of allele separates doesn't affect how other pairs separate.

Like dealing cards, getting an ace of spades doesn't change your chance of getting a king of hearts.

Okay.

And that independence is what lets us multiply probabilities later on using the product rule.

It's fundamentally important for the math.

Quick context check for everyone.

Human cells are diploid, 46 chromosomes.

Gametes, though, are haploid, just 23.

And this sorting happens during that reduction.

Exactly.

But here's where things get, well, a bit messy.

Real genetics isn't always perfectly Mendelian.

Not all genes actually assort independently.

How come?

If genes are physically very close together on the same chromosome, they tend to get inherited together, like they're linked.

We call them, unsurprisingly, linked genes.

So if genes on the same chromosome stick together, how do we get new combinations of traits from one parent?

Good question.

It's thanks to crossing over.

During meiosis, the process that makes gametes matching chromosomes can swap segments.

Imagine them literally exchanging pieces of their arms.

So genes that are far apart on the same chromosome often get separated by this crossing over.

They can assort independently, or at least more independently than closely linked ones.

It creates more variation.

Right.

So these rules, Mendel crossing over, apply to the main bulk of our DNA, the autosomal nuclear DNA.

But that's not the whole picture, is it?

Absolutely not.

We have to remember why chromosome DNA, which only passes from father to son, and mitochondrial DNA, which passes just from the mother to all her children.

And they don't follow these rules?

Nope.

Their inheritance is non -Mendelian.

They're passed down, largely unchanged, without that shuffling from segregation and independent assortment.

This means we need totally different statistical approaches for them, which we'll definitely get to.

Okay, so now we shift gears.

We go from how one person inherits DNA to how we measure genetic variation across whole populations.

This is population genetics.

It's how we figure out rarity.

And to measure rarity, we need frequencies.

You mentioned three key ones, usually calculated by just counting genes in a sample.

The gene counting method.

That's the basic idea, yeah.

We need three, because they tell us slightly different things.

First, allele frequency, often called P dollars.

That's just how common a single allele variant is at one specific spot, or locus, in our sample population.

Okay, the basic building block.

Then there's genotype frequency, or P dollars.

This is the frequency of the pair of alleles an individual has at that locus.

So are they AA, AA, or A, for example?

And the third one, heterozygosity.

Ten of those.

This one seems really important for forensics.

It really is.

Heterozygosity tells us what proportion of people in the population have two different alleles at that locus.

They're heterozygotes.

So higher heterozygosity means more genetic variety at that spot.

Exactly.

And more variety means that locus is better at discriminating between people.

It has more power to tell individuals apart.

Okay, so we have these frequencies.

Now, how do we use them predictively?

You mentioned the Hardy -Weinberg principle.

Sounds important.

It is.

It's kind of a cornerstone.

What Hardy -Weinberg says is,

if, and this is a big if, if a population is large, mates totally randomly, and isn't affected by things like mutation, migration, or natural selection.

An ideal theoretical population, basically.

Precisely.

Then you can predict the genotype frequencies just from knowing the allele frequencies, using that famous equation, 2p $ plus 2pq plus q2 plus q2 equals 1.

Where p2foo and 2 $ are the frequencies of the two homozygotes.

And 2pcalors is the frequency of the heterozygotes.

It gives you the expected ratios.

But hold on.

Real human populations are never like that.

We don't mate completely randomly.

People move around.

Populations aren't infinitely large.

It seems, well, unrealistic.

It is unrealistic on its own.

That's why forensic labs don't just assume it works.

They have to test it.

They build databases.

Maybe a hundred, two hundred people from a population group.

And compare what they actually see to what Hardy -Weinberg predicts.

Exactly.

They use a statistical test called the chi -square test.

Symbolizes chi 2, 202.

Okay.

Now here's something we need to clarify.

We're talking about the allele frequency p.

But this chi -square test gives a p value.

Those are totally different p's, right?

Excellent point.

Yes.

Very different.

The allele frequency 2 and r is a biological measure.

How common is this allele?

The p value from the chi -square test is a statistical measure.

How likely is it that we'd see this data if the Hardy -Weinberg assumptions were actually true?

So what does that statistical p value tell the lab?

If that p value is high enough, usually the cutoff is greater than 0 .05.

It means the observed frequencies are not significantly different from the predicted ones.

It tells the scientists, okay, for practical purposes, this database is behaving as if it meets the Hardy -Weinberg assumptions.

So the database gets a green light for calculations.

Essentially, yes.

It means we can trust it enough to calculate the probability of match.

That's the chance that two people picked randomly from that population would have the same genotype at that one specific locus.

And the goal is a low p on.

The lower the better.

A lower p on r means that locus is highly discriminating.

And this is where you see the huge difference

Right.

Like comparing older markers, maybe.

Yeah, some markers like certain SNPs used forensically might give you a combined p on r across several loci of maybe, say, 1 in 4 ,000, which is pretty good.

But the standard STRs.

Oh, the 13 -Codas STRs.

The shirt tandem repeats.

When you multiply their individual p11 values together using the product rule, the combined probability becomes astronomically small.

We're talking numbers like 1 in 10 to the power of 14, or even less.

1 in hundreds of trillions.

Exactly.

That incredible discriminating power is why STRs became the gold standard.

Okay, so that power leads us right to the core issue.

A match is found.

Crime scene DNA matches the suspect.

Now we have to evaluate that.

We're essentially weighing two possibilities, aren't we?

Precisely.

Hypothesis 1, H $1, the suspect is the source of the DNA, versus hypothesis 2, 2H $2.

The suspect is not the source, they just happen to match by sheer coincidence.

A random match.

And there are two main ways to put a number on that.

Two main statistical approaches, yes.

The first is the profile probability, which is often called the random match probability, RMP.

That sounds like what we were just discussing.

It is, basically.

You calculate the genotype frequency for each locus tested using 2PQRIs from Hardy -Weinberg.

Assuming the database passed the Chi square test.

Right.

And then you multiply all those individual locus frequencies together.

That relies on the assumption of independent assortment, the product rule.

So multiply the frequency at locus 1 by the frequency at locus 2 times locus 3,

all the way through the 13 -codes loci, for instance.

Yep.

And because each individual frequency is usually quite small, multiplying them makes the final number incredibly tiny.

Like that example, maybe $2 .76 times 1014.

Vanishingly small.

But wait, that calculation still relies on some big assumptions, doesn't it?

The person is chosen randomly from the general population, and they're totally unrelated to whoever left the DNA.

Exactly.

Those are the ideal assumptions built into the basic RMP calculation.

But what if that's not true?

What if the suspect and the actual source are from the same small, maybe isolated subpopulation?

Wouldn't that increase the chance of sharing rare alleles?

That is the critical caveat.

Real populations have structure.

People tend to marry within their own groups, communities, geographical areas.

This non -random mating, or substructure, means people within that subgroup are slightly more related than random people from the whole population.

And that affected the frequencies.

It does.

It tends to increase the frequency of homozygotes, people with two identical alleles, and decrease heterozygotes, compared to what the general Hardy -Weinberg predicts, which means a coincidental match is slightly more likely within that subgroup.

So the basic RMP might be too small, overstating the rarity.

It could be, slightly.

So we need to apply a correction, a conservative adjustment factor called theta.

The sources mention specific values like 0 .01 for most US populations, but 0 .03 for Native American groups.

Why those specific numbers?

Where do they come from?

They're not like precise biological constants measured perfectly.

They're more like consensus values agreed upon by the scientific community based on studies of population genetics.

They act as a safety factor.

Using theta .011 basically builds in a buffer to account for unknown relatedness or substructure.

It makes the final probability estimate a bit higher, therefore more conservative.

So it dampens that extreme rarity number slightly to be safe.

Exactly.

In that example where the RMP was $2 .76 x 1014, applying theta .011 might increase the probability by about threefold.

Still incredibly rare, but more cautious estimate that acknowledges real -world population complexity.

Makes sense.

Okay, that's approach one RMP with a theta correction.

What's approach two?

The likelihood ratio.

Right, the LR.

This is often preferred because it directly compares those two hypotheses we talked about.

H $1 same source versus 2H2 different source coincident match.

How does it do that?

It calculates a ratio, the probability of observing the DNA evidence if H1 is true, divided by the probability of observing the DNA evidence if H2 is true.

Okay, let's break that down.

If the profiles match perfectly and H $1 same source is true, then the probability of seeing that match is 100 % or one.

Exactly.

If the suspect is the source, the probability of their DNA matching is one.

And the probability of the evidence if 2H2 different source is true.

That's just the random match probability we already calculated, right?

Including the theta correction.

Precisely.

So the likelihood ratio becomes one divided by the RMP.

So it's essentially the inverse of the RMP.

The RMP is one in a billion, the LR is a billion.

Pretty much, yes.

An LR of say one billion means this DNA evidence is one billion times more probable if the suspect is the source compared to if some random unrelated person is the source.

Ah, I see.

So it frames the strength of the evidence in terms of comparing the two explanations.

Yeah.

Exactly.

It directly answers the question, how much more likely is the evidence under the prosecution's hypothesis, H $1 versus the defense hypothesis, 2H2 dollars coincidental match.

Some find this framing easier for courts and juries to understand than just hearing an incredibly small RMP number, though presenting either requires careful explanation.

Okay, let's circle back quickly to those other types of DNA we mentioned, Y -DNA and mitochondrial DNA.

We said they don't follow Mendel's rules.

Right.

No independent assortment, minimal recombination for Y, none for MTDNA.

So does the Hardy -Weinberg principle just completely break down for them?

Can we even calculate an RMP using 2P2R200?

No.

You absolutely cannot.

The whole basis of the product rule in HW calculations relies on alleles at different loci being inherited independently.

For Y -STRs or MTDNA sequences, the markers are physically linked together on the same molecule and passed down as a block.

A block, like a haplotype.

Exactly.

We call these blocks haplotypes.

Because there's little or no recombination shuffling things up, certain combinations of alleles along the Y chromosome or MTDNA occur

way more often than you'd expect by chance.

This phenomenon is called linkage disequilibrium.

Linkage disequilibrium.

Meaning the loci are not in equilibria, they're linked.

Correct.

So multiplying individual allele frequencies would give you a wildly inaccurate, usually much too small, estimate of how common that haplotype actually is.

So if we can't use Hardy -Weinberg or the product rule, how on earth do we estimate how rare a Y haplotype or an MTDNA sequence mitotype is?

We have to go back to basics.

Simple counting.

We rely heavily on large population databases specific to Y haplotypes or MTDNA sequences.

Just count how many times you see that exact haplotype or mitotype in the database.

Essentially yes.

It's called the counting method.

Or, for MTDNA, sometimes mitotype frequency.

If a specific mitotype appears, say, six times in a database of null people, we use a slight correction formula, often given as seconds plus one and plus two, to get a frequency estimate PNTF.

Why the plus one and plus two?

It's a statistical adjustment, particularly important if six dollars is small or zero.

It avoids giving a frequency of zero if something hasn't been seen, and provides a slightly more conservative estimate.

Right.

What do you do if the haplotype is so rare it's never been seen in your database?

If six dollars the dollars,

you can't say the frequency is zero.

You definitely cannot.

That would be scientifically unsound.

If a haplotype isn't in the database, it just means it's rare, not impossible.

In that situation, we have to calculate a conservative upper bound for its frequency.

An upper bound?

Yes.

Using statistical methods based on the size of the database, and a chosen confidence level, like 95%, we calculate the maximum frequency that haplotype could plausibly have in the population even though we haven't observed it yet.

So you're saying we haven't seen it, but based on our database size, where 95 % confident its frequency is no higher than this number.

Precisely that.

It's all about providing cautious, statistically sound estimates, especially when dealing with these linked markers where the standard assumptions don't apply.

Hashtag outro.

Wow.

Okay, so we've gone from Mendel's simple laws of inheritance through the complexities of real populations.

We've seen how forensic scientists don't just declare a match.

No, it's built on a foundation.

Testing the population data with Chai Square against Hardy Weinberg.

Then using the product rule carefully to estimate a random match probability.

And crucially, applying corrections like theta to account for things like population substructure, making the final number more robust.

Or using the likelihood ratio to directly compare the probability of the evidence under different scenarios.

The bottom line for you, the listener, is that the incredible power of DNA evidence comes from the ability to attach a statistical weight, a measure of extreme rarity to that match.

That number, whether an RNP or an LR, is the strength of the evidence.

It's not just it matches, it's it matches.

And the chance of a random person matching is one in billions or trillions.

Bugs.

So here's a final thought for you to chew on.

We've discussed how critical these calculations and corrections are.

We know adjustments are needed for relatives, for mixed DNA samples, even for how databases are searched.

Given all that, how much does the ultimate strength and reliability of any DNA conclusion depend on the quality, the size, the representativeness, the accuracy of those initial population databases that all these calculations are built upon?

Something to think about.

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

Chapter SummaryWhat this audio overview covers

Statistical and genetic interpretation of forensic DNA profiles requires understanding both the theoretical population genetics that underpin frequency calculations and the practical methods for communicating evidentiary weight to legal decision makers. Mendelian inheritance patterns and population genetic principles form the foundational framework, establishing how allele frequencies and genotype frequencies distribute within defined populations. The Hardy-Weinberg Principle provides the mathematical basis for predicting expected genotype frequencies under equilibrium conditions, allowing forensic analysts to determine whether observed genetic variation matches theoretical expectations. The Profile Probability approach quantifies rarity by calculating the chance that an unrelated individual would randomly possess an identical genetic profile, offering a straightforward measure of exclusionary power for common autosomal markers. Beyond this single-value metric, the Likelihood Ratio framework represents a more sophisticated statistical tool that evaluates competing propositions: the probability of the observed DNA evidence given that the suspect is the true biological source compared to the probability of that same evidence under the alternative hypothesis that an unrelated person contributed the sample. This comparative approach provides stronger inferential reasoning than probability estimates alone because it directly addresses the question courts need answered. Specialized statistical adjustments become necessary when analyzing uniparental markers such as mitochondrial DNA and Y chromosome short tandem repeats, where discrimination power differs substantially from nuclear DNA and haplotype frequency calculations depend critically on reference database composition. Rare haplotypes require different treatment than common variants when databases provide incomplete coverage of genetic diversity. Population substructure and mutation rate estimation further refine these calculations, acknowledging that allele frequencies vary across demographic groups and that rare variants arise through generational change. Collectively, these statistical frameworks enable forensic professionals and legal stakeholders to present DNA evidence with scientific precision while remaining comprehensible to juries and judges evaluating the strength of genetic links between suspects and biological evidence.

Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.

Support LML ♥

Chapter 25: Evaluation of the Strength of Forensic DNA Profiling Results

Related Chapters