Chapter 20: Autosomal Short Tandem Repeat Profiling

Search this chapter

Audio Overview

0:00 / 0:00

Autoplay next chapter

Welcome to Last Minute Lecture.

This free chapter overview is designed to help students review and understand key concepts.

These summaries supplement not replaced the original textbook and may not be redistributed or resold.

For complete coverage, always consult the official text.

Welcome to the Deep Dive.

Today, we're really getting into the nitty -gritty, the molecular foundation of criminal justice,

short tandem repeat profiling.

That's right.

This is the tech that really changed forensic biology, moved it beyond older methods into something much more precise.

You want to cut through some of the complexity today, really understand the genetic blueprint that crime labs are using now.

And we've pulled together the key info on this, focusing on STR profiling.

You really can't understand modern forensics without getting what an STR is and why it was such a big step forward.

Yeah, our goal here is to give you a look at the big databases and importantly, talk about the real world issues, the artifacts, the interpretation challenges that can muddy the waters.

Let's kick off with what makes STRs so revolutionary, their size.

That's absolutely central.

An STR is just a short, repeating bit of DNA.

This is what's called a microsatellite.

We used to use the NTRs, variable number tandem repeats.

Right, the older method.

Yeah, and they were big fragments, maybe a thousand base pairs long.

STRs, the ones we use forensically, are tiny in comparison, usually 100 to 500 base pairs.

Okay, and why is being small such a game changer, especially for

legal evidence?

Well, it's really about survival.

Think about crime scene DNA.

It often gets degraded, broken down by heat, moisture, sunlight,

just time.

Right, fragmented.

Exactly.

The short length of STRs means they're much more likely to survive that kind of damage and still be copied using PCR.

That ability to work with degraded samples, plus being able to amplify lots of STR regions at once, multiplexing.

That's what really transformed things.

Got it.

So let's break down the structure of one of these STR regions, or loci.

What are the key parts we need to understand for how they work?

Okay, you've basically got two main parts.

First, there's the core repeat region.

That's the bit with the sequence that repeats over and over.

And the number of repeats is the key, right?

That defines the allele.

That's it.

The number gives you the genotype.

Then around that core region, you have the flanking regions.

And those are?

They don't usually vary much between people.

But they're critical because that's where the PCR primers attach.

They need to bind there to start copying the repeat region.

I see.

Now, these repeats come in different lengths, dimerics, trimerics, tetramerics, two, three, four base pair units.

Forensics seems heavily focused on the four unit ones, the tetramerics.

Why is that?

It really boils down to getting clean data.

Shorter repeats, like the dimerics and trimerics, well, they tend to cause more problems during PCR.

Problems like what?

They generate more of these artifact peaks called stutter.

It kind of clouds the profile, makes it harder to be sure what you're seeing.

Tetrameric repeats, and there are at least 10 ,000 of them scattered through our genome.

They're very variable between people, which is good for identification, and they produce much less stutter.

So less noise, more reliable signal.

Exactly.

You get that cleaner profile you need for court.

So let's say we're looking at a straightforward one, a simple repeat like D5S818.

How does the lab actually assign that allele number?

For a simple repeat, where it's the same unit repeating,

like agat in D5S818, it's literally just counting.

If that agat sequence repeats 10 times.

Then it's allele 10?

Precisely, allele 10.

Simple as that.

But biology loves complexity, right?

I read that there are more complicated structures that aren't just simple repeats.

Oh, absolutely.

That's where it gets more challenging.

You move beyond simple repeats to things like compound repeats.

These have more than one type of simple repeat sequence stuck together.

D8S179 is an example.

Okay, so mixed repeats.

Yeah.

And then you have complex repeats like D21S11.

These are even trickier.

They might have different clusters of repeats mixed with other non -repeating sequences in between.

Wow.

So the software analyzing this has to be pretty sophisticated.

Definitely.

And it gets even more complex with something called microvariance, or non -consensus allele.

Microvariance.

They don't have whole number repeats.

Exactly.

They have partial repeat units.

So the classic example is the TH -ERO1 locus, allele 9 .3.

Okay, 9 .3.

What does the .3 mean?

It means it has nine full repeats plus an extra three nucleotides from the next repeat unit.

It's not a full 10th repeat.

Right.

So simple counting goes out the window there.

It really does.

The system has to be calibrated to recognize these fractional repeats.

It shows how automated analysis became essential.

Okay, let's zoom out from the single locus to the big picture.

The power of SDR is really exploded with standardized databases, right?

How did that evolve?

Yeah, the standardization was key.

It started back in the early 90s in the UK.

They had a system called the quadruplex, tested just four loci.

Four.

That doesn't sound like as much discriminating power.

It wasn't huge, but it was a start.

Then came the SGM system,

second generation multiplex, that used six loci plus the sex marker.

That brought the chance of a random match, the population match probability, or PM, down to about one in 10 million.

Okay, now we're talking real identification power.

That low PM is the whole point.

It is.

And that statistical power really pushed things forward internationally.

By 1998, Europe established the European standard set, the ESS loci.

And in the US.

At the same time, the FBI set up the big one, CODES, the Combined DNA Index System.

CODES.

Everyone's heard of that.

It uses 13 core SDR loci now plus amelogenin for sex.

The source mentioned a match probability as low as 10 to the minus 15.

That's astronomical.

It's incredibly discriminating.

One in a quadrillion, roughly.

How do you even explain that level of certainty to someone like a jury?

It's hard to grasp.

It is abstract.

Which is why choosing those core loci was so critical.

The power comes from multiplying the probabilities across independent loci.

Independent being the key word.

Absolutely.

The loci had to meet tough criteria.

Yeah.

Highly variable between people, obviously.

Short amplicon lengths, as we discussed.

And crucially, they had to be unlinked.

Unlinked meaning they're inherited independently, like on different chromosomes.

Usually, yes.

Located on different chromosomes is the easiest way to assure they're unlinked.

Or if they're on the same chromosome, they need to be far enough apart that they essentially get shuffled independently during inheritance.

Like flipping separate coins?

The result of one doesn't affect the others.

That's the idea.

A match at one locus gives you statistical weight, and the next independent locus adds more, multiplying the certainty.

Okay.

Let's trace the journey of a sample through the lab.

After extraction and checking how much DNA you have, the big step is multiplex PCR amplification.

Right.

This is where the profile gets generated.

You amplify all those different STR loci simultaneously.

The primers used have different fluorescent dyes attached.

Different colors for different loci.

Sort of, yeah.

Or different combinations.

Then comes electrophoresis.

The amplified fragments, the amplicons, are separated by size in a thin capillary tube.

Smaller ones move faster.

And the dyes.

A detector reads the fluorescence as the fragments pass by.

That data creates the electrophoregram, basically a graph, showing peaks of fluorescent signal versus the size of the fragment.

And the height of those peaks tells you something, right?

Measured in RFU?

RFU, yeah.

Relative fluorescence units.

It's a measure of the signal intensity, which relates to how much of that specific DNA fragment was amplified.

Is there a minimum threshold?

Like, how do you know a peak is real signal and not just, you know, background noise?

There is.

Labs set an analytical threshold.

A common one, often recommended by kit manufacturers, is around 150 RFU.

Peaks below that might be ignored.

But can you have too much signal?

Oh, definitely.

If the peaks are too high, say, above 6000 RFU, the detector gets saturated.

That can cause artifacts, like seeing signal bleed into other color channels, which we call pull -up.

Okay.

So you've got these peaks on the graph.

How does the analyst assign the actual allele number, like allele 10?

They run a standard alongside the sample called an allelic ladder.

A ladder?

Yeah.

It's not a real DNA sample.

It's a manufactured mix containing synthetic DNA fragments that correspond to all the common alleles known for that specific STR locus.

Ah, like a ruler.

Exactly like a ruler.

The software sizes the peaks from the sample and compares them to the rungs on this allelic ladder to assign the correct allele number.

What if a peak doesn't line up with the ladder?

That would be designated an off -ladder allele.

It happens with rare variants.

Needs careful checking.

So the whole point of this process is to end up with one of three results.

Inclusion, it's a match.

Exclusion, it's not a match.

Or inconclusive.

That's the goal.

But it sounds like getting that clean result can be complicated by,

well, reality.

Biological glitches and lab process artifacts.

Let's talk about the biological ones first.

Right.

You've got errors that come from nature and errors that come from the process.

On the biological side, a key one is germline mutations.

Mutations in the STRs themselves.

Yeah.

These are inheritable changes, usually gaining or losing just one repeat unit.

It's not super common.

Maybe happens about once in every 10 ,000 times an STR is passed from parent to child.

But it can complicate things, especially in kinship testing, like paternity cases.

Any other biological oddities?

Occasionally you see triallelic patterns.

That's three distinct peaks at one locus, where you'd normally expect one or two.

Three.

How does that happen?

It could be due to things like gene duplications or certain chromosomal abnormalities, like trisomy, where there's an extra copy of a chromosome segment carrying that locus.

It tells the analyst something unusual is going on genetically at that spot.

Okay.

And then there's the one that sounds particularly tricky.

The null or silent allele.

Ah, yes.

That's a real potential pitfall.

A null allele happens if there's a mutation, like a single base change, right in the flanking region where the PCR primer is supposed to bind.

So the primer can't attach.

Correct.

And if the primer can't bind, that allele doesn't get amplified at all.

It just disappears from the profile.

What does that look like on the results?

It makes a heterozygous person, someone who actually has two different alleles, look homozygous, like they only have one, because only one of their alleles amplified successfully.

It's a critical interpretation challenge.

Okay.

Shifting gears to the artifacts from the lab process itself.

Stutter seems like a big one.

Stuttering is probably the most common PCR artifact.

It happens when the polymerase enzyme kind of slips during copying, usually creating a smaller peak that's one repeat unit shorter than the main true allele peak.

So it's expected noise to some extent.

It is.

It's a known byproduct.

That's why labs have rules about it.

They look at the stutter ratio, the height or area of the stutter peak, compared to the main allele peak.

And there's a threshold.

Yeah.

A common rule of thumb is that the stutter peak should be less than, say, 15 % or 0 .15 of the height of the true allele peak.

What if it's higher than that?

If it's significantly higher, that's a red flag.

It might not be stutter at all.

It could indicate that there's actually DNA from a second person mixed in.

Ah, so it helps spot mixtures.

What other lab artifacts pop up?

Well, there's non -template adenylation, sometimes called the plus A peak.

The polymerase enzyme likes to add an extra A nucleotide onto the end of the fragment.

Making it one base pair longer.

Exactly.

Most modern forensic kits are designed to encourage this, actually, so that nearly all fragments have that extra A.

It makes the sizing more consistent.

Okay.

What about peak heights not being equal when someone is heterozygous?

Right.

That's heterozygote imbalance.

You have two different alleles at a locus, but one peak is significantly taller than the other.

Why does that happen?

Often it's due to preferential amplification.

Smaller DNA fragments tend to amplify more efficiently than larger ones.

So if someone has one short allele and one long allele at a locus, the shorter one might produce a much stronger signal.

And if that imbalance is really extreme?

The extreme version is allelic dropout.

This is where the imbalance is so severe that one of the alleles, usually the larger one, fails to amplify above the detection threshold altogether.

So again, a heterozygote looks like a homozygote.

Same problem as a null allele, but a different cause.

Precisely.

Both lead to the same potential misinterpretation if not carefully considered.

Okay.

Let's tackle the really tough samples labs face.

Starting with degraded DNA.

The molecules are broken into small pieces.

How do they get a profile from that?

Yeah.

Degradation is a huge challenge.

Those longer STR fragments just aren't intact anymore.

Larger alleles are more likely to drop out because the target DNA is too fragmented.

So what's the workaround?

The solution is to use minSTRs.

These kits use primers that bind much closer to the core repeat region.

So the whole piece you need to amplify is smaller.

Exactly.

By shrinking the overall amplicon size, you increase the chance of successfully amplifying it, even if the DNA is badly broken down.

You might still only get a partial profile, but it's better than nothing.

Makes sense.

Then there's low copy number or LCN testing, tiny amounts of DNA.

All right.

We're talking less than a hundred picograms of starting DNA, maybe just a few cells.

To get a signal, you have to increase the number of PCR cycles, maybe from the standard 28 up to 34 cycles.

Boost the amplification.

Yeah.

But boosting it that much comes with risks.

It significantly increases the likelihood of all those artifacts we talked about.

Stutter becomes more pronounced, dropout is more likely, heterozygote imbalance gets worse.

And contamination.

And contamination becomes a huge issue.

You might accidentally amplify a stray bit of DNA that isn't from the sample, leading to allele drop -in.

So with all those increased risks, how can labs be confident in an LCN result?

Strict protocols.

The key is replication.

An LCN profile generally isn't considered reliable unless it can be reproduced in at

independent amplification reactions from the same original extract.

Okay.

Multiple checks.

Finally, the Mount Everest of interpretation.

Mixtures.

DNA for more than one person.

Very common, I imagine.

Extremely common, especially in sexual assault cases.

But also touch DNA, fingernail scrapings, lots of scenarios.

Interpreting mixtures is complex.

How do you even know you have a mixture?

There are telltale signs.

Seeing more than two peaks, more than two alleles at multiple different STR loci is the clearest sign.

But also really severe heterozygote imbalance across several loci, or stutter peaks, that are consistently above that typical 15 % threshold we mentioned.

Those are strong indicators, too.

So once you suspect a mixture, what's the process?

It's methodical.

First, confirm it is a mixture.

Then try to determine the genotype possibilities present at each locus.

You have to figure out the maximum number of contributors.

Remember, each person can only contribute at most two alleles per locus.

Then you try to estimate the ratio of contribution.

Are the peak heights suggesting a major contributor and a minor one?

Or is it more balanced?

The sex marker, a melanogenin, can sometimes help here if it's a male -female mixture.

After that, you consider all the possible genotype combinations that could explain the observed peaks.

And finally, you compare those possible combinations to the profiles from known individuals like the victim or suspects to see who could be included or excluded as a contributor.

It sounds incredibly challenging, requiring careful judgment.

It requires rigorous adherence to established guidelines.

Set by groups like SWGDAM or ISFG, it's definitely one of the most complex areas.

This has been a really insightful look behind the curtain at Forensic DNA.

Let's quickly recap the main points for everyone.

Sure.

I'd say takeaway number one is that STRs really change the game because they're small.

That allows multiplexing and makes them work even on degraded DNA, which was a huge leap.

Okay.

Number two.

Number two, getting an accurate profile depends on comparing the sample's peaks to that allelic ladder standard.

And analysts have to be constantly vigilant about interpretation errors from things like stutter or null alleles or dropout.

Right.

Avoiding the pitfalls.

And third, and third, those really tough samples, the low amounts of DNA, LCN, the degraded stuff, and especially mixtures, they need special handling.

Techniques like MINISTR help, but careful guideline -based interpretation, often involving replicate testing, is absolutely essential.

Fantastic summary.

Thank you for breaking down these molecular markers that are so fundamental to the justice system today.

It's clear that reliability comes from understanding and mitigating both the biological quirks and the chemical artifacts.

My pleasure.

It's fascinating stuff.

And to leave you with something to think about, we've talked about the incredible power of Forensic DNA, achieving those astronomical odds against a random match by testing 13 or more independent loci.

But we also heard that a small mutation, a germline change, can happen at one of those loci roughly once every 10 ,000 times it's passed on.

So how does the forensic world constantly balance the immense statistical weight of a multi -locus profile against the undeniable biological reality that tiny errors and genetic surprises do occur?

Something to mull over.

Until next time on the Deep Dive.

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

Chapter SummaryWhat this audio overview covers

Autosomal short tandem repeat profiling represents the gold standard technique for forensic identification and genetic comparison in modern laboratories. STR markers consist of tandemly repeated DNA sequences—typically 2 to 6 base pair units—flanked by distinctive sequences that provide the structural framework for individual variation. The discriminatory power of STR analysis depends on utilizing standardized locus sets such as CODIS, SGM Plus, and ESS systems, which target highly polymorphic regions including markers like TH01, VWA, FGA, and D3S1358 across multiple chromosomes to establish unique profiles. Capillary electrophoresis serves as the primary separation methodology, enabling forensic scientists to visualize amplified DNA fragments and assign allele counts based on peak positions and heights in an electropherogram. The interpretation process yields categorical conclusions of match, exclusion, or inconclusive results depending on whether profiles exhibit concordance at all tested loci. However, STR analysis confronts numerous complicating factors that challenge accurate interpretation. Genuine biological variations include mutations within repeat sequences themselves, point mutations affecting flanking regions, and chromosomal duplications that generate unexpected three-allele patterns in individuals. Technical artifacts frequently complicate analysis, with stuttering creating spurious minor peaks positioned one repeat unit below true alleles, nontemplate adenylation adding artificial adenine bases that generate false peaks, and heterozygote imbalance causing unequal peak heights between alleles of a heterozygous individual. Electrophoretic instrumentation can produce pull-up peaks from spectral bleed-through and electronic noise that must be distinguished from genuine allelic signals. Practical casework complications further challenge forensic practitioners, particularly when analyzing degraded DNA samples that yield incomplete profiles with missing loci, conducting low copy number testing where stochastic variation and random sampling effects reduce reliability, and interpreting mixed DNA profiles originating from multiple contributors where determining the number of contributors, their respective allele frequencies, and relative mixture proportions demands sophisticated statistical approaches and careful evaluation of alternative explanations.

Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.

Support LML ♥

Chapter 20: Autosomal Short Tandem Repeat Profiling

Related Chapters