Chapter 22: Single Nucleotide Polymorphism Profiling

Search this chapter

Audio Overview

0:00 / 0:00

Autoplay next chapter

Welcome to Last Minute Lecture.

This free chapter overview is designed to help students review and understand key concepts.

These summaries supplement not replaced the original textbook and may not be redistributed or resold.

For complete coverage, always consult the official text.

Welcome back to the Deep Dive.

So today, we're going a bit beyond the usual forensic markers.

You know, the SDRs, short tandem repeats that always come up.

Instead, we're zooming in on

the most abundant type of variation in our DNA,

the single nucleotide polymorphism.

You probably know them as SMPs.

That's right.

And we're talking about tiny changes here.

Think of it like a single typo in the massive book that is your genome, maybe a T where there should be a C, or sometimes a base just gets added or deleted.

It all starts from spontaneous mutations.

A single letter difference.

And the scale of this is huge, isn't it?

Oh, absolutely massive.

The estimate is around 10 million SMPs across the human genome.

They are by far the most common type of genetic variation we see.

And something important is that most of them are biolelic.

Biolelic, meaning?

Meaning, for most SMP locations, there are typically only two possibilities, two alleles in the population.

Think of it like a coin flip, heads or tails.

It simplifies things in one way, but complicates them in others, as we'll see.

Okay, got it.

So our goal today for you listening is to really break down how forensic science has actually used these tiny, tiny differences for human ID.

We'll trace the tech from the very first kits that use them right up to the potential of things like next generation sequencing today, how these little variations might even help investigators build a picture of someone.

Exactly.

From basic identification to potentially predicting physical traits.

Okay, so let's jump right in.

We know STRs are incredibly powerful.

They're the standard backbone of CODIS.

But SMPs, you said they're mostly biolelic, less polymorphic.

So why would forensic scientists even turn to them?

What's the big advantage here?

Yeah, it's a great question.

There are a few advantages, but the absolute key one, especially for the kind of evidence you often get in forensics, is the size of the DNA fragment you need to analyze.

With SMPs, the amplified bits, the amplicons, they're really usually just 50 to 100 base pairs long.

Okay, and compared to STRs.

STR analysis needs much longer, intact pieces of DNA.

But think about forensic samples.

They might be really old, degraded, exposed to the environment, basically shredded.

It's just much easier to find a short, intact 50 base pair piece in that mess than a longer 300 base pair piece needed for STRs.

Okay, so SMP profiling really shines when the DNA quality is poor.

Exactly.

It's tailor -made for those highly fragmented or degraded samples where STRs might just, well, fail completely.

Right, that makes perfect sense.

And you also mention their sheer abundance, 10 million to choose from, and a low mutation rate.

Yep, the low mutation rate is particularly useful for things like complex paternity testing, or maybe identifying remains through distant relatives.

And because they're just single point changes, the testing is very amenable to automation.

You can design assays to check many SMPs at once, high throughput style.

Okay, so great for bad samples, easy to automate.

But then why aren't they the main tool?

Why did STRs end up dominating the scene?

What are the downsides?

The main one goes back to that bio -allelic nature, the low polymorphism.

Each SMP just doesn't give you as much identifying power as a multi -allelic STR locus does.

Less bang for your buck, basically.

You could put it that way.

To give you an idea, you'd need to analyze somewhere between 50 and 60 different SMP loci just to get the same statistical power, the same ability to discriminate between people as the standard 13 -core STR loci used in CODIS.

Wow, 50 or 60 versus 13, that's a big difference.

It is.

And that same low polymorphism makes it incredibly difficult to interpret DNA mixtures, samples with DNA, from more than one person.

If the contributors share the common SMP alleles, which is likely, it gets very messy to pull them apart.

Okay, that's a major hurdle.

And the other one must be the databases, right?

Absolutely.

All the major national and international DNA databases like CODIS here in the U .S.

are built entirely on STR profiles.

So if you get an SMP profile from a crime scene and you don't have a suspect to compare it to directly, you can't search it against the database.

It's incompatible.

So no cold hits using SMPs alone.

That's a massive practical limitation.

A huge one.

It forces labs into a different workflow if they only have SMP data.

Okay.

But despite these limitations, SMPs actually have a really interesting history in forensics.

They were used in some of the earliest commercial DNA kits, weren't they?

They were indeed pioneers.

The very first locus targeted for this kind of typing was HLA DQA1 on chromosome 6.

It's part of the immune system.

And a specific region within it was known to be quite variable, making a good starting point.

And the first commercial kit came out of that.

Yeah.

In the late 1980s, the DQA ample type kit, it could distinguish seven different alleles, leading to 28 possible genotype combinations.

Which sounds okay, but how powerful was it really?

Not very, by today's standards.

The probability of two random people matching, the PM, was around 5 in 100 or 5 by 10 too.

So pretty high chance of a random match.

Exactly.

It was useful mainly as a quick screening tool, mostly to exclude suspects rather than definitively link someone to a scene.

So the next logical step was to add more markers to boost that power.

That led to the poly marker system.

Correct.

That came along in 1993.

The ample type PM kit, the poly marker kit, added five more loci to the mix, LDLR, GYPA, HPGG,

D7S8, and GC.

Some of these actually had three alleles, which helped a bit.

And adding those five lists, I must have improved the odds significantly.

It did.

It dropped that random match probability, the PM, down to about 1 in 10 ,000 or 10 to 4.

Much better.

And this was all PCR -based, right?

Which was a big deal compared to older methods.

Huge deal.

Compared to the older RFLP VNTR methods, which were complex and needed lots of DNA, this PCR -based poly marker system was much more sensitive, eating only about two nanograms of DNA.

Plus, it could handle more degraded samples and was way faster and less labor intensive.

So it was a major step forward, even got accepted in courts,

but ultimately it still got replaced.

Yeah, by the late 1990s, STRs took over.

Even with the improvements, the poly marker system just didn't have the discrimination power of STRs.

And that difficulty with mixtures was still a major issue.

It became a kind of bridge technology.

Interesting.

Now, to really understand how these early kits like poly marker worked, we need to dig into the technique they used, allele -specific oligonucleotide hybridization, or ASO.

Can you walk us through that?

Sure.

It's

usually just 14 to 17 bases long.

The key is that under very specific lab conditions,

these probes will only stick or hybridize to the target DNA sequence if the match is absolutely perfect.

So that single base difference, the SMP itself, is enough to prevent the probe from binding properly.

Exactly.

A single mismatch and the probe won't stick, or at least not stably under the test conditions.

It acts like a very precise lock and key system.

And you mentioned these kits used a reverse blot format.

What does that mean?

Right.

Instead of putting the sample DNA onto the membrane and washing probes over it, they did the reverse.

They took the ASO probes, the little keys for each specific allele, and chemically attached them, immobilized them in specific spots on a solid strip, usually made of nylon.

Okay.

So the probes are fixed onto the strip.

Then how do you see which allele from the sample DNA stuck to which probe?

That's the detection part.

And it involves a few steps leading to a color change.

First, you amplify the regions of the sample DNA containing the SMPs using PCR.

But the primers you use for this PCR have a little chemical tag on one end, biotin.

Biotin, okay.

So all your amplified DNA fragments end up being biotinylated.

Then you denature this tagged DNA, make it single stranded, and wash it over the membrane strip that has all the different ASO probes stuck to it.

And the sample DNA will only stick to the probe spots where there's a perfect match.

Precisely.

Then to see where it's stuck, you add a detection system.

This usually involves streptavidin, a protein that binds incredibly tightly to biotin.

And this streptavidin is linked to an enzyme, often horseradish peroxidase or HRP.

So the streptavidin HRP complex latches onto any biotinylated DNA that's hybridized to a probe on the strip.

You got it.

Then the final step is adding a chemical substrate, something like TMB,

tetramethylbenzidine.

It's normally colorless.

But if HRP is present at a spot, because the DNA is stuck there, the HRP enzyme catalyzes a reaction that turns the TMB into a visible blue precipitate.

A distinct blue dot appears right on the strip at the location of that specific probe.

A blue dot tells you the allele is present.

That's a really neat visual readout.

And I remember the source mentioning control dots, C dots, or S dots.

Yeah, those are crucial quality controls.

They're spots on the strip designed to always light up if enough DNA was amplified overall.

It helps ensure you don't mistakenly think an allele is missing just because the PCR amplification might have failed or been too weak.

You need that baseline check.

Makes sense.

You need to trust the negative results, too.

Okay, so that's the historical context.

Let's shift to now and the future.

Given that STRs handle most routine ID, where do SNPs fit into the modern forensic toolkit?

They're still really important, especially in a few key areas.

First, as we talked about, they remain vital for those really degraded samples where STRs just won't work.

That includes using SNP panels on autosomal DNA.

Also, mitochondrial DNA SNPs are very useful.

Instead of sequencing the entire mitochondrial genome, which takes time, analyzing specific MTDNA SNPs can be a faster way to identify remains, especially by comparing to maternal relatives.

Okay, so degraded samples and MTDNA, what else?

A big area now is using SNPs for forensic intelligence generating leads when investigators have no suspect.

This is where Ancestry Informative Markers, or AIMS, come in.

These are specific SNPs known to vary in frequency between different populations around the world.

Analyzing a panel of AIMS from a crime scene sample can give investigators clues about the likely biogeographical ancestry or ethnic origin of the contributor.

That could be a huge help in narrowing down suspect pools or focusing investigations.

Absolutely.

And then there's the really cutting edge stuff, forensic phenotyping.

Predicting physical appearance from DNA.

Exactly.

This focuses on what are called non -synonymous SNPs, or NSSNPs.

These are SNPs located within the coding parts of genes, the exons, and they actually change the amino acid that the gene codes for.

And changing the protein can change a physical trait.

Precisely.

For instance, certain SNPs in the MC1R gene are strongly linked to red hair, fair skin, and freckles.

Similarly, SNPs in another gene involved in pigmentation, the P gene, are associated with differences in eye color.

Wow.

So potentially, from a tiny DNA sample, you could tell investigators, we think the person has red hair and blue eyes.

That's the direction it's heading.

It's not perfect yet, but the potential for lead generation is enormous.

Incredible.

Are there other applications maybe in forensic medicine?

Yes.

Briefly, there's research into SNPs linked to certain health conditions that might be relevant in death investigations.

For example, SNPs in genes like KCNH2 or SN5A might be useful in cases of sudden cardiac death, potentially indicating long QT syndrome.

And in toxicology, SNPs in genes that code for drug metabolizing enzymes like CYP2D6 could help explain why someone might have overdosed on a particular drug level, their genetics might make them much slower or faster than average.

That's fascinating detail.

Okay, doing all this complex SNP analysis, especially phenotyping and ancestry, sounds like it needs serious technological power.

Which brings us to next generation sequencing, or NGS.

Right.

NGS is really where the field is moving for this kind of large scale SNP analysis.

It offers much higher throughput.

You can analyze vastly more markers simultaneously and potentially at a lower cost per marker compared to older methods like Sanger sequencing.

But there must be challenges for using it in forensics.

The main ones right now are, well, NGS technologies tend to have a slightly higher error rate than the gold standard Sanger sequencing.

That's being addressed, but it's a factor.

And perhaps more practically, forensic samples often yield tiny amounts of DNA, maybe just nanograms.

Whereas many NGS workflows traditionally prefer micrograms of starting material.

That's a thousand fold difference.

A huge gap.

How do labs bridge that?

How do you prepare a tiny forensic sample for NGS?

It involves a few key steps, often called library preparation.

First, the DNA you have is fragmented into smaller pieces.

Then special DNA sequences called adapters are attached or ligated to the ends of these fragments.

These adapters contain sites for universal primers so you can amplify everything later.

Okay.

Fragment add adapters.

Sometimes these adapters also include unique molecular barcodes or index tags.

This allows you to pool samples from different cases together in one sequencing run and then computationally sort them out later, which improves efficiency and helps track samples.

Smart.

Then you need to amplify this prepared DNA.

Massively amplify it.

Because the sequencing machines need dense clusters of identical DNA molecules to get a detectable signal.

Two common ways are emulsion PCR, where DNA fragments attached to beads are amplified inside tiny oil droplets.

Or solid phase PCR, maybe more common now, using techniques like bridge amplification, where DNA fragments attached to a flow cell surface bend over and create bridges, which are then copied, forming dense clusters right there on the surface.

Millions of toppers packed together.

And then the sequencing machine reads them.

Can you give us a simple idea of how that reading happens?

What's the chemistry?

Sure.

One well -known example is pyrosequencing, which is a type of sequencing by synthesis.

The machine tries adding one type of nucleotide, A, T, C, or G at a time, to the waiting DNA template strands in those clusters.

If that nucleotide is the correct one to be added next, it gets incorporated.

This incorporation triggers a chemical cascade that releases pyrophosphate, or PPI.

This PPI is then used in another reaction involving luciferase, the same enzyme fireflies use, which produces a flash of light.

So a flash of light means the correct base was added.

Exactly.

The machine detects the light flash, and knows which base was just incorporated.

It repeats this cycle, adding different nucleotides one by one, and records the sequence of flashes to read the DNA sequence.

That's a really clear way to picture it.

Okay, you mentioned NGS has a higher error rate.

How do scientists ensure the final sequence is accurate, especially for forensic work?

This comes down to a critical concept, coverage, or sometimes called sequencing depth.

It's basically the average number of times each specific base in your target region gets sequenced independently in the run.

So you're reading the same spot over and over again.

Precisely.

Because errors can happen randomly during the sequencing chemistry.

If you only sequence a base once or twice, you can't be sure if a variation you see is real, or just an error.

But if you sequence that same base, say, 30 times, achieving 30x coverage, and 28 of those reads say it's a G, and maybe only 2 say it's an A, you can be very confident that the true base there is G.

You need that high coverage, often 10x to 30x or even more, for forensic S &P analysis to filter out noise and get a reliable result.

That makes complete sense.

Build confidence through repetition.

Wow.

Okay.

We've covered a lot of ground here today.

We started with defining S &Ps, these tiny single base changes.

We looked at their history, how the polymarker system used that clever ASO hybridization with the blue dots.

And now we've explored their modern power for degraded DNA, ancestry, even predicting appearance, all leading towards the high throughput capabilities of NGS.

It really shows how forensic DNA analysis is constantly evolving.

These S &Ps, these minute variations, are opening up new ways to not just identify someone, but to generate really valuable investigative leads from potentially very challenging evidence.

The ability to maybe sketch someone's features or trace their ancestry from DNA alone is quite remarkable.

It really is.

And that brings us to a final thought for you, our listeners, to chew on as we wrap up this deep dive.

If the technology is getting so powerful that forensic science might soon routinely predict physical traits, ancestry, maybe even health predispositions from just a trace amount of DNA, what does that mean for us?

What are the ethical lines we need to consider regarding genetic privacy and the potential for this kind of detailed genomic surveillance in the context of justice and investigations?

Definitely something to think about as the science continues to advance.

Thanks for joining us on this exploration of forensic S &P analysis.

We appreciate you tuning in.

Catch you on the next deep dive.

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

Chapter SummaryWhat this audio overview covers

Single nucleotide polymorphisms represent the most abundant form of genetic variation occurring naturally within human populations and have emerged as essential tools for forensic identification and investigative purposes. Early forensic SNP analysis relied on targeted approaches such as allele-specific oligonucleotide hybridization techniques focusing on specific loci like HLA-DQA1, with commercially developed systems including DQα AmpliType and Polymarker platforms that established foundational protocols for genetic profiling work. Over time, forensic practitioners expanded their analytical capabilities to incorporate autosomal SNPs, enabling greater discrimination power when distinguishing between individuals in criminal investigations. A major innovation in forensic genetics involves ancestry informative markers, which leverage population-specific allele frequency distributions to estimate the geographic origin of biological evidence, thereby generating valuable investigative leads about a person's likely ancestral background. Beyond ancestry estimation, forensic phenotyping has become a transformative application wherein SNPs associated with genes controlling observable traits such as MC1R and P allow analysts to make inferences about an individual's likely hair color, eye color, and other visible physical characteristics that may aid investigative direction. SNaPshot assay represents a widely adopted detection methodology that employs primer extension chemistry to genotype multiple SNP loci simultaneously with efficiency and accuracy. Detection approaches have further evolved to include array-based platforms capable of analyzing hundreds or thousands of markers in a single assay run, providing scalable solutions for high-volume casework. Next-generation sequencing technologies have fundamentally transformed SNP analysis workflows, requiring careful attention to multiple technical stages including initial DNA sample handling, library preparation, incorporation of fragment labeling systems, and implementation of sample indexing strategies that enable simultaneous processing of multiple evidence samples. Achieving and maintaining sufficient sequencing depth remains critical for ensuring reliable and consistent results across forensic case samples and appropriate reference populations used in comparative analysis.

Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.

Support LML ♥

Chapter 22: Single Nucleotide Polymorphism Profiling

Related Chapters