Chapter 21: Population Genetics and Hardy–Weinberg Principles

Search this chapter

Audio Overview

0:00 / 0:00

Autoplay next chapter

Welcome to Last Minute Lecture.

This free chapter overview is designed to help students review and understand key concepts.

These summaries supplement not replaced the original textbook and may not be redistributed or resold.

For complete coverage, always consult the official text.

Welcome to The Deep Dive, the show where we take a complex stack of sources and give you the distilled knowledge you need to be immediately informed.

Today we are undertaking a massive mission.

We're plunging into the mathematics and the mechanisms that drive evolution itself.

We're focusing on population genetics.

This feels like a really foundational deep dive.

I mean, for a long time, evolution was taught almost as a descriptive concept.

It was, and population genetics just transforms it into something measurable, something quantifiable.

We are completely shifting our perspective here, moving beyond the individual organism or the single cell.

Okay, so let's try to unpack this.

I think it helps to situate it.

Genetics is usually broken down into, what, four main areas?

That's right.

They sort of build on each other.

You start with transmission genetics.

That's your classic Mendel, right?

How traits pass from a parent to their immediate kids.

Then you zoom in to the subcellular level with molecular genetics.

You're looking at the structure of DNA, RNA proteins, how genes are actually turned on and off inside the cell.

And then there's quantitative genetics, which is super relevant here.

It deals with those really complex traits like human height or say crop yield that are influenced by dozens or even hundreds of genes all interacting.

Yeah.

Tying all of those threads together, the thing that finally reconciled Mendel's neat, discrete rules with Darwin's slow, continuous change, that is population genetics.

And the crucial shift, you said, is the unit of study.

Exactly.

We stop looking at the individual.

We even stop looking at the family tree.

We start looking at the Mendelian population.

The Mendelian population.

That's just any group of individuals that are interbreeding, right?

They share reproductive ties.

They do.

And their collective genetic material, all of it together, is called the gene pool.

So it's the total collection of all the shared genes and all the different versions, their alleles in that entire population.

Correct.

And that means the central mission, the whole point of population genetics is to understand how the genetic makeup of that shared pool changes over time.

Because that change, that is the definition of evolution.

That is, by definition, the process of evolution, a change in allele frequencies across generations.

So we're asking these huge questions like how much variety is there even in a population?

What processes?

Is it chance?

Is it migration?

Is it selection?

What controls that variety?

And how do populations diverge and become distinct from each other?

How does something as basic as their mating system shape the structure of their genes?

You can see the complexity just by looking at a single species.

The source has this great example of the Cuban tree snail.

Oh, it's a perfect illustration.

The range of shell colors and patterns is just astonishing.

That visible variation is the direct outcome of this massive underlying genetic structure that we can now decode of mathematics.

But this field, it didn't just appear fully formed.

It's rooted in what biologists call the neo -Darwinian synthesis.

It was a monumental unification happening mostly in the early 20th century.

You had Darwin, who gave us the mechanism natural selection, but he had no idea how traits were inherited.

He was missing the engine.

And Mendel provided the engine, the discrete particulate nature of genes.

But at first, his work didn't seem compatible with Darwin's gradual evolution.

So the neo -Darwinian synthesis was the fusion of those two ideas.

Propelled by the mathematical work of giants like Sir Ronald Fisher, Sewell Wright, and J .B .S.

Haldane, it truly established the rigorous, modern foundation of biology as we know it.

And to study any kind of change, you first need a baseline.

You need a state of no change to measure against.

You do.

And that benchmark, the fundamental theoretical tool for this entire discipline, is the Hardy -Weinberg law.

It's so elegant in its simplicity.

It's basically a mathematical model that sets up the null hypothesis.

The absolute null hypothesis.

What would happen to the genes and population if no evolutionary forces were acting on it at all?

It lets us isolate the effects of evolution by seeing what happens when we start breaking its rules, one by one.

Okay, so before we get into the law itself, we have to start with the language of population genetics, which is all about quantification.

We're moving from just counting individuals to calculating frequencies.

And a frequency fundamentally is just a proportion.

It's a number between zero and one.

If you find, say, 150 people out of a group of 500 have a specific trait, the frequency is just 150 divided by 500, or 0 .30.

We apply this to the gene pool at two levels, the genotypes and the alleles.

That's right.

Let's start with genotype frequencies.

How do you actually calculate those in the field?

Well, it's the most direct application of that idea.

You just count.

You count the number of individuals that have a specific genotype, and then you divide that by the total number of individuals you sampled, which we call N.

And the book is a great example here.

The scarlet tiger moth, Panaxia dominula,

researchers collected a sample, and the visible patterns on the moths corresponded to three distinct genotypes at one gene.

And in that specific sample, they collected a total of 497 moths.

Now, only two of those individuals had the genotype BB43 were heterozygous, so BB, and the vast majority, 452 of them, were homozygous dominant, BB.

So if we run the calculation for those frequencies.

Okay, so the frequency of the rare one, FBB, is two divided by 497, which is a very, very low, 0 .0004.

Right.

And the heterozygous frequency, FBB, that's 43 over 497, or about 0 .087.

And finally, the dominant one, FBB, is 452 out of 497, which comes to 0 .909.

And if you add those three up, 0 .004, 0 .087, and 0 .909, they sum to 1 .0, which means we've accounted for the entire population.

Perfect.

Now, here's the key question.

Why do population geneticists say that allele frequencies are fundamentally more important than these genotype frequencies we just calculated?

This is maybe the single most important concept to get right.

We can see and count genotypes, sure.

But the individual organism is just a temporary vehicle for its gene.

Just a temporary combination.

Exactly.

When that organism reproduces, its genotype gets broken apart.

It only passes on individual alleles through its sperm or eggs.

So it's the alleles, the A or the A, that have continuity over evolutionary time.

A population is only truly evolving if the frequency of those underlying alleles is changing.

So if the genotype frequencies shift around, but the underlying allele frequencies stay the same, that's not really evolution.

It's just a reshuffling.

That's a perfect way to put it.

So calculating these allele frequencies has to be done really carefully.

The best way is called direct gene counting.

It is.

You determine the total number of copies of a given allele in the entire population, and then you divide that by the total number of all alleles available at that locus.

And since we're mostly dealing with diploid organisms, with two alleles at each gene, the total number of alleles is just two times the population size, or 2N.

Let's use that 1 ,000 individual population example from the text.

We have 353 LA individuals, 494 AA, and 153 A.

So the total number of alleles is 2 ,000.

Right.

And to find the frequency of the allele, which we'll call P, we just have to count up all the As.

So the AA individuals, all 353 of them, they each contribute two A copies.

So that's two times 353.

706.

And the AA heterozygotes, all 494 of them contribute one A copy each.

So that's another 494.

Okay, so 706 plus 494 gives us a total of 1200 A alleles.

And P, the frequency of A, is that 1200 divided by the total of 2000 alleles, which equals 0 .60.

And since there are only two alleles and their frequencies have to add up to one, if P is 0 .6, then Q, the frequency of the A allele, must be 0 .4.

It has to be.

But you could double check by counting.

Yeah.

You take two times 153 from the A individuals plus the 494 from the heterozygotes, which gives you 800.

And 800 over 2000 is, in fact, 0 .40.

I know there's another way to do it, sort of a shortcut, using the genotype frequencies we already have.

You can, yeah.

You can find T by taking the frequency of the AA homozygote and adding half the frequency of the heterozygotes.

So P equals FAA plus one half of FAA.

In our example, that would be 0 .353 plus half of 0 .494, which, yep, it gives you 0 .60.

It works mathematically, but I have to say, in real world data analysis, especially with huge data sets, you always want to use the direct gene counting method.

Relying on the frequency calculations can introduce compounding rounding errors.

It's just cleaner to count.

Okay.

What if we step up the complexity?

What about a system with multiple alleles?

Three alleles, say A1, A2, and A3, with frequencies P, Q, and R.

The ABO blood type system is the classic human example of this.

The principle is identical.

It doesn't change at all.

You just count the contribution of each allele across all the possible genotypes.

So to find the frequency of A1, you'd count two copies from every A1, A1 individual, plus one copy from every heterozygote that has an A1.

Right.

So the A1, A2, and the A1, A3 individuals, and you add all that up and divide by the total number of alleles, 2N.

And just like before, P plus Q plus R must sum to one.

The book has a detailed example of this with an enzyme called PGM in milkweed beetles.

Yes.

In a sample of 207 to 4 beetles, they found six different genotypes from the combinations of three different alleles.

Once you run the counts across all 548 alleles in that sample, you get the frequencies.

PA1 is 0 .135, QA2 is 0 .542,

and RA3 is 0 .323.

It's pretty amazing that we can characterize that level of hidden variation just by using these straightforward counting methods.

It is.

And it applies directly to clinical situations.

There's a study of human hemoglobin variants in Nigerian populations involving the A, S, and C alleles.

A huge sample too, over 3 ,000 people.

A massive sample.

And you have six possible genotypes to count.

AA, AS, AC, SS, SC, and CC.

The final result gave them a quantitative baseline.

The frequency of the normal allele was 0 .831, the sickle cell S allele was 0 .134, and the C variant was 0 .035.

It shows that 83 % of the alleles in that gene pool are the standard type, but these other variants persist at really significant measurable levels.

Exactly.

Okay, before we get to Hardy Weinberg, we have to talk about X -linked genes because the math for counting alleles is fundamentally different.

Absolutely.

You can't just use 2N anymore.

Because females are XX, they're deployed for the X chromosome, and males are XY or hemizygous, the total number of alleles has to be calculated as 2 times the number of females plus 1 times the number of males.

And crucially, the way the genotypes are distributed in the population is different too.

For females, if the population is in equilibrium, their genotypes still follow the standard P squared, 2PQ, and Q squared proportions.

But for males, it's simpler.

Since they only have one X chromosome, their genotype frequency is exactly the same as the allele in the population.

If the frequency of a recessive allele is Q, then the frequency of males who show that recessive trait is just Q.

And that asymmetry has a huge, powerful consequence.

It means that recessive X -linked traits are dramatically more common in males than in females.

Look at X -linked red -green colorblindness in African American populations.

The frequency of the recessive allele Q is about .039, so the expected frequency of colorblind males is simply 3 .9%.

But for a female to be colorblind, she needs two copies of that allele, so her expected frequency is Q squared.

Which is .039 times 0 .0015.

Wow, so that's a difference between 3 .9 % in males and just .15 % in females.

That's profound.

It is, and this simple adjustment to the calculation explains a massive pattern of inheritance we see across all human populations.

Why males disproportionately suffer from these recessive X -linked conditions.

Okay, so now that we have the tools, we have P and we have Q, we can finally introduce the Hardy -Weinberg law.

Yes.

Developed independently back in 1908 by Godfrey H.

Hardy, a British mathematician, and Wilhelm Weinberg, a German physician.

And the law is really structured in three parts.

The non -negotiable assumptions, and then the two critical results that flow from them.

Let's start with those ideal, almost impossible conditions.

What are the five core assumptions?

For the HMW law to hold, a population must be, one, infinitely large.

Two, it has to be randomly mating, at least for the gene we're looking at.

Three, there can't be any new mutations.

Four, no migration in or out, no gene flow.

And five, absolutely no natural selection.

Those are five extremely strict conditions.

Now, if a population does meet all of them, what are the two results that make this law the null hypothesis of evolution?

Result number one.

The allele frequencies, P and Q, will not change from one generation to the next.

They stay perfectly constant.

And result two.

The genotypic frequencies will stabilize after just one single generation in these predictable proportions.

P squared for the homozygous dominant, 2PQ for the heterozygous, and Q squared for the homozygous recessive.

And if a population meets those results, it's in what we call Hardy -Weinberg equilibrium.

That's right.

It's not evolving.

Let's dig into those assumptions, because they're obviously not realistic.

An infinite population size, for instance.

Why can we still use this model for large populations?

Well, the infinite part is a mathematical abstraction.

It's a way to completely eliminate the role of chance or what we call sampling error.

Which is genetic drift.

Which is random genetic drift.

In a really large population, hundreds of thousands, millions, any random fluctuation in which gametes happen to make it to the next generation is irrelevant.

The effects of chance are so minimal that the HW model holds for all practical purposes.

What about the random mating assumption?

This gets misunderstood, especially for humans.

It doesn't mean we just choose partners completely randomly for everything.

Precisely.

It only means mating has to be random with respect to the specific gene you're studying.

We know humans don't meet randomly for things like height or IQ.

But for a trait that's invisible, like your MN blood type, which has zero influence on who you choose as a partner mating, is effectively random for that gene.

So a human population could be in Hardy -Weinberg equilibrium for MN blood types, even if it's not for genes related to height.

Exactly.

The law is locus specific.

So fundamentally, the HW law is just showing what happens when inheritance is driven only by Mendelian segregation and the random combination of gametes.

That is its power.

You can imagine the entire gene pool as this huge bucket of gametes.

A proportion P of them carry allele A, and a proportion Q carry allele A.

Random mating is like reaching in and pulling out two gametes at random to make a new individual.

And that leads directly to the binomial expansion.

It does.

The probability of pulling an A and then another A is P times P, which is P squared.

The probability of pulling an A and then another A is Q times Q, or Q squared.

And for the heterozygotes AA, you can get it two ways.

You can pull an A, then an A, which is P times Q, or you can pull an A, then an A, which is Q times P.

You add those together and you get 2PQ.

So the total genotypic structure is P plus Q squared equals P squared plus 2PQ plus Q squared.

And the math proves this structure is incredibly stable.

Even if a population starts way out of whack, say, it's 90 % heterozygotes, just one single generation of random mating is enough to instantly snap the genotype frequencies into those HW proportions.

And they'll stay there forever, as long as none of the other forces are acting on them.

Guaranteed stability.

The algebraic proof, which can look a bit intimidating, just shows that if you sum up the outcomes of all possible matings, AA with AA, AA with AA, and so on, each weighted by its HW frequency, the next generation's P and Q are identical to what you started with.

This relationship between allele and genotype frequencies is so important.

If you picture it as a graph, you see that the maximum possible frequency of heterozygotes is .5.

It peaks right when the two allele frequencies are equal, when P is 0 .5 and Q is .5, but the most profound insight, especially for human genetics, comes when one allele is rare.

Right.

When Q is very low, the homozygous recessive genotype, Q squared, becomes by far the rarest of the three.

And this translates immediately into a really powerful application, estimating how many people are carriers for rare recessive diseases.

It explains what you could call the rarity paradox.

Exactly.

When a recessive allele is rare, most of the copies of that allele aren't in the people who are sick.

They're hidden away, unexpressed, in the heterozygous carriers.

Let's use the albinism example.

For tyrosinase negative albinism in North American whites, the frequency of the disease itself, FAA, is incredibly low.

It's about 1 in 40 ,000.

Which is a frequency of 0 .000025.

So if we assume this population is in HW equilibrium for this gene,

we can use that frequency of affected people, Q squared, to work backwards and find the allele frequency Q.

Right.

Q is just the square root of 0 .000025, which gives us Q 0 .0005.

That's the frequency of the recessive albinism allele.

So P, the frequency of the normal allele, is 1 minus that, or 0 .995.

But the real aha moment comes when you calculate the carrier frequency, which is 2PQ.

You calculate 2 times 0 .995 times 0 .005, and you get a carrier frequency of about 0 .000995.

Which is roughly 1 in 100 people.

Think about that.

The disease itself affects 1 in 40 ,000.

It's almost invisible.

But the carriers, the people walking around with that hidden allele, are 400 times more common.

Wow.

That mathematical insight, which comes entirely from the HDBU model, is just.

It's foundational.

It reveals this huge hidden reservoir of genetic variation.

So since the Hardy -Weinberg law gives us these exact predictable numbers for genotype proportions,

the logical next step is to take real world data and test it against those predictions.

Right.

If what we observe in nature is significantly different from what HW expects, that's a huge sign that evolution is happening.

And to measure that deviation, we use the chi -square test, the chi test, which is a standard statistical tool in genetics.

But there's a strict rule we have to follow when we set up this test, to avoid a kind of statistical cheating.

We have to avoid circular reasoning.

We need P and Q to calculate our expected numbers, but we absolutely cannot get Q by taking the square root of the frequency of the recessive individuals.

Because that assumes the population is already in HW equilibrium, which is the very thing we're trying to test.

Exactly.

So you must start by using the direct gene counting method on your observed numbers to find P and Q.

Okay.

Once we have P and Q from gene counting, we calculate the expected numbers for AA, AANA, by multiplying the total population size N by those HW proportions, P squared, 2PQ, and Q squared.

And then it's just the standard calculation.

You sum up the observed minus expected squared divided by the expected for each category.

We can walk through the data from the red -backed vole.

They were looking at a protein called transferrin, which had three genotypes, MM, MJ, and JJ.

The total sample size N was 77 voles.

Okay.

So the observed counts were 12 MM, 53 MJ, and 12 JJ.

First step.

Find P, the frequency of the M allele.

So we calculate 2 times 12 from the MMs plus 53 from the MJs.

That's 24 plus 53, which is 77.

And we divide that by the total number of alleles, which is 2 times 77, or 154.

So 77 divided by 154 gives us P equals 0 .50.

And if P is 0 .5, then Q must also be 0 .5.

Right.

So now we can calculate our expected counts.

The expected number of MMs is P squared times N, so 0 .25 times 77, which is about 19 .3.

The expected MJs would be 2 PQ times N, so 0 .5 times 77, or 38 .5.

And the expected JJs is the same as the MMs, also 19 .3.

So when we compare what we saw, 12, 53, 12, to what we expected, 19 .3, 13 .5, 19 .3, using the chess formula, we get a final statistic of 10 .98.

And this brings us to a crucial statistical point, calculating the degrees of freedom.

It's a bit different here.

It's not just the number of categories minus 1.

No.

We have to account for the fact that we estimated a parameter.

We estimated P from the data itself.

So we start with our three genotype classes.

We lose one degree of freedom for the total size constraint, and we lose a second one for estimating P.

So our degrees of freedom is 3 minus 2, which is just 1.

And a trop value of 10 .98 with only one degree of freedom is…

That's highly significant.

The P value is way less than 0 .01.

The conclusion is firm.

This red -backed vole population is not in a Hardy -Weinberg equilibrium.

Something is going on.

And in this case, the number of observed heterozygotes, 53, is way higher than the expected 38 .5.

Which suggests some force, maybe selection that favors heterozygotes, or some kind of non -random mating, is actively shaping this gene pool.

Okay, now let's slip the application around.

Let's look at the Hopi -Albinism example.

In this case, we have to use the HW model to estimate allele frequencies because we can't count them directly.

Right, you use this inverse method when the heterozygote looks exactly like the dominant homozygote.

We can't tell them apart.

And the Hopi tribe in Arizona has a really high rate of albinism.

The frequency of affected individuals, Q squared, is about 0 .0043.

So assuming HW holds, we can estimate Q by taking the square root of that, which is about 0 .065.

And that leads to a carrier frequency, 2PQ, of about 0 .122, which means roughly 1 in 8 Hopi individuals carries the allele for albinism.

An incredibly useful estimate for understanding the genetic load in that community.

But I have a critical question here.

If we know that there are strong social factors that affect albino individuals in the Hopi community like, historically they had a special ceremonial role, doesn't that suggest that mating might not be random?

How much can we trust that 1 in 8 number if a core assumption is possibly being violated?

That is an excellent, excellent point.

You hit on the major limitation of this method.

That estimate is only as good as the assumption it's built on.

We use it in the Hopi case because it's the best estimate we can get.

But you are absolutely right.

If non -random mating or selection were strongly at work, that 2PQ value could be off.

We have to acknowledge we're making a calculation based on an assumption that we can't then go back and test.

How do population geneticists look at variation across space?

We often look for what are called allele frequency clines.

A cline is just a gradual, systematic change in allele frequency across a geographic distance.

The classic case is the blue muscle along the east coast of the United States.

Yes, they were tracking the alleles for an enzyme called LAP.

And they found that the frequency of certain LAP alleles changes smoothly and systematically as you move from the cold waters in the north to the warmer waters in the south.

And that change is tightly correlated with the temperature change itself.

It is.

It's a strong suggestion of an adaptive response to a changing environment over a geographic space.

And what about through time?

Does variation stay constant from one generation to the next?

Rarely.

Especially not in small populations.

There is an example with the Prairie vole and an estrace allele that shows these clear fluctuations in frequency generation after generation.

It highlights how dynamic evolution is in a seemingly stable population.

And finally, let's touch on how this variation is partitioned, especially in our own species.

If you look at all the genetic variants in humans,

how is it distributed across the globe?

This is one of the most powerful and, I think, counterintuitive findings of modern population genetics.

Geneticists consistently find that only about 12 to 13 percent of all human genetic variants exist between different geographic populations.

Between, say, Europeans, Asians, and Africans.

Exactly.

Which means the remaining 87 to 88 percent of our total genetic variation is found within any one of those geographic populations.

So if you just sampled people from a single large population, you would capture almost 90 percent of all the genetic diversity found on Earth.

You would.

And this realization, which came first from those early protein studies and has been confirmed by genomics, has had profound implications for our traditional ideas about human classification and race.

It shows that the genetic differences among groups were actually quite small compared to the vast variation that exists within them.

It's important to remember that for the first half of the 20th century, the math was there, but the tools to actually see the variation weren't.

The dominant idea was the classical model.

Right.

The classical model basically said that most populations are highly uniform.

The idea was that the genome was mostly homozygous for one superior wild type allele with just a few rare bad mutant alleles floating around.

And natural selection was thought to just quickly purify the gene pool, getting rid of those bad mutations.

Leaving very little standing variation behind.

But then, in 1966, the technological revolution hit.

Lewontin and Hubby applied protein electrophoresis to population genetics.

And that was a complete game changer.

It really was.

Electrophoresis let them separate different versions of the same protein based on their electrical charge.

It was a way to visualize the genetic variation that was hidden in the amino acid sequences.

Suddenly, they could quickly and cheaply measure genotype and allele frequencies at dozens of genes at once.

What were the key metrics they used and what did they find?

They looked at two main things.

The proportion of polymorphic loci, or P, which is just the percentage of genes that actually have more than one allele,

and heterozygosity, or H, the average proportion of individuals that are heterozygous across all the genes they looked at.

And their results just shattered that classical model.

Completely.

They found widespread variation far more than anyone expected.

In organisms like drosophila, fruit flies, they found that P was often over 50 % and H was over 10%.

So over half of all the genes they examined were variable, and individuals were heterozygous at more than 10 % of their genes.

The gene pool was not uniform at all.

Not at all.

It was a complex, deep reservoir of extensive variation.

So if the classical model was wrong, what was maintaining all this variation?

Was it all some form of selection that favored heterozygotes?

That was one idea called the balance hypothesis.

But in 1968, Motu Kimura proposed something radically different.

The neutral theory of molecular evolution.

The neutral theory suggested that most of this protein variation they were seeing was actually neutral.

Exactly.

Kimura argued that the different forms of these proteins were often functionally equivalent.

They had no effect on the organism's fitness.

So the frequencies of these neutral alleles weren't being driven by selection at all.

They were just being shaped by the constant input of new mutations and the random loss or fixation caused by genetic drift.

It fundamentally shifted the focus from adaptation to just chance.

It did.

Now the modern consensus has moved beyond a simple choice between selection or neutrality.

It's more of a synthesis.

Right.

The modern view is that mutation, drift, migration, and selection are all constantly interacting.

The relative strength of each of those forces is what determines the level of variation you see at any particular gene.

We've since moved beyond proteins to measuring variation directly at the DNA level.

How did the early molecular methods do this?

One of the first powerful techniques was using restriction fragment length polymorphisms or RFLPs.

You use these special enzymes that cut DNA only at specific nucleotide sequences.

If a mutation changes one of those sites, the enzyme can't cut anymore.

And that results in a different pattern of DNA fragments, which you can see on a gel.

So you can literally see the genotypes.

If you look at five mice, you might see some are genotype 11, where the cut site is on both chromosomes.

Others might be 22, where it's absent on both.

And some would be 12, the heterozygotes.

And this allows for a more precise measure, nucleotide heterozygosity or SNUC.

Which is the proportion of nucleotide sites in the genome where an average person is heterozygous.

For the human genome, that number is pretty low, around 0 .0008.

Which means an individual is likely to have different bases on their two homologous chromosomes at about one in every 1 ,000 positions.

That's right.

The ultimate tool, though, is DNA sequencing.

The study of the ADD gene in Drosophila by Martin Kreitman was a landmark moment.

It really was.

Kreitman sequenced 11 copies of this one gene and found 43 different sites that varied.

But the real breakthrough was how he categorized this variation into synonymous and non -synonymous changes.

Synonymous changes being the silent ones.

They don't change the amino acid sequence because of the wobble in the genetic code.

Right.

Often in that third position of the codon.

While non -synonymous changes are the ones that do change the amino acid, and therefore might change the protein's function.

And Kreitman found that the diversity was way, way higher at the synonymous sites.

Velay -est -ly higher.

And that difference is a powerful signature of natural selection at work.

Specifically, purifying selection.

Yes.

Random mutation should actually create about three times more non -synonymous changes than synonymous ones, just by chance.

But Kreitman found only a single non -synonymous polymorphism that was variable, compared to 13 synonymous ones.

So selection must be rapidly eliminating most of the mutations that change the protein, because they're probably bad for its function.

While the neutral silent mutations are just free to accumulate, governed only by mutation and drift, it was definitive proof that, even if the neutral theory explains a lot of the variation we see, selection is constantly at work, purging the gene pool of functional mistakes.

This is where we could bridge into some more advanced concepts.

Genetic hitchhiking sounds technical, but it's really important for understanding how selection in one spot can affect a whole region of a chromosome.

Genetic hitchhiking happens when a neutral allele is physically located very close to a really advantageous mutation that's being strongly selected for.

So as that good mutation sweeps through the population towards fixation.

The neutral allele next to it gets dragged along for the ride, it hitchhikes to high frequency, even though it does nothing itself.

They're sort of stuck together.

They are.

And how stuck they are depends entirely on the local recombination rate.

If recombination is low between them, the hitchhiking effect is strong, and a selective sweep can wipe out variation across a large chunk of the chromosome.

If recombination is high, that link gets broken quickly and the effect is much more localized.

Recombination is like the eraser for this process.

It is.

And of course, now we have high throughput methods that let us catalog millions of these variable sites, mostly SNPs, single nucleotide polymorphisms.

Which fueled massive projects like the 1000 Genomes Project.

The goal there was to catalog all common human alleles ones with a frequency above, say, one percent in diverse human populations.

And that massive data set let us reconstruct human demographic history with incredible detail.

Confirming the out of Africa migration pattern and the bottlenecks that happened along the way.

And those findings about bottlenecks have been really illuminating.

The data showed that populations that went through strong bottlenecks during that migration, specifically the ancestors of modern European and Asian populations, now carry substantially more rare, slightly bad mutations than the ancestral African populations do.

Why would a bottleneck cause bad mutations to build up?

Because in a very small population, the power of selection gets weaker and the power of genetic drift gets much, much stronger.

A slightly bad mutation that would normally be weeded out can, just by random chance, increase in frequency or even become fixed before selection has a chance to act.

And just to round this out, we should mention DNA length.

Polymorphisms like insertions and deletions, especially short tandem repeats or STRs.

Right.

STRs or microsatellites are these little sections of DNA where a short sequence, maybe two to six base pairs is repeated over and over.

They mutate really rapidly, which makes them incredibly variable and therefore indispensable tools for things like forensic science, genetic mapping and conservation genetics.

So we established that the Hardy -Weinberg law is our non -evolving baseline.

Now we're going to systematically violate those five assumptions to understand the four forces that actually drive evolution.

And we start by violating no mutation.

This is force one.

Mutation, the ultimate raw material for all evolutionary change.

We define it by the rate of forward mutation A to A rate U and reverse mutation A to A rate V.

Mutation provides all the novelty.

Every single advantageous allele, like the one that gave insects DDT resistance, started as a single random point mutation.

Without it, evolution has nothing to work with.

But if mutation is the only thing happening, how fast can it actually change allele frequencies?

The math shows that it is exceedingly slow.

If you have an allele with a frequency P of 0 .9 and a typical mutation rate of say five times 10 to the minus five, the change in P in one generation is tiny.

It's a change of about negative 0 .000043.

Which means it would take thousands and thousands of generations for mutation alone to significantly shift the allele frequency.

This tells us that mutation rarely determines the final frequency of an allele.

Its effects are almost always swamped by the much stronger pressures of selection or drift.

But it will eventually reach an equilibrium on its own.

It will.

The system stabilizes when the forward mutation rate equals the reverse mutation rate.

The equilibrium frequency for the recessive allele Q is just the forward rate divided by the sum of both rates.

Q watt equals U over U plus V.

Okay, let's move to the second violation.

Getting rid of the infinite population size.

When populations are small, we get forced to random genetic drift.

Genetic drift is evolutionary change caused purely by sampling error.

It's just chance.

The book has this great analogy of a small island population.

Imagine 10 people on an island.

If five of them carry a recessive allele, the frequency Q is born at five.

But if, purely by chance, a random typhoon kills half the population, the five survivors might all just happen to be the ones carrying only the allele.

In that extreme case, the allele goes from 50 % frequency to zero instantly.

And it had nothing to do with selection.

Nothing at all.

It was just bad luck.

That's drift.

The smaller the population, the bigger the impact of these random sampling events.

This really highlights the importance of the effect of population size, Neo.

Why can't we just count the number of breeding adults?

Because drift is driven by how many individuals are actually genetically contributing to the next generation.

And that number can be skewed by things like an unequal sex ratio.

The formula for that is Na equals four times the number of females times the number of males, all divided by the number of females plus the number of males.

Right.

And if you have a population of,

say, 70 breeding females, but only two breeding males.

You plug those numbers in, and you get an effective population size, Na, of only about 7 .8.

So even though there are 72 adults alive, the gene pool of the next generation is being funneled through this tiny bottleneck.

It's subject to the massive chance fluctuations you'd expect in a population of only eight individuals.

Those two males have a huge disproportionate influence, which amplifies drift.

The classic experimental proof of drift comes from P.

Burry's experiment with Drosophila back in 1956.

Oh, it's a landmark study.

He set up 107 identical lines of flies, each with only 16 individuals, and he tracked an eye color allele that started at a frequency of 0 .5 in every single line.

And what happened over the 19 generations?

They immediately started diverging like crazy.

Just by chance, some lines quickly went to a frequency of 1 .0.

They fixed the allele.

Others quickly went to 0 .00.

They lost it.

By the end of the experiment, more than half of his 107 populations had either lost or fixed the allele entirely.

So this confirms the three core effects of drift.

First, allele frequencies fluctuate randomly.

Second, drift leads to a loss of genetic variation within a population.

And third, because each small population is drifting on its own random path, it causes significant genetic divergence among populations.

And bottlenecks and founder effects are just special cases where drift gets a massive sudden boost.

The founder effect is perfectly illustrated by the Tristan da Cunha Island community.

Yes, Tristan da Cunha was settled by just a few individuals in the early 1800s.

And even when the population grew, its genetic makeup was still heavily defined by that tiny founding group.

Generations later, the original founding couple still contributed 14 % of all the genes on the island.

And the population also went through two major bottlenecks.

It did.

One when a bunch of people left and a second, tragic one, when a boating accident killed 15 adult males, which just crushed the effect of population size.

Each event amplified drift, which is why rare genetic disorders that were present in the founders are now much more common there.

Another great human example is the Dunkers, a religious sect in Pennsylvania.

They emigrated from Germany and then for 200 years, they remained isolated and only married within their small community.

When their ABO blood group frequencies were studied, they were wildly different from both the general US population and their ancestral German population.

A perfect example of a founding event followed by generations of drift in a small isolated group.

Exactly.

So what happens when you balance mutation and drift?

This leads to a steady state equilibrium.

Mutation is constantly creating new alleles and drift is constantly getting rid of them by chance.

The resulting level of heterozygosity H is given by the formula H equals 4nm, 4nm plus 1.

Where net is effective population size and m is the mutation rate.

And that term 4nm is the key.

It shows that the amount of genetic variation you have is determined by the combined action of population size and mutation rate.

This elegant formula showed that you can maintain variation in a population even if it's completely neutral, just from this dynamic balance between its random creation and its random loss.

Okay, on to our third violation of HW, no migration.

This is force three, migration or gene flow.

Right.

And gene flow is the movement of genes from one population to another.

It happens when individuals migrate and then successfully interbreed, adding their gametes to the local gene pool.

And gene flow has two main effects, right?

It does.

First, it can introduce new alleles, so it increases genetic variation within the recipient population.

But second, and this is crucial, migration is a powerful homogenizing force.

It makes populations more similar to each other.

Exactly.

By sharing alleles, it actively reduces the genetic divergence that drift or local selection might be creating between them.

The model for the change in allele frequency is pretty simple.

It is.

The change is driven by just two things.

The mellum, the proportion of the population that are new migrants, and the difference in allele frequency between the two populations, PxPi.

The formula, VMPxPi, shows that if the allele frequencies are already the same, nothing changes.

If they're different, migration will constantly pull them closer together.

So if migration is the homogenizer, force four, natural selection is the engine of specific local adaptation.

Natural selection is the differential reproduction of genotypes.

And we quantify this using Darwinian fitness, which we call W.

It's just the relative reproductive success of one genotype compared to the others.

The most successful genotype is assigned a fitness of W equals one.

That's the benchmark.

And all other genotypes have a fitness somewhere between zero and one relative to that best one.

And the intensity of selection against a genotype is measured by the selection coefficient, S.

Which is just one minus W.

So if a genotype has a fitness of 0 .6, the selection coefficient against it is 0 .4.

The most famous, most dramatic case study of natural selection in action has to be industrial melanism in the peppered moth.

Oh, it's the textbook example of directional selection.

Before the Industrial Revolution, the typical form of the moth was this speckled grayish -white color, perfectly camouflaged on lichen -covered trees.

A rare dark mutant carbonaria was around, but at a very low frequency.

But then industrial pollution killed the lichens and darkened the trees with soot.

And suddenly, the tables were turned.

The typical form became incredibly conspicuous to bird predators, while the dark carbonaria form was now perfectly camouflaged.

H .B .D.

Kettlewell did this brilliant series of mark -and -recapture experiments in the 1950s to prove it.

Describe that fieldwork.

It's such a great story.

He released known numbers of both light and dark moths into two different woods.

A heavily polluted, dark wood near Birmingham, and a clean, lichen -covered wood in Dorset.

In the polluted wood, he recaptured twice as many dark moths as light ones.

The birds were picking off the light ones.

And in the clean wood, the opposite was true.

Exactly.

It was the direct, quantitative proof that predation was the selective force driving the rapid increase of that dark allele in polluted areas.

Okay, so selection systematically changes allele frequencies.

We can calculate this change in Q using a four -step process.

Right.

You list your initial frequencies, multiply by fitness,

normalize the frequencies after selection, and then calculate the new allele frequency.

So let's look at directional selection against a recessive trait.

The fitness of the dominant homozygote, AA, and the heterozygote, AA, is 1.

But the fitness of the affected recessive, AA, is 1 minus S.

And the formula for the change in Q that comes out of that is AAQ minus SPQ one stays three.

What does that equation tell us about how efficient selection is?

It reveals this really profound evolutionary constraint.

When the recessive allele A is common, selection against the A individuals is very efficient, and the allele frequency drops fast.

But as Q gets smaller and smaller, as the disease becomes rare, the change in Q slows to a crawl.

Because selection can't see the allele anymore.

Exactly.

The vast majority of the A alleles are now hidden, protected inside the AA heterozygotes, which have a fitness of 1.

They're invisible to selection, and that's why recessive genetic diseases persist at these low, stable frequencies, even when there's strong selection against the people who have them.

This leads us right into the concept of balancing selection, or heterozygote superiority.

Balancing selection happens when the heterozygote has the highest fitness of all three genotypes.

And unlike directional selection, which gets rid of variation, balancing selection actively maintains two or more alleles in the population.

And the classic life -or -death example is sickle cell anemia in malarial regions.

It is.

The HBS allele, if you're homozygous for it, SS, causes lethal sickle cell anemia.

The HBA allele, if you're homozygous, AA, gives you normal hemoglobin, but no protection against malaria.

But the heterozygote AS is the sweet spot.

It is.

The AS heterozygote has the highest Darwinian fitness in an environment with malaria.

They have enough normal hemoglobin to not get sick, but their blood chemistry makes them resistant to the malaria parasite.

This advantage maintains the otherwise deadly S allele in the population at a high frequency.

So the two competing selective forces against SS from anemia and against AA from malaria balance each other out.

We do, creating a stable equilibrium.

We also have to consider the balance between mutation and selection.

This is another stable equilibrium, where the constant input of new bad alleles from mutation is perfectly matched by their removal through selection.

For a recessive deleterious allele, the equilibrium frequency, Q hot, is the square root of U over S.

And this dynamic is why dominant deleterious diseases are so much rarer than recessive ones.

Because a dominant allele is exposed to selection in both the heterozygote and the homozygote.

Selection is much more efficient at purging it, so its equilibrium frequency is much, much lower.

Finally, let's revisit that random mating assumption and talk about non -random mating.

This affects how alleles are combined into genotypes, but it doesn't necessarily change the allele frequencies P and Q on its own.

The most powerful form of this is inbreeding, which is preferential mating between relatives.

And inbreeding has one major effect.

It dramatically increases homozygosity.

The most extreme example is self -fertilization in plants.

In a selfing population, the number of heterozygotes is cut in half, exactly every single generation, until the population is almost entirely homozygotes.

So inbreeding exposes all those hidden recessive alleles we were just talking about.

Exactly.

And this leads directly to inbreeding depression, a reduction in fitness that happens when all those bad recessive alleles, which were previously masked in heterozygotes, are suddenly exposed in homozygous form and feel the full force of natural selection.

We've spent a lot of time breaking down the forces one by one, but in nature, they're all happening at once.

Mutation, drift,

migration, selection, they're all interacting.

And they're often antagonistic.

Drift drives populations apart, but migration pulls them back together.

Directional selection removes variation, but balancing selection or mutation selection balance maintains it.

It's a complex dynamic.

Let's add one more layer of complexity.

How crossing over affects all of this.

This brings us to linkage disequilibrium, or D.

Linkage disequilibrium happens when alleles at two different genes are found together on the same chromosome more often than you'd expect by chance.

They aren't segregating independently.

And that can be because they're physically close together.

Or it can be because of a recent demographic event like a bottleneck or a migration.

If one of the founding individuals just happen to carry, say, allele A1 and allele B2 on the same chromosome, those two alleles will be associated in the new population, at least for a while.

And how does that association break down over time?

Recombination.

It gets broken down by recombination.

The rate at which that disequilibrium D decays depends on the recombination rate R between the two genes.

If R is high, the association disappears in just a few generations.

But if R is low, meaning the genes are very tightly linked, that association can persist for a very long time.

Precisely.

It means that population history, like a recent selective sweep or a bottleneck, leaves a physical signature on the chromosome that takes time for recombination to erase.

Moving to a very real -world application.

How is all of this indispensable for conservation biology?

For endangered species, population genetics provides the essential data.

Long -term survival depends on having genetic diversity.

So conservationists use these principles to manage small populations.

A huge focus is on maximizing the effective population size, Neon, to fight against the effects of genetic drift.

So when zoos manage captive breeding programs, they're using pedigrees to avoid inbreeding and maintain as much diversity as possible.

Yes.

Genetic data lets them quantify relationships, as they do with the Galapagos tortoises, to make sure their breeding policies are actually maximizing diversity and avoiding that inbreeding depression.

Habitat loss is the number one threat, of course, but these genetic tools are essential for crafting effective management policies.

And our final concept brings us to the ultimate outcome of all these forces acting over immense time.

Speciation.

Speciation is, fundamentally, the erection of barriers that stop gene flow between two populations.

Once gene flow is cut off, the two populations are free to evolve on their own separate paths, driven by drift and selection, until they eventually become distinct species.

And these barriers are either post -psygotic or pre -psygotic.

Which ones tend to show up first?

Generally, post -psygotic barriers, the ones that happen after fertilization and result in hybrids that are sterile or just don't survive, those tend to arise first, just as a byproduct of random genetic incompatibilities that build up.

The source mentions Haldane's rule here.

Right.

Haldane's rule just notes that if you do see hybrid sterility or inviolability, it usually affects the heterogametic sex.

So in humans, where males are XY, it would be the hybrid males that are sterile.

And how do these costly post -psygotic barriers lead to the evolution of pre -psygotic ones?

Through a process called reinforcement.

If two populations occasionally hybridize and produce unfit offspring, that's a waste of reproductive effort.

So natural selection will strongly favor any individual in the parent populations that avoids those wasteful matings.

So selection reinforces the evolution of mechanisms that prevent fertilization in the first place.

Exactly.

Things like temporal isolation or behavioral isolation or mechanical isolation.

And at the finest level, you get gametic isolation.

Like in the abalone.

A perfect example.

Abalone just release their sperm and eggs into the water.

For fertilization to happen, a protein on the sperm has to perfectly fit a protein on the egg like a molecular key in a lock.

If the sperm of one species bumps into the egg of another, the key doesn't fit.

And fertilization is completely blocked.

A very elegant final barrier to gene flow.

This has been an incredibly detailed journey into the quantification of evolution.

Let's try to quickly synthesize the core principles.

Well, we started with the gene pool, which we quantify with allele frequencies P and Q.

And we established the Hardy -Weinberg law.

P squared plus two PQ plus Q squared as our essential null hypothesis.

The baseline of stasis against which all real evolution is measured.

And then we systematically identify the four forces that drive that change.

Mutation, the slow but indispensable source of all new variation.

Genetic drift, the power of chance which dominates in small populations.

Migration, the homogenizing force.

And finally, natural selection, the adaptive force measured by fitness.

We saw how modern genomics from that first Drosophila gene study to the 1000 Genomes Project revealed this massive hidden world of variation.

And we found that the patterns we see today are a consequence of these complex balances, mutation balancing selection, or heterozygote superiority balancing fitness extremes like with sickle cell.

It really is a complex dynamic tapestry.

And that leads to a final thought for you to consider.

We talked about how human populations that went through recent bottlenecks like those migrating out of Africa accumulated a higher load of slightly deleterious mutations because drift was so strong.

Now selection is much more efficient in the massive modern populations we live in today.

So how might that historical accident, the genetic ghost of those ancient bottlenecks, continue to shape the rate and direction of natural selection in human populations today long after those migrations ended?

That is a profound challenge.

How the chance events of the distant past are still influencing our fitness right now.

Thank you for guiding us through this expert deep dive into population genetics.

And thank you for listening to the deep dive.

We'll be back next time with another essential topic to still just for you.

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

Chapter SummaryWhat this audio overview covers

Population genetics merges Mendelian inheritance patterns with evolutionary biology to understand how genetic composition changes across groups of organisms rather than within individuals. The foundation rests on quantifying genetic structure through calculations of allele and genotype frequencies within defined populations and gene pools. The Hardy-Weinberg equilibrium principle provides a mathematical baseline for detecting evolutionary change by predicting that allele frequencies remain constant across generations when populations are infinitely large, randomly mate, and experience no mutation, migration, or natural selection. This equilibrium serves as a null hypothesis; deviations from predicted frequencies indicate that evolutionary forces are actively reshaping the genetic landscape. Measuring genetic variation in natural populations has evolved from historical techniques like protein electrophoresis to modern molecular approaches including DNA sequencing of SNPs and microsatellites, revealing far greater diversity than early geneticists anticipated. The neutral theory of molecular evolution explains how much variation persists without adaptive significance. Four primary forces alter allele frequencies and drive evolution: mutation introduces new genetic variants as the ultimate source of novelty; genetic drift causes random changes in allele frequencies particularly pronounced in small populations through mechanisms like founder effects and bottlenecks that can fix alleles independently of fitness; gene flow homogenizes genetic differences between populations by introducing alleles through migration; and natural selection, quantified through Darwinian fitness and selection coefficients, consistently favors advantageous variants. Classic examples illustrate these mechanisms: industrial melanism demonstrates directional selection responding to environmental change, while sickle-cell anemia exemplifies heterozygote advantage where heterozygotes have higher fitness than both homozygotes. Nonrandom mating patterns including assortative mating and inbreeding alter genotype frequencies and increase homozygosity without necessarily changing underlying allele frequencies. Linkage disequilibrium describes nonrandom associations between nearby genetic variants shaped by recombination rates. These population genetic principles extend to practical applications in conservation biology and speciation, where reproductive isolating mechanisms including prezygotic and postzygotic barriers reduce gene flow and allow populations to diverge into distinct species.

Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.

Support LML ♥

Chapter 21: Population Genetics and Hardy–Weinberg Principles

Related Chapters