Chapter 10: The Cell Cycle

0:00 / 0:00
Report an issue

Welcome to Last Minute Lecture.

This free chapter overview is designed to help students review and understand key concepts.

These summaries supplement, not replace, the original textbook and may not be redistributed or resold.

For complete coverage, always consult the official text.

Okay, let's unpack this.

We are embarking on a truly enormous deep dive today.

That's a big one.

It really is.

We're tracing the entire intellectual and really the physical history of the gene.

We're starting from an idea, really.

An abstract concept that started in a monastery garden.

And we're going to end up in the just unbelievably complex and dynamic landscape of the modern genome.

Right.

Our source material is chapter 10, the nature of the gene and the genome, and it is a beast of a chapter.

So our mission is to guide you, the listener, step by step through every major experiment, every molecular mechanism, and every historical turning point in here.

And this is more than just us reading the text.

This is a chronological journey.

We're moving through what?

Almost two centuries of scientific history?

At least.

And the goal is to make every single concept from Mendel's little factors all the way to non -coding RNAs perfectly clear and put it all in context.

And to ground this whole historical mission in something immediate, something with real world stakes, let's start where our chapter does.

It's a story that just shows how personal this whole quest has become.

It's called The Bee Project.

This story starts back in 2003 with the birth of a baby girl, Bay Reinhoff.

She seemed mostly healthy, but she had this very specific, unusual collection of physical traits.

Things like wide -set eyes, her hands and feet were curled, and she had poor muscle tone.

And her father, Hugh Reinhoff, was in this unique position to notice these things because he was a trained geneticist himself.

So for two years, two exhausting years, the family went to specialist after specialist.

They did every test available, but they just kept hitting dead ends.

Nothing matched her symptoms.

No diagnosis.

The frustration got so intense that Reinhoff finally just decided he had to solve the puzzle himself.

He literally converted a room in his home into a lab, a rudimentary genetics lab.

Incredible.

His first clinical suspicion, based on her symptoms, was a problem with the TGF beta signaling pathway.

And that's a pathway that's just crucial for regulating growth and differentiation in all sorts of tissues.

So he started sequencing key genes in that pathway, comparing B's sequence to the standard reference genome.

But even with this really focused hunting, it was like looking for a needle in a genomic haystack.

He how nothing.

The breakthrough didn't come from luck, though.

It came from technology.

Right.

In 2009, Reinhoff started working with Illumina, a company that was testing these advanced sequencing methods.

And instead of trying to sequence the entire 3 .2 billion base pairs of the genome.

The whole thing.

The whole thing, yeah.

They decided to focus on just the exome.

Which is the tiny part of the genome, what about one and a half percent, that actually codes for proteins.

Exactly.

So by sequencing the exomes of B and her family, they just dramatically reduced the search space.

And the results were, I mean, immediate.

Oh, absolutely.

The analysis found the smoking gun, a single point mutant, just one base pair change in the TGF -Beta3 gene.

And it was only in Bay.

Then lab experiments confirmed it.

That one little mutation resulted in a non -functional protein.

It just crippled that crucial signaling pathway.

So they had the diagnosis of the molecular reason, but, you know, what did that mean for a future?

The prognosis was still a huge question mark.

The really critical piece, the final discovery, came after years of more searching.

Reinhof managed to find a 68 -year -old woman in Europe who had the exact same mutation.

And she was living a full, healthy life.

Could you imagine the relief?

That one woman's existence was this profound reassurance.

It's the perfect way to start this deep dive.

It really is, because that whole story encapsulates the entire chapter.

The quest to understand heredity, defining what a gene physically is, and then using this cutting -edge genomics to solve mysteries at the single base pair level.

And to understand how he got to sequencing an exome in 2009, we have to go way, way back.

150 years earlier, yeah.

To the fundamental question of just,

how does heredity even work?

Hashtag tag 10 .1.

The concept of a gene as a unit of inheritance.

Okay, so if the modern genome is our destination,

our starting point is this.

This really abstract idea of a factor.

We have to go back to the 1860s, to Gregor Mendel and his monastery garden.

Mendel was just so meticulous.

I mean, his experimental design was brilliant.

He chose the garden P because it had these very distinct, easily trackable traits.

Right.

Our chapter has table 10 .1, which lists them out.

There were seven of them.

Seven traits.

Things like seed color, seed shape, plant height.

And crucially, each of these traits only came in two alternate forms.

So for height, it was either tall or dwarf.

For seed color, yellow or green, no in -betweens.

And by systematically cross -breeding these plants, and this is the key, by rigorously counting the offspring generation after generation, he built the entire foundation for modern genetics.

It was that quantitative approach that let him propose these conclusions that just fundamentally changed everything.

Let's walk through them.

He had four key conclusions, which we can talk about in our modern terms.

Okay, so the first one.

He concluded that traits are governed by these discrete physical units, which we now call genes.

So not some kind of blended essence, but actual little packets of information.

Exactly.

Packets that get passed down unchanged.

Second, he figured out that organisms have two copies of each gene, one from each parent.

And the alternate forms of that gene, say the tall version and the dwarf version, we call those alleles.

And he noticed one allele could be dominant, meaning it completely masked the presence of the other, which he called the recessive allele.

Right.

His third conclusion is absolutely central to how this all works.

He proposed that the reproductive cells, the gametes...

Sperm and egg.

Sperm and egg.

They contain only one copy of a gene for each trait.

So the two copies that the parent has must physically separate or segregate when those gametes are formed.

And that's his famous law of segregation.

It's what explained the neat mathematical ratios he kept seeing.

And his fourth big idea, the law of independent assortment, was a huge conceptual leap, even though we now know it has some exceptions.

Right.

This one stated that the segregation of alleles for one trait, like flower color, had absolutely no influence on the segregation of alleles for another trait, like plant height.

One was totally independent of the other.

They sorted randomly.

I mean, it's just a tremendous body of work, which makes the historical irony here so profound.

It's just incredible.

He presents his findings in 1866.

He publishes them in a journal and crickets.

The world completely ignored him.

His paper just sat on library shelves all over Europe for 34 years.

And it wasn't until 1900, 16 years after he died, that three different botanists, all working independently, came to the same conclusions.

And only then did they go back and rediscover Mendel's original paper.

And that was the moment.

That's when genetics as a field was truly born.

Hashtag 10 .2, the discovery of chromosomes.

So once Mendel's work was finally rediscovered, the next big question was obvious.

OK, we have these discrete factors, these genes, but what are they?

And where are they physically inside the cell?

Right.

And all eyes turn to the nucleus, because that had to be where the information was passed from cell to cell and from generation to generation.

And biologists in the 1880s, you know, with better microscopes and staining techniques, they were watching cell division very closely.

A German anatomist, Walter Fleming, made this crucial observation.

He noticed that when a cell divides, it seems to kind of randomly shuttle its cytoplasm to the two daughter cells.

OK.

But it goes to incredible, meticulous lengths to divide the contents of the nucleus equally and systematically.

So there's something special about the nucleus.

There's something very special.

He saw the nuclear material organizing itself into these visible dark threads during division, and he named them chromosomes, which literally just means colored bodies.

Because they soaked up the dyes he was using.

That's it.

And this precise equal division strongly suggested that these threads, these chromosomes, were housing the hereditary material.

And that idea got a big boost from the work of Theodore Bovary.

He was using sea urchin eggs.

Right.

He did the famous polyspermy experiment.

Normally, one sperm fertilizes an egg.

Bovary figured out how to get two sperm to fertilize one egg.

And what happened?

Chaos.

The cell divisions were completely abnormal, and the embryo died very early on.

So what did that tell him?

He concluded that for normal development to happen, you don't just need a set of chromosomes.

You need a very specific combination of chromosomes.

This was the first real proof that chromosomes weren't all the same.

They had different or qualitative genetic properties.

The wrong mix was lethal.

And there was more evidence coming from the roundworm muscaris.

Yes, which was a great organism to study because it only has four big, easy -to -see chromosomes.

So Edward van Beneden noticed in 1883 that the worm's body cells, the somatic cells, had four chromosomes, but the gametes, the sperm and egg.

They only had two each.

So that ties right back to Mendel's segregation.

It does.

And it led August Weisman in 1887 to propose the necessity of a reduction division.

He reasoned that if gametes kept the full number of chromosomes, the count would double every single generation.

Four, then eight, then 16.

It would be unsustainable.

Exactly.

So he predicted there must be a special kind of cell division, which we now call meiosis, that halves the chromosome number before gametes are formed.

It was the physical mechanism that made Mendel's laws make sense.

Hashtag tag 10 .3 chromosomes as the carriers of genetic information.

So by the turn of the 20th century, you've got Mendel's abstract factors on one hand and Fleming's visible chromosomes on the other.

And the person who definitively linked the two, who put it all together, was Walter Sutton in 1903.

He was a grad student studying grasshopper sperm cells.

A grad student.

Amazing.

Yeah.

His observation was the keystone.

He saw that the spermatogonia, the cells that make sperm, had 23 chromosomes.

But he noticed they existed as 11 identical -looking homologous pairs, plus one unpaired X chromosome.

And that clicked for him.

Instantly.

He realized this physical pairing of chromosomes perfectly mirrored Mendel's pairs of inheritable factors, the two alleles for every trait.

And he actually saw the mechanism for segregation.

He did.

He watched these cells begin meiosis and saw the homologous pairs physically join up into what he called a bivalent.

And then the meiotic division would pull those homologous chromosomes apart, sending them to different cells.

Just like Mendel's law of segregation predicted,

the abstract gene now had a physical home on the chromosome.

But Sutton was sharp, and he immediately saw a logical problem with this.

A contradiction.

A potential one.

He realized that if genes were arranged like beads on a string along a chromosome, then all the genes on that one chromosome should be inherited together as a single package.

A linkage group.

Exactly.

And that directly contradicts Mendel's law of independent assortment, which said traits sort randomly.

But wait, Mendel was right about his pea plants.

He was, but he was also incredibly lucky.

The seven traits he picked just happened to be on different chromosomes, or so far apart on the same chromosome that they acted independently.

Sutton's prediction of linkage was right, and it set the stage for the next big phase of research.

Which brings us to the fruit fly.

Trosophila melanogaster.

Thanks to Thomas Hunt Morgan, starting around 1909,

the fruit fly was the perfect model organism.

Quick generation time, small, easy to breed.

But his first problem was that he only had normal, wild -type flies.

He needed variation.

And he got it.

After observing thousands and thousands of flies, he found his first spontaneous mutant.

It was the single male fly with white eyes instead of the normal red.

And that was a monumental discovery.

It was the first concrete example of a mutation, the raw material for variation and evolution.

By 1915, his lab had found 85 different mutants.

And here's the critical observation, the one that confirms Sutton's prediction.

These 85 mutants didn't sort randomly.

They fell into exactly four linkage groups.

And drosophila has… Four pairs of homologous chromosomes.

The correlation was perfect and irrefutable.

Genes are on chromosomes, period.

But the story gets more complex, right?

Because they found the linkage wasn't perfect.

Not at all.

Even genes on the same chromosome didn't always stick together.

They'd see a small percentage of offspring that were recombinants, showing new combinations of traits that weren't in the parents.

They needed to explain this incomplete linkage.

And Morgan found the answer in the work of another scientist, F .A.

Jansons.

Yes.

Jansons had observed that during meiosis, the homologous chromosomes physically wrap around each other.

Morgan proposed in 1911 that this physical wrapping led to the breakage and exchange of corresponding pieces between the maternal and paternal chromosomes.

He called it crossing over, or genetic recombination.

And that explained incomplete linkage perfectly.

The closer two genes were to each other on a chromosome, the less likely a crossover event would happen in that tiny space between them.

And the farther apart they were, the more often recombination would happen.

And this is where Morgan's undergraduate student, Alfred Sturdevant, had a stroke of genius.

Another s - I know.

He realized that this recombination frequency, the percentage of recombinant offspring, was constant for any pair of genes.

So he could use that frequency as a direct measure of the physical distance between the two genes.

So a higher frequency means they're farther apart.

Exactly.

And that led him build the first chromosomal maps, showing the relative positions, or loci, of genes.

If wing length and body color had a high recombination frequency, he mapped them far apart.

If eye color and body color had a low frequency, he mapped them close together.

That's brilliant.

It's the foundation of gene mapping.

Then in 1927, H .J.

Moller discovered that X -rays dramatically stood up the mutation rate, giving geneticists a powerful new tool, but also highlighting the dangers of radiation.

And the final piece of visual proof came in 1933.

From Theophilus Painter.

He rediscovered these things called polythene chromosomes in the salivary glands of Drosophila larvae.

They are massive, a hundred times thicker than normal chromosomes.

Why are they so big?

Because the cells stop dividing but keep replicating their DNA.

So you get up to, like, a thousand strands of DNA all bundled together.

Wow.

And when you stain them, they show about 5 ,000 distinct bands.

Painter was able to match individual bands to specific gene locations,

and the order of the bands perfectly matched the genetic maps Sturtevant had made using recombination frequencies.

A stunning visual confirmation of the whole theory.

Absolutely.

They even saw these dynamic regions called chromosome puffs where the DNA was unspooled and being actively transcribed.

It was like watching genes in action.

The gap between the abstract gene and its physical place was finally closed.

Hashtag 10 .4, the chemical nature of the gene.

Okay, so by the 1940s, we know where the gene is.

It's on the chromosome.

But the defining question became,

what is it made of chemically?

Right.

And chromosomes only have two main components, DNA and protein.

And for decades, I mean decades, protein was the overwhelming favorite to be the genetic material.

It seems so obvious to us now that it's DNA, but why was protein the front runner?

We'll get to that.

But the story of DNA actually starts way earlier, back in the 1860s and 70s with Friedrich Miescher.

Miescher was a chemist, and he wanted to figure out the chemical makeup of cell nuclei.

And he needed a good source of nuclei, so he used pus from discarded surgical bandages.

You're kidding.

Nope.

It was a great source of white blood cells, which are mostly nucleus.

From that, he isolated this large, acidic, phosphorus -rich stuff he called nuclein.

Later, he purified it from salmon sperm, and in 1889, it got renamed nucleic acid.

And even back then, people noticed that chromosomes stained just like Miescher's nuclein.

As early as 1884, a scientist named Otto Hirtwig proposed that nuclein was the stuff responsible for heredity.

So the idea was there, but it didn't catch on.

It hit a major, years -long roadblock, thanks mostly to a scientist named Phoebus Levain.

He made some important discoveries, like figuring out the sugar in DNA is deoxyribose, but he also championed this completely wrong idea called the tetranucleotide theory.

Okay, what was that?

The theory said that DNA was just a simple, boring, monotonous repeat of the four bases, like ATGC, ATGC, over and over again.

So if that were true, it couldn't hold any complex information.

It would be like a book written with only four words in a repeating pattern.

Exactly.

It lacked the complexity to be the genetic end.

Proteins, on the other hand, are made of 20 different and complex amino acids.

So proteins seem like the only candidate that could carry all that information.

So that theory just completely stalled research on DNA.

For a long time, yes.

But the first real clue that DNA was the stuff came from, of all places, bacteriology, from Friedrich Griffith in the 1920s.

He was studying streptococcus pneumonia,

the bacteria that causes pneumonia, and it came in two forms.

Right.

There was the S strain, which was smooth and had a capsule, and it was virulent.

It killed mice.

And there was the R strain, which was rough, had no capsule, and was non -virulent, harmless.

And his experiment in 1928 was just.

It was simple genius.

It was.

If you injected mice with living R bacteria, they were fine.

If you injected them with heat -killed S bacteria, they were also fine.

When he injected a mixture of the two, both harmless on their own.

The mice died.

And when he took bacteria from the dead mice, he found living virulent S cells.

So the harmless R bacteria had somehow transformed into the deadly S bacteria.

Exactly.

He concluded there must have been some kind of transforming principle that passed from the dead S cells to the living R cells and permanently changed their genetic identity.

But he didn't know what that principle was.

He didn't.

That took another decade of work by Oswald Avery and his team at the Rockefeller Institute.

They painstakingly purified this transforming principle.

And in their 1944 paper, what did they conclude?

They were cautious, but the evidence was overwhelming.

The active substance behaved chemically, just like DNA.

And crucially, it was only inactivated by enzymes that digest DNA, not by enzymes that digest protein.

Their conclusion was that the transforming principle was DNA.

So that should have been it, right?

Case closed.

But it wasn't.

No.

The scientific establishment was hugely skeptical.

For one, this was bacteriology and many geneticists didn't think it was relevant to higher organisms like flies or people.

And the protein bias was still really strong.

Incredibly strong.

Most people assumed Avery's DNA prep must have been contaminated with a tiny amount of protein and that the protein was the real transforming agent.

So the tide only really started to turn in 1950.

With Erwin Chargaff.

He's the one who finally definitively shattered the old tetranucleotide theory by showing that the base composition of DNA was different in every species.

It wasn't a simple repeat.

It was complex.

But the final absolute nail in the coffin for the protein theory came in 1952.

From Alfred Hershey and Martha Chase, they used bacteriophages viruses that infect bacteria.

Right, the tiny syringes.

They inject their genetic material and leave their protein coat outside.

And their experiment was so elegant, they used radioisotopes to label the two components separately.

They used radioactive phosphorus, UCP, to label the DNA because DNA has phosphorus but protein doesn't.

And they used radioactive sulfur, or S, to label the protein coat because protein has sulfur but DNA doesn't.

Then they let these labeled viruses infect bacteria.

And after a few minutes, they used a kitchen blender to shear off the empty viral coats from the outside of the cells.

A kitchen blender?

A wearing blender, yeah.

And the results were decisive.

When they used the protein labeled viruses, the radioactive sulfur stayed outside with the empty coats.

But when they used the DNA labeled viruses?

The radioactive phosphorus went inside the bacterial cell.

And, most importantly, it was passed on to the next generation of viruses.

That's it.

That's the smoking gun.

DNA, not protein, is the genetic material.

It removed all doubt.

DNA was finally accepted.

Hashtag, tag, tag, the structure of DNA.

Okay, so with DNA finally confirmed as the blueprint for life, the stage was set for the big one, figuring out its three -dimensional structure.

And that brings us to James Watson and Francis Crick at Cambridge in 1953.

Their job was really to synthesize all the existing data into a coherent model.

And what data did they have to work with?

They had a few key facts.

They knew DNA was a polymer of nucleotides, each made of a sugar, a phosphate, and a base.

They knew the bases were either pyrimidines, TNC, or purines, GNA.

They also knew each strand had a direction, a polarity, a five prime end, and a three prime end.

Right.

And they had the X -ray diffraction data from Rosalind Franklin and Maurice Wilkins, which was crucial.

It suggested a helical structure with very specific, regular dimensions.

And, of course, they had Chargaff's rules.

The most important clue of all, the amount of adenine always equals thymine, and the amount of guanine always equals cytosine.

ATGC.

So what did the Watson -Crick model look like?

It's beautiful in its simplicity.

First, the double helix.

Two nucleotide chains spiraling right -handed around a central axis.

And a key feature is that they run in opposite directions.

They're anti -parallel.

Exactly.

If one strand runs five prime to three prime, its partner runs three prime to five prime.

Structurally, the sugar phosphate backbone is on the outside of the helix.

Which makes perfect sense chemically.

The negatively charged phosphates can interact with the water in the cell.

The bases project inward, stacked on top of each other like a spiral staircase.

And that stacking provides a lot of stability.

It does.

But the real genius of the model was how it explained what holds the two strands together.

Hydrogen bonding.

Right.

Adenine on one strand always pairs with thymine on the other, using two hydrogen bonds.

And guanine always pairs with cytosine, using three hydrogen bonds.

This specific pairing instantly explains Chargaff's rules.

It also means the two strands are complementary.

If you know the sequence of one strand, you automatically know the sequence of the other.

Absolutely.

If one strand is five prime getaca three prime, the other has to be three prime CTHGT five prime.

And the model also described the grooves on the outside of the helix.

Yes.

A wider major groove and a narrower minor groove.

And these are incredibly important because proteins that regulate genes can bind in these grooves and read the DNA sequence without having to pull the strands apart.

So this elegant structure immediately suggested how DNA could do its three main jobs.

Exactly.

First, storage of genetic information.

That's in the linear sequence of the bases.

Second, replication and inheritance.

The complementarity immediately suggests the mechanism.

You pull the strands apart and each one serves as a perfect template to build a new partner strand.

And the third function, expression of the genetic message.

How that sequence actually builds a protein.

That was the next great mystery for biology to solve.

Hashtag tag tag DNA supercoiling.

So the double helix isn't just this static, rigid ladder.

It has to twist and contort itself to fit inside the cell and for its own function.

Right.

Which brings us to DNA supercoiling discovered by Jerome Vinograd in 1963.

Supercoiling is just the DNA twisting upon itself, which makes it much more compact.

Like twisting a rubber band until it coils up on itself.

That's a perfect analogy.

And this supercoiled state actually makes the DNA move faster in experiments like electrophoresis.

And the most important state for biology is negative supercoiling.

What does that mean?

It means the DNA is underwound.

It's twisted in the direction opposite to the natural winding of the helix.

This creates a kind of torsional strain and the molecule relieves that strain by twisting up into supercoils.

Most circular DNAs in nature are negatively supercoiled.

And why is that so important?

Because that underwinding, that strain, actively helps the two strands separate.

And you have to separate the strands for replication and transcription to happen.

It gives the process a head start.

So something has to control all this twisting and untwisting?

Enzymes do.

They're called depoisomerases and they were discovered by James Wang in 1971.

They're basically the genome's stress management system.

And they come in two main types.

Yes.

Type I depoisomerases are the more gentle ones.

They make a temporary break in just one strand of the DNA.

A little nick.

A little nick.

And that allows the other strand to rotate around it, which relaxes the supercoiling.

It's essential for preventing the DNA from getting tangled up ahead of the replication machinery.

And on type II?

Type II depoisomerases are the heavy duty machinery.

They make a transient break in both strands of the DNA.

A double strand break.

That sounds dangerous.

It would be, but it's very controlled.

It creates a gate, allowing another segment of DNA to pass right through the break before it's resealed.

And why would a cell need to do that?

It's absolutely essential for cell division.

After replication, you have two identical chromosomes that can be interlinked like two rings in a chain.

This is called catenation.

And the type II enzyme has to cut one ring to let the other one out.

Precisely.

To make sure each daughter's cell gets a complete separate chromosome.

And because these enzymes are so critical for rapidly dividing cells.

They're a major target for cancer drugs.

Drugs like etoposide work by trapping the enzyme after it has cut the DNA, preventing it from resealing the break.

This leads to massive DNA damage and triggers cell death, specifically in the fast dividing cancer cells.

Hashtag tag 10 .5, the complexity of the genome.

Okay, let's zoom out now from a single molecule to the entire genome, all the genetic information in a species.

This is a massive jump in scale and complexity.

And it requires different tools to analyze.

One of the first was looking at its thermal properties.

Right.

DNA denaturation or melting.

If you heat DNA, the hydrogen bonds holding the two strands together break and the strands separate.

And you can watch this happen by measuring how much UV light the DNA solution absorbs.

Single stranded DNA absorbs more UV light than double stranded DNA.

And the melting temperature or two L dollars is the temperature where half the DNA is single stranded.

Exactly.

And a key finding is that the two dollars is directly related to the DNA's composition, specifically its GC content.

Because GC pairs have three hydrogen bonds and AT pairs only have two.

Right.

So DNA with a higher GC content is more stable, has more bonds to break and therefore requires more heat to melt.

It has a higher tonal dollar.

And what's really amazing is that if you cool it down slowly, the process reverses.

It does.

It's called renaturation or re -annealing.

The complementary single strands can actually find each other in solution and zip back up into a double helix.

And that discovery became the basis for nucleic acid hybridization, a foundational tool in molecular biology.

And when researchers looked at the kinetics of this re -annealing process, how fast it happened, they saw something really interesting.

Viral and bacterial DNA re -annealed in a simple, smooth curve.

As you'd expect for a genome of mostly unique genes.

But when they looked at eukaryotic DNA, the re -annealing curve had three distinct steps.

It was lumpy.

Which told them there had to be three different classes of DNA sequences in there.

Hashtag, tag, tag, three classes of eukaryotic DNA.

So the fastest re -annealing stuff, the first step on the curve, belongs to the highly repeated DNA sequences.

Making up about one to ten percent of our total DNA.

These are short sequences that are present in massive tandem clusters, one after another after another.

Because there are so many copies, they find a partner to re -anneal with almost instantly.

And there are a few types within this class.

Right.

First are the satellite DNAs.

These are short repeats, five to a few hundred base pairs that form these enormous arrays, sometimes millions of base pairs long.

And they're found mostly in the centromeres of our chromosomes, the constricted part in the middle.

Correct.

Then you have many satellite DNAs.

They're a bit longer, ten to a hundred base pairs, and they're very unstable.

Their length tends to change from one generation to the next.

And that high variability, that polymorphism, is what makes them the basis for DNA fingerprinting.

Exactly.

The pattern is unique to each individual.

Finally, you have the shortest ones, the microsatellite DNAs or STRs.

They're just one to nine base pairs long, scattered all over the genome, and also highly unstable.

And we use their variability to trace human migration patterns and ancestry.

This instability in the repeats isn't just a useful tool, though.

It's also a direct cause of a whole class of human diseases.

Right.

The ones caused by dynamic mutations, where the number of these repeating units,

often trinucleotides like CAG or CGG,

expands dramatically from parent to child.

These diseases fall into two broad types.

Type I diseases are almost all neurodegenerative, and they're caused by a CAG repeat expansion that happens within the coding region of a gene.

The classic example being Huntington's disease.

Right.

In a normal person, the Huntington's gene has between 6 and 35 CAG repeats.

This codes for a stretch of the amino acid glutamine in the protein.

But if that number expands above about 35...

The resulting protein gets this long, sticky polyglutamine tract, and it acquires a new, toxic gain of function.

It misfolds, clumps together, and clogs up the neurons, eventually killing them.

And a really tragic feature of these diseases is genetic anticipation.

Yes.

The disease tends to get more severe and appear at an earlier age in successive generations, because the repeat number tends to keep expanding with each transmission.

And the second type of these diseases.

Type II diseases are different.

They're caused by trinucleotide expansions in the non -coding regions of a gene, and they typically cause a loss of function.

Like fragile X syndrome.

The most common cause of inherited intellectual disability.

In this case, you have a massive expansion of a CGG repeat in the 5' non -coding region of the FMR1 gene.

And this huge expansion somehow shuts the gene off.

It does.

The gene gets chemically modified and is completely silenced.

It's not transcribed or translated, so you lose the FMR1 protein, which is essential for normal brain development.

So we can actually see where these different sequences are located on the chromosomes.

We can, using a technique called in situ hybridization, which basically means hybridization in place.

The modern version is BFIIH or fluorescence in situ hybridization.

You use fluorescently labeled probes that bind only to their complementary sequence on the chromosome.

And this is how we confirmed, visually, that satellite DNA is located exclusively at the centromeres.

And it's how we can see that a unique, non -repeated gene, like the Lammon B2 gene, maps to a single specific spot.

OK, so after the highly repeated DNA, what's the next class?

The next chunk of the genome, the second step in the re -annealing curve, is the moderately repeated DNA sequences.

This is a huge portion, 20 to 80 % of the genome.

And it includes some coding sequences, right, for things you need a lot of, like ribosomal RNAs and histones.

It does, but the vast majority of this fraction is non -coding, and these sequences are scattered or interspersed throughout the genome.

These are mostly the mobile elements we'll talk about in a minute, the signs and lines.

And that leaves the last class, the final,

slow re -annealing portion of the DNA.

The non -repeated DNA sequences, or single -copy DNA, this is what we traditionally think of as genes.

They follow Mendelian inheritance, and they code for almost all of our proteins, except for histones.

And many of these exist in multi -gene families, like the globin genes or actin genes.

Right, related but distinct single -copy genes that arose from ancient duplication events.

Which brings us to the big,

shocking reveal from the Human Genome Project.

The big reveal is that all of this, all the single -copy protein -coding DNA, accounts for less than 1 .5 % of the human genome.

One and a half percent.

The other 98 .5 % is non -coding.

And figuring out what all of that other stuff does is one of the biggest questions in biology today.

Hashtag tag 10 .6, the stability of the genome.

We tend to think of DNA as this incredibly stable, unchanging molecule.

But the genome as a whole is actually incredibly dynamic.

Oh, it's constantly changing, both over short and long evolutionary timescales.

One of the most dramatic ways it can change is through whole genome duplication, or polyplotization.

This is where an error in cell division leads to an offspring getting a whole extra set of chromosomes.

So instead of being deployed with two copies of each chromosome, they might become tetraploid with four copies.

It's very common in flowering plants, things like wheat, bananas, coffee, they're all polyploids.

It can be a very rapid way to create a new species.

And the evolutionary advantage is that you suddenly have all this extra genetic material to work with.

Exactly.

You have redundant copies of every gene.

So while one copy has to maintain the original essential function, the extra copies are free to be lost, inactivated, or most interestingly, evolve new functions.

This is the idea behind the 2R hypothesis, right?

Susumu Ono proposed that two rounds of whole genome duplication early in our history helped drive the evolution of complex vertebrates from simpler invertebrates.

And the genomic evidence seems to support this.

But whole genome duplication is pretty rare.

What's more common?

Gene duplication.

The duplication of just a small segment of a single chromosome.

And this usually happens because of a mistake during meiosis called unequal crossing over.

So the two homologous chromosomes don't line up perfectly.

Exactly.

They misalign.

So when they exchange genetic material, the exchange is unequal.

One chromosome ends up with a duplicated segment, and the other one ends up with a deletion.

And over evolutionary time, these duplicated genes can evolve into families.

That's the idea.

Most of the duplicates just accumulate mutations and become non -functional pseudogenes, which are like genetic fossils in our genome.

But every once in a while, a duplicate acquires a new, useful function, creating the multi -gene families we see today.

And the textbook example of this is the evolution of the globin genes.

The genes that make hemoglobin.

Right.

Through a series of ancient duplication events, an ancestral globin gene gave rise to the separate alpha -globin and beta -globin gene clusters we have now on different chromosomes.

And this allows for specialization.

We have different globin genes that are turned on at different stages of life, embryonic, fetal, and adult forms.

The fetal form, for instance, has a higher affinity for oxygen, which is essential for pulling oxygen across the placenta from the mother's blood.

It's a beautiful example of evolution creating complexity through duplication and divergence.

Okay, so that's duplication.

But the genome is dynamic in another, even stranger way.

The jumping genes.

Yes.

This is the story of Barbara McClintock.

She was studying maize, or corn, back in the 1940s,

and she noticed that some mutations would spontaneously appear and then disappear in later generations.

It was bizarre.

And her explanation was that there were mobile genetic elements, genes that could physically move from one place in the genome to another.

She called it transposition by transposable elements.

And the scientific community, which believed the genome was a stable, static thing, completely ignored her.

For decades.

She finally got the Nobel Prize for it in 1983, after her discovery was confirmed in bacteria.

And now we know these transposons are central to understanding the genome.

They all make an enzyme called transposase, which is what allows them to move.

And they move in two main ways.

The first is the DNA transposons, which use a cut and paste mechanism.

The transposase enzyme literally cuts the transposon out of one spot and pastes it into another.

The second, and much more common type in our genome, are the retrotransposons.

These use a copy and paste mechanism.

The retrotransposons DNA is first transcribed into an RNA copy.

Then a special enzyme called reverse transcriptase uses that RNA as a template to make a new DNA copy.

And it's this new DNA copy that gets integrated somewhere else in the genome.

So the original copy stays put.

This is how they can multiply so effectively.

It's exactly why.

And the numbers are staggering.

Something like two thirds of the human genome is derived from these transposable elements.

Most of them are dead, just molecular fossils.

But their legacy is everywhere.

And the two major families are the lines and signs.

Lines are the long, interspersed elements.

A few of them, like the L1 sequence, are still potentially active because they encode their own reverse transcriptase.

And senains are the short, interspersed elements.

The most common one is the ALU sequence.

We have over a million copies of it.

And ALU elements are parasites on the lines.

They don't have their own machinery.

So they hijack the L1 reverse transcriptase to copy and paste themselves around the genome.

And we used to call this junk DNA.

We did.

But now we know that over evolutionary time, these transposons have contributed to our genome in all sorts of ways.

They've created new regulatory sequences, new protein domains, and may have even been the ancestors of essential genes like telomerase.

They're a major engine of genomic change.

Right.

So the development of sequencing technology was the real game changer that allowed us to move from studying single genes to entire genomes.

It happened so fast.

First, prekaryote in 95, yeast in 96, and then the rough draft of the human genome in 2001.

The finished sequence in 2004, all 3 .2 billion base pairs.

And when they finally had the whole sequence, the biggest shock was the gene count.

It was a huge shock.

The estimates had been anywhere from 50 ,000 to maybe 150 ,000 propene coding genes.

And the actual number was?

Around 20 ,000, just 20 ,000 protein coding genes.

To put that in perspective, that's roughly the same number as a tiny microscopic worm, C.

elegans.

Exactly.

It just forced this massive shift in thinking.

The complexity of an organism like a human isn't about the number of parts, the number of genes.

It's about the complex regulatory software that controls those genes.

So if it's not the gene number, what is driving our complexity?

Well, there are three major factors.

The first one is alternative splicing.

This is where a single gene can produce multiple different proteins by splicing its messenger RNA in different ways.

Right.

It's like one gene can have a whole menu of protein options, and over 90 % of our genes probably do this.

So while we only have 20 ,000 genes, we might have 100 ,000 or more different proteins.

Okay, so that's one factor.

What's the second?

The second is non -coding RNAs, NCRNAs.

Remember, only 1 .5 % of our genome codes for protein, but over 70 % of it gets transcribed into RNA.

And a lot of that is regulatory RNA.

A huge amount.

Things like microRNAs, which are these tiny little RNAs that act like dimmer switches to fine -tune the expression of other genes.

And the number and complexity of these regulatory RNAs correlates much better with organismal complexity than gene number does.

And the third factor?

Systems biology.

The idea that complexity isn't about the parts themselves, but the interactions between them.

The complexity of the networks that our proteins and genes form.

Small increases in the number of nodes in a network can lead to an exponential increase in its overall complexity.

So if only 1 .5 % of the genome codes for protein, how on earth do we find the other important functional bits in that other 98 .5 %?

We use comparative genomics.

We compare our genome to the genomes of other species like mice or fish or chimps.

And the logic is that if a sequence is important for function, it should be conserved by evolution.

Precisely.

It's under purifying selection.

Natural selection weeds out changes in important sequences.

Non -functional DNA, on the other hand, is free to mutate and change rapidly.

It evolves neutrally.

So we can find the important stuff by looking for the sequences that have stayed the same across millions of years of evolution.

And when you do that, when you compare the human and mouse genomes, for example, you find that about 5 % of our DNA is highly conserved.

5%.

But we already said that protein coding in known regulatory regions only account for about 2%.

Exactly.

Which means there's another 3 % of our genome that consists of these mysterious conserved non -coding regions.

And the logic is, if it's conserved, it must be important.

It must be doing something crucial.

We just don't know what it is yet.

It could be new types of regulatory RNAs or sequences involved in chromosome structure.

Finding the function of that conserved 3 % is a huge frontier in biology right now.

All of this deep genomic understanding has led us directly into the age of genome engineering.

We hear a lot about tools like CRISPR -Cas9 for making small, precise edits.

But the field of synthetic biology has gone even further into building entire genomes from scratch.

And the landmark achievement here came from the J.

Craig Venner Institute.

In 2010, yeah, they synthesized the first self -replicating synthetic bacterial cell.

It was an incredible feat of engineering.

They chemically synthesized the entire 1 .2 million base pair genome of a bacterium in pieces,

stitched it all together, and then transplanted that synthetic genome into an empty host cell.

And it booted up.

The cell came to life.

It booted up and started replicating.

It was a cell controlled entirely by a synthetic genome.

And after that, their next goal was to figure out the minimal genome.

What's the absolute smallest set of genes required for life?

Right.

And they didn't try to rationally design it from first principles.

They took a more empirical approach.

They used a large scale mutagenesis technique to systematically knock out genes one by one to see which ones were absolutely essential for life in the lab.

And the result was a new synthetic bacterium called JCVI SYN 3 .0.

Created in 2016, it's a viable replicating bacterial cell that has only 473 genes.

That's about half the size of the original genome.

That's amazing.

But what's even more amazing is that of those 473 essential genes, 149 of them had completely unknown functions.

Wow.

So even in the absolute minimal stripped down version of life,

almost a third of the essential components are a complete mystery to us.

It's incredibly humbling.

It just shows how much fundamental biology we still have to learn.

And this ambition isn't just for bacteria.

There's a huge international project, SC2 .0, that's working on synthesizing an optimized version of the entire yeast genome.

The goal for all of this is twofold.

To understand life at its most basic level and to engineer organisms to do useful things for us, for medicine and industry.

Hashtag, hashtag 10 .9, the genetic basis of being human.

OK, so if comparing genomes tells us what's conserved and what we share with other species, we have to look at the differences to understand what makes us uniquely human.

And our closest living relatives are chimpanzees.

We diverged from a common ancestor about five to seven million years ago.

And our genomes are about four percent different.

But that's not just four percent single letter changes.

No, a lot of that difference comes from large dilutions and especially segmental duplications where whole chunks of the genome have been copied.

So researchers have been hunting for genes that evolved particularly fast on the human lineage.

And they found hundreds of them.

And very often these fast evolving genes are transcription factors,

master regulators that control big networks of other genes.

So they're prime candidates for driving major evolutionary changes like the expansion of our brain.

The most famous example of one of these genes is FOXP2.

Right.

It was initially linked to human speech because there are two amino acid changes in the protein that are specific to humans compared to chimps.

But the story got more complicated.

A lot more complicated.

Once we got the genomes of our extinct relatives, Neanderthals and Denisovans, we saw that they had the exact same human specific version of FOXP2 that we do.

So that version of the gene must be much older than modern humans.

It didn't arise with us.

Exactly.

So it might have been necessary for language, but it wasn't the final central switch that made modern human cognition unique.

It just shows how science is constantly revising these stories as we get more data.

Yes.

Humans have three extra truncated copies of this gene that chimps don't have.

And the protein from these extra copies actually inhibits the original full length protein.

And the result of that inhibition is an increase in dendritic spines on our neurons.

The little connections between brain cells.

So this duplication event may have been a key step in wiring up our larger, more complex brains.

And we have examples of very recent evolution driven by duplication too.

A fantastic one is the AMY1 gene, which codes for the amylase enzyme in our saliva that digests starch.

Chimps who don't eat a lot of starch usually have just one copy of this gene.

But human populations, especially those that adopted agriculture and started eating a lot of starchy foods like grains and tubers, show strong selection for having multiple copies of the AMY1 gene.

More copies means more enzyme, which means you can digest starch more efficiently.

It's a perfect example of copy number variation driving a very recent adaptation in human history.

And speaking of recent history, one of the most exciting stories has been unraveling our relationship with our archaic relatives.

For a long time, based on mitochondrial DNA, the thinking was that modern humans and Neanderthals never interbred.

But when Svante Paebo and his team sequenced the full nuclear genome from Neanderthal bones in 2010, the story changed completely.

They found that one to four percent of the DNA in all modern non -African populations is derived from Neanderthals.

So there was interbreeding.

After modern humans left Africa, they met and mixed with Neanderthals.

It's low levels, but significant.

And a lot of that Neanderthal DNA that's been retained in our genomes is related to the immune system.

It probably gave the newly arrived modern humans some ready -made adaptations to local pathogens.

We now know there was gene flow between all three groups, Sapiens, Neanderthals, and Denisophans.

So within our own species, we're 99 .9 percent similar.

But that tiny fraction of a profound difference is where all the variation is.

Right.

The most common type of variation is the single nucleotide polymorphism, or SNP, a single letter change that's present in at least one percent of the population.

We each differ by about three million SNPs from any other random person.

But the big sequencing projects have also found a huge number of single nucleotide variants, or SNVs.

These are the rare SNPs, the ones with a frequency below half a percent.

And the reason we have so many of these rare and often slightly deleterious variants is probably because our population has grown so explosively in recent history.

There just hasn't been enough time for natural selection to weed them all out.

So how do we apply all this genomic knowledge to understand human disease?

Well, traditional genetics was very good at finding the genes for Mendelian disorders, like Huntington's, where a single high -penetrance mutation virtually guarantees you'll get the disease.

But most common diseases, heart disease, diabetes, cancer, they're not like that.

They're genetically complex.

They involve many different genes, each contributing a small amount of risk, plus a large environmental component.

And to find those small -effect genes, we use genome -wide association studies, or GWAS.

Right.

You take thousands of people with the disease and thousands without, and you scan their genomes for common SNPs that are statistically associated with the disease.

The assumption is that common diseases are caused by common variants.

And there have been some big successes, like finding a strong link between the CFH gene and macular degeneration.

A huge success.

But there's also a big problem with genome odds.

They tend to find variants that confer only a very tiny amount of risk, like 1 .2 times the odds.

This is the problem of missing heritability.

Exactly.

We know from twin studies that these diseases have a large genetic component.

But when you add up all the risk variants found by GWAS, they only explain a small fraction of that heritability.

So where's the rest of it?

That's the million -dollar question.

It could be that there are thousands more of these tiny -effect common variants we haven't found yet.

Or it could be that the risk is driven by rare variants with a larger effect, which GWAS isn't designed to find.

Or it could be complex interactions between genes, something called epistasis.

To make these huge studies more manageable, we use the concept of haplotypes.

Right.

Because of how recombination works, SNPs are not inherited independently.

They're inherited in large blocks, sometimes tens of thousands of base pairs long, called haplotypes.

So you don't need to test every single SNP in the block.

You just need to test a few tag SNPs, and they serve as a proxy for the entire block.

The HapMap project mapped all these common haplotypes, which made GWs much cheaper and more efficient.

And these haplotypes can also tell us about recent evolution.

Yes, like the lactose -tolerance haplotype in European populations.

It's an unusually long block of SNPs that haven't been broken up by recombination, which is a classic signature of very strong recent positive selection.

Linked to the history of dairy farming.

Exactly.

And finally, beyond single base changes, we also have larger structural variations, or SVs.

Big deletions, duplications, and inversions.

The most common type being copy number variation, or CNV.

Which is just having extra or missing copies of a gene or a DNA segment.

CNVs affect a huge portion of our genome.

We talked about the AMY1 gene for starch digestion.

Another example is that extra copies of the APP gene are linked to early onset Alzheimer's disease.

So this structural variation is a major source of human diversity and disease risk.

For our last topic, we're going to come back to nature and look at a case of natural genetic engineering.

And a mechanism that we've now co -opted for our own use in biotechnology.

We're talking about the bacterium agrobacterium tumifaceans.

This is the bug that causes crown gall disease in plants.

It's basically a plant cancer.

It is.

And the bacterium is an incredible natural genetic engineer.

Here's how it works.

When a plant gets a wound, it releases these phenolic compounds.

Which act as a signal to the bacteria.

Exactly.

It activates a set of genes in the bacteria called the vir genes.

These genes make the proteins that cut a specific piece of DNA called the tDNA out of a plasmid in the bacterium.

So it cuts out this little transferable piece of DNA.

And then it uses this amazing machine called a type 5e secretion system.

It's like a molecular hypodermic needle to inject that single strand of tDNA into the plant cell.

Wow.

Once it's inside, the tDNA travels to the plant cell's nucleus and integrates itself into the plant's own genome.

And what does that tDNA do once it's in there?

The tDNA contains genes that code for the production of plant growth hormones, oxen, and cytokinin.

So the bacterium hijacks the plant cell and forces it to make hormones.

Which causes the plant cells to divide uncontrollably, forming the tumor, or gall.

And this is precisely the mechanism that scientists have harnessed to make GMOs.

It is.

Researchers just take out the tumor -inducing genes from the tDNA and replace them with a gene of interest, say a gene for herbicide resistance or drought tolerance.

Then they let the agrobacterium do its thing and deliver that new gene into the plant genome, creating a transgenic plant.

It's a powerful tool that came straight from nature.

Hashtag tag outro.

What an incredible journey.

I mean, we've charted this whole trajectory from Mendel's purely abstract factors.

All the way to their physical location on chromosomes, their chemical bases, as the double helix.

Acknowledging this dynamic, almost chaotic nature of duplication and transposition and then ending up in the just mind boggling complexity of modern genomics.

I think the central realization of this century is maybe the most profound.

That the complexity of an organism is driven so much less by the sheer number of protein coding genes.

That tiny one and a half percent of our genome.

Than by the incredibly elaborate dynamic systems that regulate them.

The alternative splicing, the vast network of non -coding RNAs, all the systems level interactions.

It's like we have this huge operating system, 98 .5 % of our genome, dedicated to running the 1 .5 % that are the actual applications.

So to leave you with one final provocative thought, let's go back to the globin gene evolution.

We saw how the gene structure changed over time.

We know our human beta globin gene has two introns.

What if we found a new primate species and its beta globin gene was missing one of those introns?

Huh.

So the question is, what would that mean?

Does it mean that primate's lineage split off from ours before that second intron appeared?

Or could it be a case where that primate's ancestor had the intron but then lost it later on in its own evolution?

Right.

It forces you to think about evolution not as this simple linear progression, but as this constant process of both adding things and taking them away over deep time.

Every piece of DNA is a record of those choices.

It really is.

This has been a monumental deep dive into the nature of the gene and the genome.

We tried to explore every concept and experiment from the source material.

We hope this conversation provided the last minute lecture level of clarity you were looking for.

Thank you for letting us dive deep into your source material with you.

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

Chapter SummaryWhat this audio overview covers
Mendelian inheritance establishes that hereditary traits are governed by discrete units of inheritance, later identified as genes existing in paired allelic forms that segregate during the formation of gametes. The chromosomal theory of inheritance connects these genetic units to physical structures on chromosomes, with their pairing and separation during meiosis directly mirroring Mendelian patterns of inheritance. Genes positioned on the same chromosome create linkage groups, though incomplete linkage occurs through crossing over, a reciprocal exchange of genetic material that enables the creation of genetic maps based on recombination frequency. Definitive identification of DNA as the genetic material came from landmark experiments including bacterial transformation studies and bacteriophage research, which conclusively demonstrated that DNA rather than protein carries hereditary information. The Watson-Crick model characterizes DNA as an antiparallel, complementary double helix stabilized by hydrogen bonds between specific base pairs, consistent with empirical patterns of base composition. Topoisomerase enzymes regulate DNA topology by managing the supercoiling necessary for proper packaging and replication. Genome complexity analysis through DNA denaturation and renaturation techniques reveals that eukaryotic genomes contain highly repetitive sequences including short tandem repeats, moderately repetitive dispersed sequences such as SINEs and LINEs, and unique single-copy genes. Evolutionary changes in genomic structure arise from polyploidization and unequal crossing over events that generate multigene families and pseudogenes. Comparative analysis between human and chimpanzee genomes demonstrates extensive sequence conservation alongside substantial structural variation and copy number variations, exemplified by differences in amylase gene duplication. Genetic diversity within human populations is characterized primarily by single nucleotide polymorphisms, which occur in inherited blocks termed haplotypes and form the basis for genome-wide association studies investigating the genetic basis of complex diseases. Contemporary approaches in genomic science include the construction of minimal genomes and the exploitation of natural gene transfer mechanisms for practical genetic modification applications.

Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.

Support LML ♥