Chapter 35: DNA Organization, Replication, & Repair

0:00 / 0:00
Report an issue

Welcome to Last Minute Lecture.

This free chapter overview is designed to help students review and understand key concepts.

These summaries supplement not replaced the original textbook and may not be redistributed or resold.

For complete coverage, always consult the official text.

Welcome to the Deep Dive.

Today we're tackling what has to be one of the most incredible feats

of cellular engineering.

It really is.

We have sources here detailing the structure, replication and repair of DNA,

and the core challenge is just staggering.

How do you take three billion base pairs of genetic information, I mean meters of material, and organize it, copy it perfectly, and maintain it flawlessly all inside a nucleus that's just microns across?

It seems impossible.

Right.

So our mission today is to unpack the cellular mechanics that somehow solve this massive paradox.

And you know solving that paradox is critical because stakes just could not be higher.

The systems that manage DNA replication, recombination, crossover, they provide the genetic adaptability we need for life, but they are also the primary sources of disease when they fail.

So every time a cell divides, we're running a small calculated risk.

A very calculated risk.

It's not abstract at all.

We know that on average, a mutation occurs at a frequency of about one in every million cell divisions.

And we look at those in two ways.

There's vertical transmission,

so it's hereditary because it happened in a germ cell, and then there's horizontal transmission, which happens in somatic cells.

That causes diseases like most cancers that affect the individual, but you know they aren't passed down.

And that's just the baseline.

External factors can make it worse.

So much worse.

We are constantly surrounded by things, viruses, industrial chemicals, UV light, ionizing radiation, that just push that already risky rate much, much higher.

That sets the stage beautifully.

We have to start with the organization problem because if the DNA is in package correctly, nothing else can possibly work.

The material is called chromatin.

So what are the basic ingredients for this miracle of compaction?

Chromatin is the functional stuff of the chromosome.

It's essentially a blend.

You've got the very long double -stranded DNA.

It's coiled up with the nearly equal mass of these small, highly basic proteins called histones.

And then you have a smaller group of non -histone proteins, which are mainly the enzymes you need for the next steps,

like replication and RNA synthesis.

So the histones are like the scaffolding crew,

but they do more than just wrap the DNA, don't they?

I mean, if they just wrapped it, it would be completely inaccessible.

That is the first major insight you need to take away.

The role of histones is fundamentally dual.

Yes, their primary job is physical condensation, packing it all down.

Right.

But they are also integral participants in gene regulation.

They're the ones that determine which sections of the genome are accessible at any given time.

We can see evidence of this under a microscope, right?

The classic beads on a string look.

Yeah, that's the one.

Tell us about those bees, the nucleosomes.

So the nucleosome is the fundamental packaging unit.

The core is a disc -shaped histone octamer, which is two copies each of the core histones H2A, H2B, H3, and H4.

And the wrapping around that core is incredibly precise.

Incredibly.

We're talking 1 .75 super helical left -handed turns of DNA wrapped tightly around that octamer surface.

This specific wrapping protects about 145 to 150 base pairs of DNA.

That level of precision, 1 .75 turns, 145 base pairs.

Why is the cell so meticulous about these measurements?

Because it's trying to balance protection with availability.

That tight wrapping shields the DNA from damage.

But by leaving the DNA in the linker region, that's the roughly 30 base pairs between the beads, it allows regulatory machinery to get in and, you know, engage with the genetic material when it needs to.

So if it were wrapped too tightly, we couldn't replicate or transcribe anything.

Exactly.

And this is where the regulation really kicks in.

These four core histones are highly conserved, which means their function is agent.

But they all have these long amino terminal tails that kind of poke out of the core particle.

Like little signal flags for the rest of the cell.

That's a perfect way to put it.

It's the cell's primary access control system.

Those tails are heavily modified by what we call post -translational modifications, or PTMs.

We've identified at least six types of covalent modifications.

There's acetylation, methylation, phosphorylation.

And some others, like ADP ribosylation and semiolation.

It's a huge list of chemical tags.

Instead of listing every single one, can we maybe group them functionally?

Like what are the basic jobs these PTMs do?

Absolutely.

Think of them in terms of speed and purpose.

You have modifications like acetylation, often on H3 and H4, and phosphorylation.

These act fast.

Acetylation of histones neutralizes the positive charge, which loosens that DNA -histone interaction.

So it's strongly associated with gene activation.

It opens the gate.

It opens the gate.

And then other modifications, like ADP ribosylation, are specifically linked to tagging DNA for the slower multi -step processes of, say, DNA repair.

So these PTMs allow the cell to instantly change chromatin structure from closed to open, or to mark a spot for later.

So if the histones control that first level, the 10 -nanometer fibril, the beads on a string, how does the cell achieve even higher levels of compaction to get ready for division?

We move to step two, the formation of the 30 -nanometer chromatin fiber.

The 10 -nanometer fibril supercoils itself, forming this helical structure that packs six or seven nucleosomes into every single turn.

And the H1 histones are involved here, right?

Right.

The less tightly bound H1 histones appear to act like staples, stabilizing this 30 -nanometer structure and locking it all down.

That is a huge jump in density.

You mentioned earlier the scale difference is just astonishing.

That 8 ,000 -fold linear decrease in length, how on earth does the cell manage to decondense that massive organized structure so quickly in just hours without creating a catastrophic tangle?

It's a marvel of molecular machinery.

It relies on enzymes like the poissamerases and helicases working together with histone modification and chromatin remodeling complexes.

The key insight is that DNA compaction isn't just folding string.

It's a dynamic, highly regulated change in state dictated by the cell cycle.

And this structure is what dictates whether a gene is turned on or off.

So walk us through the two main states, active versus inactive.

We classify chromatin into two fundamental activity states.

First, you have euchromatin.

This is the transcriptionally active or potentially active material.

It stains less densely, it's replicated earlier in the cell cycle, and it's generally more open.

And then there's heterochromatin.

The opposite.

Densely packed and transcriptionally inactive.

And heterochromatin even has subcategories.

It does.

Constitutive heterochromatin is the stuff that is always condensed.

Think of the structural parts, like at the centromeres and telomeres.

Then you have facultative heterochromatin, which can condense and decondense depending on what the cell needs.

The classic example being the inactive X chromosome in females.

Exactly.

It's globally silenced by being converted into facultative heterochromatin.

Then we can get clues about which genes are active by using enzymes, can't we?

Precisely.

We find that DNA in active regions is relatively more sensitive to nucleases like DNA's eye.

But if you look closer, you find these specific zones called hypersensitive sites.

Hypersensitive.

What does that mean exactly?

These are small regions, maybe 100 to 300 nucleotides long, that are 10 times more sensitive to digestion.

And the reason is that the nucleosomes have been explicitly pushed out of the way, exposing the DNA so that critical regulatory proteins can bind right upstream of an active gene.

So it's like a molecular billboard that says access here.

That's it.

Conversely, in truly inactive regions, we often find high levels of 5 -methyl deoxycytidane, or MEK.

This is a common DNA modification that acts as a long -term chemical silence switch, further correlating with gene repression.

Okay, let's zoom out to the macroscopic level, the chromosomes themselves.

We know the centromeres connect the sister chromatids, but the ends, the telomeres, are fascinating.

Telomeres are vital.

They protect the ends of the linear chromosome from, from degradation and from fusing with each other.

They're defined by short, repeating TG -rich sequences.

In humans, it's 5' TG3' and their length is maintained by the enzyme telomerase.

And telomere shortening is basically a molecular clock for aging.

It is.

And on the flip side, inappropriately high telomerase activity is a hallmark of many cancers, which lets tumor cells become essentially immortal.

Okay, so moving beyond the structure, let's talk content.

It still blows my mind that, after all that complexity, only about 1 % of the entire human genome actually codes for protein.

It just highlights how much regulatory information we're still figuring out.

We estimate about 25 ,000 proteins come from that 1%, the exonic DNA.

The other 99 % is all non -protein coding material, regulatory sequences,

structural elements, and vast repetitive sequences.

And within those coding genes, we have the exon intron architecture.

What's the functional benefit of having these long, non -coding introns interrupting the coding exons?

Why is our genome structured like that?

It's a great question about evolutionary efficiency.

Introns have to be precisely cut out after transcription, and the exons are spliced back together to make the mature mRNA.

But the huge advantage is differential splicing.

So by using different combinations of exons, one gene can produce multiple related proteins.

Exactly.

It multiplies the output from a fixed amount of genetic material.

They also probably facilitate faster evolution by providing these safe spaces for genetic rearrangement to happen.

And what about the truly repetitive non -coding sequences?

We break these into two main groups.

First, highly repetitive DNA.

This is usually short clustered repeats you find at the centromeres and telomeres, and they serve a structural role.

Then there is the massive group of moderately repetitive interspersed sequences.

These are the mobile elements, like lines and signs.

Right, and signs like the Allu family are just everywhere.

They're everywhere.

The Allu family alone makes up about 10 % of your total genome.

What's crucial is that both lines and signs are generally classified as retroposins.

They move through an RNA intermediate using reverse transcriptase.

They essentially copy themselves and paste the copy somewhere else.

Which can be disruptive.

Very.

An flu insertion into a functional gene, for example, is a known cause of genetic disease, like myofibromatosis.

It's like having this jumping DNA that can accidentally land in a really critical spot.

And we also use the even smaller repeats, the microsatellite repeats, two to six base pairs, like ACTG, for genetic linkage mapping, because their number is so variable from person to person.

But even these tiny elements can cause catastrophe.

We know that unstable trinucleotide repeats, like CGG, CAG, CTG, are the underlying cause of severe neurologic disorders like Fragile X Syndrome and Huntington Korea.

Before we move on to the copying mechanism, we should briefly mention the other genome hiding out in the cell, the mitochondrial DNA.

Right.

MTDNA.

It's small, circular, and double -stranded, only about 16 kilobytes long.

And while it's only 1 % of the cellular DNA, it codes for essential things.

13 respiratory chain proteins, rRNAs, and tRNAs.

And its most critical feature is how it's transmitted.

Yes.

It is exclusively inherited from the mother through maternal non -Vendelian inheritance.

This means mutations here are passed down purely on the maternal side, and they typically cause myopathies and neurologic disorders.

Okay.

Organization complete.

Now for the copying machine.

DNA replication.

The core principle seems deceptively simple.

You need a single -stranded template.

Simple.

In theory, incredibly complex in execution.

Replication requires

identifying multiple starting points, the origins of replication.

Then DNA helicases, like the MCM complex in eukaryotes, have to unwind that tight double helix.

And then other proteins have to hold it open.

Exactly.

Single -strand binding proteins, or RPA in eukaryotes, have to stabilize the resulting single strands so they don't just immediately snap back together.

The ultimate difficulty, though, comes from the rule of DNA polymerase.

It can only synthesize a new strand in the 5' to 3' direction.

How does the cell copy two anti -parallel strands at the same time?

This leads to the fundamental asymmetry at the replication fork.

One strand, the leading strand, is lucky.

It runs in the 5' to 3' direction, facing the unwinding fork, so it can be synthesized continuously.

But the other one, the lagging strand, is the cell's worst -case scenario.

It is.

It has to be synthesized discontinuously.

Which means it has to constantly restart.

Constant restarting.

The lagging strand is built in these short segments called Okazaki fragments, which are only 100 to 250 nucleotides long in eukaryotes.

Think of the leading strand as a smooth zipper pull, and the lagging strand as a thousand tiny zippers, each needing its own separate starting point.

And each one of those tiny fragments needs a whole crew to clean up after it.

A whole crew.

Each fragment needs an RNA primer to be synthesized by pole alpha primus.

After the fragment is made, those RNA primers have to be removed, the gaps filled in by a polymerase, and then finally, the entire fragment has to be sealed by DNA ligases.

And that sealing step is energy intensive.

Very.

It specifically requires ATP hydrolysis to make that final bond.

We also need to talk about the speed issue here.

We're replicating 3 billion base pairs.

If we only had one origin, like a bacterium, it would take over 150 hours to copy the human genome.

Which is why mammalian cells use multiple origins.

We proceed bidirectionally from hundreds of these sites all at once, creating these replication bubbles all along the chromosome.

This parallel processing cuts the time down to about 9 hours.

Which fits neatly into the S phase of the cell cycle.

Exactly.

And while helicase is opening the strands, the coil ahead of it must be getting impossibly tight, creating massive torsional strain.

Who handles that stress?

That is the job of the DNA to poissomerases.

They act as a molecular swivel.

They introduce temporary nicks in the DNA to release the tension, letting the strands unwind, and then they quickly reseal the nicks.

And there's a critical point about two poissomerase I.

Yes, this is remarkable.

Topo I, which introduces single strand nicks, is ATP independent for its healing function.

The energy needed is stored in a temporary covalent bond to the DNA itself.

This makes it mechanistically distinct from DNA ligase, which always needs ATP.

And all this has to be perfectly timed and controlled to make sure the DNA is copied only once per cycle.

Right, and that control is managed by cyclins and cyclin -dependent kinases, or CDKs.

These protein complexes rise and fall, driving the cell through the G1, S, G2, and M phases.

When it's time to replicate, DNA is briefly licensed to be copied, and once S phase is done, that license is revoked, preventing disastrous re -replication until the cell completes mitosis.

So we have the comping machine running, but given the scale and speed, mistakes are, well, they're inevitable.

This brings us to the quality control system,

DNA repair.

It's an elaborate, redundant network of five major repair pathways.

We can group them based on what they fix.

First, for small localized damage, you have BER, or Base Excision Repair.

It fixes simple things like a basic site.

Then there's NER, Nucleotide Excision Repair, which is specialized for fixing bulky structural problems, like the Pyramidine dimers caused by UV radiation.

And NER is famous for its clinical correlation.

Indeed.

A defect in NER causes the disease Xeroderma pigmentosum.

Patients are exquisitely sensitive to sunlight because they can't repair UV damage, leading to a dramatically increased risk of skin cancer.

And what about mistakes made during replication itself?

That's the third pathway, MMR, or Mismatch Repair.

It's the proofreader that catches errors missed by the polymerase.

Defects here lead directly to Hereditary Non -Polyposis Colorectal Cancer,

or HNPCC.

Okay, so that covers the small defects.

But a double -strand break seems like the most catastrophic form of damage possible.

How does the cell deal with a snapped chromosome?

It has two options, depending on the cell cycle stage.

If it's in the S, G2, or M phases, the cell uses HR, or homologous recombination.

This is the high -fidelity, highly accurate path.

It uses the intact sister chromatid as a perfect template to repair the break.

And this pathway is so crucial that tumor suppressor proteins like BRCA1 and BRCA2 are part of it.

They are core components of the HR repair machinery, yes.

But what if there's no sister chromatid template available, say in G0 or G1?

Then the cell has to resort to NHEJ, non -homologous end joining.

This is the quick and dirty method.

It essentially glues the broken ends back together with little regard for lost information.

It's better than letting the chromosome fall apart, but it can introduce errors.

And defects there lead to diseases like SCID.

Severe combined immunodeficiency disease, yes.

It's remarkable how the cell chooses between this accurate but slow fix, and a fast but messy one.

And all these repair processes are overseen by a management team.

The checkpoint controls.

The checkpoints are the ultimate security system.

They constantly monitor DNA integrity, especially at the G1 and G2 stages.

And if damage is found, they just slam the brakes on cell cycle progression.

They ensure that replication or division does not proceed until the repair is complete.

And the molecule that stands as the ultimate arbiter, the one that makes the decision, repair, delay, or self -destruct.

That is the tumor suppressor protein, P53.

When DNA damage stabilizes it, P53 rapidly accumulates and acts as a transcription factor.

It has two main programs.

Program 1, induce genes that cause a cell cycle delay, often by activating P21, which is a potent inhibitor of the CDKs.

And program 2.

If the damage is just too extensive, P53 activates genes that trigger apoptosis or program cell death.

It prevents the damaged cell from becoming malignant.

The centrality of P53 just can't be overstated, can it?

Not at all.

Clinically, P53 is the single most frequently mutated gene in human cancers.

Over 80 % of human cancers carry loss of function mutations in P53.

Wow.

It just underscores the fact that disrupting this single checkpoint, that decision to stop and fix, is one of the most effective ways for a tumor to start forming.

This has been a true deep dive into molecular resilience.

So to recap for you.

DNA is packaged with incredible precision into regulatory nucleosomes via histones, which use PTMs as their communication system.

Replication has to be semi -discontinuous and rely on parallel processing multiple origins to be fast enough.

And finally, the cell's integrity is fiercely protected by redundant repair pathways, all managed by these critical checkpoint proteins like P53.

And remember the clinical connection.

Every disease we discuss, from xeroderma pigmentosum to Huntington's Korea to cancer, is fundamentally a breakdown in this machinery.

The link between this basic biochemistry and human pathology is absolute.

And here is a final provocative thought for you to carry forward.

We spent a lot of time on the 1 % of the genome that codes for protein.

But the sources reveal that the vast majority of that non -coding, 99 % of supposed dark matter, is actually transcribed in some cell types, generating regulatory long non -coding RNAs and other elements.

If we are only now starting to define the function of this massive regulatory landscape, how many more critical, currently undefined sequences are waiting to be linked directly to human health and disease?

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

Chapter SummaryWhat this audio overview covers
Genomic organization and faithful transmission of genetic information depend on intricate mechanisms that package, replicate, and protect DNA with remarkable precision. The human genome's three billion base pairs are compacted into the nucleus through a hierarchical system beginning with nucleosomes, where DNA winds around histone octamers to form elementary structural units, then progresses through increasingly condensed chromatin fibers to fully condensed metaphase chromosomes during cell division. This packaging creates functionally distinct domains: euchromatin remains transcriptionally active and accessible to the cellular machinery required for gene expression, while heterochromatin adopts a tightly condensed, transcriptionally silent state. The genomic landscape extends far beyond protein-coding sequences, encompassing introns and numerous repetitive elements, particularly long and short interspersed nuclear elements that contribute to genetic variability and occasionally cause disease when they relocate within the genome. Mitochondrial DNA, structurally distinct as a circular molecule, follows a separate inheritance pattern transmitted exclusively through the maternal lineage. During replication, DNA synthesis occurs semiconservatively in the S phase of the cell cycle through coordinated action of specialized enzymes: helicases unwind the double helix, primase synthesizes short RNA primers, and multiple DNA polymerases construct leading and lagging strands with high accuracy, with okazaki fragments stitched together on the lagging strand. Cell cycle regulation involves checkpoints controlled by cyclins and cyclin-dependent kinases, with tumor suppressors such as p53 and the retinoblastoma protein functioning as gatekeepers to prevent propagation of damaged DNA. Protecting genomic integrity requires multiple repair pathways: nucleotide excision repair removes bulky lesions caused by ultraviolet radiation and chemical damage, base excision repair addresses small modified bases, and mismatch repair corrects replication errors. These safeguarding mechanisms are critical barriers against the genomic instability underlying cancer and hereditary genetic disorders.

Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.

Support LML ♥