Chapter 4: DNA, RNA & Flow of Genetic Information

Search this chapter

Audio Overview

0:00 / 0:00

Autoplay next chapter

Welcome to Last Minute Lecture.

This free chapter overview is designed to help students review and understand key concepts.

These summaries supplement not replaced the original textbook and may not be redistributed or resold.

For complete coverage, always consult the official text.

Welcome back to The Deep Dive.

Today we are undertaking an expedition into, well, the molecular blueprints of life itself.

We're moving past general biology and zooming right into the fundamental architecture of heredity.

We really are.

We're talking about the nucleic acids, DNA and RNA.

Exactly.

And our source material today takes us on a deep exploration of the covalent and the three -dimensional structures of these molecules.

So this isn't just about what they are.

No, not at all.

Our mission is to understand exactly how their physical atomic arrangement, I mean, the subtle differences in their sugars, the angles of their bonds, even the charge distribution dictates every single process of information flow.

From storage to synthesis.

From stable storage all the way to accurate protein synthesis.

It's really the ultimate lesson in biochemistry.

Structure defines function.

Okay, so let's unpack this with a central dogma as our frame.

We know DNA and RNA are long linear polymers.

They're basically molecular strings or nucleic acids.

Right, but they aren't just, you know, amorphous blobs.

They're incredibly precise chains built from these simple repeating units called nucleotides.

And that precision starts right at the monomer level.

It has to.

A nucleotide is a perfect three -part system.

You have a sugar, either ribose or

deoxyribose, a phosphate group, and then one of four distinct nitrogenous bases, A, T, or U, G, or C.

And the sugars and phosphates are the ones that link up.

They join hands, so to speak, to create this repeating structural backbone.

It's the scaffolding of the entire molecule.

Okay, so that backbone provides the structure, the robustness.

Yeah.

But the actual data, the information, that's in the bases.

That's where it is.

Adenine, thiamine, guanine, cytosine, or uracil.

It's linear information, sequential, just like letters spelling out a sentence.

That specific order, that sequence, spells out the complete instruction set for building and operating an entire organism.

And the flow of all that information was formalized by Francis Crick way back in 1958.

That's right.

His proposal, the central dogma of molecular biology, is still, for the most part, the organizing principle for all of life.

So the dogma lays out this standard sort of one -way path for information in a cell.

Precisely.

It begins with replication.

That's where DNA copies itself, you know, with incredible accuracy, DNA to DNA.

And that ensures the information gets passed to daughter cells perfectly.

Exactly.

Next up is transcription.

This is where the archival DNA information is copied into a temporary messenger molecule.

So DNA to mRNA.

And finally, translation.

Right, where the sequence encoded in that mRNA guides the synthesis of specific proteins, mRNA to protein.

And a really crucial insight here is that DNA, the master blueprint, it's not the direct template for making proteins.

No, it stays safely archived in the nucleus.

It sends out a working copy, that RNA intermediate, to go guide the protein assembly line out in the cytoplasm.

So our deep dive today is really focused on drilling down into that molecular architecture, the covalent bonds, the 3D geometry, the transfer mechanisms.

Yes, to truly grasp why DNA is the ideal archive and how these temporary RNA intermediates manage their really complex, sometimes even catalytic,

roles.

Okay, let's start with the covalent structure of the raw materials then, the building blocks.

We're jumping into section one, which focuses entirely on these monomer units, the nucleotides.

Right.

So while all nucleic acids are linear polymers built from nucleotides, the fundamental difference between the building blocks of DNA and RNA is it's chemically subtle, but it's biologically profound.

Let's nail those distinctions down then, starting with the sugar component, which literally gives the molecules their names.

Okay, so DNA contains deoxyribose, and that deoxy prefix is the key.

If you look at the sugar ring, specifically at the two prime carbon atom, we number sugar carbons with primes, so two prime, three prime, five prime.

Right.

The deoxyribose sugar is missing an oxygen atom that's present in the ribose sugar used in RNA.

So RNA has a hydroxyl group, an OAH, at that two prime position.

So RNA has a two prime hydroxyl group and DNA does not, that one missing oxygen atom.

It sounds so minor, but it has massive implications for the molecule's purpose, right?

Oh, absolutely huge.

We'll come back to the stability point in a second, but let's just confirm the second difference, which is the bases.

Okay, both molecules use the large double -ringed purine derivatives,

adenine A and guanine G.

Correct.

But they diverge when it comes to the smaller single -ringed pyrimidines.

They do.

DNA uses cytosine C and thymine T.

RNA, on the other hand, replaces thymine with uracil.

So that's the classic giveaway.

If you're looking at a sequence and you see you know you're dealing with RNA, you see a T, it's almost certainly DNA.

Barring a few weird viral exceptions, yes, that's the rule.

Okay, now let's connect these monomers to form the chain.

They're linked via phosphodiester bridges.

This linkage is really the structural mortar of the nucleic acid, and it's always a three prime to five prime linkage.

What does that mean specifically?

It means the three prime hydroxyl, the three prime OH group of the sugar on one nucleotide, is covalently attached to a phosphate group.

That phosphate then links to the five prime hydroxyl group of the sugar on the next nucleotide in the chain.

This repeating three prime five prime linkage forms that super robust backbone structure.

And what's the immediate chemical consequence of this specific backbone structure?

Well, the phosphodiester bridge is not neutral.

It carries a really crucial negative charge at physiological pH.

And why is that charge so important?

That negative charge is essential for structural stability.

It actively repels other negatively charged molecules, especially hydroxide ions, OH minus, that might otherwise attack the phosphate linkage and you know break down the backbone through hydrolysis.

That's a fascinating point.

If the backbone is negatively charged, wouldn't that mean the entire DNA molecule is highly repulsive to itself?

How does the cell manage to cram billions of these negative charges together into a tiny nucleus without the molecule just flying apart?

That is a fantastic question and it speaks to the very first step of packaging.

Because the DNA backbone is so strongly negative, the cell is chemically forced to recruit massive numbers of positively charged molecules,

mostly proteins, yes, specifically proteins called histones.

These basic proteins essentially neutralize the negative charge of the DNA backbone, which then allows the long molecule to fold up tightly and condense.

That's the foundational first step in the whole packaging challenge, turning a linear chain into a compact chromosome.

Wow, so charge dictates function right from the start.

Now let's circle back to that small sugar difference, the absence of the two prime OH in DNA.

Right.

That difference is the absolute key to DNA's archival stability.

If you take RNA and put it in a mildly alkaline solution, it just breaks down.

It self -cleaves very quickly.

Wow.

Because that two -prime hydroxyl group acts as an internal nucleophile, it attacks the adjacent phosphodaster bond and basically facilitates the molecule's own destruction.

Because DNA lacks that two -prime OH group, it is fundamentally far, far more resistant to hydrolysis.

So the simple chemical stability makes DNA the optimal choice for permanent hereditary material.

Exactly.

And RNA, which is structurally less stable, is better suited for its role as a temporary regulated messenger.

It's a perfect division of labor, and it's all defined by a single oxygen atom.

Now, just on nomenclature, we should probably clarify the terms we use for these units.

Okay.

We start with a nucleoside.

Right.

That's just a base bonded to a sugar.

So adenosine or deoxyguanosine, the base attaches to the C1 prime carbon of the sugar.

There's a small convention note too.

We often call the nucleoside with thymine just thymidine.

And by convention, that already implies it has a deoxyribose sugar, so the deoxy prefix is often left off.

And then when we add the phosphate, we get the actual building block, the nucleotide.

Correct.

A nucleotide is a nucleoside joined to one or more phosphoro groups, like a monophosphate, such as deoxyadenylate or a triphosphate.

And crucially, it's the nucleotide triphosphates, DATP, GTP, UTP, CTP, that are the activated precursors, the high -energy building blocks the cell actually uses to build the chains.

And we absolutely have to give a shout out to ATP, adenosine 5 prime triphosphate.

The celebrity of the family, for sure.

While it is technically a precursor for RNA synthesis, its most common and vital role is as the primary energy currency of the cell.

The energy released when its triphosphate group is cleaved drives countless cellular processes that would otherwise be thermodynamically impossible.

Now that we have we need to recognize its structure.

Like a polypeptide, a nucleic acid chain has directionality or polarity.

This polarity is non -negotiable.

It defines everything.

One end of the chain will always have a free 5 prime OH group or a 5 prime phosphoro group, and the other end will always have a free 3 prime OH group.

And that defines the reading standard for everything that comes next.

Everything.

The universal convention is that sequences are always written, read, and synthesized in the 5 prime to 3 prime direction.

If you see a sequence like ACG, you know the 5 prime end is the adenylate and the 3 prime end is the guanylate.

This convention is absolutely vital for understanding how enzymes like polymerases operate.

Finally, let's just appreciate the sheer scale of information storage we're talking about here.

These polymers need to encode the instructions for an entire functioning cell.

The lengths is just staggering.

A simple polyomavirus has strands of about 5 ,100 nucleotides.

A common bacterium like E.

coli has a single circular DNA molecule of 4 .6 million nucleotides per strand.

And for us in the human genome.

We have roughly 3 billion base pairs in each strand, distributed across 24 chromosomes.

And to really grasp that scale, consider the Indian muntjac deer.

It has this remarkably large chromosome with over a billion nucleotides.

If you could take that one molecule and stretch it out end to end, it would physically be over a foot long.

A foot long inside a single cell.

An incredible amount of linear information that has to be packaged, accessed, and copied perfectly.

That incredible length brings us to section two.

The three -dimensional structure that defines life.

The double helix.

This is the icon of molecular biology.

It is.

And while Watson and Crick get the primary credit for the final model, we have to acknowledge that their work stood on the shoulders of some critical data.

Right, specifically the X -ray diffraction photographs.

Yes, from Maurice Wilkins and Rosalind Franklin, which clearly indicated a regular helical structure with repeating units.

Their final model,

BDNA, which is the most common form in our cells,

immediately offered these remarkable functional insights.

Let's break down the key features of that structure.

Okay, feature one.

It's made of two helical polynucleotide strands.

They coil around a common axis with a right -handed screw sense.

So like a standard screw, it turns to the right as it goes up.

Exactly.

And critically, the two strands are anti -parallel.

They run in opposite directions.

If one strand is read five prime to three prime, its partner is read three prime to five prime.

Why is that anti -parallel orientation so vital?

Is it just a structural thing, or does it actually help with the function of reading and copying molecules?

It's absolutely crucial for function, especially for replication.

Because the strands run in opposite directions, it means the templates are read correctly, and the complementary bases can align perfectly in a way that just wouldn't be possible if they ran parallel.

And the enzymes that do the copying?

Well, the molecular machinery, the polymerases, they're highly directional.

They can only work in the five prime to three prime direction on the template.

So the anti -parallel nature ensures both strands can be copied at the same time using that fixed enzyme directionality.

Okay.

Feature two covers the internal arrangement.

The structure is, you know, a perfect spiral staircase.

The polar charged sugar phosphate backbones are on the outside exposed to the water in the cell.

And conversely, the purine and pyrimidine bases lie flat on the inside of the helix, like the steps of that staircase.

Feature three is all about the dimensions.

Let's get specific here.

Right.

The bases are stacked almost perfectly on top of each other.

They sit nearly perpendicular to the central helix axis, tilted by only about one degree.

Adjacent bases are separated by 3 .4 angstroms.

So 0 .34 nanometers.

Yes.

And the whole structure repeats every 34 angstroms, which means there are about 10 .4 bases per full turn of the helix.

That means each base pair requires about a 36 degree rotation to get to the next one.

And feature four addresses the uniformity of the diameter.

The helix is consistently 20 angstroms wide.

This seems a little counterintuitive since purines A and G are much larger than pyrimidines T and C.

How does the helix maintain that perfectly uniform width?

That is the genius of complementary base pairing.

The geometry only permits two specific pairings.

Guanane G always pairs with cytosine C.

And adenine A always pairs with thymine T.

A big one with a small one.

A purine, the large one, always pairs with a pyrimidine, the small one.

This specific combination and shows that every base pair, whether it's GC or AT, occupies the exact same physical space, maintaining that perfectly regular 20 angstrom diameter.

And this base pairing is the physical realization of Chargaff's rules, which were established years before the structure was even solved.

Exactly right.

Erwin Chargaff's observation from 1950 that the ratios of A to T and G to C are universally close to 1 .00.

Whether you look at human DNA or salmon DNA was a massive clue.

The double helix proved that this was a physical necessity for the regular structure.

So we have the covalent -fuddister bonds holding the individual strands together.

But what stabilizes the two complementary strands against each other?

There are three main forces and they are all non -covalent, which is important because it makes the strands easy to separate when you need to.

Okay, what's the first one?

First, the hydrogen bonds.

GC pairs form three hydrogen bonds and AT pairs form two.

Individually, these bonds are pretty weak, maybe 4 to 21 kilojoules per mole, but their cumulative effect across millions and millions of base pairs makes the helix extremely stable.

And the other two forces relate to the environment and the bases themselves.

The second force is stacking forces, which result from van der Waals interactions.

Because the non -polar base rings are stacked so precisely one on top of the other inside the

But the net effect is substantial and provides a major portion of the helix's stability.

And the third force, the hydrophobic effect, is the same stabilizing principle we see in protein folding.

Yes, exactly.

The hydrophobic non -polar bases cluster together in the interior, shielding themselves from the surrounding water.

The polar charged sugar phosphate backbones are then left exposed to the aqueous environment, which is highly favorable and that helps drive the formation of the double helix structure.

We've talked about BDNA as the standard, but this is where it gets really interesting.

DNA is not rigid.

It's dynamic and can adopt different forms.

Let's talk about ADNA.

ADNA is usually only seen when you dehydrate DNA fibers in a lab, but it is critically important because it is the structure that's naturally adopted by double -stranded RNA and, significantly,

RNA -DNA hybrids.

What's the physical difference between BDNA and ADNA?

ADNA is shorter, it's wider, and its base pairs are steeply tilted, about 19 degrees relative to the axis, compared to BDNA's almost perpendicular alignment.

And the major reason for this difference lies in something called the sugar pucker.

Sugar pucker.

It sounds a bit like a biochemical term for folding, but tell us what that actually means in structural terms.

Okay, so think of the ribose sugar ring.

It's not perfectly flat.

One of the carbon atoms is always pushed slightly out of the plane defined by the others.

In physiological BDNA, the sugar is in what we call the C2 -prime endo -confirmation.

The C2 -prime atom is out of the plane.

But in ADNA, the sugar is in the C3 -prime endo -confirmation.

And why does that C3 -prime endo -pucker dictate the ADNA structure for RNA and RNA -DNA hybrids?

It's purely a matter of physical necessity.

It's a consequence of that 2 -prime hydroxyl group we talked about earlier.

In RNA, that 2 -prime OH group creates a substantial steric hindrance, molecular crowding.

The only way the RNA molecule can physically avoid crowding between that 2 -prime oxygen atom and the adjacent phosphoryl group is by adopting the C3 -prime endo -sugar pucker, which in turn forces the entire helix into that wider, tilted A -form configuration.

It's fascinating that one atom dictates the stability, and its position then dictates the entire geometry.

And then we have the black sheet,

ZDNA.

ZDNA is a really intriguing structural anomaly.

It's a left -handed double helix compared to the standard right -handed A and B forms.

Its name comes from the characteristic zigzag path of the phosphoryl groups in its backbone.

Well, its function is still a major area of research, but it's not just a curiosity.

We know ZDNA -binding proteins exist, and one has been isolated that's involved in the pathogenesis of poxviruses.

It suggests that certain regions of the genome might temporarily switch to the Z -form for regulatory purposes.

So it underscores the DNA is a flexible, highly dynamic molecule.

It's not a static textbook drawing.

Not at all.

And that flexibility is essential because of that packaging problem we discussed.

In many organisms, the DNA isn't even linear.

Correct.

In bacteria and archaea, the DNA molecule is usually circular.

And since the ends are closed, the axis of the double helix itself can twist into a superhelix, a process we call supercoiling.

I often think of supercoiling like taking an old phone cord and twisting it until it starts to coil on itself.

What's the biological importance of that stress?

It's vital for two reasons.

First, compaction.

The E.

coli chromosome is about a millimeter long.

That's a thousand times the length of the cell itself.

So it must be compacted.

And supercoiling helps achieve that.

And the second reason?

Second, it affects the DNA's chemistry.

The torsional stress in the superhelix affects the ability of the double helix to unwind locally.

This winding and unwinding directly impacts its interaction with regulatory molecules like polymerases.

It basically controls access to the genetic information.

Finally, before we leave structure, we have to address single -stranded nucleic acids, specifically RNA.

They don't typically form that giant double helix.

No, but RNA is the master of three -dimensional folding.

Single -stranded nucleic acids can fold back on themselves to create these highly complex, well -defined 3D structures.

The simplest and most common motif is the stem loop, or hairpin, which forms when complementary sequences within the single strand find each other and base pair.

But unlike DNA, which relies pretty much purely on Watson -Crick pairing, RNA structures can be much more creative.

Oh, exactly.

They can include non -standard folding elements like mismatched base pairs, bulged bases, or really complex interactions where three or more bases interact using non -Watson -Crick hydrogen bonds.

And these elaborate structures.

They're often stabilized by metal ions like magnesium.

The key insight here is that this capacity for complex folding allows RNA to perform sophisticated functions that we once thought were exclusive to proteins, including catalysis.

We've established the structure of the DNA archive.

Now we move to section three, how that structure facilitates the transmission of information through replication.

The complementary nature of the double helix immediately suggested the perfect copying mechanism.

It did.

It suggested the semi -conservative hypothesis.

The core idea is simple,

but really elegant.

The two parent strands separate, and because the sequence of one dictates the sequence of the other, A with T, G with C, each parent strand serves as a flawless template for synthesizing a new complementary daughter strand.

And the result.

Each new DNA molecule is half old, half new.

That's semi -conservative.

But how do you prove that this specific model, and not a conservative or a dispersive model, is the right one?

You need a clever way to physically distinguish the parent DNA from the newly synthesized DNA.

And this was the brilliant task undertaken by Matthew Meselson and Franklin Stahl in 1958 in what is probably the most elegant experiment in all of molecular biology.

Walk us through the methodology of the Meselson -Stahl experiment.

Okay.

They needed to make the parent DNA heavy.

So they grew E.

coli for many generations in a growth medium that contained heavy nitrogen, the stable isotope N15.

The bacteria incorporated this N15 into all their bases, making the parental DNA measurably denser than normal DNA.

So all the starting DNA is heavy.

Then what?

Then they suddenly transferred the bacteria to a medium containing only ordinary lighter nitrogen, N14, and they just let them grow for one or two rounds of division.

And how do they analyze the resulting DNA mixture?

They use a technique called density gradient equilibrium sedimentation.

They mixed the extracted DNA with a concentrated cesium chloride solution and spun it in a centrifuge at extremely high speeds for days.

What did that do?

The cesium chloride forms a density gradient, and the DNA molecules migrate until they reach the exact point where the solution density matches their own.

They appear as narrow, distinct bands under UV light.

Okay, so what happened after one generation in that light N14 medium?

After one generation, all the DNA molecules should be hybrids.

One heavy N15 parent strand and one newly synthesized light N14 strand.

And that's what they saw.

That's exactly what they saw.

A single band located precisely halfway between where purely heavy DNA and purely light DNA would band.

This result was incredibly powerful because it immediately ruled out conservative replication, which would have left the original heavy DNA intact and created a separate new light DNA band.

And what happened after two generations?

The results became definitive.

They saw two distinct bands in roughly equal amounts.

One was that hybrid band, N14, N15, and the other was a purely light band, N14, N14.

This result ruled out dispersive replication, which would have created intermediate density strands across all generations.

And it perfectly confirmed the semi -conservative model that Watson and Crick had proposed.

It's an incredible example of using density isotopes to track molecular fate.

But for this replication and also for transcription to happen, those two tightly -wound parent strands need to physically separate, at least locally.

They do.

And that separation process is called melting or denaturation.

In the lab, in vitro, we can achieve this just by heating the DNA solution.

The hydrogen bonds and stacking forces weaken, and the strands just come apart.

And we can monitor this process very precisely, right?

Yes, using UV light at 260 nanometers through an effect called hyprochromism.

Stacked bases absorb less UV light than bases that are unstacked and free.

So as the double helix melts?

As it melts and the bases become unstacked, the absorption at 260 nanometer drastically increases.

The temperature at which half the helical structure is lost is called the melting temperature, or T -mim.

And the beauty of this system is that it's completely reversible.

It is.

If you cool the solution, the complementary strands can spontaneously reassociate, or anneal, which is also called renaturation.

And that has practical applications.

Oh, huge.

This ability to melt and reanneal is the foundation of powerful lab techniques like hybridization experiments.

They allow us to locate specific genes corresponding to specific RNA molecules, or even measure the genetic similarity between two different species by seeing how well their DNA strands can hybridize.

So if that's how we do it in the lab, how does the cell achieve strand separation without, you know, cooking itself?

In the living cell, in vivo, separation is managed by proteins called helicases.

These enzymes act like molecular zippers.

They use the chemical energy from ATP hydrolysis to physically unwind and disrupt the hydrogen bonds of the helix at specific points, generating the replication forks needed for synthesis.

That sets us up perfectly for section four, the molecular engines of information transfer,

the polymerases.

Let's start with DNA polymerase, the synthesizer, which was first isolated by Arthur Kornberg in 1958.

Right.

DNA polymerases catalyze the step -by -step addition of deoxyribonucleotide units onto a growing DNA chain.

The core reaction is simply taking an existing chain, DNA -N, and adding a deoxyribonucleoside triphosphate, a DNTP, to create a longer chain, DNA -N plus one, releasing pyrophosphate in the process.

What are the absolute requirements for this synthesis to happen?

First, it needs all four of the activated precursors, DATP, DGTP, DCTP, and TTP plus a magnesium ion.

Second, and this is crucial, it is strictly template -directed.

Meaning?

Meaning the phosphodaster linkage only occurs efficiently if the incoming DNTP is perfectly complementary to the base on the template strand.

This is what ensures the high chemical fidelity of the copy.

And the third requirement is unique to DNA polymerase compared to its RNA counterpart.

Yes, it requires a primer.

DNA polymerase cannot initiate some stenovo from scratch.

It absolutely has to have a pre -existing primer strand, a short piece of nucleic acid that is already base -paired to the template and provides a free 3' OH group to start adding onto.

Let's detail the mechanism of elongation.

We know it only proceeds in one direction.

Synthesis is rigorously 5' to 3'.

Mechanistically, the free 3' OH terminus of that growing strand acts as a nucleophile.

It launches an attack on the innermost phosphorus atom of the incoming DNTP.

And that forms the new bond.

That forms the new phosphodaster bridge and releases pyrophosphate, PPI.

The subsequent enzymatic hydrolysis of that pyrophosphate into two orthophosphate ions provides the thermodynamic energy to make the polymerization reaction highly favorable and essentially irreversible.

Given that the cell has to copy billions of bases perfectly every single time it divides, what specialized mechanism ensures that required extraordinarily high fidelity?

That's the fourth key characteristic, proofreading.

Many DNA polymerases have a distinct nucleus activity, usually a 3' to 5' exonuclease activity.

So it can go backwards.

It can.

If a mismatched nucleotide is incorporated by mistake, the polymerase stalls, reverses direction, and it excises the incorrect base using this nucleus activity.

Then it resumes synthesis.

This meticulous check and repair process contributes to an astonishingly low error rate, less than one mistake per billion base pairs.

That is robust error correction.

We focus on cellular genomes, which are DNA, but what happens when the genetic material is RNA, like in some viruses?

The whole flow of information changes.

Take the tobacco mosaic virus.

It has a single -stranded RNA genome.

This replication is mediated by an RNA -directed RNA polymerase.

The virus basically hijacks the host cell's machinery to crank out new RNA copies, which often induces cell death in the process.

And then there are the famous retroviruses, like HIV -1, which famously defy the traditional central dogma by going backward.

This is a spectacular twist on the dogma.

Retroviruses contain two copies of their single -stranded RNA genome.

When they enter a host cell, they deploy this vital viral enzyme called reverse transcriptase.

What makes reverse transcriptase such a central player in the infection cycle?

Reverse transcriptase is a chemical powerhouse.

It acts as both a polymerase and an RNAase.

First, it uses the viral RNA as a template to synthesize a complementary DNA strand.

Then, its RNAase activity degrades the original RNA template.

And finally, it copies that newly formed DNA strand to create a double -stranded DNA molecule.

This finished viral DNA can then integrate into the host chromosome, becoming part of the host's own genetic material and replicating alongside it, ensuring long -term persistence and expression of new viral particles.

The function of that viral DNA, or any DNA, is to produce functional molecules.

Section 5 covers gene expression, the two -step process of converting the DNA blueprint into proteins.

Right.

DNA is the archival storage optimized for stability.

Gene expression is the conversion process.

Transcription, so DNA to an RNA copy, or mRNA, and then translation, mRNA to protein.

And RNA molecules were once seen as just passive transient things, but we now know they are absolute molecular multitaskers.

They are critical.

RNA molecules perform roles from carrying genetic information to highly specific regulation and even catalysis.

We divide the major RNA species into three categories based on their function, using E.

coli as the standard model.

Let's start with the most abundant, ribosomal RNA.

RNA makes up about 80 % of total cellular RNA.

It forms the backbone of the ribosome, which is the cellular factory for protein synthesis.

And although it was once considered purely structural, the key insight now is that the rRNA component, not the protein components, is the actual catalyst for forming the peptide bonds during protein synthesis.

So the RNA is the enzyme.

The RNA is the enzyme.

In prokaryotes, you have the 23S, 16S, and 5S species.

Okay, next up, the adapter molecule.

Transfer RNA, tRNA.

This accounts for about 15 % of total RNA.

tRNA molecules are the chemical linkers.

They carry activated amino acids to the ribosome, matching them up to the mRNA template.

They're small, typically around 75 nucleotides long, and every single one of the 20 amino acids has at least one corresponding kind of tRNA.

And finally, the blueprint itself, messenger RNA, mRNA.

Right, mRNA is the template for translation.

It's the least abundant, only about 5 % of total RNA.

A distinct mRNA molecule is generated for each gene or group of genes.

In prokaryotes, they average about 1 .2 kilobases in length.

All three types of RNA are synthesized by the enzyme RNA polymerase during transcription.

What are the reaction requirements for this enzyme?

Well, RNA polymerase catalyzes the reaction where a ribonucleoside triphosphate, an NTP, is added to a growing RNA chain.

The requirements are pretty similar to DNA polymerase.

You need a double -stranded DNA template, all four ribonucleoside triphosphates, ATP, GTP, UTP, and CTP, and a divalent metal ion, usually magnesium or manganese.

And the direction and chemistry remain the same as DNA synthesis?

Yep.

Synthesis proceeds strictly in the 5' to 3' direction.

The elongation mechanism is identical.

That 3' OH nucleophilic attack on the innermost phosphorus atom of the incoming NTP with subsequent pyrophosphate hydrolysis driving the reaction forward.

But the key difference from DNA polymerase is the initiation.

That's the big one.

RNA polymerase does not require a primer.

It can start synthesis de novo just by binding to the template and initiating the chain.

Also, RNA polymerase has a less extensive mistake correction ability than DNA polymerase, which is acceptable because RNA is transient.

Errors are less critical than in the permanent DNA archive.

And in E.

coli?

In E.

coli, a single RNA polymerase enzyme synthesizes all three types of RNA.

When RNA polymerase is synthesizing the transcript from the DNA template, it has to distinguish between the two strands of the double helix.

It does.

One strand is the template strand.

This is the strand used to guide RNA synthesis.

So the RNA transcript sequence is complementary to it.

G on the template gives you C and the RNA gives you U.

And the other strand?

The other strand is the coding strand, which is the non -template strand.

It's called the coding strand because its sequence is identical to the RNA transcript, except that thymine T is replaced by uracil -like.

The polymerase has to know exactly where to begin the gene.

It can't just start randomly.

No, it binds to highly specific base sequences known as promoter sites.

In prokaryotes, we see two primary upstream consensus sequences on the five prime side of the transcription start site.

There's the PribNal box with consensus of TATAT centered at minus 10, and the 9 over 35 region with the consensus of TTGACA at minus 35.

Why are those specific sequences like TATAT so important chemically speaking?

Well, they're AT rich.

AT -based pairs are held together by only two hydrogen bonds compared to the three bonds in GC pairs.

And they're weaker?

They're mechanically weaker and therefore easier for the RNA polymerase to pry apart locally, which is essential for initiating the unwinding of the helix needed for transcription to begin.

The structure dictates the binding preference.

And in eukaryotes, the promoter sites are more complex.

Much more complex.

You still have a core promoter, the TATA box, with a TATA -A consensus centered at minus 25, but you often also see regulatory elements like the SAT -A box around minus 75.

And on top of that, eukaryotic transcription can be regulated by enhancer sequences, which can be thousands of base pairs away from the gene itself, yet still stimulate transcription.

Once transcription starts, the polymerase needs to know where to stop.

What signals termination in prokaryotes?

Terminator sites.

These are specific sequences in the DNA that cause the RNA polymerase to dissociate.

The sequence often leads to the formation of a G and C -rich, self -complementary structure in the brand new RNA.

Her hairpin.

It folds into a stable hairpin.

This hairpin structure, frequently followed by a sequence of U -residues, physically causes the RNA molecule to release from the DNA template.

Sometimes termination also requires the help of a specific protein factor called a row.

In eukaryotes, the initial RNA transcript to the pre -mRNA is immediately modified before it can leave the nucleus.

Yes, these post -transcriptional modifications are vital for stability and recognition.

First, a unique structure called the 5' cap is added.

This is an unusual guanosine nucleotide attached via a rare 5' triphosphate linkage.

And that's important for translation.

It's necessary for recognition during translation initiation.

And second, a sequence of adenylate residues, the polyA tail, is added to the 3' end.

What's the function of that polyA tail?

It greatly enhances the stability of the mRNA molecule, and it's also involved in translation initiation and export from the nucleus.

The longer the polyA tail, generally, the longer the mRNA survives in the cytoplasm before it gets degraded.

We have to revisit tRNA -CRIC's predicted adapter molecule.

How does it physically manage to link a chemical, an amino acid, to an information molecule, the mRNA template?

tRNA is a perfect double agent.

It has two functional ends.

One end is the amino acid attachment site, located at the terminal adenylate of the CCA arm, at the 3' end of the molecule.

That's where the specific amino acid is attached.

The other end is the template recognition site, a three -base sequence called the anticodon.

And how is the amino acid actually joined to the tRNA?

This activation step is catalyzed by a highly specific enzyme called an aminoacyl tRNA synthetase.

There's typically at least one unique synthetase for each amino acid.

And this reaction, which forms the aminoacyl tRNA, is driven forward by the cleavage of ATP, linking the energy currency directly to the information transfer process.

Now we move to section 6, the final translation.

This is the genetic code, the set of rules that release the sequence of bases in the RNA transcript to the sequence of amino acids in the functional protein.

Yes, and the structure of the code was worked out by Nierenberg, Karana, CRIC, and Brenner.

By 1961, they had established five non -negotiable characteristics of the code.

Number one, it is a triplet code.

Why three nucleotides, a codon, for one amino acid?

It's pure mathematics.

Since you only have four bases, A, U, G, C, but you need to include 20 different amino acids, a doublet code, four squared, only gives you 16 combinations.

That's not enough.

A triplet code, four cubed, gives you 64 combinations, which is the smallest possible unit that provides enough variability.

Number two, the code is not overlapping.

Right.

Meaning that the reading frame dictates that bases are read in sequential groups of three.

If the sequence is ABCDEF, ABC specifies the first amino acid and DEF specifies the second.

They don't share bases.

It's not BCD.

Exactly, not BCD.

Number three, there is no punctuation.

It's read continuously, sequentially, from a fixed starting point without any intervening bases or commas to signal separation.

Number four, it is nearly universal.

This is a truly profound observation.

The code is essentially the same in all organisms studied, from bacteria to humans.

This universality is why we can successfully clone and express human genes, like the gene for insulin, inside a bacterial host cell.

And number five, it is degenerate.

With 64 triplets, but only 20 amino acids to encode, most amino acids are specified by more than one codon.

61 of the 64 possible triplets specify amino acids, and the remaining three serve as stop signals.

Let's analyze the structure of this redundancy.

Only two amino acids, tryptophan and methionine, have a single triplet.

But others, like leucine, arginine, and serine, are highly degenerate, specified by six codons each.

Codons that specify the same amino acid are called synonyms.

For example, CaU and Cica are both synonyms for histidine.

And if you look at the genetic code table, a clear pattern emerges.

Synonyms often differ only in the third base of the triplet.

The wobble position.

That's right.

XYC and XYU almost always encode the same amino acid, and XYG and XYA usually do as well.

What is the immense biological benefit of this degeneracy?

Why did evolution select for a code that has so much redundancy?

It's an internal system of molecular error correction.

It maximizes genetic robustness.

If the code were perfectly non -degenerate, a single nucleotide change, a point mutation, would lead to a catastrophic change in 44 of the 64 possible outcomes, often resulting in immediate chain termination.

But with degeneracy.

Because the code is degenerate, many single nucleotide changes result either in a synonymous codon, which means no change to the protein sequence, or they change the amino acid to one with very similar chemical properties, a conservative mutation.

It buffers the genome against the harmful effects of random mutations.

Translation occurs on ribosomes.

How does the system ensure the ribosome starts reading the non -overlapping triplet code at precisely the right location on the mRNA?

It requires a highly specific initiation signal.

In prokaryotes, polypeptide synthesis begins with formal methionine, or FMET, which is carried by an initiator tRNA that recognizes the start codon AUG.

But AUG also codes for internal methionine residues.

It does.

So the key to initiation is the pure and rich Shine -Dalgarno sequence.

Tell us more about the Shine -Dalgarno sequence and its interaction.

It's typically located several nucleotides upstream of the initiating AUG codon.

This sequence is rich in purines, and it base pairs directly with a complementary sequence located within the ribosomal RNA molecule itself, specifically the 16S rRNA component.

So it's an RNA -RNA interaction.

Yes.

This base pairing interaction precisely locks the ribosome onto the correct starting AUG, which then establishes the crucial reading frame, the correct grouping of three non -overlapping nucleotides for the rest of the chain.

So it's the rRNA component of the ribosome acting as the final checkpoint for translation initiation.

In eukaryotes, is the initiation signal simpler?

Generally, yes.

Eukaryotes don't rely on the Shine -Dalgarno sequence.

They use the 5' cap on the mRNA to recruit the ribosome, which then scans the mRNA until it finds the AUG closest to the 5' end.

That usually acts as the start signal.

Once the first AUG is located, the reading frame is fixed.

And for termination, the stop codons.

The three stop codons are UAA, UAG, and UGA.

They're unique because they are not recognized by any tRNA molecule.

Instead, they're recognized by specific proteins called release factors.

When a release factor binds to a stop codon in the ribosome, it triggers the release of the newly synthesized polypeptide chain, ending synthesis.

We need to acknowledge the caveat that the genetic code is only nearly universal.

Where are the exceptions found?

They're rare but important.

For example, some ciliated protozoa read UAA and UAG as amino acids, instead of stop codons, leaving UGA as their only termination signal.

But the most common and significant exceptions occur in mitochondria of various species, including our own.

Why do mitochondria have variations?

Mitochondrial DNA encodes its own distinct, smaller set of transfer RNAs that recognize alternative codons.

So for example, in human mitochondria, the codon UGA, which means stop in the universal code, actually codes for cryptofan.

AUAA codes for methionine instead of isoleucine, and AGA codes for stop instead of arginine.

So these are localized evolutionary adaptations.

Exactly, and the genetic machinery.

Our final section, section 7, deals with the discovery made in 1977 that completely overturned our understanding of eukaryotic genes.

The fact that they are structured as mosaics of introns and exons.

For decades it was just assumed that genes in higher organisms were continuous stretches of coding information, just like in bacteria.

What was the experimental evidence that shattered that assumption?

The key evidence came from electron microscopic studies.

Researchers created hybrids between mature mRNA and the corresponding genomic DNA.

When they viewed these hybrids under the electron microscope, they saw segments of the genomic DNA that did not pair with the mRNA.

So they just looped out.

They looped out from the hybrid structure, indicating they were present in the DNA, but completely absent in the mature messenger RNA.

Let's nail the terminology for these two types of sequences.

The non -coding segments, the ones that intervene within the gene sequence, are called introns, for intervening sequences.

The coding sequences,

the segments that are actually expressed and appear in the mature mRNA, are called exons, for expressed sequences.

And the scale of this mosaic structure is vast, even for relatively simple genes.

Oh yeah.

Consider the beta -globin gene, which encodes a component of hemoglobin.

It's split into three coding sequences, three exons, by two large introns, one spanning 550 base pairs and the other 120.

In the human genome, the average gene contains eight introns, and some complex genes can have over 100.

So the initial transcription product must be huge.

It is.

The newly synthesized RNA is called the primary transcript, or pre -mRNA, and this transcript is much larger than the final mature mRNA, because it contains both the introns and the exons.

For that beta -globin gene, the primary transcript is 1600 nucleotides long, but the mature mRNA that goes to the ribosome is only 900 nucleotides.

The complex operation to cut out those introns and stitch the exons back together is called splicing.

How does the cell ensure this is carried out with absolute precision?

Splicing is an immensely complex and precise operation.

It's carried out by these large molecular machines called spliceosomes, which are assemblies of proteins and several small RNA molecules known as esRNAs for small nuclear RNAs.

And here again, the RNA is doing the work.

The vital insight here is that the RNA components within the spliceosome, the sesonRNAs, are actually the molecules that perform the catalytic steps required for intron removal.

And how does this machinery know where to cut?

Are there specific recognition signals?

Yes, and they have to be precise.

Introns must be cut out with zero error.

Otherwise, the reading frame of the subsequent exons would be completely destroyed.

Introns almost always begin with the consensus sequence GU and end with the consensus sequence AG.

These splice sites are also preceded by a pyrimidine -rich tract, which serves as a recognition signal for the spliceosome to bind and begin the excision process.

The streamlined prokaryotes lost these introns.

Why do they remain in eukaryotes?

Are they just junk or do they confer some kind of advantage?

It's hypothesized that introns were present in the earliest ancestral genes and were likely lost in fast -growing organisms like bacteria to create a more efficient streamlined genome.

But their retention in higher organisms provides two immense evolutionary advantages.

Let's start with the first one, exon shuffling.

Right.

Exons often encode discrete structural and functional units of proteins, what we call molecular domains.

Because the introns allow the gene structure to be easily broken up, these exons can be shuffled or rearranged during evolution, mixing and matching domains from different genes.

So you can build new proteins from old parts.

Exactly.

It allows new proteins with novel combinations of functional units to arise rapidly and efficiently,

speeding up the pace of evolution.

Do we have a good example of a protein that arose through this exon shuffling?

Yes.

The tissue plasminogen activator or TPA gene is a classic example.

TPA is an enzyme involved in dissolving blood clots.

An analysis shows its gene structure resulted from fusing exons that encode domains borrowed from other existing proteins.

Like what?

An F domain from fibronectin, an EGP domain from epidermal growth factor, and a K domain from plasminogen.

Evolution didn't have to invent these domains from scratch.

It simply rearranged existing proven functional modules.

That's a powerful evolutionary mechanism.

And the second major advantage is alternative splicing.

Alternative splicing vastly increases the coding capacity of the genome.

A single primary transcript can be spliced in different ways, resulting in a series of related but functionally distinct proteins from one gene.

Can you give us an example of how one transcript can yield two completely different protein functions?

The regulation of antibody expression is a perfect case study.

A precursor B cell uses one splicing pattern for its primary antibody transcript.

This splicing includes an exon that encodes a hydrophobic membrane anchoring domain, which results in an antibody that stays attached to the cell surface.

Okay, so it's a receptor.

It acts as a receptor.

But when that B cell is activated, it switches to an alternative splicing pattern that excludes that specific membrane anchoring exon.

This new mRNA transcript now codes for a soluble antibody that is secreted into the bloodstream rather than being bound to the cell.

One gene, two radically different functional outcomes, based purely on how the cell splices the RNA.

That was an incredible journey.

Moving from the singular building block all the way to the complex modularity of eukaryotic genes, let's quickly consolidate the key molecular takeaways from this foundational deep dive.

First, the importance of structure.

The absence of the two -prime OH group optimizes DNA for stable, long -term archival storage, reinforced by its negative charge, which dictates packaging with histones.

The double helix, stabilized by all those non -covalent forces, is the perfect template.

Second, the fidelity mechanisms.

The double helix structure enables that elegant, precise, semi -conservative replication, proven definitively by Meselson and Stahl.

We also learned that the polymerases, while sharing the five -prime to three -prime synthesis chemistry, are differentiated by their need for a primer, in the case of DNA polymerase, and their dedication to proofreading, ensuring maximum fidelity for the permanent genome.

Third, the versatility of RNA and the code.

Gene expression relies on highly specialized RNA molecules -RNA as the key peptide bond catalyst, and tRNA as that crucial chemical -to -information adapter.

The genetic code itself is highly optimized through degeneracy, providing a robust molecular defense against common mutations.

And finally, the efficiency of higher life.

The mosaic structure of eukaryotic genes, with their introns and exons, provides the evolutionary flexibility necessary for complex life, via rapid exon shuffling, and the capacity for a single gene to encode multiple protein variants through alternative splicing.

And you know, we noted earlier that the ribosomal RNA, the rRNA, is the actual catalyst for protein synthesis, and that all RNA, unlike DNA, is capable of complex folding and structure formation.

So given that RNA molecules are capable of complex structure, information storage, as we see in viral genomes, and key catalytic roles in the ribosome, it raises an important question for you, the listener, to ponder.

Do the specialized roles of DNA for archival storage and proteins for efficient catalysis represent the original state of life?

Or did life perhaps start in an RNA world, where RNA molecules alone performed both the information storage and the catalytic functions before these more specialized molecules evolved?

A truly profound thought to mull over how chemistry and structure dictate, not just function, but perhaps the origin and complexity of life itself.

Thank you for joining us on this deep dive into the foundations of molecular biochemistry.

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

Chapter SummaryWhat this audio overview covers

Nucleic acids serve as the molecular repositories and messengers of genetic information, with their structure fundamentally determining their biological roles. DNA and RNA are polymeric chains constructed from nucleotides, each unit containing a nitrogenous base, a five-carbon sugar, and a phosphate group linked through phosphodiester bonds to create a directional backbone with distinct 5' and 3' ends. The iconic double helix configuration of DNA emerges from two antiparallel strands held together by hydrogen bonding between complementary base pairs and reinforced by van der Waals stacking interactions between adjacent bases. Beyond the standard B-form DNA, alternative conformations including A-DNA and the left-handed Z-DNA exist under different conditions, while single-stranded nucleic acids fold into functional stem-loop and hairpin structures. Prokaryotic chromosomes exhibit topological supercoiling to compact their genetic material, a property revealed through density gradient centrifugation and other analytical techniques. The perpetuation of genetic information occurs through semiconservative replication, where each strand of the parental DNA serves as a template for a new complementary strand, a mechanism confirmed elegantly by the Meselson-Stahl experiment using isotopic labeling. DNA polymerases catalyze phosphodiester bond formation in the 5' to 3' direction and require a primer to initiate synthesis, while reverse transcriptase, an enzyme present in certain viruses, performs the reverse reaction by synthesizing DNA from an RNA template. The expression of genetic information follows the central dogma, initiated by RNA polymerase binding to promoter regions containing conserved sequences such as the Pribnow box in prokaryotes or the TATA box in eukaryotes, resulting in the synthesis of messenger RNA, transfer RNA, and ribosomal RNA. The genetic code, read as consecutive triplet codons, exhibits degeneracy where multiple codons specify the same amino acid, ensuring robustness against mutations. Eukaryotic genes possess a segmented architecture with coding exons interspersed among non-coding introns, necessitating RNA processing through spliceosomes that remove introns and join exons. This modular organization enables alternative splicing pathways and exon shuffling during evolution, generating proteomic diversity from a limited number of genes.

Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.

Support LML ♥

Chapter 4: DNA, RNA & Flow of Genetic Information

Related Chapters