Chapter 29: RNA Synthesis & Processing

Search this chapter

Audio Overview

0:00 / 0:00

Autoplay next chapter

Welcome to Last Minute Lecture.

This free chapter overview is designed to help students review and understand key concepts.

These summaries supplement not replaced the original textbook and may not be redistributed or resold.

For complete coverage, always consult the official text.

Welcome to the deep dive.

If you think of DNA as, you know, the master blueprint, the thing that's perfectly archived and kept stable, then RNA has to be the dynamic workflow.

It's the set of immediate instructions that really directs the moment -to -moment business of life.

And that's exactly what we're tackling today.

We have a massive stack of research all focused on one core idea.

How does genetic information get from that stable storage to, well, to active expression?

This is the deep dive into RNA synthesis, what we call transcription, and then, and this is so crucial, the just astonishing molecular modifications that happen It's funny, for decades, the focus was so much on DNA replication.

Like, once you have the copy, the hard work is done.

Right, but the real, you know, the structural and regulatory complexity of life, that all begins the moment that first RNA transcript starts to emerge from the polymerase.

Okay, let's unpack this a bit.

We're focusing on the central machinery of expression.

When you look at the human genome, you see we only have about, what, 21 ,000 protein -coding genes?

Yeah, that number always seems surprisingly small to people.

It feels tiny, and it makes you wonder, where does all the complexity come from?

It's the processing.

It's not the count of the genes, it's the processing that holds the secret.

The initial RNA transcript, what we call pre -mRNA,

is often littered with these non -coding segments,

introns.

And those have to be removed with incredible precision.

Through a process called splicing, yeah.

And the revelation that really, I mean, truly changed our view of the genome is that the splicing isn't a fixed one -time event.

It's flexible.

It's incredibly flexible.

There's a process called alternative splicing, and that allows different cells, or even the same cell at a different time, to stitch together different combinations of the coding segments, the exons.

So one gene doesn't equal one protein anymore.

Not at all.

It means one gene can encode multiple distinct mature mRNAs, and from that, a huge repertoire of unique proteins.

Our real genomic potential is kind of hidden in the massive enzyme machines, the RNA polymerases, that physically read the DNA.

We need to understand how they're regulated, especially the profound difference between simple prokaryotes and complex eukaryotes.

And then we'll dedicate a good chunk of time to those structural RNA modifications.

We're talking capping, polyethanolation, the wild flexibility of editing, and the physics of the spliceosome itself.

We'll start with the universal chemistry that's shared across all life, then we'll move to the multilayered regulation in eukaryotes, and we'll finish up with the groundbreaking idea that RNA itself can be an enzyme.

A molecular discovery that literally forced us to rewrite the history of life.

Okay, let's start at the very beginning with the engine of expression, RNA polymerases.

What strikes me immediately is the deep, deep evolutionary connection here.

It's really remarkable.

I mean, you have this massive gulf between a simple bacterium and, say, a human cell, but the core machinery looks fundamentally the same.

It's a fantastic example of what we'd call a shared evolutionary origin.

If you actually look at the three -dimensional structures of RNA polymerases, say, from a simple prokaryote like Thermos aquaticus.

The one from hot springs.

The very same.

And you compare it to a complex eukaryote like yeast, Saccharomyces

cerevisiae.

The structural similarities are just undeniable.

So what are those shared features?

Well, both of them have this massive central cleft where the DNA template is fed in.

And critically, both rely on a central metal ion, usually magnesium, that's located precisely in the active site.

That shared structure, it suggests that this core catalytic function is not only essential, but maybe it's just the most efficient way to get the job done.

It seems so.

The basic chemistry that happens inside that cleft has been conserved for billions of years.

So describe that for us.

What is the fundamental reaction?

The basic reaction is just forming a phosphodiester bond.

So you can imagine the growing RNA chain, it's sitting inside the enzyme.

Right.

The three prime hydroxyl group at the very last nucleotide that was added that performs a nucleophilic attack on the alpha -phosphoryl group of the incoming ribonucleoside triphosphate, the NTP.

So the new NTP is coming in with its own energy baggage, right?

It's got three phosphates.

How does the mRNAs manage that energy so efficiently and make sure the reaction only goes one way?

And that is where the elegant energy management comes in.

So that nucleophilic attack, it cleaves the NTP and it releases a pyrophosphate molecule, PPI.

Which is two phosphates linked together.

Exactly.

Now that initial bond formation is already thermodynamically favorable.

But what truly locks the entire process into forward motion,

what prevents it from just reversing.

What's the lock?

It's the immediate subsequent degradation of that pyrophosphate.

There are enzymes in the cell that just instantly hydrolyze that PPI into two separate orthophosphate molecules.

Two pi.

Ah.

So breaking down that waste product is what provides this huge drop in free energy.

That's what makes the whole thing essentially irreversible.

It's a beautiful system.

The synthesis drives the reaction and then the breakdown of the byproduct makes sure it stays driven forward.

It's so clever.

And you mentioned those magnesium ions are critical for this.

Absolutely critical.

You usually have two of them.

One magnesium ion is held really tightly by the enzyme and it helps to orient that three prime hydroxyl group for the attack.

It lines up the shot.

And the second one comes in with new NTP and helps shield the negative charges on the triphosphate group.

It just helps facilitate that transition state for the bond to form.

So we have the machinery.

Now let's look at the actual stages, the synthesis and elongation.

What are the key operational differences between making an RNA chain versus, say, replicating a DNA chain?

Well, RNA synthesis still has the familiar stages, you know, initiation,

elongation, termination.

But the critical difference is that RNA polymerase can start de novo.

It does need a primer.

No primer required, unlike DNA polymerase, which is completely dependent on a primer.

And how do we know that for sure?

What's the evidence?

We know because the very first base that gets added to an RNA chain, it actually keeps its triphosphate tag.

The five prime end of a brand new RNA transcript is always either PPPG or PPPA.

So that three phosphate tag is like a little I was here first sign.

Exactly.

And it confirms that growth starts at the five prime end and moves to the three prime end, all without needing a separate enzyme to build a primer.

And once this process gets going, the enzyme is committed.

It doesn't just let go.

We call that characteristic processivity.

RNA polymerase is fully progressive.

Once it initiates transcription, a single enzyme molecule is going to synthesize the entire transcript, which could be thousands of nucleotides long.

Start to finish.

Start to finish without falling off the DNA template.

Okay.

To maintain that kind of processivity, it has to constantly manage the DNA.

The sources describe this thing called the transcription bubble.

Can you visualize that for us?

So the polymerase acts kind of like a zipper.

It locally unwinds the DNA double helix to expose the template strand.

This unwound region, the bubble, it spans about 17 base pairs.

And inside that bubble, the new RNA gets made.

Right.

And crucially, the newly synthesized RNA chain forms a temporary, a transient hybrid helix with the DNA template strand.

But this little hybrid is short.

How short?

Typically only about eight base pairs long.

That's just under one full turn of the helix.

So it's held in place just long enough to make sure it's accurate, but not so long that it causes a traffic jam.

Precisely.

As the polymerase chugs along, it has structures inside it that physically force this little RNA -DNA hybrid apart.

The new RNA strand exits through a channel, and the two DNA strands immediately zip back up behind it.

And that forward movement bringing in the next base, that requires translocation.

Right.

After you add a nucleotide, the whole RNA -DNA hybrid complex has to effectively ratchet forward relative to the stationary active site of the enzyme.

Which puts the three prime end in the perfect spot for the next one.

Exactly.

It moves it into what's called a pre -insertion site, ready for the next NTP to come in and attack.

And even though you can get little reversals, the overall powerful free energy change from breaking down that PPI ensures the reaction just plows forward, usually at about 50 nucleotides per second.

50 nucleotides per second sounds fast, but when you compare it to DNA replication,

where polymerases are racing along at like 800 nucleotides a second,

it's quite a bit slower.

It is.

And that brings us to proofreading, which is also, let's say, less rigorous than in DNA replication.

Why is the cell okay with a higher error rate in RNA?

That's a really critical question.

The error rate for RNA polymerase is something like one mistake in every 10 ,000 or 100 ,000 nucleotides.

Which is way higher than for DNA.

Way higher.

But the cell accepts this trade -off for speed and energy conservation.

The fundamental reason is that mistakes in RNA are not inherited.

They're not passed on to progeny.

They are transient errors in a disposable molecular message.

But wait, one mistake in 10 ,000 still seems like it could generate a lot of defective enzymes, especially for really long genes.

You would think so, but that risk is mitigated.

Because for most genes, the cell is making many, many copies of the transcript.

So even if a few defective messenger RNAs get made, the overall supply of functional protein is usually just fine.

So the cell is all about hyperfidelity for the blueprint, the DNA.

But it optimizes for speed and resource allocation for the working copies, the RNA.

That's a perfect way to put it.

So how does RNA polymerase actually correct the errors it does manage to catch?

It uses a mechanism called backtracking.

The polymerase can actually move backward along the DNA -RNA hybrid, which is a reversal of its normal direction.

And what triggers that?

The error itself.

If you incorporate an incorrect nucleotide, a non -Watson -Crick base pair, you create a weak spot in that little hybrid helix.

Because it's less stable, the energetic cost of breaking it is lower, which makes the polymerase much more likely to reverse or backtrack right past the point of the error.

So the mistake itself is the signal for the correction?

Right, precisely.

Once the enzyme is backtracked, that segment with the mismatched base gets repositioned near the active site.

Then the enzyme, sometimes with help from other factors, can perform a hydrolysis reaction.

A water molecule comes in, attacks the phosphodiester bond, and you cleave off a little dinucleotide that includes the error.

Wipes the slate clean.

And clears the site for the next, hopefully correct, NTP to come in.

All right, let's transition now from that general mechanism to the very specific world of prokaryotic initiation, which is all defined by the famous sigma subunit.

Right, so the bacterial transcription engine, the RNA polymerase, really exists in two forms.

You have the core enzyme, which is just the alpha, beta, beta prime, and omega subunits.

And that can make RNA, but it's kind of dumb.

It's dumb, exactly.

It lacks the ability to find specific start sites.

To do that, you need the hollow enzyme.

That's the core enzyme plus the sigma subunit, which is like the essential GPS that guides the polymerase to the right promoter sites on the DNA.

So how does sigma recognize where to land?

What's the address it's looking for?

It recognizes core promoter features.

Specifically, two consensus sequences that are upstream of the transcription start site, the plus one site.

And these are the famous boxes?

These are the famous boxes.

You have the minus 10 sequence, which is TATAT, and the minus 35 sequence, which is TTGACA.

And the strength of a promoter, meaning how often a gene gets transcribed, is directly related to those sequences.

Directly.

It's proportional to how closely those two sequences match the consensus and how optimal the spacing is between them, which is typically about 17 nucleotides.

So if you have a promoter that's a perfect match, you get a strong promoter.

Exactly.

Strong promoters in bacteria can fire off a new transcript as often as every two seconds.

But a weak promoter, which might only initiate once every 10 minutes, will have a bunch of deviations from the consensus or non -optimal spacing.

And a single mutation can kill it.

We know that mutating even a single base in either the minus 10 or minus 35 region can just drastically reduce or even abolish promoter activity completely.

So Sigma lands the polymerase on the promoter.

What's its role in actually starting the synthesis?

Sigma is essential for making the transition from what we call the closed complex, where the DNA is still double -stranded.

To the open complex.

To the open complex, where the DNA is unwound and stabilized inside the polymerase cleft.

Then once the new RNA chain gets to be about 9 or 10 nucleotides long, the Sigma subunit usually undergoes a conformational change and just dissociates from the core enzyme.

And that frees it up.

So Sigma becomes catalytic in a way.

It can go find another core enzyme and start another round of transcription.

It's a very efficient recycling system.

And what's even cooler is that bacteria have evolved specialized Sigma factors to adapt to different environmental cues.

Like a stress response.

A perfect example.

coli uses Sigma 70 as its standard everyday factor.

But if the cell is suddenly hit with stress, like high temperature, it starts making an alternative factor, Sigma 32.

And Sigma 32 recognizes a whole different set of promoters.

A completely different set of minus 10 regions.

And this allows the cell to instantly shift its entire gene expression program over to making the heat shock genes that it needs to survive.

That's an incredibly rapid, built -in regulatory response.

And some bacteria take this to an extreme.

Oh yeah.

You look at something like streptomyces coeli color.

A soil bacterium.

Its environment is constantly changing.

Nutrients vanish, temperatures swing.

Other bacteria compete with it.

It's a tough life.

It is.

And to manage that massive adaptive burden, streptomyces encodes over 60 distinct Sigma factors.

This huge repertoire gives it the fine -grained control it needs to survive in a really dynamic and challenging ecosystem.

Okay.

We've seen how transcription starts.

Now, how does the cell know where the stop sign is in this prokaryotic system?

There are two main ways to terminate.

The first one, and the simplest, is called row -independent termination, or intrinsic termination.

The signal is entirely encoded in the DNA sequence.

So what happens when that sequence gets transcribed?

The resulting RNA transcript immediately folds up on itself into a self -complimentary, very stable, GC -rich hairpin structure.

And the GC richness is key because it makes it super stable.

Free hydrogen bonds, exactly.

It's a very robust stem loop structure.

The formation of this hairpin creates a physical obstacle.

And it causes the RNA polymerase to just pause.

It hits the brakes.

And what follows immediately after that hairpin sequence in the transcript is a run of four or more uracil residues.

The UA pairing, the weakest of all the base pairs.

That's the Achilles heel.

So while the polymerase is paused by the super stable hairpin,

those weak RUDA base pairs in that little RNA -DNA hybrid are just not strong enough to hold the new RNA to the template.

It just peels off.

The RNA spontaneously dissociates.

The DNA zips back up.

And transcription terminates.

Simple as that.

OK.

But the second method needs help.

It relies on an external protein, the Rho protein.

Right.

Rho -dependent termination requires this protein, which is an ATP -dependent RNA -DNA helicase.

So it uses energy.

It does.

Rho binds to specific regions on the new RNA, usually sequences that are rich in cytosine and poor in guanine.

And it uses the energy from ATP hydrolysis to literally chase the RNA polymerase down the transcript.

It's a molecular chase.

It is.

And when the polymerase pauses, maybe because of a tricky sequence or a little secondary structure, Rho catches up.

And then what?

Once it reaches the transcription bubble, its helicase activity goes to work.

It just rips apart the short RNA -DNA hybrid, breaking those hydrogen bonds, which forces termination and releases the RNA transcript.

Now, before we cross over into the eukaryotic world, we have to pause for one of the most stunning regulatory discoveries of all.

The riboswitches.

This is where the RNA transcript itself becomes the sensor.

Riboswitches are a total paradigm shift in how we think about regulation.

They're segments of the nascent mRNA transcript that fold into these complex structures that are capable of binding small metabolites directly.

So they can sense the cellular environment without needing any regulatory proteins?

None at all.

It's an immediate response.

So give us the classic example, the one involving riboflavin.

How does the presence of a metabolite directly flip a switch?

Okay, so you look at the genes for riboflavin biosynthesis in, say, Bacillus subdulus.

The key metabolite that the riboswitch senses is Flavin mononucleotide, FMN.

A product of that pathway.

Right.

So if FMN levels in the cell are high, the metabolite will actually bind to a specific pocket in the folded RNA transcript.

And this binding stabilizes a structure that includes the formation of a terminator hairpin.

Just like the intrinsic terminator we just talked about.

Exactly like it.

So if that terminator hairpin forms, transcription stops prematurely and you don't make the enzymes needed to synthesize more FMN.

It's a perfect negative feedback loop built right into the RNA.

It's an immediate incredibly elegant feedback loop.

If FMN is low, it doesn't bind.

An alternative non -terminator structure forms and transcription goes all the way to the end to make the full length mRNA.

The cell has basically deployed a sophisticated RNA structure as a metabolic rheostat.

Wow.

Okay, now moving from prokaryotes to eukaryotes.

This feels like going from a studio apartment to a sprawling mansion with multiple floors and complex security systems.

That's a great way to put it.

The first and most profound difference is structural.

It's the nuclear membrane.

Right.

Transcription happens in the nucleus.

Translation is exiled to the cytoplasm.

And that spatial and temporal separation is what enables this massive increase in regulatory complexity and all the post -transcriptional processing we're about to talk about.

It's just not possible in bacteria where the two processes are coupled.

And that regulatory complexity is obvious right away in the control elements.

We're moving way past the simple minus 10 and minus 35 boxes.

Oh, yeah.

We're in a new world.

Eukaryotes use multiple types of promoter regions and they also rely on these incredibly far -reaching regulatory sequences called enhancers.

And an enhancer can be thousands of base pairs away from the gene it's controlling.

Thousands.

It can be upstream, downstream, even in the middle of an intron, and it can still exert a powerful influence.

This is the kind of complexity you need for the cell -specific, time -specific expression that a multicellular organism requires.

And that complexity starts with the machinery itself.

We don't just have one RNA polymerase.

We have three.

That's right.

And they're functionally segregated.

You've got RNA pol -1, which lives in the nucleolus, and it just transcribes the large ribosomal RNA precursors.

RNA pol -2 is the real workhorse.

It's in the nucleoplasm, and it transcribes all the messenger RNA precursors and most of the small nuclear RNAs.

And pol -3.

Pol -3 is also in the nucleoplasm, and it focuses on the small stuff, like transfer RNAs and the 5S ribosomal RNA.

And this differentiation is so distinct that we can actually use one of the most famous and deadliest molecules in biology to tell them apart.

Alpha -amenitin.

A toxin from the infamous death cap mushroom, Amanita phalloids.

It's amazing that this single chemical, little cyclic octopeptide, became such a fundamental tool for biochemists.

What does it do?

It binds with incredible affinity, meaning it's super low concentrations, specifically to RNA pol -2.

And it just locks it down, blocking the elongation phase of transcription.

And it basically ignores the other two.

Pol -4 is completely resistant to it, which highlights how structurally unique it is.

Pol -3 is only inhibited at much, much higher concentrations.

This differential sensitivity is what underscores the critical nature of pol -2 for making all our protein -coding genes.

So ingesting that mushroom is lethal because it just shuts down all new mRNA production.

Exactly.

In vital organs like the liver, that's a death sentence.

Now, pol -2 exing also has this unique structural tag that sets it apart.

The carboxyl terminal domain, or CTD.

The CTD is this long, unstructured tail on the largest subunit of pol -2, and it is absolutely crucial for life.

It's made of multiple tandem repeats of the consensus sequence YSPTSPS.

And it acts like a central hub.

It is the central integration hub.

It coordinates the entire transcription and processing lifecycle, and it does that through its phosphorylation state.

We'll come back to that.

Okay, so speaking of initiation, let's look at the DNA sequences that pol -2 uses.

What are the common cis -acting promoter elements?

The most famous one is the TATA box.

It's usually located somewhere between minus 30 and minus 100 base pairs upstream of the start site.

It looks a bit like the prokaryotic minus 10 sequence, but it's much further away.

Much further.

And that's necessary to accommodate the massive initiation complex that has to assemble there.

But the TATA box alone doesn't pinpoint the exact start site, does it?

No.

For that, it often needs the initiator element, or in R, which is right on the start site, from minus three to plus five.

Alternatively, some genes that don't even have a TATA box will use something called the downstream core promoter element, or DPE.

And you also see things like the SEAT box and the GC box.

Right, the GC box is really common in what we call housekeeping genes, the ones that are just always on at a baseline level in most cells.

But here's the absolute key functional difference from prokaryotes.

These sequences are not recognized by the polymerase itself.

That is the critical distinction.

Unlike bacteria, where the sigma factor binds the promoter elements directly in eukaryotes, these cis -acting elements are recognized by this huge army of other proteins called transcription factors.

And they have to assemble first.

They have to assemble first to recruit and activate RNA pull -2.

Okay, let's follow that assembly cascade.

It all starts with the factor that nucleates the entire complex at the TATA box.

That initiating factor is the TATA box binding protein, or TBP.

It's a relatively small protein, but it's part of the much, much larger TFIID complex.

TBP is saddle -shaped, and it binds the TATA box with an extraordinarily tight affinity.

And TBP is famous for what it does to the DNA when it binds.

It doesn't just sit there.

It changes it dramatically.

TBP is not a placeholder.

It's a molecular wedge.

It drives structural changes.

When it binds that eight -to -ridge TATA sequence, it forces the DNA to unwind slightly and bend very sharply.

How does it do that?

The sources describe how four specific phenylalanine residues on the TBP molecule physically slip or intercalate between the base pairs.

They're like little structural stakes being driven into the DNA.

Wow, that sounds pretty violent.

What's the functional point of that dramatic forced bend?

The bend creates an asymmetry.

It specifies a unique unidirectional starting platform.

By widening the minor groove and creating this structural landmark, TBP dictates exactly where the rest of the machinery, all the TFI factors, must assemble.

Ensuring the polymerase only starts in the correct direction.

So you have this whole list of acronyms,

TFIIH, BFEH, the basal transcription apparatus.

Which one is the gatekeeper that finally starts the engine?

That would be TFIIH.

Once the whole complex is assembled, TFIIH gets activated.

And it has two absolutely essential jobs.

First, it has helicase activity to fully open up the DNA double helix around the start site.

One second.

It has kinase activity to phosphorylate the serine residues on the CTD of RNA pool 2.

And that phosphorylation is the final signal.

It is the green light.

The shift in the CTD's phosphorylation state marks the transition from initiation to elongation.

It allows the polymerase to finally leave the promoter, shed most of those transcription factors, and begin its rundown the template.

But even with all of that, the basal transcription complex only gives you a low kind of baseline level of expression.

The core trickles, yeah.

To get those high tissue -specific rates, we need those distant regulatory sequences.

The enhancers.

Enhancers are so fascinating because they just break our normal understanding of proximity control.

These are sequences that can be several thousand base pairs away from a gene's promoter.

And they still work.

And they powerfully stimulate transcription.

They can even operate if you flip them 180 degrees.

It's amazing.

So how does a sequence so far away exert control?

It seems almost like magic until you think about the physical mechanics.

It's all about architecture.

The DNA molecule, even though it's immense, is highly flexible.

The only way an enhancer can interact with a distant promoter is if the DNA in between physically loops out.

Bringing them close together.

Exactly.

It brings the specific regulatory proteins bound to the enhancer into close physical contact with the basal transcription apparatus that's anchored at the promoter.

And those enhancer -bound proteins interact with the factors at the promoter.

They stabilize the complex, and they boost the frequency of initiation.

Right.

And this interaction is highly cell -specific.

For example, an immunoglobulin enhancer is only active in B lymphocytes because only those cells make the specific transcription factors needed to bind that enhancer.

And that's how a single genome can generate such different expression profiles in a liver cell versus a skin cell.

It's all about combinatorial control.

There's a powerful and tragic example of this complex architectural control being broken in Burkitt lymphoma.

It's a very stark reminder of how important this control is.

In Burkitt lymphoma, a specific chromosomal translocation occurs.

This event physically moves the MYHACK proto -oncogene, a gene vital for cell growth.

It moves it from its normal neighborhood.

Exactly.

From its normal regulatory context.

And it places it directly under the influence of an incredibly powerful immunoglobulin enhancer.

So the MYHACK gene suddenly gets the signal, start producing vast amounts of protein immediately,

but in a cell that shouldn't be growing that fast.

Precisely.

The deregulation of MYHACK, now being subjected to this inappropriate high -frequency stimulation, is what drives the rapid and uncontrolled cell proliferation that defines the cancer.

It just shows that the DNA sequences the code.

But the 3D organization is what determines a cell's fate.

That is incredible.

And as a final note here, the similarity of the TDP complex to components found in archaea is just a profound reminder of our molecular past.

It really is.

Archaea are prokaryotic.

They don't have a nucleus.

But their transcription initiation process uses a TATA -binding protein that is structurally very similar to our TDP.

Which suggests this whole complex system didn't just appear out of nowhere.

It strongly suggests that elements of this complex, multi -factor, bending -based transcription control system evolves from an ancestor we share with archaea, even though bacteria went down the much simpler sigma factor route.

Okay.

So the pre -mRNA made by Pol II the sec is just one of many transcripts that need molecular surgery.

Before we get to mRNA, let's briefly look at the Pol and Pol III products, rRNA and tRNA.

They also need extensive processing.

Right.

Pol II produces that large RNA precursor, the Forty -Hovest precursor in mammals, which contains the sequences for the 18S, 5 .8S, and 28S RNAs, all in one long chain.

And you can see this happening under an electron microscope, right?

The famous Christmas tree images.

It's a perfect visual.

The DNA is the trunk and the growing RNA transcripts are the branches, getting progressively longer as the polymerase moves down the DNA trunk.

And on the end of those branches, you see little knobs.

Those terminal knobs, those correspond to the SSU processum, which is this massive ribonuclear protein complex required for the synthesis and modification of the 18S RNA.

So what do those modifications entail?

Before it gets cleaved, the precursor undergoes intense chemical modification, lots of methylation, and the conversion of uridane into suduridane.

And these changes are guided by small nucleolar ribonucleoproteins, or SNORMPs.

Then, after all that, nucleuses come in and precisely cleave the precursor to release the three mature RNAs.

And the tRNA from Pol -Phar also requires some heavy editing.

tRNA maturation is a multi -step process.

Just like in bacteria, the five prime leader sequence is cleaved off by the famous ribozyme, RNAsP, the three prime trailer sequences removed, and then that critical CCA terminus.

The part that holds the amino acid?

The attachment site, exactly.

That's added post -transcriptionally by a template -independent CCA -adding enzyme.

But eukaryotic tRNAs often have an extra hurdle, right?

Splicing.

Yes.

Many eukaryotic tRNAs contain introns that have to be removed.

This needs a specific endonuclease for the cleavage and oligase to stitch the two halves back together.

And on top of that, tRNAs get extensive base and ribose modification.

Sometimes up to a quarter of the bases are modified, converting standard urealites into exotic forms like ribofimidolate.

Which helps with stability and decoding.

It enhances the whole structure and its function, yeah.

Okay.

Now for the main event.

The pull -to product pre -mRNA, which undergoes the most radical makeover.

Let's start with the ends, beginning with a five -crime cap.

This starts almost immediately.

It does.

It's a three -step process designed to protect and tag the transcript.

First, the five -prime triphosphate end of the new RNA is hydrolyzed down to a diphosphate.

Then what?

Second, this diphosphate attacks an incoming GTP molecule.

And this creates a highly unusual chemical linkage.

What makes this five -prime to five -prime triphosphate linkage so unique compared to the standard three to five bonds we see everywhere else?

It means the terminal granicine is linked head to head to the rest of the mRNA chain via three phosphates instead of tail to tail.

And this structural anomaly is resistant to the standard five -prime to three -prime exonucleases that would otherwise just chew up the message.

So it's a protective shield.

And after that reverse linkage, the terminal guanine gets methylated.

That's cap zero.

And sometimes the riboses on the next few bases get methylated too.

Cap one, cap two.

This cap is absolutely essential for stability.

And it's recognized later by the translation machinery.

At the other end, we have the three -prime polyA tail.

And we need to remember, this tail is not in the DNA code.

That's right.

The DNA template just keeps on going past the final coding sequence.

The signal for polyadenylation is a sequence in the RNA itself, usually AAUAA.

And what happens when that signal is recognized?

That signal is recognized by cleavage factors.

It triggers a specific endonucleus to cleave the new transcript about 10 to 30 nucleotides downstream.

This cleavage creates a free three -prime hydroxyl group.

And then the tail gets added.

Then polyA polymerase takes over.

This enzyme uses ATP as a donor.

And without needing any template, it just adds a run of approximately 250 adenylate residues.

What's the main function of this long tail of A's?

Is it just about getting it out of the nucleus?

That's part of it.

But its main roles are enhancing translation and acting as a kind of molecular timer for stability.

A long polyA tail strongly enhances how efficiently a protein gets made.

And the timer part.

Crucially, as the mRNA gets older in the cytoplasm, exonucleases start gradually chewing away at that polyA tail from the end.

The half -life of the mRNA, how long it survives before it's degraded, is often directly determined by how fast that tail is getting degraded.

Okay, and before we finally get to splicing, we have to talk about the molecular surprise that overturns a core principle of genetics.

RNA editing.

RNA editing is truly mind -bending.

It is a post -transcriptional enzyme -mediated change in the actual nucleotide sequence.

It's distinct from splicing.

It shows that the DNA sequence is just a suggestion, not a guarantee.

Exactly.

It harnesses the basic chemical reactivity of the nucleotide bases for regulatory purposes.

The classic example is apolipoprotein B, which has two wildly different forms, apob -100 and apob -48.

And the gene is identical in the liver and in the small intestine.

Liver cells make apob -100, a huge protein involved in cholesterol transport.

Small intestine cells make apob -48, which is less than half the size.

And the difference is a single enzyme.

A single enzyme that's only active in the small intestine, an RNA -editing daemonase.

In the intestinal cells, this enzyme catalyzes the deamination of a specific cytidine, a C, within the apob mRNA.

It chemically converts it into a uridine, a U.

So a C becomes a U.

What does that do to the coat?

That single change converts the codon CAA, which codes for glutamine, into UAA, which is a universally recognized stop codon.

So a single enzyme acting on a single base just truncates the protein, making it perfectly suited for that tissue's specific need for lipid absorption.

It's an incredible mechanism for diversity and tissue specificity.

You see another powerful example in the nervous system with the glutamate receptor mRNA.

What happens there?

Editing changes a CAG for glutamine into a CGG for arginine.

And that one amino acid substitution at a critical point in the ion channel alters its properties, preventing calcium flow while still allowing sodium flow, which is crucial for certain types of neural transmission.

Okay.

Now let's tackle the grand complexity, splicing.

The excision of non -coding introns and the joining of coding exons.

Why is precision so absolutely paramount here?

Precision is everything because splicing operates within the framework of the genetic code's reading frame.

If the splicing machinery is off by even a single nucleotide at either splice site, the reading frame for every single codon downstream of that mistake is immediately shifted.

A frame shift.

And that almost always results in a string of nonsense amino acids in a premature stop codon.

You get a completely non -functional protein.

So what are the three molecular markers that define an intron and guide this precision?

Introns are identified by very strict consensus sequences.

At the very beginning, the five prime splice site always has the invariant dinucleotide GU.

At the very end, the three prime splice site always has the invariant dinucleotide AG.

And that's usually preceded by a long stretch of pyrimidines.

Then the internal anchor.

That's the branch site.

It's an internal adenosine, an A, that's located about 20 to 50 nucleotides upstream of that three prime splice site.

The two prime hydroxyl group of this specific adenosine is the nucleophile that kicks off the whole reaction.

OK, let's detail the chemistry.

It happens through two sequential transesterification reactions.

It's a remarkable piece of molecular choreography.

In the first transesterification, that two prime OH group of the branch site adenosine attacks the phosphatister bond right at the five prime splice site.

And what does that do?

It simultaneously cleaves the bond and links the branch site A to the five prime end of the intron.

This creates that unique branch structure we call the lariat intermediate.

It's defined by a very rare two prime to five prime phosphatister bond right at the branch point.

And that initial cleavage is so important because it frees the upstream exon, exon one.

Yes, the newly liberated three prime OH group of exon one is now highly reactive.

So in the second transesterification, this three prime OH acts as the new nucleophile, attacking the phosphatister bond at the three prime splice site.

Which joins the exon.

It simultaneously joins exon one and exon two, and it releases the intron in its lariat shape.

And what's crucial to remember about the energy of these chemical steps?

The splicing chemistry itself is energy neutral.

You're just swapping existing phosphatister bonds.

The number of bonds broken equals the number of bonds formed.

So the two transesterification reactions themselves don't require an external energy source like ATP or GTP.

The machinery that does the splicing, the spliceosome, is one of the largest and most dynamic assemblies in the cell.

And it absolutely burns through ATP.

It does.

So if the chemistry is energy neutral, where is all that energy being spent?

The energy is spent on the relentless structural rearrangement and assembly of the machine.

The spliceosome is this massive 60S complex made of five small nuclear ribonucleoprotein particles.

SNRNPs, we call them.

U1, U2, U4, U5, and U6.

Plus hundreds of other protein factors.

So we should think of the U SNRNA as not just as components, but as like molecular GPS units and structural girders directing the whole process.

That is the perfect analogy.

The energy from ATP hydrolysis, which is done by RNA helicases, is needed to unwind all the temporary RNA duplexes that form, allowing for these crucial structural shifts.

The proteins manage the transitions, but the U SNRNAs are the functional core.

OK, so let's detail their specific roles in this choreography.

All right.

U1 SNRNA is the scalp.

It finds and binds the five prime splice site through complementary base pairing.

And U2?

U2 SNRNA binds to the branch site and it actually helps to push that key adenosine residue out.

So it's two prime OH is exposed and ready for the attack.

And the catalytic center.

Most critically, U2 and U6 SNRNAs base pair extensively with each other.

They form the structural and catalytic active site.

This is the place where the two transesterification reactions actually happen, likely involving coordinated magnesium ions.

Then what about U5?

What's its job?

U5 SNRNA is believed to act as the precision jig.

It binds to both the upstream and the downstream axons, holding them in perfect alignment so that when the intron is cut out, the free three prime OH of exon one can attack the three prime splice site of exon two with just exquisite accuracy.

So all this complex multi -step processing capping, splicing, polyadenylation.

It all has to be tightly integrated with the polymerase itself.

How does the cell make sure these events happen in the right sequence at the right time?

This is where that Polisika CTD comes back in.

It acts as the molecular bridge.

The phosphorylation state of the CTD is the signal.

So as the polymerase moves down the DNA, the CTD is changing.

Exactly.

And it sequentially recruits the necessary processing factors in a precise handoff sequence.

Describe that sequence for us.

Okay, so initially, early phosphorylation signals recruit the capping enzymes almost immediately as the five prime end of the transcript emerges.

Then during mid -elongation, the CTD recruits components of the splicing machinery.

So you get co -transcriptional splicing.

The intron is being excised as it's still being synthesized.

Which is super efficient.

Highly efficient.

And finally, when the polymerase hits the cleavage site at the three prime end, the CTD recruits the undonuclease and the polyadenylation factors.

This molecular linkage ensures timely and efficient processing before the transcript ever leaves the nucleus.

But unfortunately, this tightly regulated system is vulnerable to mutations, which can cause some really serious genetic diseases.

They absolutely can.

A classic cis -acting mutation.

So a mutation in the RNA sequence itself is found in some forms of thalassemia, a group of anemias.

A single base change in an intron sequence can create a cryptic brand new five prime splice site that competes with the correct one.

The result is aberrant splicing that incorporates extra intron sequence into the mature mRNA, which often has premature stop codon.

You get non -functional hemoglobin.

And you can also have problems if the splicing machinery itself is mutated.

Yes, a transacting mutation.

A mutation in a protein component of the splice system.

For example, mutations in the HPRP8 protein, a critical part of the U5 -U6 complex, are linked to retinitis pigmentosa, a progressive form of blindness.

Which shows that a defect in a totally general essential machine can still lead to a very tissue -specific disease.

Right, and the reasons for that tissue specificity are still really complex and being worked out.

And now we arrive at what is really the true genius of the eukaryotic genome,

alternative splicing.

Alternative splicing is the reason we can generate a massive proteome from a pretty modest gene count.

It's all about selecting different combinations of exons from a single pre -mRNA transcript and stitching them together to make distinct mature mRNAs.

Which then encode functionally different proteins.

The calcitonin gene is a perfect illustration of this cell -specific choice.

It is.

In thyroid cells, the local transcription factors dictate a splicing pathway that includes exon 4.

The result is the hormone calcitonin, which lowers blood calcium.

But in the brain.

But in neuronal cells, the local regulatory factors trigger a different splicing pattern that completely skips exon 4.

This leads instead to the production of calcitonin gene -related peptide, or CGRP, which is a potent phasodilator.

One single gene, two radically different peptide hormones, and it's all dependent on which cell is doing the splicing.

It's all about context.

And the potential for this kind of combinatorial power is just staggering.

The dscam gene in Drosophila is often cited as the pinnacle of this.

The Drosophila dscam gene, which plays a huge role in wiring the fly's complex neural circuitry, has the potential to produce over 38 ,000 different combinations of exons.

38 ,000 from one gene.

Why does a single gene need the potential to make that many different versions of its protein?

This extreme versatility is believed to be essential for giving a specific molecular identity to individual neurons.

For the fly's nervous system to make the thousands of precise connections it needs for complex behaviors, each neuron has to have a unique cell surface recognition molecule.

And alternative splicing gives that gene the combinatorial power to achieve that specificity.

Exactly.

It illustrates how the complexity of the final organism is engineered not at the level of DNA, but at the level of RNA processing.

Okay, so if the splice system shows us that RNA is the catalyst at the core of gene processing, then the next step has to be the revolutionary discovery that RNA can be a full -on enzyme, a ribozyme all by itself.

This discovery from Thomas Sec and Sydney Altman just fundamentally changed biochemistry.

The idea of RNA molecules acting as catalysts or ribozymes confirmed that the long -held belief that only proteins could be enzymes was just wrong.

We see it everywhere now.

Everywhere.

The RNA component of RNA's P cleaves tRNA precursors.

Even the core catalytic function of the ribozyme is carried out by ribosomal RNA.

But the breakthrough proof came from studying the ribosomal RNA precursor in the single -celled eukaryote tetrahymena.

Sec showed that the 414 nucleotide intron in this precursor could precisely splice itself out or self -splice in a test tube with absolutely no proteins involved.

This is what we call group I self -splicing.

And what were the only two requirements for this reaction to happen?

It needed two things.

A precise folded 3D structure and the presence of an external guanosine nucleotide.

PTP,

GMP, or even just G.

And the guanosine isn't an energy source.

Not at all.

It acts as the initial nucleophile.

It transiently becomes incorporated into the RNA structure to kick things off.

So let's compare this chemistry to the spliceosome mechanism.

It's similar in that it also uses two sequential transesterification steps.

First, the external guanosine binds to a specific pocket in the intron and it attacks the 5' splice site making the first cut.

And then the second step.

The newly freed 3' OH of the upstream exon then attacks the 3' splice site.

This joins the two exons and releases the intron.

But the key difference here is that the intron is released as a linear molecule, not a lariat.

And the intron itself is doing all the work.

Yes.

The intron RNA folds into this complex 3D structure of helices and loops that creates a defined active site capable of coordinating magnesium ions, just like a protein enzyme.

It even has an internal guide sequence, an IGS, that makes sure the guanosine cofactor and the splice sites are perfectly aligned for catalysis.

That realization that RNA can be both information and catalyst, it really strengthened the idea that this might be an ancient relic.

And that leads directly to group 2 introns.

Right.

Group 2 self -splicing introns are found mainly in organelles like mitochondria and chloroplasts.

And mechanistically, they're fundamentally different from group 1.

How so?

Instead of using an external guanosine, group 2 splicing is initiated by the 2 'OH group of an internal adenylate residue, an A that's part of the intron itself.

Wait a second.

That internal attack leading to a lariat product, that sounds exactly like what the spliceosome does to nuclear pre -mRNA.

It is nearly identical chemically.

The initiation by an internal adenosine, the formation of the lariat intermediate, the two transesterification reactions that join the exons.

That mechanism is precisely shared between group 2 introns and the modern spliceosome.

So the evolutionary hypothesis is that our complex multi -component spliceosome may have actually risen from a simpler self -splicing group 2 intron.

That is the leading hypothesis.

It's suggested that the cellular machinery sort of took over the catalytic role.

The catalytic power shifted from the massive sequence -constrained intron itself to the separate external SNRNAs like U2 and U6.

It's domestication of the intron.

Right.

And that freed the intron sequence from having to maintain its own precise catalytic structure, which in turn allowed introns to become much larger, more random, giving genes greater freedom to evolve new regulatory elements inside them.

So whether it's shared ancestry or just convergent evolution settling on the most efficient path,

the discovery that RNA can be a catalyst is just monumental.

It completely shattered the protein -first dogma of biology and provided powerful molecular evidence for the RNA world hypothesis.

The idea that RNA was the primary molecule of ancient life capable of carrying both genetic information and performing essential enzymatic tasks.

And we are still uncovering these echoes of the RNA world.

Hidden deep within our most fundamental cellular machinery.

This deep dive into gene expression beyond the DNA has just been explosive.

We've seen how the universal chemistry of RNA polymerases, driven by that elegant energy lock of PPI breakdown, forms the basis of transcription across all domains of life.

We then explore the complexity of eukaryotes with multiple polymerases, TVP -mediated DNA bending and distant enhancers creating this system of highly specific architectural control.

And then the vast power of post -transcriptional processing from the protective reverse -linked five -prime cap to the stability -conferring poly -A tail and the radical sequence changes introduced by RNA editing.

And at the heart of it all is splicing.

A precise RNA -driven machine performing two energy -neutral

transesterification reactions that define the core function of the transcript.

So what does this all mean?

I think the most profound realization is that the modest count of human genes is deeply, deeply misleading.

It is.

The true complexity and diversity of the human proteome, the full set of proteins that actually do the work of life, is not determined by the gene count.

It's determined by the layers of processing we talked about today.

Especially alternative splicing.

Alternative splicing especially allows one transcript to become multiple functional tissue -specific proteins.

It effectively multiplies our 21 ,000 genes many, many times over.

Okay.

So here's where it gets really interesting.

We've established that the fundamental engines of molecular life, the spliceosome, the ribosome, are driven by RNA catalysts.

We've seen the molecular fossils of the RNA world in Group 1 and Group 2 introns.

Which leads to our final provocative thought for you to explore.

If RNA was the key functional molecule of early life, and if its catalytic power is still hidden deep within our most essential complex machinery.

Then how much more catalytic or regulatory RNA is yet to be discovered, silently controlling the vast non -coding regions of the transcriptome that we currently just dismiss as junk?

The RNA world might not be a relic at all.

It might still be running the show from behind the scenes.

That's a powerful question for your own continuing deep dive.

Thank you so much for joining us.

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

Chapter SummaryWhat this audio overview covers

RNA synthesis represents a fundamental process in gene expression wherein RNA polymerase catalyzes the formation of phosphodiester bonds between nucleotides, creating complementary RNA transcripts from DNA templates. Transcription occurs through three sequential stages: initiation, where the enzyme recognizes and binds specific promoter regions; elongation, where the polymerase processively adds nucleotides to the growing chain; and termination, which concludes transcript production through defined mechanisms. In prokaryotic organisms, the sigma subunit functions as a specificity factor that directs the RNA polymerase holoenzyme to recognize conserved promoter elements such as the minus 10 and minus 35 boxes, while termination proceeds either through formation of hairpin structures within the transcript or through action of the rho protein, which causes the enzyme to dissociate from the template. The clinical significance of transcription is evident in antibiotic action, as compounds like rifampicin and actinomycin D inhibit bacterial RNA synthesis and are used therapeutically to suppress pathogenic gene expression. Eukaryotic transcription involves three specialized RNA polymerase enzymes, each with distinct targets: RNA polymerase I produces ribosomal RNAs, RNA polymerase II generates messenger RNAs, and RNA polymerase III synthesizes transfer RNAs. Assembly of eukaryotic transcription initiation requires the TATA-binding protein and numerous transcription factors that facilitate polymerase recruitment and promoter melting. Following transcription, eukaryotic mRNAs undergo extensive post-transcriptional modifications that enhance their stability and translational efficiency, including addition of a 5-prime cap structure and a 3-prime poly-A tail. Splicing, catalyzed by the spliceosome complex containing small nuclear ribonucleoproteins, removes non-coding intron sequences and ligates exons through transesterification chemistry involving lariat-shaped intermediates. Alternative splicing patterns allow generation of multiple protein isoforms from a single gene, increasing proteomic complexity without expanding genome size. Additional post-transcriptional modification occurs through RNA editing, wherein specific nucleotides within transcripts are altered after synthesis. The chapter also discusses catalytic RNA molecules known as ribozymes, which demonstrate that RNA possesses enzymatic capacity and can catalyze its own removal from precursor transcripts through self-splicing mechanisms.

Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.

Support LML ♥

Chapter 29: RNA Synthesis & Processing

Related Chapters