Chapter 11: Transcription & RNA Processing
Welcome to Last Minute Lecture.
This free chapter overview is designed to help students review and understand key concepts.
These summaries supplement not replaced the original textbook and may not be redistributed or resold.
For complete coverage, always consult the official text.
Welcome back to the Deep Dive.
You know, we often picture the central dogma, that flow of genetic information, as this really neat linear process, DNA to RNA to protein.
Simple.
Right, like an assembly line.
Exactly.
But chapter 11 really pulls back the curtain, doesn't it?
It shows that middle step getting from DNA to RNA is, well, it's more like a super busy, highly regulated post office than a simple line.
That's a great analogy, actually.
Lots of sorting, checking, modifying.
Yeah.
So our mission today is really to follow that messenger molecule, the RNA.
We're going to explore how that static blueprint in the DNA gets transcribed, and crucially, why the resulting transcript needs so much quality control and processing, especially in eukaryotes.
It's a fascinating journey, involves some amazing molecular machines.
Let's dive in.
Okay, so starting with the basics, the central dogma lays out that fundamental information flow.
You've got DNA making copies of itself that's replication, and then you have DNA being used to make proteins that's gene expression.
And expression has those two key stages.
Right.
First,
transcription, taking the DNA information and writing it into an RNA molecule, then translation,
using that RNA message to build a protein.
Understanding that flow is just, well, fundamental.
Absolutely.
And while it generally goes DNA, RNA protein, there are important exceptions, right?
Like RNA viruses, HIV, for example.
Oh yeah, reverse transcriptase.
Exactly.
They use that enzyme to go backwards, RNA back into DNA.
But importantly, that last step, RNA to protein.
That's basically a one -way street.
No going back from protein to RNA.
Okay.
So if DNA is the master blueprint, you said RNA is more like the whole construction crew.
Yeah.
It does way more than just carry the message.
Oh, definitely.
RNA is incredibly versatile.
It's not just one molecule type.
Chapter 11 introduces five main players.
Okay, let's break those down.
So first, think about the ones directly involved in making the protein, the translation machinery.
You've got mRNA, that's messenger RNA.
It carries the actual genetic code copied from DNA.
The message itself.
Precisely.
Then there's tRNA, transfer RNA.
These are the adapters.
They physically link the amino acids to the code words on the mRNA.
We'll see more of them in Chapter 12.
Okay.
And rRNA, ribosomal RNA.
These are actually part of the ribosome itself, both structurally and surprisingly catalytically.
They help make the peptide bonds.
Wow.
Okay, so RNA is even building the protein -making machine?
In large part, yes.
Then you have the RNA is involved more in regulation and processing.
There's SNRNA, small nuclear RNA.
These are critical parts of the spliceosome complex.
Splicing.
We'll definitely get to that.
Sounds important.
Very.
And finally, mRNA or microRNA.
These are tiny little RNA strands that act like gene silencers.
They can bind to mRNAs and either block their translation or mark them for destruction.
A key layer of regulation.
So many roles.
And you mentioned location being key.
Prokaryotes versus eukaryotes.
Big difference.
In prokaryotes, bacteria, and archaea, there's no nucleus separating the DNA from the ribosome.
So transcription and translation can actually be coupled.
They happen almost simultaneously in the cytoplasm.
Super efficient.
But in eukaryotes, like our cells.
Everything's compartmentalized.
Transcription happens inside the nucleus.
The RNA has to be processed, then shipped out to the cytoplasm for translation.
It's a much more involved, regulated process.
Okay, let's unpack transcription itself.
Starting with a simpler prokaryotic model, like E.
coli.
How does RNA synthesis actually work at the molecular level?
It shares some features with DNA synthesis, right?
It does.
The building blocks are similar ribonucleoside triphosphates, or RTPs, instead of DNTPs.
And synthesis always happens in the five prime to three prime direction, adding on to that three prime hydroxyl group.
Okay, same directionality.
Yep.
But key differences too.
First, only one of the two DNA strands acts as the template for any given gene.
Not both, like in replication.
Correct.
Second, RNA polymerase doesn't need a pre -existing primer to get started.
It can initiate synthesis de novo, just start from scratch.
Ah, okay.
No primer needed.
And the RNA strand produced.
How does it relate to the two DNA strands?
So the RNA is synthesized to be complementary to the DNA strand it's reading.
That's called
Think A pairing with U, G with C.
Right.
Which means the RNA sequence ends up being essentially identical to the other DNA strand, the non -template strand just with uracil swapped in wherever there was a thymine T.
Wait, hang on.
So the strand called the coding strand isn't actually the template.
Exactly.
It's a bit confusing, but yeah.
The non -template strand is often called the coding strand or sense strand, precisely because its sequence directly matches the RNA sequence with U for T.
The polymerase reads the template strand to make an RNA copy of the coding strand.
Huh.
Okay, got it.
So the coding strand is like the sense of the gene, but the template is what's physically used.
You've got it.
It's an important distinction.
All right.
So how does the enzyme, the RNA polymerase and E.
coli find the right place to start on the DNA?
You mentioned promoters.
Yes.
The complete E.
coli RNA polymerase enzyme is called the hollow enzyme.
It has several subunits, but one, the sigma factor is crucial just for initiation.
Sigma finds the starting line.
Basically, yes.
Sigma recognizes specific DNA sequences called promoter sites, which are located just upstream from where transcription should begin.
It stands the DNA until it finds one.
And are these promoter sequences conserved,
like signals?
Very much so.
In E.
coli, there are two key consensus sequences on the non -template strand that Sigma looks for.
There's the NAX35 sequence, consensus TTG ACA, which seems to be mainly for recognition, and then the NAGIS10 sequence, consensus TTAT, often called the Pribno box.
Ah, lots of As and Ts.
Easier to pull apart.
Exactly.
The AT base pairs only have two hydrogen bonds compared to GCs3, so that AT -rich NAXR10 region helps the DNA double helix unwind locally, forming the transcription bubble and allowing the polymerase to access the template strand.
Smart.
And once it starts, Sigma leaves.
Pretty quickly, yeah.
After the first few phosphodiester bonds are formed, the Sigma factor usually detaches.
The remaining core enzyme, alpha -beta -felda, then moves along the DNA, elongating the RNA chain within that transcription bubble, which is about 18 nucleotide pairs long.
Okay, so initiation is finding the promoter with Sigma, unwinding, starting the chain.
Elongation is the core enzyme treading along.
How does it know when to stop?
Termination.
Right.
Termination also relies on specific signals in the DNA, which get transcribed into the RNA.
There are two main mechanisms in prokaryotes.
The first is called row -independent termination.
It doesn't need any extra protein factors.
It just relies on the RNA sequence itself.
How does that work?
The termination sequence contains inverted repeats.
When the RNA polymerase transcribes this region, the RNA folds back on itself immediately, forming a stable hairpin structure, like a little loop.
Okay, a hairpin in the RNA.
Yeah.
This hairpin physically bumps, or somehow causes the polymerase to pause, and critically, right after the hairpin sequence in the DNA template, there's usually a stretch of adamines.
So the RNA has a stretch of U's, AU pairs.
Right.
And AU base pairs are the weakest ones.
So when the polymerase pauses because of the hairpin, that weak RNA -DNA hybrid in the active site just destabilizes.
The U's let go of the A's, and the whole RNA transcript just peels off.
Clever.
It builds its own release signal.
Pretty much.
The second type is row -dependent termination.
This one does require an additional protein factor called row.
Okay, so row gets involved.
How?
Row protein recognizes and binds to a specific sequence on the newly synthesized RNA molecule called a RUT site, row utilization site.
Row then acts like a motor, using ATP hydrolysis to kind of race along the RNA transcript, chasing after the polymerase.
It chases the polymerase.
Yeah.
And it also has helicase activity.
It can unwind nucleic acids.
So the polymerase will eventually hit a different kind of termination sequence, often another hairpin, that causes it to pause.
When it pauses, row catches up.
And then row uses its helicase activity to basically unwind the RNA -DNA hybrid duplex within the transcription bubble.
It physically separates the RNA from the DNA template and the polymerase, forcing termination.
Wow, like a little molecular bulldozer clearing the track.
Kinda, yeah.
It actively pushes the transcript off.
So row -independent uses RNA structure, row -dependent uses the row protein machine.
That's the gist of it.
And all this efficiency and prokaryotes leads to that amazing phenomenon you mentioned earlier, coupled transcription and translation.
The pictures of this are incredible.
They really are.
Miller and Hamkellow's electron micrographs from the late 60s were revolutionary.
Because there's no nucleus, as soon as the 5 -foot -end of an mRNA molecule emerges from the RNA polymerase...
Rivasomes just hop on.
Rivasomes jump right on and start translating it into protein.
Even while the 3 -bit end of that same mRNA is still being transcribed further down the gene, you see these structures that look
like feathers.
The DNA strand is the quill.
The growing RNA transcripts branch off, and each RNA branch is already covered in ribosomes, making protein.
Yes, I -multinus production.
Incredible speed.
Exactly.
It allows prokaryotic cells to respond really rapidly to environmental changes.
Okay.
But now, let's brace ourselves for eukaryotic complexity.
We bring in the nucleus,
compartmentalization, everything changes, right?
Control becomes paramount.
Absolutely.
The eukaryotic system is, well, significantly more elaborate.
First off, instead of basically one main RNA polymerase, eukaryotes have multiple types.
Multiple polymerases.
For different jobs.
Yep.
The main ones are RNA polymerase Pol -1, which is dedicated to making most of the ribosomal RNAs, and it works in a specific nuclear region called the nucleolus.
Okay.
RNA factory.
Then RNA polymerase the second, Pol -2.
This is the one we focus on most because it transcribes all the protein -coding genes into messenger RNA precursors, pre -mRNAs.
It also makes some other small RNAs.
So Pol -2 makes the mRNA.
Got it.
And RNA polymerase the third, Pol -3, synthesizes the transfer RNAs, tRNAs, one type of ribosomal RNA, the 5S RNA, and some other small, stable RNAs.
Plans have Pol -3 and V2 involved in gene silencing, but I, 2, and 3 are the core ones.
Right.
So focusing on Pol -2 making mRNA, you said it needs help finding the promoter.
No sigma factor here.
No sigma factor.
Instead, Pol -2 relies on a whole suite of proteins called basal transcription factors or general transcription factors, often designated TFIA, TFIIB, TFID, et cetera.
TF for transcription factor, 2 for Pol -2.
An entourage, you said.
Pretty much.
Pol -2 cannot recognize the promoter DNA on its own.
These factors have to assemble on the promoter first, creating a landing pad for the polymerase.
And what do eukaryotic promoters look like, similar to the negative 10, negative 35?
There are some conserved elements, but they can be more varied.
One key one is the TATA box.
It has a consensus sequence like TATA A, and it's typically found around position negative 25 or negative 30 relative to the transcription start site plus one.
TATA again.
Still AT rich.
Still AT rich, yes.
And its location is critical for positioning Pol -2 correctly to start transcription at the right nucleotide.
The first step is usually the binding of TFIA, which actually contains the TATA binding protein, TBP, to the TATA box.
That distorts the DNA and serves as a scaffold for the other factors and Pol -2 to assemble into the pre -initiation complex.
So TBP grabs the TATA box, everyone else piles on, then Pol -2 can start.
That's the basic idea, yes.
There are other elements, too, like the SIAT box or GC boxes further upstream that influence the efficiency of initiation, but the TATA box is often key for positioning.
Okay.
Now, Pol -Lukid starts making the RNA transcript, the pre -mRNA.
But you said it can't just go straight to the ribosome.
It needs processing inside the nucleus.
What happens to it?
Three major things happen to almost all eukaryotic pre -mRNAs before they're considered mature mRNA ready for export and translation.
And much of this happens while transcription is still ongoing.
It's co -transcriptional.
Okay, what's first?
First is the 5R cap.
Very early on, as soon as the 5 -foot end of the RNA emerges from Pol -2 cap, enzymes add a modified guanine nucleotide 7 -methylguanosine backwards, essentially.
It's linked via an unusual 5 -to -5 -foot triphosphate bridge.
A cap on the front end, why?
Two main reasons.
It protects the 5 -foot end from being degraded by exonucleases.
And crucially, it acts as the recognition signal for the ribosome to bind and initiate translation later in the cytoplasm.
Protection and recognition.
Got it.
What's next?
Second is the 3 -horal polyA tail.
Unlike prokaryotes where termination is often precise, eukaryotic Pol -2 often transcribes well past the actual end of the gene coding sequence.
Then the transcript is cleaved internally at a specific site.
Cleaved, not just stopped.
Right.
There is a signal sequence in the RNA, typically AUAA, recognized by cleavage factors.
After cleavage, another enzyme called polyA, Polymerase, comes in and adds a long string of dinosine nucleotides anywhere from 50 to 250 As onto that newly created 3 -wet end.
This is the polyA tail.
A long tail of As.
What's that for?
Again, multiple roles.
It enhances the stability of the mRNA, protecting the 3 -foot end from degradation.
It also plays a role in the export of the mRNA from the nucleus and is important for initiating translation.
The length of the tail can even regulate how long the mRNA persists.
So cap on the front, tail on the back.
Like packaging it up securely.
Exactly.
And then there's a third, really quite startling modification process called RNA editing.
Editing.
You mean changing the sequence after it's been copied from the DNA?
That's exactly what it means.
The information in the final RNA molecule doesn't perfectly match the DNA sequence it came from because specific bases are altered, inserted, or deleted after transcription.
Whoa.
That seems counterintuitive to the whole blueprint idea.
It was definitely a surprise.
A classic example is the gene for apolipoprotein B in humans.
The same gene is transcribed in both liver and intestinal cells.
But in intestinal cells, a specific enzyme, and it's the mRNA, changing a single cytosine C nucleotide into a uracil.
This change, CAA to UAA, creates a premature stop codon.
So it stops translation early?
Yes.
The result is that the intestine produces a much shorter functionally distinct apob protein compared to the full length version made in the liver from the unedited mRNA.
Same gene, different proteins and different tissues thanks to RNA editing.
That's incredible.
It adds another layer for control.
One gene, multiple protein possibilities.
Precisely.
Another dramatic form involves insertion or deletion of uridines, especially seen in the mitochondrial RNAs of trypanosomes guided by other small RNAs.
It can radically alter the final message.
Mind -boggling complexity.
But maybe the biggest bombshell in eukaryotic gene structure was realizing genes weren't continuous.
This idea of non -colinearity.
Oh, absolutely.
The discovery of introns and exons was revolutionary.
It shattered the assumption that the DNA sequence mapped directly base for base onto the amino acid sequence of the protein.
So define those terms for us again.
Sure.
Exons are the segments of a gene that are expressed.
They contain the coding sequences that end up in the mature mRNA and dictate the protein sequence.
Introns are the intervening sequences.
They lie between the exons within the gene, are transcribed into the pre -mRNA, but then are removed before translation.
Moved.
So they're just cut out.
What was the evidence for this?
How did they figure this out?
The key technique was R -loop hybridization.
Researchers took the final mature mRNA molecule from the cytoplasm and hybridized it back to the gene's DNA.
Okay.
Mixing the RNA message with the DNA blueprint.
Right.
DNA -RNA hybrids are actually more stable than DNA duplexes under certain conditions.
So where the RNA could find its complementary sequence on the DNA template strand, it would bind, displacing the other DNA strand and forming a visible DNA -RNA hybrid region and R -loop.
What did they see?
Instead of seeing one continuous R -loop spanning the whole gene, they saw multiple R -loops, the exon regions separated by loops of unhybridized double -stranded DNA.
Those loops of DNA that the mRNA couldn't bind to, those were the introns.
Exactly.
Because those sequences had been cut out of the mature mRNA, when they did the same experiment using the unprocessed pre -mRNA taken straight from the nucleus, they saw just one continuous R -loop covering the entire gene.
Proof that introns are transcribed initially, but then removed.
That's elegant proof.
And the scale of these introns can be huge, right?
Phenomenal.
Some genes are mostly intron.
The human dystrophin gene, DMD, mutated in muscular dystrophy is a classic example.
It spans about 2 .5 million DNA base pairs, but has 78 introns.
The final mRNA is only about 14 ,000 bases.
Most of the gene is non -coding sequence that gets removed.
Managing that removal splicing must be precise.
You can't be off by even one base.
Absolutely critical.
A single nucleotide error would shift the reading frame for all downstream exons, leading to a completely garbled protein.
So how does this splicing happen?
There are actually three main mechanisms, though one dominates for pre -mRNAs.
What are they?
The first two are a bit more specialized.
Some tRNA introns are removed by a fairly straightforward enzymatic process.
A specific endonucleus cuts the intron out, and then a guess joins the two tRNA halves back together.
Protein enzymes do the job.
Simple cut and paste.
Relatively, yes.
The second was a huge discovery.
Autocatalytic splicing.
Found in some RNA precursors, like in the protist tetrahymena, and also in some mitochondrial and chloroplast RNAs.
Here, the RNA molecule itself performs the splicing reaction.
The RNA acts as the enzyme, a ribozyme.
Exactly.
It catalyzes its own excision.
The tetrahymena RNA intron splicing needs a free guanosine nucleotide GOH as a cofactor, but requires no protein enzymes for the cleavage and ligation steps.
This discovery proved RNA could be catalytic, challenging the protein -only enzyme dogma.
That's amazing.
But most mRNA introns aren't self -splicing.
No.
The vast majority of introns in nuclear pre -mRNAs, in eukaryotes, are removed by the third mechanism, which involves a massive molecular machine called the spliceosome.
The spliceosome.
You mentioned it earlier.
What is it?
It's a huge dynamic complex, about the size of a ribosome.
It's composed of several proteins, plus five small nuclear RNAs, U1, U2, U4, U5, and U6 SNRNAs.
These SNRNAs don't function alone.
They are complexed with proteins to form small ribonuclear protein particles, or SNRNPs, pronounced SNRPs.
SNRPs.
Okay.
How do they work?
The SNNRNPs are the key players.
They recognize specific, highly conserved, short sequences at the boundaries between exons and introns.
There's usually a GU sequence at the five -foot splice site, the beginning of the intron, and an AG sequence at the three -foot splice site, the end of the intron, within a larger consensus context.
UAG rule.
Generally, yes.
U1 SNRNP binds to the five splice site.
U2 binds to a specific branch point adenine within the intron, and then U4, U5, and U6 join to form the active spliceosome.
The SNRNAs themselves likely catalyze reaction, probably through base -pairing interactions that bring the key sites together.
And the process.
It involves two main cutting and joining steps.
First, the five -foot splice site is cleaved, and that free five -foot end of the intron is looped around and covalently linked to the branch point adenine, forming a characteristic lariat structure, like a lasso.
Loop, okay.
Then the three -foot splice site is cleaved, releasing the intron lariat, and simultaneously, the two exons are ligated together.
Precisely.
Seamlessly.
The lariat intron is then degraded.
Incredible precision from this huge SNRNP machine.
It has to be perfect, every single time, for thousands of introns and potentially tens of thousands of genes.
And you hinted that this complexity isn't just about removing junk DNA.
It offers an advantage.
A massive advantage.
Alternative splicing.
Because genes are split into exons and introns, the spliceosome doesn't always have to connect the exons in the same way.
You mean it can skip an exon?
Or use a different splice site?
Exactly.
By including or excluding certain exons or using alternative splice sites, a single pre -mRNA transcript from one gene can be processed into multiple different mature mRNAs.
Which means one gene can encode multiple distinct proteins, often with different functions or properties, perhaps expressed in different tissues or at different times.
It vastly expands the coding potential of the genome without needing vastly more genes.
It's a major source of protein diversity in complex organisms.
Wow.
So the introns aren't just junk.
They're punctuation that allows for recombination of the coding parts.
That's a great way to put it.
It allows for modular protein construction and evolution.
Okay, let's synthesize this.
We've gone from the central dogma's basic flow to the nuts and bolts of transcription.
We saw the prokaryotic model sigma factor for initiation, row -dependent or independent termination, super fast coupling with translation.
Streamlined efficiency.
Then the eukaryotic leap in
Multiple polymerases, especially Pol2, needing basal factors like TBP for initiation.
Then the crucial post -transcriptional processing.
The 5 -foot cap for protection and recognition.
The 3 -foot polyA tail for stability at export.
And even RNA editing changing the message itself.
Layers of regulation and preparation.
And finally, the reality of interrupted genes, exons and introns, and the absolute necessity of precise splicing, carried out by tRNA enzymes, self -splicing RNAs, or most commonly the massive spliceosome complex built from FNRMPs.
Which itself enables the power of alternative splicing, generating protein diversity from single genes.
Looking across all of this, what really stands out is the, I guess, the sheer molecular engineering involved.
It really is engineering.
Think about the row protein physically chasing down the polymerase, or the intricate dance of the SNRNAs within the spliceosome, recognizing those tiny GUAG signals millions of times over.
These are molecular machines managing vast amounts of information with incredible fidelity.
It's truly mind -boggling.
Which brings us to that final thought for you, our listener.
Given the enormous cellular resources poured into making spliceosomes, and accurately removing potentially millions of bases of intron sequence from transcripts,
why?
Why keep introns?
Is this elaborate exon -intron structure just an evolutionary leftover some ancient baggage Or,
as alternative splicing suggests, is the very presence of introns, and the complex machinery needed to deal with them, actually a key innovation.
Perhaps the key innovation that unlocked the functional complexity of eukaryotes by allowing genes to be mixed and matched.
Something to ponder, definitely.
The cost versus the benefit.
Think about that level of biological design, or perhaps emergent complexity.
It's quite something.
Thank you for diving deep with us today into the world of transcription and RNA processing.
It's complex, but hopefully seeing the pieces connect makes it a bit clearer.
Thank you for being a part of our little last -minute lecture family.
ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.
Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.
Support LML ♥Related Chapters
- Gene Expression: TranscriptioniGenetics: A Molecular Approach
- The Genetic Code and TranscriptionEssentials of Genetics
- Transcription: Synthesis of RNAMarks' Basic Medical Biochemistry: A Clinical Approach
- Gene Expression I: Genetic Code & TranscriptionBecker's World of the Cell
- RNA Synthesis & ProcessingThe Cell: A Molecular Approach
- RNA Synthesis & ProcessingBiochemistry