Chapter 5: Gene Expression: Transcription
Welcome to Last Minute Lecture.
This free chapter overview is designed to help students review and understand key concepts.
These summaries supplement not replaced the original textbook and may not be redistributed or resold.
For complete coverage, always consult the official text.
Welcome back to the Deep Dive.
Our mission here is pretty simple.
We slice through the complexity of heavy scientific research and just deliver the core knowledge pulled directly from some really comprehensive source material.
Today we are cracking open the master blueprint for life itself.
We are diving into a process called gene expression.
This is the molecular choreography that transforms that silent information stored away in DNA into the active working parts that run every single cell in your body.
Right.
And if you've ever just marveled at modern genetics, I mean how researchers can clone an entire organism or design these super specific drug therapies that target one single gene defect.
Or even forensics.
Or forensics, yeah.
Identifying a person from a tiny sample.
It all comes back to a deep, deep understanding of this one process.
It's really the central mechanism of how life works.
It absolutely is.
And our deep dive today is focusing on the very first and I'd argue most critical step, transcription.
That is the process of synthesizing a working copy of RNA from a DNA template.
And our sources for this are coming from a really solid college level genetics chapter.
So we're going to map out the whole molecular mechanism and we'll draw some pretty sharp comparisons between the let's say efficient bacterial system and the much more layered complex eukaryotic machine.
Okay.
So let's start at the very foundation.
I think everything really kicks off with the central dogma, right?
A concept that Francis Crick coined way, way back in 1956.
Yep.
It's the simple, almost elegant path of information flow.
It goes DNA, which then hands the baton to RNA.
And then RNA hands it off to protein.
Exactly.
DNA to RNA to protein.
Transcription is that first absolutely crucial arrow.
And here's maybe the first key point you need to internalize.
When a cell does transcription, it is an act of a selective purposeful copying.
It doesn't just copy both strands of the DNA double helix.
Okay.
So why be so selective?
Why not just copy everything and sort it out later?
Well, because the RNA molecule you get, the end needs to have a very specific sequence to do its job, whether that's, you know, encoding a protein or becoming part of a ribosome.
If you transcribed the other DNA strand, the complimentary one, you'd get an RNA molecule with a completely different base sequence.
Which would be useless.
Functionally incorrect.
Totally useless to the cell.
So transcription is all about choosing the right source material to create the correct functional product.
Got it.
So before we jump into the enzyme that's actually doing all this copying, let's look at the finished products, the RNA molecules themselves.
We always think about genes coding for proteins, but our sources detail four major classes of functional RNA.
Right.
And it's a great point because only one of these is actually destined to be translated into a protein.
The one we're most familiar with is mRNA, which stands for messenger RNA.
The messenger?
Yep.
These are the temporary transcripts of those protein coding genes.
They carry the instructions, including the specific amino acid sequence for a polypeptide.
They're like the mobile blueprint that you take from the DNA vault down to the factory floor.
Okay.
And then you have the other three, which are more like structural and transport players.
Exactly.
First up is our RNA or ribosomal RNA.
These molecules are a core component.
They combine with special ribosomal proteins to form the ribosomes themselves.
And the ribosomes are the factories.
The ribosomes are the factory.
They're the massive complex machinery where protein synthesis, what we call translation, actually happens.
Makes sense.
What's next?
Next is tRNA or transfer RNA.
These are visually maybe the most interesting.
You often see them drawn in this kind of folded clover leaf shape.
Right.
And they function like molecular delivery trucks.
They go out, grab specific amino acid and ferry it over to the ribosome, making sure they line up in the The last one is a class that's specific to eukaryotes, so organisms like us.
It's SNRNA, which stands for small nuclear RNA.
These little guys are essential for the massive cleanup job that has to happen after transcription in our cells.
They combine with proteins to form complexes called SNRNPs, which are the core of the splicing machine.
Okay.
We'll definitely have to come back to splicing.
So let's talk about the workhorse, the enzyme doing the actual transcribing.
That super precise.
It's a DNA dependent RNA polymerase, which just means it absolutely has to read a DNA template to synthesize RNA.
It can't just make it up.
And right away, this enzyme has a problem, right?
How does it get to the information that's all locked up inside that tightly wound double helix?
That's the first hurdle.
The enzyme has to locally unwind the DNA helix to even see the bases it needs to read.
And our sources point out a really critical divergence right here between bacteria and eukaryotes.
Exactly.
In simpler organisms like bacteria, the RNA polymerase is kind of a beast.
It's robust enough to handle the unwinding all by itself.
But in complex organisms like us, the job is shared.
Other proteins have to come in and help separate the strands near the start site, making sure the helix is properly untwisted.
All right.
Let's nail down the directionality.
I feel like this is always the part where people, you know, get a little tripped up.
Oh, absolutely.
So which way is the RNA actually built and how does that relate to how the DNA is read?
Right.
So the fundamental rule, and this is for all nucleic acid chemistry, is that RNA is always synthesized in the five prime to three prime direction.
Always.
Five dollars right arrow, three dollars.
Okay.
So to do that, the RNA polymerase has to move along the DNA template strand and read it in the opposite direction.
Three prime to five prime, three dollars right arrow, five hundred dollars.
So we've got two strands of DNA, but only one is actually being read at any given time for a particular gene.
Let's clarify the terminology for these strands because I remember it being confusing.
It can be the strand.
The enzyme is actually reading.
We call that the template strand.
Simple enough.
And the other one, the other one, the complementary strand has a couple of names.
You can call it the non template strand or, and this is maybe more logical, the coding strand.
Why coding strand?
It's not being used because it has the exact same base sequence and the same polarity, the same five dollar right arrow, three foot dollar direction as the RNA transcript that gets made.
Oh, that's a great shortcut.
It is.
The only difference is that DNA has thymine T and RNA has uracil U.
So you can look at the coding strand sequence,
mentally swap every T for a U and you have your RNA transcript.
That's why if you look up a gene sequence in a database, you're almost always looking at the coding strand.
That makes so much sense.
Okay.
So what's the actual chemistry of the synthesis?
What are the building blocks?
The precursors are what we call ribonucleoside triphosphates or NTPs.
So you have ATP, GTP, CTP and UTP.
The RNA polymerase just moves along the template, sees a base,
selects the correct matching NTP from the cellular soup and then catalyzes a phosphataster bond to link it to the growing chain.
Now, here's a point I find really interesting, especially when you compare it to DNA replication.
Our sources stress that RNA polymerase can start a new chain completely on its own.
It doesn't need a primer.
Right.
Why is it so much less fussy than DNA polymerase, which absolutely needs a primer to get started?
That's a fantastic question.
It really gets to the core of their different roles.
DNA polymerase has to be, I mean, virtually flawless.
DNA is the permanent genetic archive.
It's the master blueprint.
So it needs a primer for a stable, accurate starting point.
And it has all this proofreading capability to fix almost every mistake.
But RNA is different.
RNA is temporary.
It's just a working copy.
If one RNA molecule has an error, the cell just degrades it and makes a new one.
It's no big deal.
So that lower requirement for fidelity means it can afford to just start synthesis de novo without a pre -existing primer sequence to build off of.
But the basic rules of pairing still apply just with the RNA twist.
Absolutely.
Adenine A in the DNA pairs with uracil U in the RNA.
So if your DNA template strand reads, say, three dollars text per dollar, the RNA that gets synthesized will beat.
Five dollars detects IUCUA three times.
Exactly.
So this whole selective, directional, primer -less process is the engine.
But the controls, how the engine starts, runs, and stops, that's where things get really different between bacteria and us.
And I think it's best if we start with the streamlined, efficient approach of bacteria.
All right.
Let's dive into the E.
coli model.
In a bacterium, the whole process from finding the gene to finishing the transcript is super fast and broken down into three really clear stages, right?
Initiation, elongation, and termination.
That's it.
And to understand how it starts, we first have to define the layout of a bacterial gene.
Our sources lay out three key functional regions that control transcription.
Okay.
Starting from the beginning, or upstream, we have the promoter.
This is a stretch of DNA that doesn't actually get transcribed, but it's where the RNA polymerase first docks.
And crucially, the promoter dictates not only the start site, but also the direction the polymerase will travel.
And that, in turn, is what determines which of the two DNA strands gets used as the template for that specific gene.
Got it.
And after the promoter?
Right after the promoter is the RNA coding sequence.
This is the main event.
It's the actual stretch of DNA that's going to be transcribed into the new RNA molecule.
And then at the very end downstream, you find the terminator.
The stop sign.
It's the sequence that tells the RNA polymerase, okay, you're done, stop synthesizing.
And we use this numbering system where the first nucleotide that gets transcribed is called plus one and everything upstream.
So the whole promoter region gets negative numbers.
Okay, let's focus on that first stage.
Initiation at promoters.
Bacteria are, as you said, models of efficiency.
They use just one single type of RNA polymerase for everything.
mRNA, tRNA, rRNA, all of it.
But here's the catch.
That core enzyme can't find the promoter by itself.
This is where the famous sigma factor comes in.
So the main enzyme, the core machinery, which is made of a few polypeptide subunits, two alphas, a beta, a beta prime, it has to bind to the sigma factor first.
Right.
And only when they're bound together do you get the active form, which is called the hollow enzyme.
So what's the sigma factor's job?
Think of the core enzyme as a powerful motor.
And the sigma factor is like a specifically shaped key.
Its job is to fit into the ignition, the promoter, and only the promoter.
It ensures the hollow enzyme binds stably only at the correct starting points.
Without sigma, the core enzyme is just this aimless machine that starts transcribing randomly all over the DNA, which is biologically useless.
So what is the sigma factor actually looking for?
What's the shape of the ignition?
It's recognizing two very specific, highly conserved sequence regions in the promoter.
We call them consensus sequences.
Okay, what are they?
The first one is called the 3NR555 box.
It's located, as you'd guess, about 35 base pairs upstream of the start site.
The consensus sequence there is $5 -TEX -TTGCA -3ETL.
And the second one?
The second is the $10 box, which is often called the PridNOW box.
It's about 10 base pairs upstream, and its consensus is $5 -TEX -3RD.
These two little sequences are the molecular street signs that basically tell the polymerase, land here.
And initiation is an actual physical process, a change in shape?
It's a physical transition, yes.
Initially, the hollow enzyme finds and makes contact with these 5 and $10 regions, while the DNA is still fully double helical.
This is what we call the closed promoter complex.
But you can't transcribe from a closed helix.
Exactly.
So the next step is the transition to the open promoter complex.
The hollow enzyme binds super tightly and actually untwists the DNA helix, specifically around that $10 region, which is rich in A's and T's and easier to pull apart.
Once it's unwound, the enzyme is perfectly positioned, oriented to read template strand, and begin making RNA right at the plus -a -dollar position.
That's fascinating.
And what's really cool from a regulatory standpoint is that our sources point out that these consensus sequences are rarely perfect.
Right, they're not.
And those variations in the promoter sequence lead to huge differences in how well the polymerase binds.
A weak match to the consensus means RNA polymerase binds less often, it initiates transcription more slowly, and that gives the cell this really subtle but powerful layer of control over how much of that gene's product gets made.
And that control gets amplified even more through different types of sigma factors, right?
Yes.
This is a brilliant system.
The standard one, sigma recognizes the consensus sequences we just talked about.
It handles most of the everyday, what you might call housekeeping genes.
But what if the cell gets stressed, like say it gets too hot?
That's when the cell needs to dramatically change what genes are being expressed.
So if an E.
coli cell experiences a heat shock, it quickly starts producing a different sigma factor.
A new key for the same motor.
A new key for the same motor.
The sigma factor recognizes a completely different set of consensus sequences at different positions.
And suddenly, the same core enzyme is directed to a whole new set of genes, specifically the genes that encode proteins needed to cope with heat stress.
It's an instant system wide reprogramming, all orchestrated just by swapping out the sigma factor.
Wow.
And on top of all that, you still have other proteins like activators and repressors that fine tune things even further.
Of course.
Activators can bind nearby and help the polymerase bind more strongly, while repressors can physically block the polymerase's path, shutting transcription down.
It's a really layered and efficient system.
Okay.
Let's move on to stage two.
Elongation.
So initiation is successful.
The first few nucleotides of the RNA chain are made.
And once it gets to about eight or nine nucleotides long, the sigma factor has done its job.
It's no longer needed, so it just dissociates from the complex.
And the core enzyme takes over.
The core enzyme takes over, and it's incredibly processive.
It just chugs along the DNA.
In E.
coli, at body temperature, it's moving at roughly 40 nucleotides per second.
It's continuously unwinding the DNA ahead of it and re -zipping the helix behind it, maintaining this little localized, untwisted region.
The transcription bubble of about 25 base pairs.
And inside that bubble, the new RNA chain is sort of what temporarily stuck to the DNA template.
For a very short stretch, yes.
About nine bases of the new RNA remain base paired to the DNA, forming a short RNA -DNA hybrid helix.
The rest of the growing RNA transcript just kind of peels off and exits out the side of the enzyme as a single strand.
Now, you mentioned earlier that RNA polymerase has a lower fidelity requirement than DNA pole, but it must have some kind of quality control, right?
It can't just be making mistakes left and right.
It does.
Our sources describe two types of proofreading.
They're a bit different from the classic proofreading you see in DNA replication.
The first is a simple reversal.
If the enzyme adds the wrong nucleotide, it can actually reverse the polymerization reaction, backtrack one step, cut out the wrong base, and insert the correct one.
A quick little backspace and delete.
Pretty much.
The second one is a bit more dramatic.
The enzyme detects an error, pauses, and then backs up one or more steps.
Then it acts like a nucleus and cleaves the RNA chain to remove the whole segment with the error before it resumes synthesis going forward.
It's like a molecular editor making a bigger correction.
Okay, so after all that elongation, we finally hit stage three.
Termination.
The enzyme needs a clear signal to stop.
And in bacteria, there are two different ways this can happen.
Right.
Two distinct molecular mechanisms, and both are signaled by specific sequences in the DNA called terminators.
The first are the row -independent terminators.
Independent, meaning they don't need any help.
No extra proteins required.
It's a self -terminating mechanism that relies purely on the structure of the RNA transcript itself.
So what does that structure look like?
The key is in the DNA sequence.
It contains a region called an inverted repeat, maybe 16 to 20 base pairs long, and it's immediately followed by a short string of AT base pairs.
An inverted repeat.
So the sequence on one strand is the reverse complement of itself further down.
Exactly.
So when the RNA polymerase transcribes that region, the new RNA transcript has a sequence that is self -complementary.
It immediately folds back on itself and base pairs, forming a very stable stem loop structure, which we call a hairpin.
Okay, so you have this big bulky hairpin structure forming right as it's coming out of the polymerase.
And that big structure physically jams up the works.
It causes the RNA polymerase to pause, to stall on the DNA.
And that pause is the key.
That pause is the tripwire.
Because right after the hairpin sequence, the RNA is transcribing that AT -rich region.
This means you have a string of uracil nucleotides in the RNA paired with adenine nucleotides in the DNA.
And AU base pairs are weak.
They only have two hydrogen bonds.
Ah, so you've got the enzyme stalled by the hairpin, and at the same time, the connection between the RNA and the DNA is super flimsy.
Precisely.
The combination of the physical drag from the hairpin and the mechanical instability of that weak AU hybrid is enough to cause the whole complex to just fall apart.
The RNA polymerase dissociates, and the new RNA strand is released.
It's a really elegant, self -contained mechanism.
Okay, so what's the second mechanism?
The second mechanism needs a helper protein.
These are the row -dependent terminators.
The terminator sequences here look different.
They're rich in Cs and poor in Gs, and they don't have those inverted repeats, so they can't form a hairpin.
So this is where the row protein comes in.
This is where row comes in.
Row is a molecular motor.
Specifically, it's a type of enzyme called a helicase, which unwinds nucleic acids.
It recognizes and binds to that C -rich sequence on the new RNA transcript upstream of where termination will actually happen.
So it hops on the RNA, and then what?
Once it's bound and starts moving, it burns ATP for energy to chase after the RNA polymerase, tracking along the RNA strand.
It's a race.
It's a race.
And when the polymerase reaches the actual termination site, it tends to pause for a moment, often due to some subtle cues in the DNA sequence.
That pause gives row the chance to catch up.
And when it catches the polymerase?
When it catches up to the stalled polymerase, it uses its helicase activity to literally unwind the RNA -DNA hybrid helix inside the transcription bubble.
It just rips the two apart, and that forces the dissociation of the RNA, the polymerase, and the DNA termination complete.
That is just so cool.
The sheer elegance of the bacterial system.
One polymerase, a simple sigma factor for starting, and then two totally distinct ways to stop, one that's purely mechanical and structural, and one that's enzymatic and requires energy.
It's incredibly streamlined.
But now we have to shift gears completely, because if the bacterial system is a sleek utility vehicle, the eukaryotic system is like a vast interconnected industrial complex.
Right.
The complexity is just on another level, which makes sense, I guess.
We have enormous genomes all packed up into chromatin, and you need specialized tools to handle all of that.
The first sign of that specialization is the division of labor.
Unlike bacteria with their one -size -fits -all polymerase, eukaryotes have three different nuclear RNA polymerases, and each one is dedicated to a specific job.
Okay, let's break them down.
Who's first?
First is RNA polymerase, or FIRST.
It lives in a specific part of the nucleus called the nucleolus, which is basically the ribosome factory.
So its one and only job is to synthesize most of the ribosomal RNA components.
The 20 itself says 18 s's and $5 .08 RNAs.
A dedicated specialist.
Okay, who's next?
Next is the rock star, RNA polymerase II.
It's found in the main part of the nucleus, the nucleoplasm, and it's responsible for synthesizing all of the mRNAs, so all the protein -coding genes, plus some of those essential SNRNAs we mentioned earlier.
Our sources even note its structure is a bit like a U -shaped clamp that helps it hold on to the DNA as it moves.
This is the polymerase that transcribes the vast majority of genes that make us who we are.
And the third one?
RNA polymerase III.
Also in the nucleoplasm, it handles the rest of the small structural RNAs.
All the tRNAs, the last little ribosomal component called 5RNA and the rest of the sense and RNAs.
This level of specialization is really the first big clue that eukaryotic control is going to be way more intricate.
Okay, so let's focus on Pol II, the protein coding machine.
A key difference you mentioned is that these eukaryotic genes have promoters, but they don't have those specific dedicated terminator sequences like bacteria do.
That's right.
There's no hairpin loop or row protein.
Termination for Pol II is a much messier, less defined process.
And because of that and other things, the initial product that Pol II makes is not a finished mRNA.
It's what we call a precursor mRNA or pre -mRNA.
So it's a rough draft.
A very rough draft.
It has to go through extensive post -transcriptional processing in the nucleus before it's mature and ready to be translated.
Let's look at the starting line then.
The promoters and regulatory elements for these Pol II genes.
They're a lot bigger and more modular than the simple bacterial ones, right?
Much bigger.
They can stretch 200 base pairs or even more upstream.
And we can categorize the different parts based on what they do.
First, you have the core promoter elements.
These are right near the start site within about 50 base pairs.
And their job is just to position the enzyme correctly.
And this includes the famous TATA box.
It does.
The core promoter usually includes the NER or initiator element, which is right at the Plesilor start site, and the TATA box, also called the Goldberg -Hogness box, which is found at about the $3 position.
Its consensus sequence is $5, stacks $3.
So if you have just that core promoter, can you get transcription?
Barely.
The core elements are like the physical anchor.
The machinery can assemble there, but the level of transcription you get is extremely low, almost background noise.
To really turn the gene on, you need a serious boost from the promoter proximal element.
Okay.
And these are further upstream.
Further upstream from about $5 to 205 taller base pairs.
These are the real volume control knobs for the gene.
Classic examples are the SIAT box around $705, and the GC box at around $900.
If you get a mutation in one of these proximal elements, transcription efficiency plummets.
They are absolutely critical for high -level expression.
And this allows for specialization, right?
Different cells turning on different genes.
Precisely.
Housekeeping genes, the ones that every cell needs for basic metabolism, they have proximal elements that are recognized by activators found in all cells.
But cell -specific genes, say a gene only needed in a liver cell, will have elements that are only recognized by activators unique to the liver.
That's how you get tissue -specific gene expression.
But the control doesn't even stop there.
There's another layer.
The enhancers.
Ah, yes.
The molecular remote controls.
These are other DNA sequences that can modulate transcription from, I mean, a stunning distance.
Thousands of base pairs away.
They can be upstream, downstream, even located inside the gene itself within a neutron.
So how on earth does a sequence thousands of bases away influence what's happening at the promoter?
It's all about the physical flexibility of DNA.
Special activator proteins bind to these short sequences within the enhancer.
Then the DNA itself forms a massive loop, which physically brings that distant enhancer and its bound activators right up next to the core promoter and the RNA polymer's second machinery.
That physical contact is what's required to stimulate transcription to its maximum level.
So you have this layering of control.
The TATA box for location, the proximal elements for volume, and the enhancers for this kind of remote -controlled super boost.
It's an incredibly sophisticated system, but it also presents a huge challenge for, say, genomics researchers,
which our sources touch on.
You can't just scan the human genome for TATA boxes and expect to find all the genes.
The signals are too variable, and the enhancers are too far away.
You have to combine that with other computational and experimental evidence.
All right, let's get into the nitty gritty of how this all starts.
Initiation by RNA polymerase is the second, and the absolute key difference, the thing you have to remember, is that eukaryotic polymerases cannot bind to DNA directly.
They are completely helpless on their own.
They are totally dependent on a whole crew of helper proteins called general transcription factors, or GTFs.
These GTFs have to assemble on the promoter first, building a kind of landing pad for the polymerase.
And they all have these systematic names like TFEA, TFAI, and so on for transcription factor for pull two.
So can you walk us through the step -by -step assembly of this pre -initiation complex, or PIC, as we understand it from experiments done in vitro?
Okay, so the very first commitment, the first step, is made by a GTF called TFII'd.
TFI itself is a complex that contains the TATA -binding protein, or TBP.
TBP is like a molecular saddle that recognizes and sits right down on the TATA box, and in doing so, it actually puts a sharp bend in the DNA.
That binding is the anchor point for everything else.
So TFII is the first one on the scene.
Then what?
Once TFII has staked its claim, TFII and TFIIB come in and bind.
TFIIB is particularly important because it acts as a bridge to help recruit the polymerase itself.
Ah, so that leads to the next step.
RNA polymerase tickin' shows up, along with another factor, TFII, and they bind to the growing complex.
Right, and notice that Paltissus is being physically brought to the promoter.
It's completely incapable of finding it on its own.
So now we have the polymerase in place, but the motor's not on yet.
What's the final step?
To complete the PIC and get things started, you need two more factors,
TFIIe and, crucially, TFIIH.
TFIIH has a very special job.
It has helicase activity.
A helicase.
So its job is to unwind the DNA.
It's the engine starter.
Just like the bacterial polymerase had to untwist the DNA to make the open complex, Paltu needs help.
TFIIH uses its helicase activity to unwind the promoter DNA right near the start site, creating that little bubble that's necessary for transcription to begin.
Without TFIIH, the whole complex can assemble, but the DNA stays closed and nothing happens.
But you mentioned this nice, neat, sequential assembly is what we see in a test tube, in vitro.
What about in a real cell?
In a living cell in vivo, it's way more complicated.
For one thing, the DNA is wrapped up in nucleosomes.
It's actually more likely that many of these GTFs, and maybe even the polymerase itself, arrive at the promoter as one giant preformed complex ready to dock.
But the main takeaway is the same.
To get efficient transcription, you absolutely need those activator proteins bound to the upstream elements and distant enhancers to give the go signal.
Okay, so transcription has successfully started.
We're making this raw, unprocessed pre -mRNA.
Now we get into that essential phase of RNA processing.
So let's recap the big differences between the final mRNA product in bacteria versus eukaryotes.
Right, so in both, a mature mRNA has three main parts, a $5 untranslated region, the protein coding sequence in the middle, and then a $3 untranslated region.
But in brokaryotes, it's simple.
The transcript that's made is the final mRNA.
It's collinear with the gene.
And it's often
meaning one mRNA molecule can code for several different proteins.
And this is critical.
Bacteria don't have a nucleus.
There's no physical barrier.
So a ribosome can hop onto the fiveinella end of the mRNA and start translating it into protein, while the RNA polymerase is still busy transcribing the $3 end.
We call that coupled transcription and translation.
But in eukaryotes, it's the complete opposite.
Everything's inverted.
The pre -mRNA has to be heavily processed in the nucleus first.
It's almost always monocistronic, so one gene per mRNA.
And after all that processing, it has to be actively exported out of the nucleus to the cytoplasm for translation.
There's no coupling.
This processing involves three big steps, right?
Capping, tailing, and splicing.
Let's start with modification one, the five -foot cap.
This happens almost immediately.
When RNA pull two has made just 20 or 30 nucleotides of the pre -mRNA, a special capping enzyme complex comes in.
It adds a guanine nucleotide, which then gets methylated to form 7 -methylguanosine, or 7 -70 -aller.
And the chemistry of how it's attached is really important.
The chemistry is the crucial part.
This 7 -g -aller is added with a really unusual $5 to $5 .30 linkage.
The normal bond in an RNA chain is a $5 to $3 phosphidester bond.
This one is essentially put on backwards.
Why the weird backward bond?
It's a molecular roadblock.
There are enzymes in the cell called exonucleases whose job is to chew up RNA from the ends.
They work by recognizing that normal $5 end.
This unnatural linkage basically hides the end from them, protecting the mRNA from being rapidly degraded.
So it's a protective shield.
It's a protective shield, but it also has another job.
The cat is the primary recognition signal for the ribosome in the cytoplasm.
It's the start here sign for translation.
No cap, no protein.
It's that simple.
Okay.
Next up is modification two, the three -foot poly -A tail.
This is a long string of adenine nucleotides, right?
A very long string.
Anywhere from 50 to 250 adenines added to the $3 end after transcription is done.
And importantly, there's no long string of T's in the DNA that codes for this.
It's added post -transcriptionally.
So how does the cell know where to add it?
There are specific consensus sequences in the RNA transcript.
The polymerase actually transcribes past the eventual end of the mRNA.
In that extra bit of RNA, there's a key sequence, $5 and that's the signal.
That's the signal.
A whole team of cleavage factor proteins binds to that region and then physically cuts the RNA about 10 to 30 nucleotides downstream of that signal sequence.
So now you have a fresh $3 end.
A fresh $3 end.
And an enzyme called poly -A polymerase or PP takes over.
It uses ATP as a building block and just starts adding adenines one by one, building that tail.
And as the tail gets longer, other proteins bind to it to help manage and protect it.
What's the function of this tail?
Is it just for protection too?
It's critical for a few things.
First, it's like a passport.
It's required for the mature mRNA to be efficiently exported from the nucleus.
Second, once it's in the cytoplasm, it acts like a molecular timer.
Exonucleases start chewing away at that tail.
And the longer the tail, the longer the mRNA survives and the more protein can be made from it.
So it's key for mRNA stability.
And it helps with translation.
Yes.
It interacts with proteins on the $5 cap to help form a loop, which makes translation much more efficient.
And this whole cleavage process also explains how polin -second finally terminates.
It does.
The leading model is that after the pre -mRNA is cleaved at that poly -A site, the bit of RNA still attached to the polymerase is now unprotected.
It has a raw $5 -A end.
Which is a target for those exonucleases.
Exactly.
A special $5 right arrow, 3 -foot exonuclease latches onto that raw end and starts degrading it, racing along the RNA towards the polymerase.
Eventually it just catches up to the polymerase and causes the whole complex to destabilize and fall off the DNA.
It's a torpedo model.
Okay, so finally, we get to the most complex and I think the most revolutionary modification.
Modification three, intron removal or RNA splicing.
Yeah, this one completely changed our understanding of what a gene is.
Before the late 70s, everyone just assumed that the sequence of a gene in the DNA was perfectly collinear with the protein it coded for.
And that idea got blown up by Philip Leiter's group studying the mouse apetoglobin gene.
It did.
They found that the pre -mRNA in the nucleus was about 1 .5 kilobases long, but the final mature mRNA they found in the cytoplasm was only 0 .7 kilobases.
Almost half of it was missing.
And those missing pieces were the proof for introns, the intervening non -coding sequences that separate the exons, the express sequences that actually make it into the final mRNA.
Right.
And the cell now has this incredibly difficult task.
It has to remove every single intron with surgical precision and stitch the remaining exons together.
If you're off by even one nucleotide.
You get a frame shift and the protein is garbage.
Total garbage.
The machine that does this is called the spliceosome.
And the spliceosome is this enormous complex made of several of those SNRNPs we mentioned earlier, the small nuclear ribonucleoprotein particles.
So that's U1, U2, U4, U5, U6.
It's a massive dynamic machine.
And the whole process is guided by recognizing consensus sequences at the boundaries of the introns, the $5 splice junction, which is usually a GU, and the $3 splice junction, which is usually an AG.
Can you walk us through the basic steps of how it works?
Sure.
So step one, the U1 SNRNP is the initial spotter.
It binds to the $5 splice junction, mostly through base pairing.
U1 marks the beginning of the intron.
Step two, the U2 SNRNP binds to something called the branch point sequence.
This is a special sequence inside the intron upstream of the $3 end.
And it always contains a very important adenine nucleotide.
That adenine is going to be key for the chemistry.
Absolutely essential.
Step three, a big complex containing U4, U6, and U5 SNRNPs comes in.
This acts like a molecular crane, pulling the two ends of the intron together, causing the intron to loop out.
So now the beginning and the end of the intron are held close together.
Right.
Step four, U4 SNRNP leaves.
Its job was to bring U6 to the party.
And once U6 is in place, U4 dissociates.
This creates the active spliceosome.
It's now ready to cut.
And now the chemistry happens.
Now the chemistry.
Step five, the spliceosome makes the first cut right at the $5 junction.
The free $5 end of the intron is then immediately swung over and attached to that special adenine at the branch point.
That's very weird bond.
A two foot $5 phosphodiester bond.
This creates a distinctive loop structure called an RNA lariat.
Like a cowboy's rope.
Exactly like a lariat.
Step six, with the lariat formed, the spliceosome makes the second cut at the $3 junction, releasing the lariat, which gets degraded.
And in that same instant, the two flanking exons are perfectly ligated or spliced together.
And this whole process repeats for every single intron.
And this incredible mechanism opens the door to one of the most important concepts in modern biology.
Alternative splicing.
It really does, because the splicing machinery itself can be regulated by other proteins.
A single gene's pre -mRNA can be spliced in different ways in different cells.
So a heart cell might decide to include exon four, but a brain cell might decide to skip it.
Exactly.
And by doing that, they produce two slightly different or variant polypeptides from the very same gene.
This is how humans with only around 20 ,000 protein coding genes can produce a proteome, a collection of proteins, that's estimated to have over a hundred thousand distinct proteins.
It's a massive force for generating complexity.
So the modern view is that all these things, transcription, capping, tailing, splicing, they aren't separate events happening one after another.
Not at all.
They are functionally and physically coupled.
The machinery for all these processing steps actually rides along on the tail of RNA polymers the second, so that the modifications can happen as the pre -mRNA is being synthesized.
It's a continuous integrated production line.
That brings us to some really specialized, almost strange RNA phenomena that challenge our whole view of RNA as just a passive messenger.
These discoveries really turned the field on its head.
They show that RNA can be an active player, a catalyst, an editor.
The first one is the wild discovery of self -splicing introns.
This came out of Tom Sex Group studying the RNA gene in a little protozoan called Tetrahymena, a Nobel Prize -winning discovery.
And what they found was that the intron in this pre -RNA molecule could cut itself out with no proteins involved.
It was a protein -independent reaction.
The RNA molecule itself folds into a very specific 3D structure that has the catalytic activity to cut and ligate itself.
So the RNA was acting like an enzyme.
It was acting like an enzyme.
This led to the coining of the term ribozymes for catalytic RNA.
Now, technically, the self -splicing intron isn't a true enzyme because it only acts on itself once and then it's done.
But later, other RNA molecules were found that could act as true catalysts, cleaving other RNA substrates over and over again.
And this discovery had huge implications for how we think about the origin of life.
Massive.
It gave rise to the RNA world hypothesis, the idea that the very earliest forms might not have had DNA or proteins.
They might have relied on RNA for everything, to store genetic information like DNA and to perform catalytic reactions like proteins.
It's the perfect two -in -one molecule for starting life.
Wow.
OK, what's the second big phenomenon?
The second is RNA editing.
This is any process that happens after transcription that results in an RNA sequence that does not match the DNA sequence it came from.
The cell is literally changing the genetic instructions after they've been written down.
And the classic, most extreme example of this comes from the mitochondria of parasitic protozoa, like trypanosoma bruceae, the bug that causes sleeping sickness.
It's an insane example.
In their COI gene, the final mature mRNA can be over 50 % made up of uracil nucleotides that were inserted after transcription.
They weren't in the original gene at all.
How is that even possible?
How can
That is just unbelievably complex.
It's mind -boggling.
But it's not just a weird thing.
RNA editing happens in us, in mammals, and it has really important functions.
The textbook example is CDU editing in the mRNA for a protein called Abolipoprotein B.
In your liver, the full -length mRNA is translated to make a very long protein that's involved in transporting cholesterol.
But in your intestine, the exact same pre -mRNA undergoes a single CDU editing event.
Just one base change.
One single base change.
It converts a CAA codon, which codes for the amino acid glutamine, into a UAA codon.
UAA is a stop codon.
It's a stop codon.
So that one little edit introduces a premature stop signal, and the intestine ends up making a much, much shorter, but functionally distinct protein that's needed for absorbing lipids from your food.
All from the same gene.
It's another incredible way to create protein diversity from a limited number of genes.
So we finished this deep dive with RNA looking a lot less like a passive messenger,
and a lot more like an active participant.
A structure that can catalyze its own reactions, and a transcript that can be fundamentally rewritten after it's been made.
The passive conveyor belt idea is long gone.
RNA is a dynamic, central player in the flow of genetic information.
Okay, that journey was incredible.
From the simple guidance of a sigma factor, all the way to the insane complexity of the spliceosome.
Let's try to recap the absolute highest yield principles for you, the learner.
First, just remember the fundamental rules of directionality.
Transcription is selective, only one strand is copied, and synthesis is always, always $5 right arrow, three foot dust.
Second, the bacterial system is all about efficiency.
One polymerase that relies entirely on the sigma factor to find the correct $10 and $300 555 per motor boxes.
Third, the eukaryotic system is about specialization and control.
You have three polymerases, pole third and third, and pole the Regen initiation is completely dependent on that big assembly of general transcription factors, or GTFs, to get it to the TATA box.
Fourth, the eukaryotic pre -m RNA is just a rough draft.
It has to undergo those three essential coupled processing events, the protective $5 cap with its weird $5 or five ball or bond, the stabilizing $3 poly A tail, and the precise removal of introns by the And finally, we learned that RNA has a life of its own.
It can be catalytic, as in ribozymes, and it's subject to extensive editing, which creates functional protein diversity that you could never predict just by looking at the DNA sequence alone.
Which brings us to our provocative final thought built directly on those last two points, alternative splicing and RNA editing.
If one single gene sequence in the DNA can be processed and edited to create dozens of different mature RNA molecules, and each of those encodes a slightly different protein,
what does that do to our classic one gene, one protein definition?
Right.
Does it even hold up anymore?
Should maybe pivot our entire focus.
Instead of just counting the number of genes in a genome,
maybe the true measure of an organism's complexity is the vast and dynamic landscape of its mature RNA and protein products.
That's where all the final molecular decisions are actually being made.
The blueprint and the DNA is fixed, but the real artistry happens in the editing room.
I love that.
The artistry happens in the editing room.
Thank you so much for joining us on this deep dive into transcription.
My pleasure.
We'll catch you next time.
ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.
Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.
Support LML ♥Related Chapters
- Transcription & RNA ProcessingPrinciples of Genetics
- Gene Expression: From Gene to ProteinCampbell Biology
- RNA Synthesis & ProcessingBiochemistry
- The Genetic Code and TranscriptionEssentials of Genetics
- DNA, RNA & Flow of Genetic InformationBiochemistry
- TranscriptionGenetics: A Conceptual Approach