Chapter 18: Gene Expression I: Genetic Code & Transcription

Search this chapter

Audio Overview

0:00 / 0:00

Autoplay next chapter

Welcome to Last Minute Lecture.

This free chapter overview is designed to help students review and understand key concepts.

These summaries supplement not replaced the original textbook and may not be redistributed or resold.

For complete coverage, always consult the official text.

We spent a lot of time navigating the double helix,

that massive elegant structure of DNA.

We understand how this grand blueprint replicates itself faithfully, making sure every new cell gets the exact same instructions.

The archive, yeah.

But an archive is just static information until you actually use it.

So if DNA is the master vault, the protected source material, how does the cell actually pull out a specific scroll, make a temporary working copy, and then turn that into a functional machine?

That question right there, that transition from archived instruction to active worker,

that is, I mean, that's the core functional challenge for every single living thing.

And the mechanisms for that are what we call gene expression.

Exactly.

So our mission today is a deep dive into the very foundation of this, of how information gets used.

We're covering the rules of the genetic code and the whole process of converting that DNA blueprint into an RNA copy, a process called transcription.

Okay, so let's unpack that.

At the heart of it all is a principle coined by Francis Crick, not long after the double helix was discovered.

He called it the central dogma of molecular biology.

And it dictates this very strict directional flow of information inside the cell.

And that direction is linear.

It's DNA makes RNA and RNA makes protein, DNA to RNA to protein.

That's it.

It's the main operating principle for how all cells use their genetic information to

build and maintain themselves.

And the terminology we use here is actually really telling.

Okay, so when DNA converts its information into RNA, we call that transcription.

Yes.

And that term is deliberate.

It's just a transfer of information from one nucleic acid DNA to another RNA.

The basic language, the alphabet of nucleic acids, it stays the same.

A, G, C, and T is just replaced by U.

You're just copying a text from one medium to another.

You've got it.

It's a transcription.

But when that RNA working copy finally turns into a protein, that's called translation.

Translation.

Because now you're changing languages completely.

You're going from the nucleotide sequence of RNA, which is just those four bases,

to the amino acid sequence of a polypeptide chain, which has 20 different building blocks.

A total switch.

From the language of nucleic acids to the language of proteins.

It's a fundamental switch, yeah.

And this whole transcription process, it doesn't just create the one messenger molecule.

It actually creates three distinct types of RNA, right?

That's right.

And they're all essential.

The one carrying the protein recipe is mRNA, or messenger RNA.

The messenger.

It carries the genetic message from the DNA archive out to the assembly line.

But the assembly line itself is made of rRNA, or ribosomal RNA.

And that's not just structural, is it?

Not at all.

It's the functional catalytic core of the ribosome.

And then finally we have tRNA, or transfer RNA.

These are the crucial intermediaries.

They're the translators.

They are.

They read the coded sequence in the mRNA and physically bring the right amino acids to the ribosome to be assembled.

So it's super important to remember that the genes for RNA and tRNA, they don't code for proteins.

They code for functional RNA molecules that are the final product themselves.

Exactly.

Which makes them just as vital.

And you know, if we connect this flow to the bigger picture of cell architecture,

where and when this happens differs dramatically between cell types.

Okay, let's start with prokaryotes.

In prokaryotes, there is no nucleus.

Their DNA is just right there in the cytosol, so it's not physically separated from the ribosomes.

So they can start building the protein before the message is even finished being written.

Instantly.

Transcription and translation are coupled.

They don't have to wait.

We have these incredible electron micrographs where you can see it happening.

You see the DNA being transcribed and attached to that growing strand of RNA are these little dark particles.

The ribosomes.

The ribosomes already translating the incomplete mRNA.

This whole cluster is called a polyribosome.

The process is immediate, it's simultaneous, and it's unbelievably fast.

That structural reality?

No nucleus.

Just enables pure speed.

But now let's jump to us, the eukaryotes.

Things slow down a lot, but we gain precision.

We introduce compartmentalization.

Transcription happens inside the nucleus.

And after that, there's a ton of RNA processing and editing.

Only then can the mature mRNA be exported out to the cytosol where the ribosomes are waiting.

So that separation in space and time, it allows for way more complex regulation.

Vastly more.

The nucleus is both the protected archive and a dedicated editing suite.

Okay, you mentioned the central dogma has been refined a bit.

It sounds pretty linear, but you're saying information can sometimes flow backward.

Indeed.

While DNA to RNA to protein is the main highway, there are these crucial side roads.

The most important one is reverse transcription, where RNA actually serves as a template to make DNA.

Which completely reverses the flow.

Completely.

And this process requires a highly specialized, and at the time, really shocking enzyme.

Right, this was a huge deal when it was discovered.

A massive conceptual shift.

It was found independently by Howard Temin and David Baltimore.

The enzyme is called reverse transcriptase.

Its discovery in the early 70s really violated the established dogma and just fundamentally changed how we view genetic information.

Let's talk about the classic example.

Retroviruses.

Like HIV.

Okay, so retroviruses carry their genetic material as RNA, along with a few molecules of their own reverse transcriptase enzyme.

When the virus gets into a host cell, that enzyme does two things.

First, it makes a complementary DNA strand from the viral RNA template.

Second, it makes another DNA strand, complementary to the first one.

The result is a stable, double -stranded DNA version of the viral genome.

And that DNA then becomes a permanent part of the host cell.

The infection is indefinite.

Precisely.

This viral DNA goes into the nucleus and integrates itself right into the host cell's chromosomes.

At this point, we call it a provirus.

And the cell's own machinery just treats it like any other gene.

It does.

It transcribes it to make new viral RNA.

And that RNA does two jobs.

It acts as mRNA to make viral proteins, and it gets packaged into new virus particles.

So reverse transcription lets viruses basically weaponize the host's own systems against itself.

Yeah.

But you said this isn't just a viral trick.

It happens in our cells, too, with something called retrotransposons.

This is where it gets really, really interesting, because it shows that this violation of the dogma is actually a fundamental engine of evolution inside our own cells.

Okay.

So what are they?

Retrotransposons are segments of DNA that can move and duplicate themselves using an RNA intermediate and reverse transcription.

Walk me through the life cycle of one.

So it starts with the transcription of the retrotransposon DNA into RNA.

That RNA gets translated, making a protein that usually has reverse transcriptase activity.

Then that protein and the RNA bind to DNA somewhere else in the genome.

The protein cuts the target DNA, and the reverse transcriptase uses the RNA as a template to make a new DNA copy, which then gets integrated into that new spot.

So they copy themselves through an RNA intermediate and paste that new DNA copy somewhere else.

It's like we're part retrovirus.

It definitely suggests a deep evolutionary link, and what's even crazier is the scale of these things in our genome.

Take the Alu family of sequences.

They're short, about 300 base pairs, and they don't even encode their own reverse transcriptase.

So they have to borrow one.

They do.

And yet they've been incredibly successful.

The human genome has about a million Alu sequences.

That's roughly 11 % of our entire DNA.

11%.

That is a massive, and I think often overlooked, evolutionary driver.

And they're not even the most common.

The L1 element, or line, is even more dominant.

L1 accounts for about 17 % of human DNA.

These are bigger, and they do encode their own reverse transcriptase.

So all told, these elements have just profoundly shaped our genomes, contributing to variability, to regulation, and sometimes to disease.

It shows the genome isn't a static library at all.

It's a dynamic, self -editing ecosystem.

Okay, so we've got the roadmap.

DNA to RNA to protein, with some exceptions.

But before we get into the how of transcription, we have to talk about the code itself.

How does an alphabet of four letters, A -T -G -C, specify the sequence of 20 different amino acids?

Right.

This is the transition that forces us to understand the rules of the code.

And that journey really began by linking a physical treat directly to a gene.

A connection that was famously made in the 1940s by Beadle and Tatum using, of all things, bread mold, Neurospora.

Their experiments are absolute cornerstones of molecular biology.

So Neurospora can normally grow on a minimal medium, just sugar, salts, and biotin.

It's self -sufficient.

Right.

A proto -troph.

Beadle and Tatum zapped the mold with x -rays to create random mutations.

Then they screened for strains that could no longer survive on that minimal medium.

The crucial step was figuring out what nutrient those mutants needed.

They were mapping metabolic pathways.

Exactly.

By giving these mutants single nutrients, like a specific amino acid, they could pinpoint where the pathway was broken, and they inferred that each mutation disabled a single enzyme.

This led to their famous conclusion, the one -gene -one -enzyme hypothesis.

One gene controls one enzyme.

A huge step.

But that hypothesis needed a little refinement.

It did.

And that came in the 50s with Linus Pauling and Vernon Ingram's work on sickle cell anemia.

They focused on hemoglobin, which is a transport protein, not an enzyme.

Right.

They found that sickle cell hemoglobin behaved differently than normal hemoglobin, suggesting a chemical difference.

And Ingram's contribution was to break the protein into smaller pieces to analyze it.

That was the key.

He used an enzyme to chop up the hemoglobin, and he found that only a single little fragment differed between the normal and sickle cell forms.

And the difference was just one amino acid substitution.

A glutamic acid was replaced by a villene.

One base pair change in the DNA caused one amino acid change, which resulted in this dramatic life -altering disease.

And that discovery refined the idea to the one -gene -one -polypeptide theory.

A gene determines the amino acid sequence of a specific polypeptide chain.

OK, so we know the product is a polypeptide chain.

But how is the recipe read?

Which brings us back to the math of the code.

Why did it have to be a triplet?

Mathematically, there was no other way.

You have an alphabet of four bases, and you need at least 20 unique words for the 20 amino acids.

A singlet code, 4 to the power of 1, only gives you four words.

Not enough.

A doublet code, 4 squared, gives you 16.

Still not enough.

Still four short.

But the triplet code, 4 cubed, gives you 64 possible combinations.

More than sufficient.

Exactly.

More than sufficient, which provides coding capacity plus a lot of redundancy.

That mathematical pressure is why biologists in the 50s were convinced it had to be a triplet code.

But conviction isn't proof.

The rigorous evidence came in 1961 from Crick and Brenner, using the elegant logic of frameshift mutations.

One of the most beautiful experiments ever.

They used dyes that would cause the addition or deletion of a single base pair.

Let's use the sentence analogy.

If the message is read in three -letter words, like, the fat cat ate the rat, and you insert one letter, say, an X.

It becomes T -H -X -E -F -T -C -A -tat.

The entire message downstream becomes total gibberish.

That's a plus mutation.

And a deletion, a minus mutation, does the same thing, just shifts the frame the other way.

Right.

And in both cases, you get a non -functional mutant phenotype.

But the truly critical finding came when they started combining these mutations.

When they made a double mutant, a plus and a minus, the effect of the second mutation largely canceled out the first.

The message was only garbled in the little bit between the mutations, but the reading frame was restored for the rest of the gene.

And that often resulted in a viable organism.

It did.

But the final proof, the one that screamed three, was the triple mutant.

If they made triple mutants of the same type three additions or three deletions, the original phenotype was often restored.

Three changes restore the reading frame perfectly.

Three changes restoring the message means the reading unit has to be a group of three.

That established it's a triplet code read in a non -overlapping sequence.

And those results immediately told us a few things about the nature of the code.

First, it must be degenerate.

We have 64 codons, but only 20 amino acids.

So most amino acids must be specified by more than one codon.

Which isn't a flaw, it's a huge biological advantage, a built -in redundancy.

It's a fantastic strategy for minimizing the impact of mutations.

A change in the third base of a codon often doesn't even change the amino acid.

Second, the code is non -overlapping.

If it were overlapping, a single base change would only affect a few amino acids, not garble the entire downstream message like they saw.

So it has to be read in discrete, non -overlapping groups of three.

And finally, it's unambiguous.

Every codon has one and only one meaning.

So how did they crack the full dictionary?

That was a race that took only five years, catalyzed by the idea that the code is read from the single -stranded mRNA sequence from the $5 right arrow, three -foot direction.

And the experimental breakthrough came from Marshall Nirenberg and Heinrich Mathai.

Yes, they used an enzyme that could make RNA without a DNA template.

So they fed it only one type of nucleotide, UTP, and it made a long chain of us, poly -U.

And when they put that in their cell -free system?

It made a polypeptide consisting solely of phenylalanine.

First assignment cracked.

UU codes for phenylalanine.

Incredible.

They quickly showed poly -A codes for lacine, poly -C for proline.

Then Hargobon Karana's group refined it by making polymers with defined, repeating sequences, which narrowed down the possibilities even further.

And by combining all these results, the entire codon dictionary was completed by 1966.

It was.

The full dictionary showed 64 codons, 61 of them in codomito acids.

Three of them, UAA, UAG, and UGA, are essential stop codons.

And one AUG is the universal start codon.

Correct.

It codes for methionine, and it's the signal to start protein synthesis.

And the universality of that code is just profound.

It really suggests a shared origin for almost all life.

It does.

The code is shared by viruses, prokaryotes, and eukaryotes.

But we have to note, there are a few specialized exceptions, mostly in organelles like mitochondria.

Tell us about those anomalies.

In our mitochondria, the codon UGA, which is normally a stop codon, is translated as tryptophan instead.

And AGA, which is usually arginine, is a stop codon in mitochondria.

So they have their own slightly different dialect to the code?

They do, thanks to specialized tRNAs that only exist inside the mitochondria.

And what about the 21st and 22nd amino acids, salinocicine and pyrolicine?

Those are fascinating cases where a specific stop codon, UGA or UAG, is repurposed to incorporate a new non -standard amino acid.

But it requires a special signal in the mRNA, like a unique folded structure, and a corresponding unique tRNA to override the stop signal.

So even a code this ancient and universal has some built -in flexibility.

It has profound regulatory flexibility.

With the code established, let's get into the mechanics of making the RNA.

Transcription.

Four key stages.

Binding, initiation, elongation and termination.

And the whole thing is driven by RNA polymerase.

Right.

And to understand the fundamentals, we have to start with the bacterial model, E.

coli.

The basic steps here are largely conserved everywhere.

So what does bacterial RNA polymerase look like?

And how does it find the one correct starting point in the whole genome?

The E.

coli RNA polymerase is a big machine called a hollow enzyme.

It has a core enzyme with a few subunits and a critical dissociable part called the sigma factor.

And the sigma factor is the key to finding the start.

It's the specificity guide.

The core enzyme does the actual synthesis, but you need the whole hollow enzyme with sigma attached for the first step initiation at the correct site.

And that correct site is the promoter.

Exactly.

Stage one, blinding to the promoter.

The sigma factor mediates the tight binding of the hollow enzyme to the DNA promoter site, which is a sequence that says start transcription here.

What are the key sequences that sigma is looking for?

It looks for consensus sequences.

The start site is designated plus one.

Upstream of that, at position language of 10, is the crucial PribNow box with the consensus to top.

And further upstream, at NADx35, is the Naxa35 sequence with the consensus TTG ACA.

How did scientists know exactly where sigma lands?

Using a really clever technique called DNA footprinting.

Okay, how does that work?

You take DNA that's bound to your protein and a sample of unbound DNA.

You treat both with an enzyme that randomly cuts DNA.

The unbound DNA gets chopped up everywhere, but the DNA covered by the protein is protected from being cut.

Ah, so it leaves a footprint.

It leaves a blank spot on the gel where the enzyme couldn't cut.

And that blank spot identifies the exact sequence where the protein was sitting.

It's how they confirmed sigma binds at the negative 10 and negative 35 boxes.

So once it's bound, the hollow enzyme moves into stage two, initiation.

Right.

It unwinds the DNA locally.

And unlike DNA replication, RNA polymerase does not need a primer to get started.

But initially, it often engages in this weird process called abortive synthesis.

Abortive synthesis, it sounds like wasted effort.

Why would it make and release little pieces of RNA over and over?

It seems to be a necessary step to build up enough mechanical force to escape the promoter.

During this phase, the polymerase is pulling downstream DNA into itself, creating a bulge in a process called scrunching.

It's building up tension so it can finally break free.

Exactly.

Once the RNA chain gets to be about 10 nucleotides long, the complex finally achieves promoter escape, releases the sigma factor, and begins to move along the DNA.

That's the start of stage three, elongation.

And the core polymerase just marches along, adding nucleotides.

It moves along, unwinding the helix ahead and rewinding it behind.

Inside the polymerase, a little transcription bubble of about 18 base pairs is kept open, and a short RNA -DNA hybrid of eight or nine pairs stabilizes the growing chain.

What about quality control?

You said it's less strict than DNA polymerase.

It is, because errors aren't inherited.

But it does have proofreading.

It can do a reverse reaction to remove a wrong nucleotide.

Or, if it stalls, it can engage in RNA backtracking.

Backtracking.

Yep.

It backs up a little bit, and that reverse movement actually helps remove the incorrect nucleotide.

Then it can resume moving forward.

And finally, stage four, termination.

How does it know when to stop?

It copies a specific termination signal.

In bacteria, there are two main types.

The first is row -independent termination.

The signal is built into the RNA itself.

It is.

The RNA contains a GC -rich sequence that folds into a tight, stable hairpin loop.

This hairpin acts like a physical break, pulling the RNA away from the DNA.

And right after that hairpin is a string of weak AU bonds, which easily break, releasing the transcript.

A stable loop followed by a weak anchor point.

Very clever.

And the second type needs help.

That is row -dependent termination.

These genes rely on the row -factor protein.

It's an ATP -dependent helicase, an unwinding enzyme.

It binds to the new RNA and zips along it toward the polymerase.

When it reaches the transcription bubble, it unwinds the RNA -DNA hybrid, and everything gets released.

So if the bacterial mechanism is this stripped -down racing machine, eukaryotic transcription is the highly specialized luxury yacht.

That's a perfect analogy.

The core difference is the need for separation and specialization.

Eukaryotes have different polymerizes for different types of RNA.

Promoters are way more complex.

And it requires a whole team of external proteins or transcription factors.

Let's go through those polymerizes.

There are three main ones.

Three main nuclear polymerizes, Pol -4, Pol -2, and Pol -3.

And they're often categorized by their sensitivity to the mushroom toxin alpha imanitin.

Let's start with RNA Pol -1.

RNA polymerase the first is in the nucleolus, the ribosome factory.

It just makes one thing, a big precursor for the major RNAs.

And it is completely resistant to that toxin.

And Pol -3.

RNA polymerase the third is in the nucleoplasm.

It makes smaller RNAs, mostly tRNAs, and the 5 -S rRNA.

It's only moderately sensitive to the toxin.

Which leaves the star of the show.

The one that makes all the protein coding messages.

RNA polymerase 2.

Pol -2 is also in the nucleoplasm.

It makes pre -mRNAs, SNRNAs, microRNAs.

And it is very sensitive to alpha imanitin.

If you want to shut down protein synthesis, Pol -2 is your target.

What makes Pol -2 so special, so highly regulated?

It's a unique part of its structure called the C -terminal domain, CTD.

It's a long unstructured tail.

And this tail is the cell's master switchboard.

It can be phosphorylated at different spots.

And that phosphorylation pattern acts like a code, dictating which other processing factors can bind to it.

So it's an organizational platform for all the complexity to come.

And speeding up complexity, the promoters that Pol -2 uses are way different from the bacterial ones.

Far more varied.

For Pol -2, the core promoters are modular.

They often have a short initiator and ask them for a sequence right at the start site.

And the famous TATA box, about 25 nucleotides upstream.

But those core elements only allow for a really low basal level of transcription, right?

Precisely.

To get high efficiency, Pol -2 needs help from other control elements, like proximal control elements that are nearby and distant enhancer elements.

Enhancers are amazing.

They can be thousands of base pairs away and still influence transcription.

How do they do that?

It's all about DNA structure.

Proteins called activators bind to the enhancer.

And the DNA in between loops out, bringing the enhancer bound proteins into direct physical contact with the machinery at the promoter.

So it physically bends the DNA to make that connection?

It does.

It helps recruit all the other necessary proteins.

Now what about Pol -3 promoters?

You said they're the weirdest.

They're unique because for tRNA and 5S RNA genes, the essential control sequences are located entirely downstream of the start site, within the part that actually gets transcribed.

Okay, let's get to the intricate process of initiation for Pol -2s.

This involves the general transcription factors, or GTFs.

Right, these are separate proteins that have to assemble sequentially to form the pre -initiation complex.

Pol -2 can't even find the promoter without them.

Which one starts the process?

TFIID is the first one to bind.

It has a crucial subunit called the TATA binding protein, TBP.

TBP recognizes the TATA box and binds to it, causing the severe, sharp, 80 -degree kink in the DNA.

And that kink is the signal?

It's the anchor point.

It recruits all the other GTFs to assemble.

And which factor acts as the ignition key, finally releasing the polymerase?

That would be TFIIH.

Once the whole complex is built, TFIIH comes in.

It has helicase activity to unwind the DNA, and it has protein kinase activity.

And that kinase is the trigger.

Absolutely.

TFIIH phosphorylates that long CTD tail of RNA Pol -2.

That phosphorylation changes its shape, allowing the polymerase to finally break free from the GTFs and start synthesis.

And termination for Pol -2 is also different.

It's more about a cleavage event.

That's right.

For Pol -2, the end of the mature RNA is determined by RNA cleavage.

A signal sequence, AAUAIA, gets copied into the RNA.

That recruits cleavage factors, which cut the transcript about 10 to 35 nucleotides downstream of that signal.

And the polymerase just keeps going.

It can, but that downstream piece of RNA gets rapidly degraded.

The important part is that the new $3 end created by the cleavage is immediately ready for the addition of the polyA tail.

So now we enter the editing room.

RNA processing.

We're turning that raw primary transcript into a functional, mature molecule.

And we should start with the most abundant RNAs, the RNAs, which brings us back to the nucleolus.

This is the cell's dedicated ribosome factory.

And you said you could actually see the production happening.

You can.

The nucleolus contains the Nucleolus Organizer Region, or NOR, which has hundreds of tandem copies of the pre -RNA gene.

Under an electron microscope, you can see these bottle brush structures, which are just hundreds of Pol -Ire enzymes all transcribing the same gene at once.

And Pol makes one single long 45S pre -RNA.

But that's way too big.

It is.

The 45S precursor contains the sequences for the 18S, 5 .8S, and 28S RNAs, all separated by long regions called transcribed spacers.

Almost half of that precursor is just spacer that has to be removed.

And how is that precise cutting guided?

It's guided by small RNA molecules called SNORNAs, small nucleolar RNAs.

They bind to the pre -RNA and direct both the cleavages and a lot of chemical modifications like methylation, which helps stabilize the functional parts.

OK, moving on to tRNA processing.

What do they go through?

Pre -tRNAs have to be processed to get their final functional L shape.

This involves a few steps.

Cleaning up the ends first.

Correct.

A liter sequence at the 5 and end is removed.

And at the 3 hour end, the terminal CCA trinucleotide, which is the attachment site for the amino acid, is often added by enzymes.

And then there are all those weird modified bases.

Extensive chemical modifications.

They're essential for the tRNA's function and stability.

And finally, some tRNA precursors actually have an RNA intron that has to be precisely removed.

Which brings us to the biggest processing event of all.

mRNA processing.

This is why the nucleus is so critical.

Eukaryotic pre -mRNA gets three major modifications.

First, the fiber end receives a $5 cap.

This is a methylated guanosine nucleotide that's added backward via a unique $5 right arrow, $5 linkage.

And what does this weird backward cap do?

Two critical things.

It provides stability, protecting the mRNA from being degraded.

And it's absolutely required for translation initiation.

It's the signal for the ribosome to attach.

Second, the $3 end gets the poly A tail.

This is a string of 50 to 250 adenine nucleotides added entirely post transcriptionally.

It also provides stability.

The longer the tail, the longer the mRNA lasts.

And it's needed for nuclear export and translation.

And the third and most revolutionary processing event is intron removal or splicing.

This was a shocking discovery in 1977.

Everyone assumed genes were continuous.

But experiments using R looping showed something totally unexpected.

So in R looping, you mix mature mRNA with its DNA gene.

If it were continuous, it would just match up perfectly.

But that's not what they saw.

Not at all.

They saw these distinct loops of single -stranded DNA extending out from the hybrid.

These loops were sequences in the gene that were not present in the final mRNA.

And those intervening sequences were named introns.

And the expressed sequences that get spliced together are exons.

And the scale is just staggering.

The human dystrophin gene is over 99 % intron.

The cell has to remove almost all of it with absolute nucleotide level precision.

How does the cell manage that precision surgery?

A single mistake would ruin the whole protein.

The precision is governed by short conserved sequences called spliceites.

There's the GU -AG rule.

The fiber end of an intron almost always starts with GU.

And the $3 end terminates with AG.

There's also a critical branch point adenine residue inside the intron.

And the machine that does the surgery is the spliceosome.

A massive dynamic molecular complex.

It's made of hundreds of proteins and five specialized RNAs known as SNRNPs.

And it's those SNRNAs that do the critical recognition and catalytic work.

Walk us through the Lariat mechanism.

It starts when the U1 SNRNP binds to the five -stranded splice site.

Then the U2 SNRNP binds to that branch point adenine.

The other SNRNPs join in, bringing the two ends of the intron together.

And that creates the signature loop shape.

Yes.

The spliceosome cuts at the five -shutter site.

And that cut end is linked to the branch point adenine, forming a loop structure called a Lariat.

Then the $3 splice site is cut, the two exons are joined together, and the Lariat -shaped intron is released and degraded.

And the little marker is left behind to show the job is done.

A critical quality control step.

The exon junction complex, or EJC, is deposited at the new exon -exon boundary.

This helps with exporting the mRNA and making sure it's stable.

Given how complex this is, the theory is that the spliceosome must have evolved from something simpler.

That's the consensus.

And it's supported by the existence of self -splicing RNA introns, or ribozymes.

Group II introns, in particular, remove themselves as Lariots, using a mechanism that's almost identical to the spliceosome.

It suggests the SNRNAs in the spliceosome just took over the catalytic role from these ancient self -splicing RNAs.

So why have introns at all?

What's the function if they're just thrown away?

Their presence allows for two huge evolutionary strategies that are critical for vertebrate complexity.

The most immediate one is alternative splicing.

The genius strategy that lets one gene make many different proteins.

Exactly.

By choosing to include or skip certain exons, the same pre -mRNA can be spliced in multiple ways to generate hundreds of different mature mRNAs, and thus different polypeptides, from a single gene.

It's a major reason for our biological complicity.

And the long -term benefit is exon shuffling.

Since introns are long, non -coding regions, genetic recombination can happen within them without messing up the exons.

This allows evolution to mix and match existing exons from different genes, creating new gene combinations, and accelerating the evolution of new proteins.

And as if that wasn't enough, there's a final layer of modification.

RNA editing.

Yes.

This involves changing individual nucleotides after the RNA has been transcribed.

It means the DNA sequence doesn't always perfectly predict the final protein sequence.

What are some examples of that?

In mammals, a great example is the CDU conversion in the mRNA for the epilipoprotein B gene.

In liver cells, you get the full -length protein.

But in intestinal cells, editing creates a premature stop codon, resulting in a much shorter, different protein.

So how is all this coordinated?

Capping, splicing, cleavage,

it must be impossibly complex.

This brings us right back to the C -terminal domain, CTD, of RNA polymerase II.

It's the master switchboard for co -transcriptional processing.

So its phosphorylation state acts like a ticket, determining which machinery can jump on board while the RNA is still being made.

Precisely.

Early in transcription, phosphorylation of one spot, serine 5, recruits the capping machinery.

Later on, the pattern shifts to favor serine 2, which recruits the splicing and cleavage factors.

It's a seamless coordination that ensures everything happens at the right time.

So to wrap up, we should touch on two key aspects of the mRNA life cycle that regulate protein production.

Its stability and the idea of amplification.

Stability first.

We know RNA and tRNA are really stable.

They last for a long time.

But mRNA is the complete opposite.

Messenger RNA has a very short half -life.

It's rapidly synthesized, translated, and then degraded.

Minutes in bacteria may be hours to a few days in eukaryotes.

This high turnover is a crucial regulatory mechanism.

By controlling how long the message sticks around, the cell controls how much protein is made.

Exactly.

If a cell needs to shut down production of a protein, it just rapidly degrades the mRNA.

And finally, the incredible power of amplification.

Because you can make multiple mRNA copies from one DNA gene, and then each of those mRNAs can be translated many times by polyribosomes, it acts as a massive amplification cascade.

Give us an example.

The silkworm fibroin gene.

There are only two copies in the cell, but those two copies can generate about 10 ,000 mRNA molecules.

And each of those is translated about 100 ,000 times.

The result is over a billion fibroin protein molecules.

An unbelievable cascade, and that's why protein -coding genes generally only need one or a few copies.

Whereas genes for rRNA or tRNA, whose final product is the RNA itself, have to exist in hundreds of copies because they don't have that translational amplification step.

So we started with the central dogma, navigated the elegant triplet code, wrestled with the complexity of bacterial and eukaryotic transcription, and then witnessed the radical post -transcriptional processing.

The cap, the tail, the sophisticated removal of introns.

And what really stands out is the sheer energy the cell invests in making sure that final messenger RNA is precisely correct and ready at exactly the right time.

The complexity of things like the CTD, coordinating all these steps.

It really reflects the enormous stakes involved in moving from a simple survival mechanism to the highly regulated gene expression needed for multicellular life.

So given that the core mechanisms of transcription are nearly universal, yet organisms like viruses and organelles like mitochondria can maintain their own slightly divergent codes, how significant are those minor differences for understanding the early divergent evolution of life on Earth?

Are they just evidence of ancient, isolated genetic drift?

Or were they an active modern adaptation?

Something to mull over until our next deep dive.

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

Chapter SummaryWhat this audio overview covers

Gene expression encompasses the cellular mechanisms that translate genetic information encoded in DNA into functional proteins and regulatory RNA molecules. The central dogma of molecular biology provides the foundational framework, establishing the unidirectional flow of information from DNA through RNA to protein, though notable exceptions exist in organisms employing reverse transcription, such as retroviruses and retrotransposons that synthesize DNA from RNA templates. Understanding how genes function as discrete units capable of producing multiple distinct polypeptides or diverse non-coding RNA molecules represents a modern refinement from earlier views that attributed a one-to-one relationship between genes and enzymes. The genetic code operates as a nonoverlapping triplet system in which consecutive three-nucleotide sequences, termed codons, specify individual amino acids among twenty possibilities, with the system exhibiting degeneracy allowing multiple codons to encode the same amino acid. Transcription, the process of synthesizing RNA from a DNA template, unfolds through four coordinated stages of binding, initiation, elongation, and termination, with fundamentally different organizational strategies between prokaryotic and eukaryotic cells. Bacterial transcription relies on a single RNA polymerase enzyme guided by sigma factors for promoter recognition and transcription termination. Eukaryotic cells deploy three specialized nuclear RNA polymerases differentiated by function and require numerous general transcription factors to navigate the complex promoter architecture exemplified by TATA box sequences. Post-transcriptional modifications distinguish eukaryotic gene expression, including the covalent attachment of protective 5-prime cap structures and 3-prime poly-A tail sequences to messenger RNA molecules. The spliceosome, a sophisticated ribonucleoprotein complex, catalyzes the removal of non-coding intron sequences and joins remaining exons through precise phosphodiester bond chemistry. Alternative splicing mechanisms substantially amplify proteomic diversity by enabling individual genes to generate numerous protein variants from a single DNA coding sequence. The regulation of mRNA stability and the capacity for multiple transcription events from a single gene enable cells to rapidly synthesize large quantities of specific proteins in response to physiological demands.

Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.

Support LML ♥

Chapter 18: Gene Expression I: Genetic Code & Transcription

Related Chapters