Chapter 13: The Genetic Code and Transcription
Welcome to Last Minute Lecture.
This free chapter overview is designed to help students review and understand key concepts.
These summaries supplement not replaced the original textbook and may not be redistributed or resold.
For complete coverage, always consult the official text.
Welcome to the Deep Dive.
Today we are cracking open the instruction manual of life itself.
Our subject is gene expression.
Specifically, how that blueprint stored deep in your DNA gets successfully converted into the actual working machinery of the cell.
This is really the foundational concept, often summarized as the central dogma.
DNA makes RNA and RNA makes protein.
And that's basically our mission for you today to sort of synthesize the rules and the mechanisms that govern those very first steps.
We need to understand the language, this elegant genetic code, and then track how it gets transferred from the DNA to its messenger RNA complement, that process, transcription.
And we'll look at how it happens in simpler systems like bacteria compared to the much more regulated layered approach in eukaryotes like us.
Exactly.
The code itself is almost universal, stored in DNA, and then transferred to RNA as a complement.
Let's start with those rules then, the genetic code.
When scientists first started digging into how this information was stored,
they found this system that's pretty much the same across almost all life.
Yeah, remarkably consistent.
It's written linearly using the RNA bases, and the big breakthrough was figuring out it uses a triplet code.
Three bases.
Every three bases form what we call a codon, and each codon specifies exactly one of the 20 amino acids used to build proteins.
And the real elegance, I think, is in its features.
We have 64 possible codons, right?
Four bases, groups of three, so four cubed.
The XC4 possibilities.
But we only need codes for 20 amino acids plus start and stop signals.
So this means the code is highly degenerate.
Most amino acids have more than one codon spelling them out.
That degeneracy sounds like a kind of built -in safety net, doesn't it?
In a way, yes.
Like if you get a small mutation, a single base change, maybe you still end up with the right amino acid.
Protects against errors.
Precisely.
And crucially, while it is degenerate, it's also unambiguous.
A specific codon will always code for the same amino acid.
It never means two different things.
Okay, unambiguous.
Got it.
Plus, the reading process is continuous.
It's camellus, no punctuation between codons, and non -overlapping.
Once translation starts, the machinery reads three bases, then the next three, then the next three in sequence.
Like reading words in a sentence without spaces, but you know each word is three letters long.
A good analogy.
And to read the whole message correctly, you need clear signals.
There's a start signal, almost always AUG, which also codes for methionine.
So thionine usually starts things off.
Right.
And then there are three specific stop signals.
UAG, UAA, and UGA.
These basically tell the ribosome, okay, protein chain finished, stop here.
But figuring out this triplet structure,
that wasn't easy, was it?
It took some serious and experimental work.
Oh, absolutely.
For years, the mechanism was a big question mark.
In the early 60s, you had Sydney Brenner making the theoretical case.
He argued, pretty convincingly, that a doublet code pairs of bases only gives you 16 combinations.
Not enough for 20 amino acids plus signals.
Exactly.
So mathematically, a triplet code giving 64 words was the minimum you'd need.
Makes sense.
But theory needs proof.
And the experimental proof, the real clincher for the reading in three's idea, came from Francis Crick, brilliant work with T4 bacteriophage.
This sounds like some serious code breaking.
What did he do?
He introduced what are called frame shift mutations.
He deliberately added or deleted nucleotides from the phages genes.
Okay.
Now if he added or deleted just one base or even two bases, the whole reading frame shifted downstream.
The resulting protein sequence was complete gibberish from that point on.
Right, because you're reading the wrong sets of three.
Exactly.
And this was the absolute kicker if he added or deleted three nucleotides.
Ah, the reading frame would be restored, right?
Yeah.
Maybe one or two amino acids are wrong at the spot of the change, but the rest of the message is back on track.
Precisely.
That was just irrefutable evidence.
The cell reads genetic information in non -overlapping units of three bases.
Wow.
Okay, so the triplet nature was but that still left the huge task.
Which specific triplet codes for which amino acid?
That massive project really took off with Marshall Nierenberg and J.
Heinrich Matthi.
They had two key tools.
First, a cell -free system basically.
They could get protein synthesis happening in a test tube outside of a living cell.
Okay, an in vitro system.
And second, they used an enzyme called polynucleotide phosphorylase to create synthetic mRNAs.
This enzyme was handy because it just strung bases together randomly.
It didn't need a DNA template.
Randomness turned out to be useful here.
Very useful.
They started simple with homopolymers, RNA made of just one repeating base, like poly -U, just U -U -U.
So they feed this poly -U into their cell -free system.
And see what polypeptide chain gets made.
And with poly -U, they got a chain made only of phenylenine.
Boom.
First assignment.
U -U codes for phenylenine.
Incredible.
So simple.
So powerful.
And they quickly found AAA for lysine, CCC for proline, using the same method.
Okay.
So homopolymers got them started.
What next?
Well, then they used mixed heteropolymers.
Say they'd make an RNA with a known ratio of two bases, like one part A to five parts C.
So you could predict the probability of getting different triplets, like CCC, CCA, say, CAC, ACC, and so on.
Exactly.
By analyzing the amino acids incorporated, they could figure out the base composition of the codons for those amino acids.
Like maybe histidine needed two Cs and one A, but not the exact sequence here.
Still a bit fuzzy.
How did they nail down the specific sequences?
That needed another clever technique.
The triplet binding assay, developed by Nirenberg again, this time with Philip Leder.
Okay.
How did that work?
It was quite ingenious.
They synthesized very short, specific RNA triplets, just three bases long, representing a single codon.
Like U -U -C, for example.
Right.
Then they mixed these tiny synthetic codons with ribosomes, and various tRNAs.
Now crucially, these tRNAs were charged with their specific amino acid, and one amino acid type would be radioactive.
So you could track it.
Yes.
The idea was, if the synthetic codon, like U -U -C, caused the ribosome to bind the specific tRNA that carried, say, radioactive phenylalanine, then you knew U -C coded for phenylalanine.
Ah.
It's like fishing.
The codon is the bait, the ribosome is the hook, and the radioactive tRNA is the fish you catch.
That's a great way to put it.
This assay, combined with work by Hargobin Kurana, using synthetic RNAs with known repeating sequences.
Like U -C -U -C -C.
Exactly.
Repeating DITRI and tetranucleotides.
Analyzing the polypeptides produced from these allowed them to logically deduce many more codon assignments.
Together, these methods cracked the entire 64 -word dictionary.
So analyzing that finished dictionary, what patterns emerged?
You mentioned
Yes.
And the pattern of degeneracy led Crick, in 1966, to propose the Wobble hypothesis.
Wobble.
What's that about?
He noticed that for many amino acids coded by multiple codons, the difference was often just in the third base of the triplet.
The first two bases seemed more critical for specifying the amino acids.
So the third position is less strict in its pairing with the tRNA.
Precisely.
The pairing between the codon on the mRNA and the anticodon on the tRNA follows normal base pairing rules for the first two positions.
But the third position can wobble.
It allows a single tRNA anticodon to recognize and bind to more than one codon.
Like one tRNA can handle multiple assignments, as long as the first two letters match.
Essentially, yes.
Often involving modified bases like inosine in the tRNA anticodon, which can pair with U, C, or A.
It's kind of biological economy, you see.
Reduces the number of different tRNAs the cell needs to make.
Clever.
An efficiency measure built into the system.
Definitely.
And overall, this code was confirmed to be remarkably universal, especially after sequencing the RNA genome of phage MS2.
But, like many rules in biology, there are minor exceptions.
Okay, where do we see differences?
Mostly in mitochondrial DNA.
For instance, in human mitochondria, UGA, which is normally a stop codon.
Yeah, one of the three stop signals.
It actually codes for the amino acid tryptophan, and AUA codes for methionine instead of isoleucine in mitochondria.
Small tweaks, but significant.
And you also mentioned viruses sometimes break the non -overlapping rule.
Right.
Some viruses, like FIX174, have overlapping genes.
To pack maximum information into their tiny genomes, they read the same stretch of DNA in two or even three different reading frames, producing completely different proteins from the same sequence.
Wow, talk about efficiency.
Squeezing every bit of meaning out.
It's a strategy born a necessity for them.
So, okay, that's the code itself, this amazing, nearly universal language.
Now, how does the cell actually read the DNA and make that RNA copy the transcription process?
Let's start with bacteria again, the simpler model.
Right.
In bacteria, the main enzyme doing the work is RNA polymerase, specifically the hollow enzyme, which has several subunits.
It needs the building blocks, nucleoside triphosphates, ATP, GTP, CTP, UTP, but unlike DNA polymerase, it crucially doesn't need a primer to get started.
You can just jump on the DNA and go?
Well, it needs to know where to jump on, but yes, no primer needed for initiation.
And when we talk about the DNA, there are two strands.
How does the polymerase use them?
Good point.
The polymerase reads only one of the DNA strands, that's the template strand.
It moves along this template and synthesizes an RNA molecule that's complementary to it.
Complementary.
So if the DNA template has an A, the RNA gets a U.
Exactly.
A pairs with U, T pairs with A, G with C, C with G.
Now, the other DNA strand, the one not being read, is called the coding strand.
Why coding?
Because its sequence actually matches the sequence of the RNA transcript being made.
Well, almost matches.
You just swap the T's in the DNA coding strand for U's in the RNA.
Okay.
Template strand is read, coding strand matches the RNA product with U for T.
Got it.
So how does initiation work?
How does RNA polymerase find the right starting spot on that long DNA molecule?
That's where a special subunit of the Holley enzyme comes in.
The sigma factor, sigma's job is recognition.
What does it recognize?
Specific DNA sequences located just upstream that's in the five foot direction from the actual start site of the gene.
This whole region is called the promoter.
The promoter region, and sigma finds it.
Yes.
And this brings up an important concept.
The promoter DNA sequence itself is a cis acting element.
It's on the same DNA molecule it's regulating.
Cis, meaning on the same side or nearby.
Right.
The sigma factor, however, is a protein that binds to the DNA.
It's a transacting factor.
It comes from elsewhere to act on the DNA sequence.
Okay.
Cis elements or sequences, trans factors are usually proteins that bind them.
Exactly.
Sigma recognizes key consensus sequences within the promoter famous ones are the PribNow box around the negative 10 position, tata, and another sequence around negative 35, TTGACA.
Binding here positions the RNA polymerase correctly to start transcription right at the plus one site.
So sigma acts like a guide, getting the polymerase lined up.
What happens then?
Once the polymerase successfully synthesizes the first few RNA nucleotides, maybe eight or nine, the sigma factor usually just dissociates.
It's done its job.
Its job is Right.
Now the core enzyme, the polymerase without sigma, takes over and enters the elongation phase.
It moves along the DNA template, unwinding the helix ahead of it, synthesizing the RNA chain in the five to three foot direction and rewinding the DNA behind it.
It even does a bit of proofreading as it goes.
Chugging along, making the RNA copy.
How does it know when to stop?
Termination.
Good question.
Bacteria use two main termination strategies.
About 80 % of the time it's through something called intrinsic termination.
Intrinsic, meaning it doesn't need extra helper proteins.
Exactly.
It relies purely on sequences transcribed into the RNA itself.
Near the end of the gene, there's a sequence that includes an inverted repeat, rich in Gs and Cs.
An inverted repeat.
So it can fold back on itself?
Just precisely.
As soon as the sequence is transcribed into RNA, it snaps into a stable half pin structure.
Right after this hairpin forming sequence in the RNA, there's typically a stretch of uracil -U residues.
Okay, hairpin followed by us.
The polymerase tends to pause or stall when it hits that hairpin structure.
And the stretch of U's in the RNA are paired with A's in the DNA template strand.
AU base pairs are the weakest one.
Only two hydrogen bonds compared to three for GC.
Right.
So the combination of the polymerase stalling at the hairpin and the weak AU bonds holding the very end of the transcript to the template, it causes the whole complex to destabilize and fall apart.
The RNA transcript is released.
Wow, elegant.
The sequence itself triggers the stop.
What about the other 20 %?
That requires a helper protein called the Rho factor.
This is Rho -dependent termination.
Okay, so this one needs a transacting factor.
Yes.
Rho is a protein with RNA helicase activity.
It can unwind RNA -DNA hybrids.
It recognizes and binds to a specific sequence on the growing RNA transcript called the RUT site.
Rho utilization site.
Binds to the RNA.
Then what?
Then it basically chases after the RNA polymerase.
Moving along the transcript.
When the polymerase encounters a termination sequence,
often another hairpin that causes it to pause,
Rho catches up.
And Rho uses its helicase activity to actively break the hydrogen bonds between the RNA transcript and the DNA template.
It essentially pulls the transcript away from the polymerase and the DNA, terminating transcription.
So two ways to stop.
One built into the sequence, one needing Rho protein to chase and unwind.
That's the bacterial picture.
Relatively straightforward.
Okay, now let's make the leap.
Eukaryotes.
Things get more complicated.
Oh boy, do they ever.
Several key differences.
First, transcription happens inside the nucleus, physically separated from the ribosomes in the cytoplasm where translation occurs.
So transcription and translation can happen simultaneously like they can in bacteria.
Correct.
Second, eukaryotes don't have just one RNA polymerase.
They have three main types, RNA P1, two, and three, each specializing in different types of RNA.
And we're mostly interested in the one that makes messenger RNA.
Yes, that's RNA polymerase two, RNA P2.
It transcribes all the protein -toting genes.
Third, eukaryotic DNA isn't naked.
It's tightly packaged with proteins into chromatin.
Right.
Wound around histones.
Which means the cell needs mechanisms for chromatin remodeling just to make the promoter regions accessible to the polymerase and its helper factors.
It adds a whole layer of regulation.
And speaking of helper factors, RNA P2 initiation is way more complex than the bacterial sigma
Absolutely.
RNA P2 cannot recognize and bind to promoter sequences on its own at all.
Can't do it solo.
No.
It relies heavily on a suite of cis -acting regulatory elements.
There's the core promoter, which might include the famous TATA box around position NX30.
TATA A or something similar.
Right.
But also other elements.
And then there are proximal promoter elements further upstream, and even very distant elements called enhancers and silencers that can be thousands of base pairs away, but loop around to influence transcription.
Wow.
Regulation from afar.
But what about the proteins that bind these elements?
That's where the transacting factors come in.
For RNA P2, you need a whole set of general transcription factors or GTFs.
Proteins like TFIE, which contains the TATA binding protein, TBP, TFIB, TFI and others.
An entire assembly crew.
Pretty much.
These GTFs have to bind to the core promoter in a specific order, building up a platform called the pre -initiation complex, PIC.
Only then can RNA2 be recruited and correctly positioned to start transcription.
It's a much more elaborate setup than the bacterial sigma factor.
Makes sense for more complex organisms needing finer control.
So RNA P makes the transcript called pre -mRNA initially,
but the journey is not over yet, is it?
Not by a long shot.
In eukaryotes, that initial pre -mRNA transcript undergoes significant processing before it's considered mature mRNA ready for export to the cytoclasm.
Okay, what kind of processing are we talking about?
Three major things happen, often while transcription is still ongoing.
First, addition of a 5 -foot cap.
A cap on the front end?
Yes.
A modified guanosine nucleotide, 7 -methylguanosine, or M7G, is added to the very first nucleotide of the transcript, but it's added kind of backwards, through an unusual 5 -to -5 -foot triphosphate linkage.
Weird linkage.
Why do that?
This cap is absolutely crucial.
It protects the transcript from degradation by exonucleases, it's essential for efficient export out of the nucleus, and later it's recognized by the ribosome to initiate translation.
So protection, export, and translation signal.
Very important cap.
What's the second?
Second is the addition of a 3 -foot polya tail.
A tail at the other end.
Lots of A's.
Lots of A's.
After the polymerase transcribes past a specific signal sequence, often AAUAAA, the transcript is cleaved downstream of that signal by an enzyme complex.
Then another enzyme, polya polymerase, comes in and adds a long string, maybe 50 to 250 of adenine residues to that new 3 -foot end.
And what's the point of the tail?
Similar functions to the cap, really.
It aids in stability, protecting the 3 -foot end from degradation, helps with nuclear export, and also plays a role in efficient translation initiation.
Cap and tail -like protective gear for the message.
What's the third major processing event?
This is the big one, right?
This is the really mind -blowing one.
Splicing.
Discovered in 1977 by Philip Sharp and Richard Roberts, completely changing our view of what a gene looks like.
What did they find?
They found that eukaryotic genes are often interrupted.
The coding sequences, called exons, expressed sequences, are separated by non -coding intervening sequences called introns.
So the initial pre -mRNA contains both exons and introns all strung together.
Exactly.
And the introns have to be precisely removed, and the exons perfectly stitched back together to create the mature mRNA that actually codes for the protein.
That sounds incredibly complex and prone to error.
Why even have introns?
It's a huge question, and the scale is surprising.
In humans, something like 94 % of our genes contain introns, and the introns are often much, much longer than the exons they separate.
94%.
So most of our genes are broken up like this.
Yep.
As for why?
Well, introns aren't just junk.
One major benefit is alternative splicing.
Alternative splicing.
Yeah.
The splicing machinery can sometimes choose to skip certain exons, or include introns that are usually removed, or use different splice sites.
This means that a single gene from one stretch of DNA can actually produce multiple different versions of the mature mRNA.
And therefore, multiple different protein isoforms.
Exactly.
It vastly increases the coding potential of the genome.
One gene can lead to a whole family of related but functionally distinct proteins.
Introns also facilitate something called exon shuffling over evolutionary time, allowing functional domains coded by exons to be mixed and matched to create new proteins.
Okay.
So introns provide flexibility and evolutionary potential.
But how does the cell actually do the splicing?
Cut out the introns precisely.
There are a couple of mechanisms.
Some introns, particularly certain ones found in ribosomal RNA or in mitochondria and chloroplasts, group 1 and group 2 introns, are actually self -splicing.
They splice themselves out.
The RNA does the cutting.
Yes.
These RNAs act as catalytic molecules.
We call them ribozymes.
They can orchestrate their own excision through a series of chemical reactions, often using a free guanosine nucleotide as a cofactor to initiate the first cut.
No proteins needed.
RNA acting like an enzyme.
Fascinating.
But that's not how most introns in our nuclear genes are removed, is it?
No.
The vast majority of introns in nuclear pre -mRNAs require a large, complex molecular machine called the splacism.
The splacism.
Sounds important.
It is.
It's huge, composed of several protein RNA complexes called SNRNPs, small nuclear ribonuclear proteins, pronounced SNRPs.
SNRPs.
Okay.
What do they do?
Each SNRNP, like U1, U2, U4, U5, U6, contains a specific small nuclear RNA molecule and associated proteins.
These SNRNPs recognize specific short sequences at the boundaries of the intron, typically a GU sequence at the five -foot splice site and an AG sequence at the three -foot splice site, plus a branch point sequence within the intron.
So they find the beginning and end markers of the intron.
Right.
They assemble onto the pre -mRNA, looping the intron out.
The splicism then catalyzes two sequential cutting and joining reactions,
transesterifications.
The intron is excised in a characteristic looped structure called a lariat.
A lariat, like a cowboy's rope.
Exactly like that shape.
And the two flanking exons are perfectly ligated or stitched together.
It's an incredibly precise molecular surgery.
Wow.
Cap, tail, and splicing.
That's a lot of processing.
Is that everything?
Almost.
There's one more layer of post -transcriptional modification we should mention.
RNA editing.
Editing the RNA after it's been transcribed and possibly even spliced.
Yes.
It's like going back and changing the letters in the message after it's already been written.
It's less common than splicing, but critically important in some cases.
How does that work?
What kind of edits happen?
Two main types.
First, substitution editing, where one base is chemically converted into another.
A classic example is the mRNA for apolipoprotein B.
In the liver, the C is left a C, making the full -length protein.
But in the intestine, an enzyme, apobec1, changes a specific C to a U.
C to U.
What does that do?
That C to U change creates a premature snop codon, UAA.
So the intestine makes a much shorter version of the apob protein, which has a different function in lipid transport.
Another example is A to I editing, adenosine to inosine, by ADER enzymes, which is widespread in the nervous system and affects neurotransmitter receptors.
Inosine is read by the ribosome as guanine G.
So changing a single letter can completely change the protein product or its function.
What's the other type of editing?
Insertion -deletion editing, this is more dramatic,
where nucleotides, usually uricils, use, are usually added into or removed from the mRNA sequence.
This is particularly common in the mitochondria of single -celled organisms like trypanosomes.
Adding or removing bases, how does it know where?
It often uses small guide RNAs, GRNAs, as templates to direct the insertion or deletion of use at specific locations.
It can massively alter the final protein sequence compared to what the DNA originally encoded.
So editing provides yet another way to generate protein diversity from a limited number of genes or to fine -tune function, maybe in response to the environment.
Exactly, it adds a layer of dynamic regulation after the gene has been transcribed.
We see it used, for example, in squids and octopuses to modify ion channel proteins in their nervous system, potentially helping them adapt to different water temperatures.
Amazing flexibility, it really paints a picture of layers upon layers of information handling.
It really does.
We've gone from this beautifully simple, almost mathematical triplet code that Nirenberg and others deciphered in test tubes.
Yeah, the foundational rules.
All the way to the incredibly complex, highly regulated reality of transcription in eukaryotes, with promoters, enhancers, transcription factors, chromin and remodeling, capping, tailing, splicing by spliceosomes, and even RNA editing.
It's quite a journey for that genetic message.
It absolutely is, and it brings us back to that really interesting biological question that sort of underpins this whole chapter, doesn't it?
Which is?
Why?
Why did complex organisms evolve this system with genes of will -it -introns that need this incredibly elaborate and energetically costly splicing machinery to remove them?
Right, especially when you compare it to those viruses you mentioned earlier, like FX174, that prioritize hyper efficiency by overlapping their genes.
It seems like two completely opposite strategies, maximum economy versus maximum complexity.
It really highlights a fundamental trade -off in evolution, I think.
Viruses often operate under extreme pressure to keep their genome small and replicate fast, so genetic economy is paramount.
Overlapping genes make sense for them.
But for complex multicellular organisms, the ability to generate diversity and regulate gene expression in sophisticated ways seems to have been more valuable.
The cost of maintaining introns and the splicing machinery is apparently outweighed by the huge benefit of alternative splicing.
That ability to create multiple protein isoforms from a single gene.
Exactly.
It allows for the incredible functional complexity needed to build and operate a complex organism like us.
It's the difference between a stripped -down minimalist blueprint and one that has built -in options for creating many different variations on a theme.
A fascinating perspective.
Economy versus complexity and flexibility.
Well, this has been an incredible journey through the genetic code and the intricate process of transcription.
Thank you for joining us on this deep dive.
We really hope this detailed look helps you grasp how the fundamental instructions of life are read, transferred, and refined.
ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.
Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.
Support LML ♥Related Chapters
- The Genetic Code and TranscriptionEssentials of Genetics
- Gene Expression I: Genetic Code & TranscriptionBecker's World of the Cell
- Gene Expression: From Gene to ProteinCampbell Biology
- Gene Expression: TranscriptioniGenetics: A Molecular Approach
- Gene Therapy & PharmacogenomicsLilley's Pharmacology for Canadian Health Care Practice
- Genetic Control of Protein Synthesis, Cell Function, and Cell ReproductionGuyton and Hall Textbook of Medical Physiology