Chapter 3: Protein Production in Bacteria & Yeast

Search this chapter

Audio Overview

0:00 / 0:00

Autoplay next chapter

Welcome to Last Minute Lecture.

This free chapter overview is designed to help students review and understand key concepts.

These summaries supplement, not replace the original textbook and may not be redistributed or resold.

For complete coverage, always consult the official text.

Welcome back to the Deep Dive, the place where we take complex science and break it down until you're the most well -informed person in the room.

Today, we're embarking on a mission that is really critical to modern medicine.

We're gonna understand how we fundamentally reprogram life.

We're talking about the production of therapeutic and industrial proteins, things like hormones, enzymes, limper kinds, by essentially turning simple, fast -growing microbes into these high -tech living factories.

And when you stop and think about it, the scale is just staggering.

The human body is this incredible symphony orchestrated by thousands of these proteins.

And if any single one of them is in short supply, or if you're making a faulty version, you're looking at chronic disease, sometimes life -threatening conditions.

Historically, and we're talking before the recombinant revolution in the early 80s, getting your hands on pharmaceutical -grade versions of these was, well, it was a logistical and financial nightmare.

Right, you were entirely dependent on sourcing them directly from animal or even human tissues.

A process that was both incredibly inefficient and, frankly, hideously expensive.

Because those target proteins, they exist at such low trace concentrations.

Exactly, and it wasn't just expensive, it was incredibly risky.

Isolating these macromolecules from animal tissues meant you were constantly running the risk of viral or prion contamination.

We saw real tragedies play out with that, for instance, with pituitary growth hormone that was sourced from human cadavers.

And even when you managed to find a similar protein in an animal, a lot of the time, it just wasn't structurally correct or functionally effective in humans.

The animal version of pituitary growth hormone is a perfect example.

It had just enough structural differences that it was biologically useless for people.

So the moment that changed everything was the introduction of recombinant DNA techniques.

A total game changer.

Suddenly, scientists realized they could clone the specific DNA segment for a critical protein, let's say human insulin, stick it into a common microorganism like E.

coli or baker's yeast, and just like that, turn that microbe into a sustainable, scalable factory.

A factory that needs only cheap culture ingredients can be scaled almost infinitely in a bioreactor, and it completely eliminates that risk of human or animal viral contamination.

Because the whole process is contained within the microbial world.

So our mission today is to go deep into the fundamentals of applied microbiology to understand exactly how this magic happens.

We're going to walk through the complete engineering workflow step by step.

Right.

How do scientists get that foreign gene into a microbe, make sure it replicates reliably, and then crank up its expression for high yield?

And finally, how do you efficiently recover and purify that final protein product?

It's the whole pipeline.

Let's start with the workhorse.

Bacteria, specifically Escherichia coli.

So why did this unassuming gut bacterium become the absolute global standard for genetic engineering almost overnight?

Well, it really comes down to three things.

History,

efficiency, and scale.

First, knowledge.

After Homo sapiens, E.

coli is the most thoroughly studied and best understood organism in the entire living world.

We know its entire genetic map, its physiology, its biochemistry.

Exactly.

That deep understanding means you get fewer surprises when you start messing with its internal processes.

And second, efficiency and speed are just unparalleled.

You can culture them so fast in cheap,

minimal media.

They double their mass roughly every 20 minutes in a rich culture.

That's just staggering scalability when you're talking about commercial production.

And the third reason,

selection power.

Right.

Because they are microscopic, you can plate up to a billion individual cells on a small 10 centimeter Petri dish.

And that ability to test enormous populations lets biotechnologists find that one rare, successful recombinant, the one cell in a million that actually took up and expressed your carefully engineered DNA.

Okay, so let's unpack that core challenge.

Getting the DNA into the bacterial cell in the first place.

Geneticists have found three natural ways that bacteria exchange genes and biotechnologists have figured out how to capitalize on all three.

The first, and you could argue the most historically important method is transformation.

Just the direct introduction of naked DNA.

This was first seen by Griffith way back in 1928, right?

When he showed that harmless pneumococcus could pick up the virulent trait from heat killed cells.

A stunning observation that ultimately led Avery Macleod and McCarty in 1944 to prove that the transforming substance was DNA itself.

That discovery really launched the whole field of molecular biology.

So some species are just naturally good at this.

Yes, some like Bacillus subtilis are naturally competent.

They have this complex active machinery built in to just suck up DNA from the environment.

But for E.

coli, which isn't naturally competent, we have to invent an artificial transformation process.

So we have to chemically force the DNA in, which sounds pretty aggressive.

How do you make the E.

coli membrane permeable enough to accept a big, highly charged molecule like DNA?

We shock them with chemistry and temperature.

The cells are made competent by resuspending them in a very cold buffer that has a high concentration of calcium chloride, tex -kale -2, typically held right at zero degrees Celsius.

What's the calcium doing at a molecular level?

The tex -k2 plus dry dollars, being positively charged, they bind tightly to the negatively charged parts of the lipids in the membrane,

especially the lipopolysaccharide, or LPS,

on the outside of gram -negative E.

coli.

So it neutralizes the charge and sort of makes the surface sticky for the DNA.

Right, while also chilling and freezing the membrane's interior.

And that freezing, it essentially creates these tiny brittle cracks for macromolecules like DNA to pass through.

Then you add the DNA, you give it that brief, high temperature heat shock at 42 degrees Celsius, and then chill it again.

And that thermal shift somehow promotes the final uptake.

Though, you know, it's surprising, the precise molecular mechanism of how the DNA actually crosses the membrane during that heat pulse is still a bit obscure even today.

That's a great example of applied science just racing ahead of our fundamental understanding.

Now, the tex -kale -2 method is a classic, but there's a better physical method for this, isn't there?

Oh yes, electro -operation.

This method applies short, high -voltage electrical pulses to the cells.

Zapping them.

Basically.

It's believed this reorients charged components in the cell membrane, creating these temporary transient holes.

The DNA then enters through those holes, often driven by the electrical charge itself.

And it's just more efficient.

More efficient, and it works reliably across a wider range of bacterial species than the chemical method.

Okay, so that's method one.

Let's move to the second mode of gene transfer.

Conjugation.

This is critical when you wanna get DNA into a species that's really difficult to transform.

Conjugation is the unidirectional transfer of DNA discovered by Litterberg and Tatum in 1946.

It requires direct cell -to -cell contact, which is often mediated by a structure called the sexpelis.

A donor cell with the F -plasmid, the fertility factor, transfers a copy to a recipient cell that doesn't have it.

Precisely, and the mechanism itself is remarkable.

It involves something called rolling -circle replication.

So how does that work?

One strand of the F -plasmid is cut at a spot called the origin of transfer, or AT.

Rolling -circle replication then synthesizes a new strand, and as it does, it displaces the old one, which is simultaneously fed, five prime and first, into the recipient cell.

And the recipient then synthesizes a complementary strand on the incoming template, which turns it into a donor cell itself.

Yep, so in biotech, if our target production species is hard to work with, we can clone our DNA into a shuttle plasmid inside an easy host like E.

coli, and then use conjugation to shuttle that plasmid over.

It's a process called plasmid mobilization.

That is an elegant workaround, but it raises a massive safety constraint.

If that F -plasmid has all the information for self -transfer, couldn't our engineered DNA just spread uncontrollably into natural microbial populations if the strain escaped the lab?

That's exactly right.

And the rule of environmental containment is incredibly strict.

Unmodified sex plasmids are never used as vectors.

You have to use non -conjugative plasmids.

Yes, non -self -transferring plasmids that have had the transfer genes removed.

We then supply the missing information externally in the lab to mobilize the plasmid only when we need to, which guarantees containment.

Okay, that brings us to the third mode of transfer, transduction, which is basically injection by a bacteriophage.

And this is all about efficiency.

Transformation might only be successful in one, in hundreds of thousands of cells.

Phage infection, on the other hand, approaches 100 % efficiency.

It's the highest efficiency delivery system that nature provides.

So how does the phage replication cycle make this possible?

Well, in the lattice cycle, the phage attaches to the cell, injects its DNA, synthesizes more copies of its DNA and its capsid proteins, packages that new DNA into new capsid.

And then the cell lyses, it bursts, releasing all the new phages.

Right, and transduction, which is where we get our hands on this delivery system, happens when the phage accidentally packages a piece of bacterial DNA instead of its own.

So in what's called generalized transduction, the phage head gets filled with host chromosomal DNA and that whole unit is then injected into a new cell.

And biotechnologists leverage this hyper -efficiency by taking our recombinant DNA, our gene of interest, in a special lambda phage vector and mixing it with purified phage capsid proteins in vitro.

And this causes spontaneous packaging of our recombinant DNA into functional phage particles.

Exactly, these highly efficient particles then inject the DNA into host bacteria, bypassing all those tricky membrane permeability issues that plague transformation and allowing us to reliably deliver huge quantities of DNA.

So we successfully engineered a way to get the foreign DNA into the cell, but now we hit a fundamental biological roadblock.

If we just injected a random fragment of DNA into the cytoplasm, why won't it replicate on its own?

It's missing the necessary blueprint components.

A random piece of foreign DNA won't be replicated unless it contains a specific origin of replication sequence or ORI that the host E.

coli machinery recognizes.

And even if it did somehow manage to integrate into the host's large chromosome, which is a very rare event, it would only exist as a single copy.

And a single copy means low expression, impossible scalability.

And it's really difficult to isolate the DNA again.

So we absolutely must use a vector.

A vector, which is usually a small circular piece of DNA derived from a plasmid or a phage, is essentially the autonomously replicating carrier that we need to hold and protect our inserted foreign DNA.

Let's visualize the initial strategy here, shotgun cloning.

If we wanna find a tiny gene, say a one or two kilobase gene from a 4 ,000 kilobase bacterial genome,

how do we even begin?

We start by fragmentation.

We use a restriction endonucleus like E.

coli to cut the vector DNA open and to chop the massive genomic DNA into thousands of smaller fragments.

The real power of these enzymes is that they create complimentary sticky ends.

These sticky ends are crucial for the second step, which is ligation.

Right, we mix the fragments and the open vector.

The sticky ends kneel perfectly and DNA ligus acts as the glue, covalently connecting the strands and creating a vast library of recombinant DNA.

Then we transform this entire library into E.

coli and plate it on selective media.

Every surviving bacterium forms a colony and that colony is a pure clone containing one specific plasmid from that initial library.

And the lesson here is pretty profound.

A good vector isn't just a carrier, it's a tool that radically reduces the time you spend screening.

Right, if we're searching for a gene in a 4 ,000 kilobase genome and we only clone small 4 kilobase fragments, we might have to test 4 ,500 clones to have a 99 % chance of finding our gene.

Exactly, but if the vector can handle a massive 40 kilobase fragment, the number of clones you need to check drops almost tenfold down to about 465.

That shifts the engineering burden from the lab bench to the design of the vector itself.

Which is why primary cloning uses vectors designed for large fragments.

Once you find the gene, you can take just that small relevant portion and move it into specialized vectors.

That's called subcloning.

Now, let's look at the special challenge with eukaryotic genes, especially from higher animals.

These genes have introns, these non -coding intervening sequences that have to be removed from the RNA transcript via splicing.

And the problem is bacteria cannot perform splicing.

If you stick a eukaryotic gene with introns into an E.

coli factory,

the resulting messenger RNA will be garbage and the protein will be nonsensical or truncated.

So how do we get the correct mature genetic instructions into the bacterial factory?

We bypass the entire problem.

We start with the mature messenger RNA, the mRNA, which has already had its introns removed in the eukaryotic cell.

And then you use an enzyme called reverse transcriptase to convert that mature mRNA template into a double -stranded intron -free complementary DNA or cDNA.

That makes the process so much cleaner from the start.

Since each eukaryotic mRNA codes for just one protein, the cDNA also codes for just one protein, which makes it much easier to insert directly into an expression vector.

Sometimes even skipping the whole shotgun cloning stuff entirely.

Yeah, though we should probably note that this complex process of cutting and pasting is increasingly being replaced by PCR -based amplification now that we have these massive genomic databases.

Let's move into the vector toolbox itself, starting with the classic standard, the plasmid.

Using the historical vector PBR322 as our model, what are the three non -negotiable features of any general purpose cloning plasmid?

First, the origin of replication, the ORI.

That's the sequence recognized by the host machinery that allows it to replicate on its own.

Second, antibiotic resistance genes.

Plasmids like PBR322 carry genes for resistance to, say, ampicillin and tetracycline.

Since transformation efficiency is so low, these markers are absolutely vital for selection.

Only the rare cell that actually received a plasmid will survive the antibiotic plating.

And third, those same antibiotic genes give us the mechanism for screening, which is called insertional inactivation.

For example, PBR322 has a single restriction site, BAMASH, located right inside the tetracycline resistance gene.

So if we insert a piece of foreign DNA at that BAMASH site, we physically disrupt the tetracycline resistance gene.

And the cell remains ampicillin resistant because it has the plasmid, but it becomes tetracycline susceptible because that gene is now broken.

We select the cells that got the plasmid on ampicillin and then we screen those survivors for the loss of tetracycline resistance.

That combination of selection and screening identifies our successful recombinants.

And it's often done quickly using a technique called replica plating.

I find it fascinating that a useful vector has to have only a single restriction site for the enzymes you plan to use.

If it had two, cutting it would create two fragments.

And when you add the DNA ligase, those two vector fragments could just relegate in some undesired complex combination and it would just destroy your yield.

That control is essential.

And to minimize the chance of the open vector just closing back up on itself without taking in any foreign DNA, we often treat the vector with phosphatase.

Right, phosphatase removes the five prime phosphate groups which prevents the vector from self -educating.

The foreign DNA though, it still has its phosphate groups intact so it can bridge and covalently connect to the vector ends.

And we also use specialized host strains of E.

coli for this work.

Yeah, they're engineered with two critical defects.

A defective restriction system so the host doesn't destroy the foreign DNA.

It would think it's under viral attack.

Right, and a defective homologous recombination system which prevents the host from altering our recombinant plasmid structure.

But plasmids aren't perfect, especially when you're dealing with massive chunks of DNA, say inserts over 20 kilobases.

If your insert is just too big, you need a completely different delivery vehicle.

And that brings us to the fascinating world of phage lambda vectors.

Phages are superior because they use their own highly efficient packaging and injection machinery.

The lambda genome is about 50 kilobases.

So we delete non -essential regions, about 15 kilobases total.

And we can replace that space with up to 20 kilobases of foreign DNA.

The genius here is that the foreign DNA is inserted in vitro and then the resulting recombinant DNA is mixed with purified phage capsid proteins.

They just spontaneously package the DNA into functional lambda particles.

And this is the high efficiency difference.

These packaged phages then infect the host bacteria, injecting the DNA with nearly 100 % efficiency.

That high efficiency delivery is necessary for working with large fragments that are extremely difficult to get into a cell using simple transformation.

And because the phageolytic cycle kills the host cell so quickly, lambda vectors are also ideal for cloning genes whose products might be highly toxic to the bacterium.

The host just doesn't survive long enough for the toxicity to become a serious problem.

Next up we have cosmids, which are hybrid vectors designed to carry even larger inserts up to 40 kilobases.

They're sort of the best of both worlds.

How do they blend the two?

Cosmids contain the lambda cosites.

Those are the cohesive sites required for packaging into the phage head along with a plasmid origin of replication and an antibiotic marker.

So they get packaged into phage heads in vitro, just like lambda DNA, ensuring efficient high yield delivery into the host cell.

But once inside the host cell, they circulate and replicate as normal plasmids.

So the delivery is phage based, but the maintenance is plasmid based.

Correct, which also means that like regular plasmids, they're difficult to use for genes that code for highly toxic proteins as they have to propagate for many generations.

For genome sequencing projects that need truly massive segments, we're talking hundreds of kilobases, we need to move up to bacterial artificial chromosomes or BACs.

BACs are specialized plasmid vectors based on the F factor origin of replication.

They are the go -to choice for massive cloning because they are low copy number, just one or two per cell and contain partitioning genes, which ensures their highly stable maintenance over generations.

They're also engineered to be non -conugative.

Precisely, the transfer genes are removed and crucially, BACs are easier to isolate than other systems and they rarely incorporate chimeric DNA, which is a common and destructive problem when you're cloning huge fragments.

Finally, we'll talk about vectors derived from single -stranded DNA phages, like the filamentous phage M13.

M13 phages only infect E.

coli cells that carry the F sex factor.

Their unique biological quirk is that they continuously extrude their progeny without causing lysis or death of the host cell.

They just elongate the filamentous particle when foreign DNA is inserted.

Yep, these were historically foundational for methods like DNA sequencing and site -directed mutagenesis, but their current highest impact application.

It has to be phage display.

Absolutely, you insert the foreign gene into the gene for protein three, which is located at the very tip of the phage.

This physically links the gene sequence, which is inside the phage particle, with the expressed potentially mutated protein on the surface.

Why is that physical linkage so powerful?

It allows for high -throughput selection.

You can subject the gene to random mutagenesis and then use the expressed protein on the surface to select for affinity to a target, say, a specific cell receptor.

So you select the whole phage based on that successful interaction.

You're effectively evolving high -affinity antibodies or binding partners right there in the lab.

The M13 system also pioneered two features that are now ubiquitous in nearly all modern vectors, the first being that elegant blue -white screening method, alpha complementation.

Right, the vector has a non -functional fragment of the lacZ gene.

If you insert foreign DNA, you interrupt the gene and the colony stay white.

But if the vector just closes up on itself without an insert, the lacZ fragment is functional and the colonies turn blue in the presence of the substrate X -gal.

Which allows researchers to instantly distinguish between empty vectors and successful recombinants.

And the second feature is the polylinker, or multiple cloning site.

The polylinker is a short sequence near the lac promoter that contains single cleavage sites for many different restriction enzymes, which dramatically simplifies the process of gene insertion by offering maximum flexibility.

And these features are often combined into phagemids, these chimeric vectors, with both a plasmid origin and an M13 origin.

They multiply stably as plasmids but can be packaged into phage -like particles when the host is super infected with helper phages.

We've established the tools for cutting, pasting, and transporting DNA.

But now the real detective work begins.

We have successfully forced billions of random plasmids into billions of cells.

How do we find the one colony out of a million that has the specific gene we're looking for?

Even with large vectors, trying to find one specific gene from a big eukaryotic genome is just daunting.

The strategy has to start by trying to stack the odds in our favor before the search even begins.

Okay, strategy one, use a better template.

We talked about cDNA.

If we use mRNA harvested from cells that naturally express our target gene very strongly, we are starting with a library that is already highly enriched.

And converting that enriched mRNA to cDNA just drastically minimizes the screening effort you need later.

And of course, the ultimate strategy is to avoid the library screening entirely using PCR amplification.

Right, if genomic databases have revealed even short sequences flanking the gene, PCR is your escape clause.

It allows you to isolate the sequence of interest exponentially,

completely circumventing that complex time -consuming workflow of creating libraries and screening individual clones.

So if we don't know the sequence, the next best option is testing the function of the protein itself.

The most efficient strategy here is a selection procedure called the complementation assay.

This is applicable if the clone gene performs a function that is also found in E.

coli, even if the gene itself comes from a totally different organism.

For instance, imagine you're cloning the gene for anthrenolate synthase, which is required for tryptophan synthesis.

So you start with an E.

coli mutant, say a TrP -E-, which can't make tryptophan and therefore starves unless you supplement the medium.

You introduce your library of recombinant plasmids into this TrP -E - strain.

You then plate them on a medium that lacks tryptophan.

And only the rare cells that receive the foreign plasmid with the functional TrP -E homolog will suddenly gain the ability to make tryptophan and thrive.

Every other cell dies.

That is an extremely powerful, efficient selection.

But what if the foreign gene is for something entirely unique, like a complex human hormone, where complementation is biologically impossible in E.

coli?

Then you have to rely on antibody screening.

You detect the presence of the target protein by its specific reactivity with a highly purified antibody.

And this is a screening, not a selection process.

You have to check every colony individually.

Exactly.

And lambda phage vectors are particularly useful for antibody screening, aren't they?

They are.

Phage expression vectors, like lambda -GT11, are designed to express the foreign protein at high levels.

And crucially, the moditic cycle kills the host and releases the proteins directly into the plaque medium.

Which greatly facilitates the easy detection of the positive clones using the antibody.

And what if the gene's product won't function or fold correctly in E.

coli?

We're back to using shuttle vectors.

Exactly.

If the protein functions in the source organism, let's call it organism A, but not in E.

coli, you use a shuttle vector with origins of replication for both species.

You do all the cloning and manipulation easily in E.

coli.

But the functional testing, the screening for success, happens back in the source organism, organism A.

Now, functional screening fails if the protein product is toxic, if the host machinery doesn't recognize the foreign promoter, or if the initial cloning accidentally included eukaryotic introns.

In those cases, we have to rely entirely on identifying the DNA sequence itself using hybridization.

Hybridization involves annealing a labeled probe radioactive or fluorescent to the fixed DNA from the colonies we were testing.

The challenge is, how do you design that probe if the exact sequence is unknown?

There are two main approaches if we don't have the sequence ready -made.

The first is using homology.

If the gene has known sequences in related organisms, we can design a probe corresponding to the most conserved regions and hybridize it under conditions of low stringency.

Which means you're tolerating some degree of mismatching or imperfection in the annealing.

Right.

The second way is if we know at least a partial amino acid sequence of the final protein.

But since the genetic code is degenerate, meaning multiple codons code for the same amino acid, how do we account for all the possibilities?

You synthesize a mixture of degenerate DNA probes that correspond to all the possible codon combinations for that short amino acid sequence.

And that mixture is guaranteed to contain a probe that perfectly matches the target sequence, allowing you to successfully identify it.

The procedure involves replica plating colonies onto a filter, lysing the cells and fixing the DNA, then incubating that filter with the labeled probe mixture.

Detection then reveals which colonies have DNA that's successfully hybridized.

And you can also ingeniously combine these approaches.

For instance, transposing mutagenesis is really useful for cloning genes from distantly related bacteria where the existing genetic tool set is very scarce.

How does inserting a mobile drug resistance element help us find the original gene?

Well, you introduce the transposon piece of mobile DNA with a drug resistance gene into the source bacterium where it randomly disrupts the target gene, gene X.

Now you have a mutant with a recognizable phenotype.

You then clone this disrupted genomic DNA, which contains gene X plus the transposon, into an E.

coli plasmid.

Because the transposon's drug resistance marker is designed to express well in E.

coli, we can easily select for the presence of the fragment.

And now we have a piece of gene X.

Exactly.

We use the cloned gene X fragments flanking the transposon as a perfect, reliable DNA probe to screen a separate library that contains the original, wild -type, uninterrupted gene X clone.

It's a clever bootstrapping method to generate a probe without any sequence knowledge.

Now, for the modern revolution in this area, PCR, if we know the sequence, this just bypasses the entire complex, messy process of library creation and screening.

PCR, or polymerase chain reaction, relies on short oligonucleotide primers, which are complementary to opposite strands flanking the target gene.

The cycle involves heating the genomic DNA to denature the strands.

Cooling to allow the primers to anneal, and then allowing a heat -resistant DNA polymerase -like TAC polymerase from Thermis aquaticus to elongate the primers.

And since the primers anneal to the newly synthesized strands in the next cycle, the process exponentially amplifies the target sequence defined by the two primers.

The extreme sensitivity of PCR is its major advantage.

Theoretically, it can amplify a single copy of DNA into billions.

Which has revolutionized everything from forensics to diagnostics, allowing us to detect specific DNA signatures in hours, not weeks.

And the availability of massive genomic databases, like GenBank, has turbocharged this.

We don't necessarily need to know our specific gene.

We can search for homologs.

Alignment programs like BLAST let us find homologous sequences for our target protein from related organisms.

By aligning those conserved protein sequences, we can identify regions that are genetically identical across species, and then design our degenerate primers based on those conserved regions for PCR amplification.

And if that only gives you an internal fragment of the gene, you can recover the entire sequence using the ingenious technique of inverse PCR.

Inverse PCR is a way of reversing the standard PCR process.

You cut the genomic DNA with a restriction enzyme, and then use DNA ligase to self -ligate those fragments into tiny DNA circles.

And then you use primers that face outward divergently from the known internal segment.

Because the DNA is circularized, the divergent primers run into each other and amplify the entire surrounding chromosomal fragment, thus recovering the full gene sequence, both upstream and downstream of the known segment.

We have successfully cloned the right gene and put it into a plasmid.

But a general cloning vector isn't designed for yield.

If the goal is commercial production, we need an expression vector, which is specifically engineered to maximize protein synthesis.

Maximization relies on the gene dosage effect.

More copies of the gene mean more mRNA produced, leading to higher protein yield.

And for that, we need to fine tune the architecture of the vector itself.

The first critical component has to be a strong promoter to drive efficient transcription.

In E.

coli, this means having the two consensus sequences,

TTGNCA, the minus 35 region, and TATAT, the minus 10 region, or PribNL box, separated by the optimal distance of 16 to 18 base pairs.

Phages like T7 are also a fantastic source of strong promoters because they naturally require massive rapid protein production during their short infection cycle.

But the second component is regulation.

You can't just run a strong promoter constitutively all the time.

If the foreign protein is toxic or even just energy -intensive to produce,

non -producing cells will quickly out -compete the producing cells, and your yield will plummet over time.

So promoters have to be regulatable.

We delay the induction until the culture has reached a high, dense concentration.

Common examples are plaques, PTRPT, and synthetic hybrid Pecto, which is the strongest.

The plaque system is often used with specific genetic tweaks, like the lac -aute allele, which increases production of the repressi protein to suppress any leaky transcription of toxic genes.

The third, and maybe the most critical element, is ensuring the mRNA is efficiently translated by the ribosome.

This is governed by the ribosome binding sequence, RBS,

or Scheindel -Garno sequence.

This sequence, typically AGA, is complementary to the 16S rRNA, allowing the mRNA to physically associate with the 30S ribosomal subunit.

But here's where the precision engineering comes in.

The distance between the RBS and the start codon, ATG, is absolutely paramount for translation efficiency.

If that distance is off by just a couple of nucleotides, translation can just crash dramatically.

Correct.

The optimal distance is about seven nucleotides apart.

And since eukaryotic mRNA, our primary template, lacks an RBS, the vector must supply the E.

coli RBS and ensure the eukaryotic gene's five prime terminus is placed in that perfect seven nucleotide sweet spot.

The PUC series is a classic example of combining these features using a regulatable black docker promoter and its natural RBS.

The foreign gene is expressed as a fusion protein, often with the lax E and terminal fragment.

But for true industrial scale, the PET series is the gold standard.

It is designed for incredibly high expression, often achieving 40 to 50 % of the total cellular protein.

What makes the PET system so much more powerful than a simple plaque dollar promoter?

It uses the powerful T7 promoter, which has one huge advantage.

It is not recognized by the host E.

coli RNA polymerase.

Instead, the T7 RNA polymerase itself is supplied by a specialized host strain where its production is tightly controlled by a separate, regulatable promoter.

So the system is completely inert until the moment you want mass production.

You induce the host to start making T7 RNA polymerase and that enzyme then exclusively transcribes your target gene from the T7 promoter on the plasmid.

It's the ultimate off -switch for toxic genes.

And for absolute control, those host strains might also express T7 lysozyme, a natural inhibitor of T7 RNA polymerase, which keeps basal transcription exceptionally low until induction.

Finally, even with the strongest promoter and a perfect RBS, if the foreign gene comes from a distant organism, we run into the problem of codon usage.

Foreign genes, especially eukaryotic ones, may contain rare codons.

Codons that are rarely used by E.

coli for specific amino acids like arginine.

Since the host has low levels of the corresponding transfer RNA or tRNA to match those rare codons, translation stalls.

The translational arrest leads to ribosomal pile -up and then rapid mRNA degradation.

The only solution is meticulous engineering.

We have to use site -directed mutagenesis to substitute those rare codons with preferred common E.

coli codons, optimizing the gene for the specific host machinery.

Our microbe factory is turning out protein at 40 % efficiency.

Now comes the challenge of getting it out and ensuring it's functional and pure.

The first tool we employ is often expressing the protein as a fusion protein.

Why?

Primarily for two reasons, protection and purification.

Short peptides expressed alone in the bacterial cytoplasm are like targets.

They're rapidly recognized and degraded by endogenous peptidases.

But fusing the peptide to a large, stable E.

coli carrier protein protects it from degradation.

Exactly, and once it's protected, you need to cut the target protein free from the carrier.

That requires highly selective, site -specific cleavage.

For simple products, you might use chemical cleavage sites.

But for complex therapeutics, you have to use high -specificity proteases like Factor Zata, which recognize stringent amino acid sequences.

Guaranteeing the cleavage happens exactly at the fusion junction and nowhere else.

And the fusion partner also serves as a crucial purification aid, often called an affinity tag.

Exactly.

These tags dramatically simplify recovery via affinity chromatography.

Fusing the protein to protein A lets you recover it using immobilized IgG columns.

Fusing it to GST lets you recover it with immobilized glutathione.

And the most widely used system is the hexahistadine tag, or HisTag.

Which binds tightly to columns packed with immobilized nickel ions.

That single step can achieve nearly complete purity.

Now for the biggest headache in bacterial production, inclusion bodies.

When we express eukaryotic proteins in E.

coli cytoplasm, they often aggregate into these dense insoluble clumps.

What causes this catastrophic misfolding?

High expression levels lead to a massive localized concentration of nascent polypeptide chains.

This high concentration favors intermolecular interaction between the hydrophobic patches of the incompletely folded chains, causing aggregation.

And the E.

coli environment itself doesn't help.

Three major factors hinder correct folding.

First, the E.

coli cytosol is a highly reducing environment, which actively prevents the formation of stabilizing disulfide bonds that are necessary for most secreted eukaryotic proteins.

Second, the condition's pH, ionic strength, differ drastically from the protein's native environment.

And third, and this is critical, E.

coli often lacks the appropriate molecular chaperones and foldases that eukaryotic cells use to guide complex folding pathways.

So inclusion bodies are a failure of the system, yet you mentioned they offer a paradoxical advantage in purification.

They do.

They're dense and easily purified by simple centrifugation after you light up the cells,

almost completely bypassing traditional purification columns.

You just isolate the aggregate pellet and wash it.

But then you have to perform the difficult, expensive work of renaturation.

You're essentially unspooling a complex protein using harsh chemicals and then asking it to refold itself perfectly in a dilute bath.

That sounds less like molecular biology and more like high -stakes alchemy.

That's a perfect description.

Inclusion bodies are solubilized using strong chemical denaturants, like six molar urea.

The protein is then gradually renatured by slowly removing the denaturants.

And this step is extremely concentration -sensitive.

It has to be done at low protein concentrations to minimize new aggregation, which significantly increases costs and reduces the final yield.

That high cost of renaturation is off of the commercial bottleneck.

It's why many large complex proteins like tissue plasminogen activator are produced in far more expensive animal cell cultures rather than cheap E.

coli.

To prevent inclusion bodies in the first place, scientists try tricks like lowering the growth temperature or co -expressing the host's own chaperones.

But one of the most effective molecular approaches is using solubilizer fusions, linking the target gene to a highly soluble protein like E.

coli -theodoxin, or maltose -binding protein, which often forces the entire fusion protein to remain soluble.

And if we could get the protein secreted, purification would become immensely simpler, right?

The bacteria could be grown in cheap protein -free media.

To be secreted, the protein has to be synthesized with a leader sequence at its end terminus.

This sequence guides the protein to the secretory apparatus and is cleaved off by an enzyme called leader peptidase as the protein translocates across the membrane.

But E.

coli is gram -negative, which means export usually results in secretion into the periplasmic space between the inner and outer membranes, not the media itself.

That's the limitation.

However, the periplasm is a critical location because it is a more oxidizing environment than the cytosol.

It contains the enzymes required for disulfide bond formation, which is vital for the correct folding of most mammalian secreted proteins.

We saw the case study of IGF -1, insulin -like growth factor one production, which required extreme precision.

IGF -1, a small peptide, was produced using a secretion vector by fusing it to the leader sequence and EGG binding domain of protein A.

This affinity handle was essential for purifying the product from the large volume of culture supernatant.

And after purification, they used a chemical agent to cleave the final product at a specific sequence they engineered right before the IGF -1 sequence.

So while the periplasm aids folding, getting robust secretion directly into the medium remains a fundamental challenge in E.

coli.

Since bacteria struggle with the complexity of folding and these post -translational requirements, we turn to the eukaryotic kingdom, specifically yeast, often Saccharomyces cerevisiae.

And yeast offers several critical advantages that bacteria simply cannot replicate.

The primary advantage being the ability to perform post -translational modifications that are essential for the protein's function and stability in humans.

Yes, specifically glycosylation, the addition of oligosaccharide units.

Most secreted eukaryotic proteins must be glycosylated.

This helps guide correct folding.

It protects the protein from degradation by proteases and it's absolutely crucial for ensuring the protein has a proper circulation half -life in animals.

Without it, the protein is often prematurely destroyed.

Yeast also uses the same secretory pathway as human cells, the endoplasmic reticulum Golgi system.

This is key.

The pathway involves a signal recognition particle and translocating the protein across the ER membrane.

The ER lumen is an oxidizing environment, strongly favoring the formation and isomerization of crucial disulfide bonds.

Plus, yeast is safer than E.

coli.

It lacks the toxic lipopolysaccharide, the LPS endotoxin, and can still be grown to high density and inexpensive media.

But even in yeast, a eukaryotic host, we still use cDNA, not genomic DNA, for mammalian expression, why is that?

Because yeast has a minimal number of introns itself and it can't be relied upon to correctly splice complex mammalian introns.

So cDNA, which is already intron -free, remains the safest, most reliable template for expression.

Getting DNA into yeast is different from bacteria, where we had three primary modes.

What are the methods for yeast?

Transformation is the only practical means.

We can use enzymatic digestion to remove the rigid cell wall, creating fragile cells called spheroplasts, which are then incubated with DNA and polyethylene glycol, or PEG, to promote uptake.

Alternatively, you can treat intact cells with lithium ions, followed by PEG, though the mechanism is poorly understood, and electroporation works here too.

As with bacteria, electroporation is also a highly effective method.

Because genetic manipulation is so much easier in E.

coli, most yeast vectors are shuttle vectors.

They contain origins of replication and selection markers for both E.

coli and yeast.

How do we select for the plasmid in yeast?

We use nutritional complementation.

We employ a host strain that is mutant for an essential gene, say, a lutein mutant that can't make leucine.

The vector carries the wild type LU2 gene, and only cells that receive the plasmid can grow on media lacking leucine.

Let's categorize the five major vector types, focusing on that trade -off between stability and copy number.

First up, yeast integrative plasmids, YPs.

YPs lack a yeast origin of replication, an ARS, and are maintained only by integration into the yeast chromosome via homologous recombination.

And this integration is a rare event, which means low transformation frequency and a low copy number, typically one copy per cell, limiting expression.

But the advantage is stability.

They are stably inherited and maintained without the need for continuous selection pressure.

Okay, next we have yeast replicating plasmids, YRPs, which contain an ARS from the yeast chromosome, allowing them to replicate on their own.

These yield a much higher transformation frequency, but they have a fatal flaw for production purposes.

They partition very poorly during yeast budding, right?

The division process is unequal.

They are rapidly lost unless you apply constant selection, making them unreliable for robust long -term expression.

The critical class for high expression, then, is the yeast epizomal plasmids.

YPs are derived from the high copy number endogenous 2 -micron plasmid found in S.

cerevisiae.

They exist in high copy numbers, typically 30 to 50 copies per cell, sometimes reaching over 200.

This makes them the most suitable for high level expression.

And while they still segregate poorly, the high copy number dramatically improves their stability compared to YRPs.

We even saw a clever trick to force super high copy retention.

That was the lewd dollar plasmid.

It contains a promoterless, defective LEU2 gene.

To produce enough of the LEU2 enzyme to complement the host, the plasmid is forced to maintain an extremely high copy number, upwards of 200 copies per cell, just to satisfy the nutritional demand.

The fourth type, yeast centromeric plasmids, YCPs, offer the highest stability.

YCs are essentially YEPs or YRPs that have had a yeast centromeric sequence, a CEN, inserted.

This forces them to behave like normal chromosomes, ensuring faithful distribution to daughter cells, which provides high stability.

But the copy number is kept very low, usually one to three per cell.

Right, and finally, we have yeast artificial chromosomes, YACs.

These are linear plasmids containing the ARS, the CEN, and a telomere sequence at each end.

Which is necessary to prevent the linear DNA from shortening with each replication cycle.

Because they are linear, there's virtually no limit to the size of foreign DNA they can clone, often used for massive segments over 100 kilobases in genome sequencing.

They're a cloning tool, not typically an expression tool.

To maximize protein yield, we have to insert the foreign gene behind a strong yeast promoter, as foreign promoters are unlikely to be recognized.

What makes yeast promoters fundamentally different from bacterial ones?

They're structurally much more complex.

While they have a TATA box, it's located much, much further upstream, 40 to 120 bases away, compared to the very tight 10 -base pair spacing in E.

coli.

And they also require a far upstream activating sequence, UAS, which is analogous to an enhancer, located hundreds of bases away.

The TDH3 promoter is one of the preferred, strong constitutive promoters, because others, like ADH1, can be repressed at high cell density.

And for toxic proteins, we rely on tight regulation.

Promoters like GAL1 are induced by galactose and repressed by glucose, or CUP1 is induced by copper ions.

Hybrid promoters can even combine a regulatable UAS with a strong TATA box region to achieve high controlled expression.

Efficient transcription also requires proper termination and polyadenylation.

Exactly, efficient polyadenylation, adding that A -stretch is essential for stable mRNA.

Because the termination process in yeast is less understood than in bacteria, researchers typically just insert a long native yeast terminator sequence downstream of the gene to ensure a stable, protected mRNA transcript.

And when it comes to translation initiation in the absence of a Shine -Dalgarno sequence, yeast relies on Kozak's rule.

Translation efficiency depends entirely on the sequence context surrounding the AUG initiation codon.

The consensus sequence is AXXAUGG.

Translation can be severely inhibited by G -residues upstream of the AUG, or if the five prime untranslated region forms complex secondary structures.

The huge conceptual takeaway, though, is that foreign proteins fold correctly in yeast, which is a sharp contrast with E.

coli.

It's a striking difference.

Proteins that form inclusion bodies in E.

coli, like hepatitis B core protein,

often remain completely soluble in yeast, even when expressed at 20 to 40 % of total protein.

This is entirely due to the presence of those efficient eukaryotic molecular chaperones and foldases.

What about glycosylation?

Is the yeast version perfectly mammalian?

Not natively.

Yeast adds hyminose -type oligosaccharides.

While that's helpful for folding and stability, these are not the complex type glycosylation found in higher animals.

This difference can affect the protein circulation time in the human body.

But recent genetic engineering efforts have successfully modified specialized yeast species like Pichia pastoris to produce complex human -like glycosylation.

Let's put all this into perspective with two commercial case studies that highlight the engineering choices involved.

First, the production of chamosin, or renin, the enzyme critical for cheese making.

Which traditionally came from limited supplies in calf stomachs.

Researchers cloned the cDNA for prochamosin, the inactive precursor from calf mRNA.

Initial trials in E.

coli were high yield, up to 5 % of total protein, but the product accumulated entirely as inclusion bodies.

And because chamosin is an agricultural product with relatively low profit margins compared to human therapeutics, the expensive low yield renaturation process just made E.

coli production commercially prohibitive.

Secretion attempts also failed, as the protein folded too quickly in the cytosol to be efficiently exported.

So they moved to Saccharomyces cerevisiae.

And while initial trials still showed some aggregates, the use of secretion vectors fusing the gene to the invertase leader sequence resulted a correctly folded and glycosylated prochamosin being secreted through the ER Golgi pathway.

But the final yield was still too low.

cerevisiae is just not a natural high level secretor.

This forced the major breakthrough, moving to non -conventional yeast and fungi.

Using species like Cloyveromyces lactis and fusing the prochamosin to a powerful signal sequence, the yield increased 50 to 70 fold, reaching approximately one gram per liter of correctly processed secreted product.

The chymosin story is a perfect illustration that success requires matching the gene to the right host based on its native folding and secretion demands, even if that means moving past the traditional E.

coli and S.

cerevisiae workhorses.

Our second case study is the hepatitis B virus surface antigen, HPSAG,

which was the component used in the first recombinant DNA vaccine licensed in the US.

This was a massive triumph for Saccharomyces cerevisiae.

HPSAG was cloned into high copy Ips.

Remarkably, it folded correctly in the yeast cytoplasm, even without complex glycosylation, and spontaneously assembled into stable 22 nanometer particles.

Those are the empty viral envelopes needed for the vaccine.

And this correct folding crucially avoided the formation of inclusion bodies, which would have impaired host cell growth due to chaperone sequestration.

The subsequent optimization was a testament to meticulous engineering.

They learned, for instance, that attempts to force super high copy numbers using the Lutonobi dollar system failed because HPSAG expression was somewhat toxic.

The production strains were outcompeted by non -producing mutants.

It proved that high yield isn't always sustained yield.

They optimized the promoter, moving to the powerful, consistently expressed TDH3 promoter.

They found that adding a strong termination sequence dramatically improved mRNA stability and yield.

And they fine -tuned the translation initiation context, reducing inhibitory G -residues upstream of the AUG codon to optimize COSEX rule for maximum translation.

But the biggest commercial jump wasn't molecular engineering at all.

No, it was fermentation conditions.

By switching from leaky mutant strains to robust prototrophic strains, they massively increased the final cell density from about one gram per liter to 60 or 70 grams per liter.

This combined increase in biomass and expression rate made the vaccine commercially viable.

It's a powerful conclusion.

Success in biotechnology depends as much on the scale of the living factory as it does on the molecular blueprints you put inside it.

We started today contrasting the nightmare of rare contaminated animal source proteins with the promise of the living factory.

We charted the sophisticated molecular geography of gene transfer from basic transformation and conjugation to the high efficiency of transduction.

We examined the essential vector toolkit from PBR322 plasmids to large fragment BACs and powerful phage display systems.

We detailed how getting the gene in is only the start.

Maximization requires precisely engineered expression systems with strong regulated promoters and optimized ribosome binding sites exemplified by the potent PET system.

Finally, we contrasted the bacterial struggle against misfolding, which often yields inclusion bodies requiring expensive renaturation with the eukaryotic advantage of yeast, which provides the chaperones and secretion pathways necessary for correct folding and essential glycosylation, leading directly to the commercial success stories like the HBS Ag vaccine and secreted chymosin.

The ability to program life has redefined our pharmaceutical and industrial landscape.

But considering that even our highly optimized systems still sometimes fail due to misfolding and toxicity, the E.

coli pro -chymosin failure being a prime case,

what fundamental universal protein folding principles are still missing from our understanding that, if mastered, would allow us to express any complex human protein perfectly and efficiently in the simplest, cheapest host like E.

coli.

That quest to truly master folding remains the ultimate frontier in microbial engineering.

A truly compelling thought for future research.

Thank you for diving deep with us.

Always a pleasure.

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

Chapter SummaryWhat this audio overview covers

Bacterial and yeast cell factories represent the foundation of modern protein manufacturing, enabling the production of pure, cost-effective biopharmaceuticals that would be prohibitively expensive or impossible to extract from animal tissues. Escherichia coli dominates bacterial host selection due to its rapid proliferation rate and extensively characterized genetic architecture, making it ideal for high-volume recombinant protein synthesis. DNA introduction into bacterial cells occurs through multiple pathways: transformation employs chemical induction with calcium chloride or physical disruption via electroporation, conjugation transfers plasmids between bacterial cells through direct contact, and transduction leverages bacteriophage machinery to inject foreign DNA. The selection and design of cloning vectors directly impacts production efficiency and scale; general-purpose vectors like pBR322 and pUC series incorporate selectable antibiotic resistance markers alongside blue-white screening mechanisms based on alpha-complementation, while specialized vectors including bacteriophage lambda, cosmids, and Bacterial Artificial Chromosomes accommodate progressively larger DNA inserts for complex genomic applications. Amplification strategies such as shotgun cloning, reverse transcriptase-mediated cDNA synthesis for eliminating eukaryotic introns, and Polymerase Chain Reaction enable efficient target sequence isolation and multiplication. Expression optimization depends on multiple interconnected factors: strong, tightly regulated promoters including lac, trp, and T7 variants control transcription initiation, Shine-Dalgarno ribosome binding site sequences determine translation efficiency, and codon usage patterns must align with host tRNA availability. Production bottlenecks emerge when recombinant proteins aggregate into insoluble inclusion bodies; resolution strategies include fusion protein engineering using solubility tags like GST or protein A, recruitment of molecular chaperone systems for assisted refolding, and deployment of secretion vectors that direct proteins toward the periplasmic space for improved solubility and post-translational processing. Eukaryotic yeast systems, particularly Saccharomyces cerevisiae, overcome bacterial limitations by executing complex post-translational modifications including glycosylation and disulfide bond formation essential for many therapeutics. Yeast vector architectures span integrative plasmids for genomic insertion, replicating plasmids maintaining high copy numbers, episomal plasmids for transient expression, centromeric plasmids enabling stable inheritance, and Yeast Artificial Chromosomes for megabase-scale DNA handling. Industrial applications demonstrate practical advantages: chymosin production for cheese manufacturing and Hepatitis B surface antigen vaccine synthesis showcase optimization of secretion signals like the alpha-mating factor, while alternative yeasts including Pichia pastoris and Kluyveromyces lactis achieve exceptionally high-yield secretory expression for commercial scale manufacturing.

Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.

Support LML ♥

Chapter 3: Protein Production in Bacteria & Yeast

Related Chapters