Chapter 9: DNA-Based Information Technologies: Cloning, Genomics, and the Human Genome
Welcome to Last Minute Lecture.
This free chapter overview is designed to help students review and understand key concepts.
These summaries supplement not replaced the original textbook and may not be redistributed or resold.
For complete coverage, always consult the official text.
Welcome back to the Deep Dive.
We're the show that tries to, you know, cut through the noise and really get to the heart of complex subjects.
And today, we're definitely diving deep.
We are into the really incredible world of DNA -based information technologies, it feels like.
Like this huge universe of constant discovery.
It really does.
And what we've learned about our biology, our blueprint, it feels like
just the tip of the iceberg, really.
Exactly.
Despite everything we don't know, the tools we have developed are just accelerating things like crazy.
Oh, absolutely.
In ways you couldn't have imagined, you know, even a few decades back, giving us access to a cell's entire DNA library.
So today, our Deep Dive is focusing on those foundational tools.
We're looking mainly at Chapter 9 of Lenager Principles of Biochemistry.
A really key chapter.
It lays out the techniques that, well, they basically transformed how we study biology.
And it seems to boil down to, what, three big ideas about genomic information?
Yeah, kind of three core principles.
First, the obvious but crucial one, an organism's DNA, its genome, that's the ultimate source, the blueprint for everything.
Billions of base pairs dictating structure, function,
everything.
Right.
Second, maybe more surprisingly, is how accessible this information has become.
With sequencing and all the other tech advances.
Exactly.
We can read it.
We can interpret how that information is used, how it's expressed.
You know, the clues to what a protein does are right there in the gene sequence.
Which leads to the third point.
It's not just static information we're reading.
No, not at all.
It's incredibly malleable.
We can actually change it.
We're not just observers anymore.
We can go in and tweak things.
Alter metabolism, structure, pretty much anything, potentially.
Okay, but how do you even start with something so massive?
These DNA molecules are the largest things in the cell.
That's the challenge, right?
How do you study one tiny part of this enormous molecule?
Like finding one specific sentence in, I don't know, the Library of Congress?
A good analogy.
And the breakthrough technology that lets us do that is DNA cloning.
Which basically means making lots and lots of identical copies of just the specific bit you're interested in.
Precisely.
Making many identical copies of a specific gene segment or some other DNA fragment.
And there's a kind of standard procedure, right?
Like five steps.
Generally, yes.
A classic five -step process.
First, you've got to get your hands on that DNA segment.
Cut it out somehow.
Usually using restriction endonucleuses.
Think of them as molecular scissors.
They cut DNA at very specific sequences.
Or you could use PCR amplification too, right?
Or PCR, yes.
A different approach to get many copies of your target sequence.
Step two is picking a cloning vector.
A carrier molecule.
Right.
A small DNA molecule that can replicate itself inside a host cell.
Plasmids are the classic example of those little circular DNAs and bacteria.
Okay, so you have your gene fragment.
You have your plasmid vector.
Step three.
You join them together.
This uses another enzyme.
DNA legus.
It's like molecular glue.
Sticking the gene into the plasmid.
Forming those phosphodiester bonds.
And what you create is called recombinant DNA.
A hybrid molecule.
Made from two different sources.
Okay, step four.
Moving that recombinant DNA into a host organism.
Usually bacteria, like E.
coli.
Why bacteria?
Well, they grow fast.
They have all the machinery needed to replicate the plasmid DNA, making millions of copies for you.
And finally, step five must be finding the bacteria that actually took up the recombinant DNA.
Exactly.
Selecting or identifying the right host cells.
Vectors usually have selectable markers built in.
Like antibiotic resistance.
That's the common one.
So only the cells that actually contain your plasmid will survive when you grow them on media with that antibiotic.
Let's back up to those restriction enzymes for a sec.
They sound crucial.
Oh, they are.
Discovered by Werner Arber.
Their natural job in bacteria is defense, chopping up foreign DNA, like from viruses.
Restricting the invader, hence the name.
Precisely.
And the bacterium protects its own DNA using methylation, adding methyl grooves to its recognition sites so its own enzymes don't cut it.
Clever system.
And it's the type two enzymes that are most useful for cloning.
Yes, Hamilton Smith isolated the first ones.
They're great because they cut within their specific recognition sequence.
Very precise.
And they often create sticky ends.
Right.
Short single -stranded overhangs.
If you cut your gene fragment and your plasmid with the same enzyme, they'll have complementary sticky ends that can easily pair up.
Making the ligation step much more efficient than joining blunt ends.
Much more efficient.
Though you can ligate blunt ends too.
Sometimes researchers even add synthetic linkers.
Short DNA pieces with extra restriction sites.
Or use plasmids with a multiple cloning site.
Or MCS.
Giving you more options for cutting and pasting later on.
Exactly.
More flexibility.
This whole field really took off back in the early 70s.
People like Peter Lobin, Dale Kaiser, Ballberg.
They made the very first recombinant DNA molecules.
It sounds pretty fundamental.
It is.
For you listening, understanding this basic cut -paste copy mechanism, it really underpins almost everything in modern biotech.
Making insulin, GMOs, forensic analysis.
It all starts here.
Okay, so let's talk more about those carriers, the vectors.
You mentioned plasmids are the workhorses.
Definitely the most common.
Especially for smaller DNA pieces.
These circular DNAs replicate independently in bacteria.
Like PBR322.
Is that a classic example?
A very early and famous one.
Yeah, from 77.
It has all the key features.
An origin of replication.
The ori site.
So the bacteria can copy it.
And the antibiotic resistance genes you mentioned for selection.
Ampicillin and tetracycline resistance, I think.
Amp, bar, and tetatare.
That's right.
And importantly,
specific restriction enzyme sites located within those resistance genes, which was clever for screening.
How do you actually get the plasmids into the bacteria?
They don't just soak them up, do they?
No, you need to make the bacterial cell walls temporarily permeable.
The standard method is transformation.
That's the calcium chloride and heat shock thing.
Exactly.
Or you can use electroporation.
A jolt of electricity punches temporary holes in the membrane.
But it's not super efficient, right?
Only some cells take up the plasmid.
Right.
Which is why those selectable markers, the resistance genes, are so vital.
You kill off all the bacteria that didn't get the plasmid.
Okay, but what if your DNA fragment is huge?
You said plasmids have limits?
They do.
Usually under, say, 15 ,000 base pairs or 15 kilobounds.
For bigger chunks, you need different vectors.
Like BACs, bacterial artificial chromosomes.
Right.
For inserts in the range of 100 to 300 kilobases.
So much larger.
How do they manage that stably?
Key thing is they maintain a very low copy number.
Usually just one or two copies per cell, like the bacterial chromosome itself.
Why is low copy number good?
It reduces the chances of unwanted recombination events messing up your large DNA insert.
They also have special PAR genes to make sure they get distributed properly when the cell divides.
And for even bigger pieces, like mega bases.
Then you move to YACs, yeast artificial chromosomes.
For cloning in yeast, Saccharomyces cerezizi.
Baker's yeast.
The very same.
It's a eukaryote, so it handles large DNA structures well.
YACs can hold up to maybe two million base pairs or two mega bases.
Wow.
And they must need eukaryotic parts.
They do.
They include essentials like telomeres for the chromosome ends and a centromere for proper segregation during cell division.
Some vectors, by the way, can work in multiple species.
They're called shuttle vectors.
So it's like having different size delivery trucks for different size cargo.
That's a great way to put it.
Plasmids for small packages.
BACs for medium ones.
YACs for the really massive freight.
Indispensable for mapping whole genomes.
Okay, so we've cloned our gene.
We have lots of copies.
But often, the real goal isn't just the DNA, it's the protein it makes, right?
Absolutely.
For many researchers, cloning is just the first step towards producing and studying the protein product.
So how do you get the host cell, like our E.
coli, to actually make that protein, and hopefully lots of it?
For that, you need a specialized vector called an expression vector.
It's still a cloning vector, but it has extra sequences designed to trick the host cell into overproducing your specific protein.
Like promoter sequences?
Strong promoters, yes.
To tell the cell's RNA polymerase, start transcribing here and do a lot of it.
It also needs ribosome binding sites for efficient translation into protein.
And if you're putting, say, a human gene into bacteria, you need bacterial signals, not human ones.
Exactly.
You have to engineer the vector with the right bacterial control sequences.
When it works well, the protein you want can become a huge fraction of the cell's total protein, maybe 10 % or even more.
But choosing the right host system sounds important.
You mentioned E.
coli.
E.
coli is the go -to for many reasons.
It's cheap, grows incredibly fast, and we understand its genetics really well.
There's always a but.
There is.
Sometimes eukaryotic proteins just don't fold correctly in bacteria.
They can misfold and clump together into insoluble blobs called inclusion bodies.
Useless goo.
Pretty much.
And bacteria can't do many of the post -translational modifications that eukaryotic proteins often need to function properly, like adding sugars or phosphates.
Are there workarounds?
Yeah, researchers have engineered strains with extra helper proteins, chaperones, to aid folding.
Or they use inducible promoters, like from the La Caperon or Bacteriophage T7, so you only switch on protein production when the cells are ready.
What about other hosts?
Yeast?
Yeast et cerevisiae is often a better bet for eukaryotic proteins.
Being a eukaryote itself, it can perform some of those modifications and folding steps more accurately.
And insect cells.
I've heard of those being used.
Insect cells, using baculovirus vectors, can be amazing.
They can produce huge amounts of protein, often with very good eukaryotic -like modifications.
They're even backmids now vectors that combine parts of the baculovirus and plasmids, making them easier to handle in E.
coli first.
And mammalian cells.
They're the best for getting modifications absolutely right, especially for therapeutic proteins.
But they're slow and expensive to grow, so often used more for testing function than for mass production.
Beyond just making the protein, can we use these techniques to change it, to study how structure relates to function?
Definitely.
That's the power of site -directed mutagenesis.
You can make very precise changes in the DNA sequence to alter a single amino acid in the final protein.
How's that done?
One way is to synthesize a short DNA fragment with a desired mutation and swap it into the gene.
Another common method uses PCR with primers that contain the mismatch, creating the change during amplification.
Like changing one specific amino acid residue.
Exactly.
The example in the book is changing a lysine to an arginine in the Rique protein.
That tiny change stops it from hydrolyzing ATP, which tells you a lot about how it works.
And you can make bigger changes too.
Delete whole sections.
Absolutely.
Delete entire domains, or even stitch parts of different genes together to create novel fusion proteins.
Speaking of fusion proteins, what about adding tags for purification?
Ah, yes.
Terminal tags.
A very common strategy.
Do you genetically fuse a short peptide or even a whole protein tag onto your target protein?
Like the GST tag?
Glutathione S -transferase, right.
GST binds very tightly and specifically to glutathione.
So you make your fusion protein in cells, break them open, and run the extract over a column matrix that has glutathione stuck to it.
And only the GST tag protein sticks?
Pretty much.
Everything else washes through.
Then you change the conditions to release your pure tagged protein.
Often you could even engineer a cleavage site to remove the tag afterwards.
Is the his tag similar?
The histidine tag, yes.
Just a short run of histidine residue is usually six.
It binds strongly to metal ions like nickel.
So you use a nickel column for purification.
Very simple.
Very popular.
It's super convenient.
Any downsides.
The main one is that the tag itself, even if small, might affect how the protein folds or functions.
So you always need controls to check that.
But it's an incredibly powerful tool.
And PCR keeps popping up.
Does it have other roles beyond just getting the initial DNA?
Oh yeah.
RT -PCR reverse transcriptase PCR that lets you amplify RNA sequences.
You first use the enzyme reverse transcriptase to make a DNA copy of the RNA called cDNA.
And then you amplify the cDNA with standard PCR.
Why would you want to amplify RNA?
To see which genes are actually being expressed, transcribed into RNA, in a cell at a particular time.
Or even to tell if cells in a sample are alive, making RNA or dead.
And quantitative PCR.
QPCR.
Or real -time PCR.
That lets you estimate how much of a specific DNA or RNA sequence is present.
It uses fluorescent probes.
And the machine measures how many PCR cycles it takes for the fluorescent signal to cross a certain threshold.
So more starting material means a faster signal.
Exactly.
Faster signal, lower cycle threshold, or CT value means more initial copies.
Great for comparing gene expression levels, say, between normal cells and tumor cells.
One last thing in this section.
DNA libraries.
Right.
Basically a collection of DNA clones.
You could have a genomic library representing the entire genome,
or a cDNA library.
Made from mRNA using that RT step.
Correct.
So a cDNA library only represents the genes that were being actively expressed as mRNA in the cells you started with.
Gives you a snapshot of the transcriptome.
And combinatorial libraries.
Those are cool.
You intentionally create lots of variants of a single gene, maybe randomizing certain amino acids, and then screen that library for proteins with improved or altered functions.
Like evolving a better enzyme in the lab.
It feels like the level of control is just staggering.
It really is.
We can dissect function with incredible precision now.
Which brings us neatly to how we actually figure out what these proteins do.
Unpacking function.
You mentioned different levels.
Right.
We can think about function on, say, three levels.
Phenotypic function.
What's the effect on the whole organism?
Does deleting the gene make it grow slower, change its shape?
Then cellular function.
That's about the protein's role within the cell.
Its network of interactions, the pathways it's involved in.
And finally, molecular function.
The nitty -gritty.
What biochemical activity does it perform?
Does it bind DNA, catalyze a specific reaction,
bind a particular molecule?
To get at all this, we need to map out what's actually in the cell, right?
The transcriptome and proteome.
Exactly.
The transcriptomes is the full set of RNAs being made at a given moment.
The proteome is the full set of proteins.
Steading them gives us transcriptomics and proteomics.
And often, the sequence itself gives us clues.
Comparative genomics is huge here.
We use tools like BLAST to compare a new gene sequence to known sequences from other species.
Looking for similarities.
Right.
If your unknown human protein looks a lot like a known enzyme in yeast,
chances are they do similar things.
We call these related genes across species orthologs.
As opposed to paralogs.
Which are related genes that arose from duplication within the same species.
Both give clues, but orthologs often point more directly to conserved function.
Sometimes even the order of genes on a chromosome's synteny is conserved and gives clues.
And specific patterns in the amino acid sequence.
Motifs.
Yes, certain short sequences are known signatures for say, binding ATP or binding DNA or crossing a membrane.
Structural motifs.
But function also depends on when and where a protein is active.
Absolutely.
RNAseq is a powerful tool here.
You sequence all the mRNA in a cell sample.
Tells you which genes are on and how strongly under specific conditions.
Even in single cells now.
S -E -R -N -A -S.
Yes.
Which is incredible for looking at complex tissues like the brain.
Or identifying rare cell types in tumors.
Huge diagnostic potential.
And mass spectrometry.
Does that look directly at proteins?
It does.
Mass spec can identify and quantify thousands of proteins in a sample.
It can even tell you about modifications like phosphorylation, which often regulate protein activity.
It complements the RNA data.
What about seeing where a protein is inside a cell?
The star player there is GFP green fluorescent protein from jellyfish.
The glowing one?
That's the one.
It glows green all by itself when you shine blue light on it.
You can genetically fuse the GFP gene to your gene of interest.
The fill makes a fusion protein that lights up wherever your protein normally goes.
In living cells.
In living cells.
In real time.
Roger C .N.
won a Nobel for developing variants in a whole rainbow of colors.
It revolutionized cell biology.
What if GFP doesn't work for some reason?
Then you can use immunocluorescence.
This uses antibodies.
You get a primary antibody that sticks specifically to your protein.
Then you add a secondary antibody that sticks to the first one.
And that secondary antibody has a fluorescent tag.
So you paint the protein's location with fluorescence.
Requires fixing the cells though?
Usually yes.
Less dynamic than GFP, but still very powerful for localization.
Okay, location is one thing.
What about interactions?
Who does a protein work with?
Crucial question.
One way is immunoprecipitation, or IP.
If you have an antibody to your protein, maybe using one of those tags we discussed, you can use it to pull your protein out of a cell extract.
And anything stuck to it comes along for the ride.
Exactly.
You then use mass -spec to identify the proteins that co -precipitated their potential interaction partners.
But you might pull down things that aren't specific.
True.
That's why methods like T -Tap Tags Tandem Affinity Purification were developed.
Uses two different tags and two purification steps in sequence.
Much cleaner, fewer false positives.
And there's a clever genetic method too.
Yeast -2 -Hybrid.
Ah yes, the Yeast -2 -Hybrid system.
Really ingenious.
It relies on a transcription factor called GEL4 that has two essential parts.
A DNA binding domain, BD, and an activation domain, AD.
Which need to come together to turn on a gene.
Right, so you fuse your protein X to the GEL4 -BD and a potential partner protein Y to the GEL4 -AD.
You put both fusions into yeast cells.
And if X and Y interact?
They bring the BD and AD together.
GEL4 becomes functional and turns on a reporter gene something easy to detect, like making the yeast cell turn blue or grow on special media.
So you can screen huge libraries of potential partners this way.
Precisely.
A powerful way to map out protein interaction networks.
Okay, one more huge area.
Actually changing or deleting genes to see what happens.
That used to be really hard, right?
Very hard.
Especially in complex organisms.
Then along came CRISPR -Case.
The gene editing revolution.
Absolutely.
It's adapted from a bacterial immune system.
CRISPR stands for Clustered Regularly Interspaced Short Palindromic Repeats.
Bit of a mouthful.
How does the editing part work?
The Cas9 system?
The most common one, developed by Dougna and Charpentier, uses the Cas9 protein, which is like programmable DNA scissors, and a single guide RNA, or sgRNA.
And the guide RNA tells Cas9 where to cut.
Exactly.
The sgRNA has a sequence that matches the target DNA site in the genome.
It guides Cas9 there, and Cas9 makes a double -strand break in the DNA.
And the cell tries to repair that break.
It does, but the common repair pathway is often sloppy, making small insertions or deletions.
This usually messes up the gene, effectively inactivating it.
Knocking it out.
Or you can provide a template for more precise changes.
Yes.
You can provide a DNA template with a desired change, and a different repair pathway can use that to make a specific edit.
Or engineer Cas9 to just make a single strand nick to encourage specific mutations.
The implications seem enormous.
Gene drives for pest control.
That's a big one being explored.
Like the X shredder idea in mosquitoes.
Put Cas9 in a guide RNA, targeting the X chromosome onto the Y chromosome.
So in sperm development.
It chews up the X chromosome.
All offspring inheriting the Y chromosome are male.
Over time, the population crashes because there are no females.
Wow.
Powerful, but also raises some serious ethical questions about releasing something like that.
Huge ethical questions.
But the potential for targeted gene modification is undeniable.
It's also used now for high throughput screens.
You can create libraries of guide RNAs targeting every gene in the genome.
Introduce them into cells along with Cas9.
And you can systematically knock out or even activate genes on a massive scale to discover their functions.
Often using barcodes in the guide RNAs to track which gene was hit in which cell.
It feels like we have this incredible toolkit now to take biology apart and see how it works piece by piece.
It's an unprecedented level of insight and control.
So let's turn this toolkit towards ourselves.
The human genome.
What has sequencing our own blueprint revealed?
Well, since the first drafts around 2001, we've sequenced tens of thousands of human genomes.
And one of the first surprises was the gene count.
Only about 20 ,000 protein coding genes.
Fewer than rice.
Yeah.
Roughly similar to worms and flies.
Seems low for our complexity, right?
That's definitely counterintuitive.
So where does the complexity come from?
A lot of it comes from how we use those genes.
Our genes are split into coding parts, exons, and non -coding spacers.
Introns.
And alternative splicing.
That's key.
We can splice those exons together in different combinations from a single gene to make multiple different protein versions.
Greatly expands our protein repertoire from a limited gene set.
And most of our DNA doesn't even code for proteins, right?
Dark matter.
Less than 1 .5 % codes for protein.
A huge chunk, almost half, is made of transposons.
Molecular parasites.
Jumping genes.
Essentially, yeah.
DNA sequences that can copy themselves and move around the genome.
They've played a massive role in shaping our evolution, actually.
DNA transposons.
Retrotransposons.
But it's not all junk.
The ENCODE project showed.
Right.
ENCODE suggested over 80 % of our DNA is functional in some way.
Involved in regulating genes, making functional RNA molecules we're only just discovering.
It's much more active than we thought.
We also have those repetitive sequences.
SSRs and STRs.
Simple sequence repeats.
Short tandem repeats.
Highly repetitive.
Found in places like centromeres and telomeres.
And the variation in STRs between people is the basis for forensic DNA fingerprinting.
How does all this variation, like SNPs, relate to human evolution?
SNPs, single nucleotide polymorphisms, are those single base differences between individuals.
Occurring maybe every 1 ,000 base pairs or so.
They're the main source of our genetic diversity.
In haplotypes, groups of SNPs inherited together.
Exactly.
Because they're inherited in blocks, they act as markers.
They can track haplotypes, especially on the Y chromosome and in mitochondrial DNA, which don't recombine much, to trace human migrations and population history.
Comparing our genome to chimps, our closest relatives, it's only about a 1 % difference in sequence.
About 1 .2 % in base pairs, yeah.
We diverged maybe 7 million years ago.
But if you include bigger rearrangements, insertions, deletions, inversions, the overall difference is more like 4%.
Like human chromosome 2 being a fusion of two ape chromosomes.
That's a classic example.
You can see where they joined.
Comparing ourselves to chimps, and then using more distant relative, like the orangutan as an outgroup,
helps pinpoint changes that happen specifically on the human lineage after we split from chimps.
Are researchers looking for specific genes that make us human?
They are.
Looking for genes showing accelerated evolution in humans.
Or genes linked to uniquely human traits, like complex language or cognition.
Sometimes by studying cognitive disorders.
Some intriguing candidates have emerged, like RNA genes involved in brain development, such as HAR1F.
And what about disease?
Genomics must be huge there.
Immense.
Over 6 ,000 genetic diseases have now been mapped to specific genes.
Linkage analysis was a key early technique.
Following how a disease travels through families along with specific genetic markers?
Exactly.
If a certain marker is always inherited by family members who have the disease, the disease gene must be located nearby on the chromosome.
That's how they found the Presilin 1 PS1 gene for early onset Alzheimer's on chromosome 14.
And now with databases of gene function, interactions, SMP locations, it must speed things up.
Massively.
You might narrow down a region by linkage,
then use databases to see which known genes in that region are plausible candidates based on their function.
Much more targeted.
So genomics traces our past migrations.
Out of Africa?
Across the globe, yeah.
We can reconstruct population movements over tens of thousands of years.
And it's even revealing interactions with ancient relatives, Neanderthals.
Absolutely fascinating stuff.
We have high quality Neanderthal genomes now, sequenced from ancient bone fragments.
And people outside Africa have Neanderthal DNA?
Up to maybe 5 % in Europeans and Asians.
Seems you were interbred after leaving Africa.
Some of those Neanderthal genes might influence things like our immune system or skin characteristics.
And there were Denisovans too.
Another ancient group?
Found from remains in Siberia.
People in Melanesia and Australia carry significant Denisovan ancestry.
Up to 6%.
Sequencing this ancient DNA, adapting forensic techniques, it's like opening a lost history book.
So wrapping this all up, what's the takeaway for us today and tomorrow?
The promise is huge.
Personal genomics for medicine is becoming reality as costs plummet.
Gene therapy, while still complex, is advancing with better delivery methods.
It feels like these technologies are just going to keep shaping our future.
Without a doubt.
Understanding and manipulating our own genome and the genomes of other organisms.
It's hard to think of a scientific field that will have a bigger impact on our species going forward.
So that brings our deep dive on DNA -based information technologies to a close.
What an incredible journey from the basics of cutting and pasting DNA.
Through the vectors, the expression systems, offering proteins.
To mapping protein functions, interactions, and finally, using genomics to read our own history and maybe even shape our future.
It really highlights how we've moved from just observing life to,
well, actively understanding it, manipulating it, maybe even redefining it in some ways.
So the thought to leave you with is, as these tools get even more powerful and accessible, what completely new insights are just around the corner?
About life, about disease, about what makes us us.
The revolution definitely isn't over yet.
Thank you so much for joining us on this deep dive.
We really appreciate you being part of the Last Minute Lecture family.
ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.
Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.
Support LML ♥Related Chapters
- DNA, RNA & Flow of Genetic InformationBiochemistry
- DNA & Molecular Structure of ChromosomesPrinciples of Genetics
- DNA Replication MechanismsiGenetics: A Molecular Approach
- DNA Structure and AnalysisEssentials of Genetics
- DNA: The Chemical Nature of the GeneGenetics: A Conceptual Approach
- DNA: The Genetic MaterialiGenetics: A Molecular Approach