Chapter 3: Exploring Proteins & Proteomes
Welcome to Last Minute Lecture.
This free chapter overview is designed to help students review and understand key concepts.
These summaries supplement not replaced the original textbook and may not be redistributed or resold.
For complete coverage, always consult the official text.
Okay, let's unpack this.
Our mission today is a deep dive into, well, the very foundation of molecular biology,
proteins.
They are the workhorses, you know, the catalysts, the signalers, the structural supports.
But the real question is, how do we as biochemists actually study them?
We're looking at a huge stack of sources here and they're all about the essential toolkit, the actual methods that let us go from just a blob of tissue to a complete atomic level understanding of a single molecule's function.
That's right.
And this dive really focuses on
the crucial chemical and physical methods that you need to isolate, to identify, to sequence, and then finally to determine the three -dimensional structures of these, well, tens of thousands of distinct molecules that define life.
So we're moving beyond just saying proteins are important.
Exactly.
We are exploring the analytical techniques that let us define their function and structure in atomic detail.
And the sources, they open with a really critical concept, a distinction you hear a lot, but it's so worth emphasizing.
The difference between the genome and the proteome.
We're all obsessed with the genome, but why is the proteome the truly complex dynamic story we need to crack?
Well, the genome is the hardwired static blueprint.
In humans, it's about 23 ,000 genes, and fundamentally every single cell in your body has the same genome.
Okay, the instruction manual.
The instruction manual.
But the proteome is the dynamic functional expression of that information.
It includes not just the inventory of proteins present, but their functions, their interactions, and this is critical, their chemical modifications.
So if the genome is the script, the proteome is the play currently being performed on stage.
Precisely.
And that performance is constantly changing.
And that's where the complexity just explodes.
It really does.
The proteome is not a fixed characteristic.
Because it represents the functional state of the cell, it varies constantly with cell type.
I mean, a liver cell's proteome is totally different from a nerve cell's.
It changes throughout development and it reacts immediately to the environment, like if hormones are present or nutrients change.
And on top of that, nearly all gene products are proteins, and they can be modified in hundreds of ways after translation.
Wow.
So the proteome is just vastly larger and more complicated than the genome could ever suggest.
And understanding this dynamic reality requires high resolution techniques.
So our exploration has to begin with the most fundamental and honestly often the most tedious challenge,
getting a single protein completely alone.
Right.
The classic saying, never waste pure thoughts on an impure protein.
So before we can determine function or structure, we have to isolate our target.
Purity is everything.
But if you start with this
crude soup of thousands of proteins, how do you even know if you're isolating the right one as you go through some complex purification scheme?
That's where the assay comes in.
It's a specific quantitative test for a unique identifying property of the protein.
And the goal is to find an assay that's as specific, sensitive, and as rapid as possible because you're going to have to perform it hundreds of times throughout the purification.
A positive result tells you, okay, the protein is here and it's active.
So for enzymes, which are protein catalysts, the assay would just measure its activity, right?
Its ability to do its job.
Exactly.
It measures the ability to promote a specific chemical reaction.
Let's make that concrete.
Can you walk us through the actual steps of an assay for a specific enzyme, something like lactate dehydrogenase?
Absolutely.
So lactate dehydrogenase catalyzes the conversion of lactate to pyruvate.
But here's the trick.
The key to the assay isn't measuring the lactate or the pyruvate directly.
That's hard.
Okay.
So what do you measure?
Instead, we measure the change in the associated coenzyme.
In this reaction, a molecule called NAD plus is reduced to NADH.
And the chemical property we exploit is super simple.
NAD plus does not absorb light at a wavelength of 340 nanometers, but NADH strongly does.
Uh -huh.
That is a perfect biochemical hack.
So you don't measure the main event, you measure the easily detectable side reaction that happens at the same time.
Precisely.
The assay is literally just measuring the increase in light absorbance at 340 nanometers over a short period of time, say one minute after you add the enzyme.
And the rate of that increase tells you how much active enzyme is there.
It's directly proportional.
This gives us our measure of what we call total activity in standardized units.
And once we have that activity, we need to know how effective our separation is, you know, compared to all the junk we've left behind.
And that's where specific activity comes in.
Correct.
We quantify the efficiency using specific activity, which is just the ratio of that total enzyme activity to the total amount of protein in the mixture.
So it's a measure of purity.
It is.
As we purify our target protein, its concentration relative to all the other contaminating proteins should go up dramatically.
So the ultimate goal of purification is just to maximize this ratio until it hits a constant value.
And when it stops increasing.
That's when you know you've likely achieved homogeneity, a sample containing only one type of molecule.
Okay.
So we have a way to measure our progress.
Now we have to actually break open the cell and start separating things.
You start with a crude homogene.
That's right.
And the first sort of broad stroke separation is differential centrifugation.
This technique just exploits differences in the density and sedimentation rate of all the cellular components.
So walk us through that.
Let's imagine we've just ground up some liver tissue.
Okay.
So we start with a pretty low centrifugal force, maybe 500 times the force of gravity, 500 XG for about 10 minutes.
This gives you a dense little pellet at the bottom, which is usually the heaviest stuff like nuclei and cytoskeletal bits.
And you check the liquid on top, the supernatant.
Right.
If our target protein is still in that liquid, we crank up the force.
And that pulls down smaller, less dense material.
Exactly.
We increase the force maybe to 10 ,000 XG for 20 minutes.
Now the pellet has the next densest layer, so mitochondria and lysosomes.
And you just keep going.
You keep going.
A really big increase, maybe up to 100 ,000 XG for an hour.
Pellets, the microsomal fraction, that's just bits of fragmented ER and Golgi.
And what's left in the final supernatant, the soluble proteins, the cytoplasm.
And at each step, you're running your assay.
You have to.
You assay each fraction, find the one that's enriched with your target, and you take that fraction and move on to the more discriminating techniques.
Okay.
So once we have our enriched fraction, the real work begins.
And we start using more discriminating methods based on four characteristics.
Solubility, size, charge, and binding affinity.
Let's start with solubility control or salting out.
Right.
Salting out exploits the fact that proteins are less soluble at very high salt concentrations.
But to really get the mechanism, you have to remember how proteins interact with water.
They're normally surrounded by this hydration shell of water molecules that keeps them soluble.
And when we add a bunch of salt, like ammonium sulfate, what happens to that water shell?
Well, when you add high concentrations of salt, the salt ions basically start competing with the protein for that water.
The salt ions start stripping away the water molecules from the protein's stabilizing hydration shell.
Okay.
This exposes the hydrophobic sort of oily patches on the protein surface, which then aggregate with the exposed hydrophobic patches of their neighbors.
And the whole complex just crashes out a solution.
It precipitates.
So it's driven by hydrophobic interactions when the water shield is gone?
The amount of salt needed is different for every protein.
Absolutely.
Something like fibrinogen precipitates at only 0 .8 -membr ammonium sulfate, while serum albumin needs 2 .4 -membr.
This gives you a really nice, clean initial fractionation step.
Okay.
So after salting out, we have our protein, but it's mixed with a ton of salt.
We need to get rid of that salt without losing the protein.
That is the job of dialysis.
It's a beautifully simple and non -destructive technique.
We just put the protein -salt mixture inside a semi -permeable membrane, basically a cellulose bag with tiny pores.
So the pores are small enough to trap the protein?
Right.
They're sized to be slightly smaller than the smallest protein, but much, much larger than salt ions.
The protein is trapped, and the contaminants, the salt ions, just diffuse down their concentration gradient out into the surrounding salt -free buffer.
And the protein is left behind clean.
Exactly.
You just repeat it a few times until the salt is gone.
Now we get to the high -resolution, column -based methods, starting with separating by size using gel filtration chromatography.
This is a fantastic and often pretty counterintuitive method.
The separation depends on whether the protein can get inside the packing material.
The packing material being these little porous polymer beads.
So walk me through the mechanics here.
Why do the biggest molecules flow out the fastest?
Okay.
So imagine the column packed with these porous beads is like an obstacle course.
The small molecules are able to enter the internal labyrinth of these beads.
So they go inside the beads?
They do.
And this forces them to take this long, winding, torturous path.
They get slowed down by taking this massive detour, which means they emerge from the column last.
And the big guys?
The large protein molecules, the ones that are bigger than the pore size, they can't get in at all, they're excluded.
They are forced to stay in the solution that's flowing between the beads.
Ah, the express highway.
The express highway.
They flow rapidly and emerge first.
Gel filtration separates based purely on size with the biggest coming up first.
That analogy of the express highway versus the labyrinth really helped.
Okay.
Next up, separating proteins by charge.
This is ion exchange chromatography.
It leverages the protein's overall net charge at a specific pH.
You just have to choose the pH pairfully so that your target protein has a charge opposite to the charge on the column matrix.
So if we're hunting for a really positive protein, we'd use a negatively charged column, which is cation exchange.
That's right.
If the beads have a negative charge, say from carboxymethylcellulose, then the positively charged proteins, the cations will bind tightly, and the negatively charged proteins?
They just flow right through.
They flow right through.
They elute first.
Then, to get our bound positive proteins off, we just increase the salt concentration.
And the salt ions compete with the proteins for those binding spots on the beads.
Precisely.
We flush it with a high concentration of sodium ions, which are positive.
These sodium ions compete with the positive groups on the protein for binding to the beads.
The proteins with a low density of positive charge come off first, and the ones with the highest charge come off last.
And the reverse, and in exchange, just uses positively charged beads to bind negative proteins.
Same principle, opposite charges.
Okay.
Finally, the ultimate level of selectivity, affinity chromatography.
This takes advantage of a protein's natural biological function.
This is arguably the most powerful technique because it exploits the high specific affinity of a protein for a particular chemical group.
Maybe it's substrate or a cofactor or a tightly binding inhibitor.
So you stick its biological partner onto the beads, the target protein gets trapped, and everything else just washes away.
Exactly.
You attach the binding partner, the ligand, covalently to the column beads.
Your target protein binds super tightly, everything else is flushed out.
Then to release the target, you add a high concentration of that same ligand, but this time it's soluble.
So the free -floating ligand competes for the binding site.
It displaces the column -attached residues from the binding sites on the protein, and your pure protein flows out.
You can get massive purification levels in a single step with this.
Now affinity chromatography used to be limited because you had to know the protein's binding partner.
But you mentioned a fantastic synergy with recombinant DNA.
Yes, the affinity tag revolution.
This is a game changer.
Recombinant technology lets us genetically engineer a little sequence that codes for an affinity tag.
A short string of amino acids right onto our protein.
The most common one is the histag.
A string of histidines.
A string of six to eight histidine residues.
And what do these histags do?
They bind very tightly to immobilized metal ions, usually nickel, that we've attached to the column beads.
So the tagged protein gets trapped.
It gets trapped.
Everything else flows through.
Then we easily elute the target by adding a competitor molecule like imidazole, which just mimics the binding properties of histidine.
It's a simple universal trick that lets you use affinity chromatography even for a brand new protein whose partners are totally unknown.
Speaking of resolution and speed, let's talk about the supercharged version of these column techniques.
High -performance liquid chromatography or HPLC.
HPLC is basically taking the physics of gel filtration or ion exchange and just optimizing it for extreme resolution.
The performance comes from the column material itself.
We use incredibly fine, highly divided column particles.
Why does finer material lead to better resolution?
Well, smaller particles mean the packing material has a vastly greater surface area, and so many more interaction sites within the same column length.
This increased uniformity in surface area just dramatically improves the resolving power.
The catch is, because the material is so fine, the buffer can't flow through by gravity, so you have to apply high pressure to force it through.
And the result is sharper peaks, faster separation.
Much higher resolution, sharper, narrower peaks, and much more rapid separation.
We track the output with a UV detector, often at 220 nanometers, because that's the wavelength where the peptide bond itself absorbs light.
So you get these beautiful, distinct peaks for each protein.
You do.
Using HPLC, you could separate a mixture of five standard proteins in about 10 minutes.
It turns what used to be an all -day separation into a quick analysis.
So column chromatography is great for prep work, getting large amounts of pure protein.
But how do we quickly visualize the success of our purification plan and analyze the properties of the proteins we've isolated?
That's where gel electrophoresis comes in.
It separates molecules based on charge and size in an electric field.
The basic idea is that the velocity of migration depends on the electric field strength, the net charge on the protein, and a frictional coefficient.
And the gel itself, usually polyacrylamide, acts as a molecular sieve.
Critically, it acts as a sieve.
Okay, let's talk about the absolute standard for separating proteins by mass, SDS -PAGE.
The key innovation here seems to be getting rid of differences in charge and native shape.
That is the whole nugget.
You treat the protein mixture with sodium, dodecyl sulfate, or SDS, which is an anionic detergent.
You also add a reducing agent, like beta mercaptoethanol.
What does the reducing agent do?
That breaks the covalent disulfide bonds, which ensures the protein fully unfolds and separates into its individual polypeptide genes.
Okay, and the SDS.
What's its job with the charge?
The SDS is the genius step.
It disrupts almost all the non -covalent interactions, completely denaturing the protein into a linear rod.
But more importantly, SDS binds to the main chain at a constant ratio, about one SDS anion for every two amino acid residues.
So it blankets the protein in negative charge.
It blankets it, creating this massive uniform net negative charge that's directly proportional to the length of the protein chain, which means it's proportional to its mass.
So you've essentially turned every protein, no matter what its original charge or shape was, into a uniform negatively charged rod.
Exactly.
So since all the proteins now have roughly the same charge to mass ratio, the electric force pulling them toward the positive electrode is basically uniform.
Which means their mobility is determined almost entirely by size.
Almost entirely by their size as they move through that polyacrylamide sieve, small proteins zip through, large ones get retarded.
And the mobility is linearly proportional to the logarithm of their mass, which makes it a really powerful quantitative tool.
Okay, so that separates by mass.
But we can also separate proteins based purely on their intrinsic charge properties.
That's isoelectric focusing.
This method separates proteins based on their isoelectric point, or PI.
Remind us what the PI is.
The PI is the specific pH at which a protein's net charge is zero.
And of course, when the net charge is zero, its electrophoretic mobility is also zero.
It stops moving.
So how do you set up the environment for that?
You do the electrophoresis in a gel that has a stable pre -established pH gradient.
When you apply the voltage, each protein moves through that gradient until it reaches the precise spot in the gel where the local pH exactly equals its PI.
At that point, it stops dead.
So a basic protein with a high PI would stop near the negative end, and an acidic one would stop near the positive end.
Precisely.
And this is incredibly resolving.
It can distinguish proteins that differ in PI by only 0 .01 units.
Separated by mass is powerful.
Separated by charge is powerful.
What happens when you put them together?
You get the extraordinary resolving power of two -dimensional electrophoresis.
This is still a vital technique for initial explorations in proteomics.
So describe how that two -dimensional map is actually made.
Okay, so the first dimension is separation by charge isoelectric focusing.
You run the protein sample horizontally along a narrow tube gel, separating it by PI.
Then you take that entire gel strip and you carefully place it across the top of a second larger gel slab that contains SDS polyacrylamide.
And then the second dimension is run perpendicularly to the first.
Correct.
When you reapply the voltage, this time vertically, the proteins that were already separated by charge are now separated based on mass, perpendicular to that original separation.
And the result is a 2D map of spots.
It's a two -dimensional map where each spot represents a single protein defined by both its PI and its mass.
A technique this powerful can resolve well over a thousand different proteins in a single cellular sample.
This sounds absolutely essential for research comparing, say, a healthy cell versus a diseased cell.
Oh, absolutely.
Researchers use it all the time to compare protein extracts from normal tissue versus tumor tissue.
You just compare the 2D maps and differences in the intensity of individual spots are immediately visible.
So if a spot is brighter in the tumor sample?
It means that protein is upregulated in the disease state.
For instance, if a specific glycolytic enzyme spot is dramatically brighter in the tumor tissue, that gives you an immediate functional clue about the tumor's metabolism.
Before we move on, let's quickly revisit purification tracking because you mentioned this constant balancing act.
You have to track your efficiency quantitatively.
This tracking is the quantitative backbone of any purification.
You track five key metrics with the crude homogenous as your 100 % baseline.
You measure total protein and total activity.
And those will always go down.
They'll always go down as you lose material.
But the two core metrics of efficiency are specific activity and yield.
Specific activity, activity per mass, which you want to maximize.
And yield is the percentage of activity you've managed to retain.
Right.
And the fifth is purification level, which is just the increase in specific activity relative to the start.
And looking at the simulated data, we see that trade -off, you called it the Faustian bargain of purification.
It truly is.
The initial steps like salt fractionation, they give you low purification, maybe only threefold, but a very high yield, like 92 % because they're just crude filters.
But then the more discriminatory steps like affinity chromatography, they offer massive purification.
We saw 3 ,000 fold in the example, but they always come with a significant cost in yield.
It might drop to 65%.
So a good scheme doesn't just chase maximum purity at all costs.
No, not at all.
It has to balance a high degree of purification with an acceptable yield.
High purity and poor yield leaves you with no protein to study, but high yield and low purity means your sample is still contaminated.
And the final confirmation is looking at the SDS page gel.
It is.
A successful scheme shows fewer and fewer bands at each step, with your target band becoming dramatically more intense as the purification level rises.
We also have ultracentrifugation for both separation and some really fine -grained analysis of physical properties like mass and shape.
Yes.
This technique using extremely high rotational speeds is really valuable for precise analysis.
We quantify the particle movement using something called the sedimentation coefficients, which is measured in Svedberg units.
And a Svedberg unit is a unit of time, right?
It is, where 1s equals 10 to the minus 13 seconds.
It reflects just how slowly these things sediment, even under incredible force.
So what are the main factors that determine how quickly a particle sediments?
There are three main factors.
First, mass etusions.
More massive particles are driven faster.
Second, shape.
The frictional coefficient is crucial.
An elongated asymmetrical particle sediment slower than a compact spherical one of the same mass because it experiences more drag.
Makes sense.
And third is the density relationship.
Particles only sink if they are denser than the medium.
And analytically, this leads us to something called sedimentation equilibrium.
This is a really powerful method for getting an accurate mass determination without denaturing the protein.
So the protein stays in its native shape.
Exactly.
Use centrifuge at a lower speed so that the force of sedimentation pulling the particle down is perfectly counterbalanced by the force of diffusion, pushing it back up the concentration gradient.
And when that equilibrium is reached, the result tells you the mass.
Yes.
At equilibrium, the shape of the final concentration gradient depends only on the mass of the particle.
And because the protein is not denatured, the native quaternary structure, how multiple subunits are assembled, is preserved.
So you can determine the mass of the whole intact complex.
Right.
And then you compare the mass you get from sedimentation equilibrium, the native multimer, to the mass you get from SDS page, which is the denatured subunits.
Ah, and that comparison gives you the stoichiometry.
That's the crucial insight.
If the intact complex has a mass of 120 kilodana and your SDS page shows a single band at 30 kilodana, you know immediately that the native protein is a tetramer.
It's made of four identical chains.
So purification removes the protein from its native context.
That's necessary for structure.
But what if we need to know what that protein is doing inside the cell, where it actually lives and interacts?
This is where immunology steps in, using the incredible specificity of antibodies.
This is a really powerful pivot.
The specificity of the antibody lets us tag a specific protein for isolation, quantification, or visualization right there in vivo.
And an antibody is just a protein made by the immune system in response to a foreign substance, the antigen.
Right.
And the antibody recognizes a tiny specific feature on that antigen called the antigenic determinant, or epitope.
The binding is based on complementary shape recognition, and it's extremely tight.
OK.
So if we inject a protein antigen into an animal, we get a mixture of antibodies.
That's the polyclonal approach.
Yes.
Polyclonal antibodies are heterogeneous, because the antigen usually has several different epitopes.
And so the animal produces many different antibody clones, each recognizing a different site.
Which is OK for crude detection, but not for precise analysis.
Exactly.
For that, you need monoclonal antibodies.
And the monoclonal antibody breakthrough is just revolutionary.
Oh,
absolutely.
Milstein and Kohler's hybridoma technique was one of the most important developments in modern biochemistry.
They realized that normal antibody -producing spleen cells are short -lived, but cancer -derived myeloma cells are immortal.
So the ultimate biological hack.
You fuse the specificity of the spleen cell with the immortality of the cancer cell.
They literally fuse the two cell types to create what are called hybridoma cells.
These immortal cells can then be grown indefinitely to produce vast quantities of a single identical monoclonal antibody that recognizes just one specific epitope.
So you have a tailor -made, highly specific tool for research.
You do.
You can attach them to beads for affinity chromatography and achieve unbelievably selective purification for incredibly scarce proteins.
So once we have these specific antibodies, how are they used to detect and quantify proteins in a high -throughput way?
We turn to the ELISA, the enzyme -linked immunosorbent assay.
This technique just links an antibody to an enzyme that converts a colorless substrate into a colored product.
So color leads your target is present.
Right.
And it's fast, convenient, and fantastically sensitive.
It can detect less than a nanogram of a specific protein.
Okay, let's break down the two main formats, starting with the indirect ELISA.
Indirect ELISA is used to detect the antibody itself.
A really common application is screening for viral infection like HIV.
You coat the well surface with the viral antigen.
If the patient's antibodies are present in their blood sample, they bind to that antigen.
Then you add a second enzyme -linked antibody that is specifically designed to recognize human antibodies.
And if that binds, you get color.
You get color production, and it's proportional to the amount of the patient's antibody that was present.
Okay, and the sandwich ELISA is the format used to detect the antigen, the protein, directly.
Right.
In the sandwich format, you start by coating the well with an antibody that's specific to your target antigen.
You add your sample, and the antigen gets trapped.
Okay.
Then you add a second, different enzyme -linked antibody, also specific for the antigen, but recognizing a different epitope.
So the antigen is now effectively sandwiched between the two antibodies.
And the resulting color is proportional to the amount of antigen present.
Exactly.
It allows for very, very high sensitivity quantification.
Next up, Western blotting, which combines the power of SDS -PAGE with the specificity of the antibody.
Western blotting is what you use when you need to confirm the identity and the mass of a specific protein within a really complex mixture.
You start with your SDS -PAGE separation.
Right.
Then the resolved proteins are transferred, or blotted, from that soft gel onto a rigid polymer sheet.
Why you need to do that transfer?
The polymer sheet just makes the proteins more stable and accessible for the antibody reactions than when they're embedded deep inside the gel matrix.
Once they're on the sheet, you introduce the primary antibody, which binds only to the specific protein band you're interested in.
And then the secondary antibody provides the actual detection.
Exactly.
After you wash away the unbound primary, you add a secondary antibody that recognizes the primary and is tagged with fluorescence, or an enzyme, or even a radioactive marker.
When you activate it, only the band corresponding to your protein of interest lights up.
So you can confirm its presence, measure its mass, and quantify its abundance all at once.
All in one shot.
And finally, using these tools to visualize proteins inside a cell, moving beyond the test tube entirely.
We use fluorescent markers to track proteins in their natural habitat.
So fluorescence microscopy uses antibodies tagged with fluorescent dyes.
That's immunofluorescence.
Okay.
For instance, an antibody -targeting actin can be stained, and it reveals these intricate arrays of parallel bundles that form the cell's internal scaffolding, the cytoskeleton.
It's beautiful.
And the most important visualization innovation in the last few decades has to be green fluorescent protein, GFP.
Oh, GFP and its colored variants, they totally revolutionize live cell imaging.
We can genetically fuse the gene for our protein of interest to the GFP sequence.
So the protein itself becomes fluorescent.
When it's expressed in the cell, the resulting fusion protein is fluorescent, which lets us track its location and monitor dynamic changes in real time.
That real -time tracking must provide functional insights that purification alone could never give you.
It does.
A classic example is the mineralocorticoid receptor protein.
By fusing it to a yellow variant of GFP, researchers could actually watch what happened.
Without its steroid hormone, cortisol, the receptor just sits dormant in the cytoplasm.
But as soon as they added the hormone, they could watch the receptor physically translocate, move into the nucleus, and bind to DNA.
It was visual proof of its role as a hormone -activated transcription factor.
We've purified, we've quantified, we've visualized.
Now, let's get down to the ultimate identification.
Yeah.
Determining the mass and, most crucially, the amino acid sequence.
This leads us to mass spectrometry, or MS.
Right.
And this technology has completely transformed proteomics in the last two decades.
What is it, fundamentally?
MS is a highly precise and sensitive analytical technique that measures the mass -to -charge ratio or millis of gaseous ions.
The real power is its accuracy and sensitivity.
It can identify a protein without you needing any prior knowledge of its identity.
Okay.
Every MS system has an ion source, a mass analyzer, and a detector.
The biggest historical challenge for proteins was the ion source, right?
How do you get a big nonvolatile protein into the gas phase without just destroying it?
That was the breakthrough that won the Nobel Prize.
Two techniques solved this.
The first is MELDI, matrix -assisted laser desorption ionization.
Here, you embed your protein analyte in a volatile matrix compound.
A pulsed laser fires onto the matrix, which vaporizes and transfers charge to the analyte, gently creating gas phase ions.
And the second method.
That's electrospray ionization, or ESI.
You pass a solution of the analyte through an electrically -charged nozzle, creating fine -charged droplets.
As the solvent evaporates really rapidly, the droplets shrink until you're left with the ionized analyte.
So once the protein is ionized, it enters the mass analyzer.
Let's use the time of flight, or TOF analyzer as our example.
How does that separate ions based on mass?
In a TO analyzer, the ions are all accelerated by a fixed electrostatic potential.
They all receive the same kinetic energy.
The physics just dictates that for ions with the same net charge, the lighter ions will reach the detector at the end of the flight tube, faster than the larger, more massive ions.
By accurately measuring the time it takes for each ion to travel that path, we can calculate its mass -to -charge ratio.
The accuracy here is just astounding.
But how does MS help us with sequencing?
I mean, that used to be a long chemical process like Edmond degradation.
It was.
Edmond degradation, which sequentially removed and identified the N -terminal amino acid, was really limited because the yield drops off after about 50 residues.
Mass spec, through a technique called tandem mass spectrometry, or MSMS, just dramatically increased the speed and reliability of sequencing short peptides.
So MSMS uses two mass analyzers.
Walk us through how you get a sequence from that.
Okay.
So the first analyzer selects a specific target peptide.
We call that the precursor ion.
This ion is then directed into a collision cell where it's fragmented by bombarding it with an inert gas like argon.
This collision breaks the peptide bonds in predictable ways.
Not randomly.
Not randomly.
And those fragments are passed to the second analyzer.
So how do we read the sequence from that resulting spectrum of fragments?
The key is that the peptide breaks yield predictable fragments called product ions.
Most commonly, the breaks happen right at the peptide bond itself.
Since the cleavage results in this ladder of ions, each one differing by a single amino acid residue removed from one end, the mass difference between the sequential peaks in the resulting spectrum corresponds directly to the precise mass of the amino acid residue that was lost.
That's a powerful logic puzzle.
You're just reading the sequence by the mass differences between the peaks.
Exactly.
You're reading the sequence by mass difference rather than by chemical tagging, and it's exponentially faster than the older methods.
But proteins are often hundreds or thousands of residues long.
Since MSMS is still limited to short peptides, how do we sequence a full protein?
We have to cleave the protein into short, manageable peptides first, using sequence -specific reagents.
That preparation is crucial.
And what are the primary tools for that specific cleavage?
We use both chemical reagents and proteolytic enzymes.
A key chemical is cyanogen bromide, which specifically cleaves on the carboxyl side of every methionine residue.
And for enzymes, the most common is trypsin, which cleaves on the carboxyl side of lysine and arginine.
Or chymotrypsin, which cleaves after bulky nonpolar residues like tyrosine and phenylenine.
So once we have these two sets of short sequenced peptides, we have to put them back in the correct order.
How do you solve that massive ordering puzzle?
You solve it using overlap peptides?
Imagine the entire protein sequence is a sentence.
If you digest it once with trypsin, cleavage set A, you get these short ordered fragments A1, A2, A3.
You know their individual sequences, but you don't know the order they go in.
So you digest the same protein separately with a different enzyme like chymotrypsin to get set B.
Exactly.
Cleavage set B gives you new fragments, B1, B2, B3, cut at different spots.
The trick is that the fragments in set B must overlap multiple fragments from set A.
So a fragment from set B might contain the end of one A fragment and the beginning of the next A fragment.
That's it.
For example, if you find that the end of fragment A1 is contained within fragment B2, and the beginning of fragment A2 is also in B2, then you know the order has to be A1 followed by A2.
By analyzing all these overlaps, you can piece together the entire original primary structure.
Okay, here's the big question in the age of rapid genome sequencing.
Since we can derive the amino acid sequence directly from the DNA sequence, why even bother with the complex destructive chemical analysis of the protein itself?
Because genomic and proteomic analyses are absolutely complementary.
They are not redundant.
Also.
The sequence you deduce from DNA is that of the nascent protein, the direct product of translation.
Only chemical analysis of the purified mature protein reveals the crucial post -translational modifications, or PTMs.
And what are some of these PTMs that the genome misses?
Well, PTMs are the chemical decorations that determine function.
They include trimming of the protein ends, cleavage of a larger precursor into its active form, the formation of desulfide links between cysteines, or specific side -chain alterations like phosphorylation, which acts like an on -off switch.
So without the chemical analysis, you have the blueprint, but no idea how the final functional machine is assembled or regulated.
Exactly.
Finally, mass spec has also enabled the field of proteomics to identify entire complexes at once, using something called peptide -mass fingerprinting.
Yes.
This method leverages the uniqueness of the protein sequence.
Every single protein has a unique genetic sequence, which means when you cleave it with a specific enzyme, like trypsin, the set of peptide fragments you get has a distinct and unique mass signature.
So you don't even have to sequence every fragment.
You just record the list of masses and treat it like a barcode.
That's a perfect analogy.
You cleave a complex protein mixture, you determine the masses of all the resulting fragments by MS, and you match that mass signature against the predicted fragment masses derived from the sequences in the entire genome database.
And that allows incredibly rapid identification of all the components.
All the components in a large macromolecular complex, even ones that were previously unknown.
It was famously demonstrated in the analysis of the yeast nuclear pore complex.
So before we get to structure, let's look at one final prep technique.
Building custom peptides.
We just talked about breaking them down.
How do we build them with precision?
This is the automated solid phase method developed by our Bruce Merrifield.
The core innovation was just brilliant.
He overcame the horrible difficulty of purifying chemical intermediates by anchoring the growing peptide chain to an insoluble resin.
So you anchor the peptide and just wash away all the leftover reagents.
You do.
It's a fantastic hack.
The C -terminal amino acid is first anchored covalently to a solid inert resin bead.
And critically, its alpha amino group is temporarily protected by a blocking group.
And then the chain starts growing.
First, you remove that protecting group.
Second, you add the next protected amino acid, which is activated by a chemical called DCC, and that facilitates the peptide bond formation.
Because the growing peptide chain is stuck to the insoluble bead, all the excess reagents and side products just get washed away.
Eliminating all those intermediate purification steps must make the process so much faster and more efficient.
Oh, absolutely.
The cycle is just repeated for each amino acid addition, often by automated machines.
And then finally, a strong acid is used to cleave the anchor and release the completed highly pure peptide from the resin.
And these synthetic peptides are really powerful research tools.
Invaluable.
They can serve as antigens to generate specific antibodies.
They can help identify hormone receptors.
And fundamentally, they let biochemists ask really precise questions about protein folding.
Does this specific sequence of amino acids intrinsically fold into an alpha helix when it's all alone?
OK, now for the grand finale.
Seeing the protein's three -dimensional structure, this is the key determinant of its function, its specificity, its mechanism of action.
We rely on two principal techniques.
Right.
X -ray crystallography and NMR spectroscopy.
So X -ray crystallography was the first technique capable of resolving protein structures in atomic detail.
It was.
And X -rays are the ideal probe because their wavelength is about the same length as a covalent bond, about 1 .5 angstroms, which gives you the necessary resolution.
But the analysis first requires a highly ordered protein crystal.
Why is that crystal structure so essential to the process?
The crystal provides order.
It's a fixed repeating arrangement where all the protein molecules are oriented identically.
When the X -rays hit that crystal, the electrons in the atoms scatter the waves.
And because the scatterers are arranged in this repeating pattern, the scattered waves reinforce one another when they're in phase, and that creates a specific measurable pattern of spots or reflections on a detector.
So that pattern of spots is the raw data.
How does a scientist get from that diffraction pattern to an actual 3D map of the atoms?
This is where the mathematically intensive step comes in.
The Fourier transform.
We don't have lenses for X -rays to focus the beams into an image like with visible light.
So instead, the Fourier transform is a mathematical calculation that's applied to the measured amplitudes and phases of all those observed reflections.
Can you give us an analogy for what the Fourier transform is doing?
Think of it like a computational inverse lens.
We know the scattered waves hit the detector in these specific patterns.
The Fourier giant's form just translates the information encoded in those wave patterns in those spots back into a three -dimensional reconstruction, which we call the electron density map.
And that map shows where the electrons are.
It shows where the electrons are most localized, which in turn defines the positions of the atoms in the molecule.
And the quality of that map is all about the resolution.
A resolution is absolutely paramount.
A low -resolution map, say at 6 angstroms, is really blurry.
It only reveals the coarse overall path of the polypeptide chain.
As the resolution increases, say to 2 .8 to 4 angstroms, you start to see groups of atoms.
But to truly see individual atoms and visualize bond angles and side chains atomic detail, you need the highest resolution possible, typically between 1 .00 and 1 .5 angstroms.
And that requires near -perfect crystals.
The crystal quality is always the limiting factor.
So X -ray crystallography gives us a snapshot of the molecule in a solid static crystal.
NMR spectroscopy provides a structure and solution, which can give us crucial insight into a protein's dynamics and flexibility.
That structural dynamic information is the key selling point for NMR.
It relies on the magnetic properties or spin of certain atomic nuclei, primarily the proton.
When you apply a powerful external magnetic field, these nuclei can exist in two energy states.
You then apply a radiofrequency pulse at the resonant frequency to flip the spin state.
How does that tell you anything about the chemical structure?
It relies on the concept of chemical shifts.
The electrons flowing around the nucleus create a little local magnetic field that shields the nucleus from the big applied field.
The degree of that shielding depends entirely on the surrounding chemical environment.
So a proton in one environment will resonate at a slightly different frequency than a proton in another.
Exactly.
A proton on an aliphatic chain versus a proton near an aromatic ring, for instance, they'll resonate at different frequencies.
Analyzing these chemical shifts gives you information about the local structure.
But local information isn't enough to define the whole 3D structure.
For that, you need long -range distance information.
For that, we use a technique called nuclear overhauser enhancement spectroscopy, or NOESY.
This technique is brilliant because it detects pairs of protons that are physically very closeless, then about five angstroms apart, regardless of how far apart they are in the primary sequence.
So you could identify a relationship between a proton on residue 10 and a proton on residue 100, telling you that a loop has folded back on itself.
Precisely.
When you run a two -dimensional NOESY spectrum, the crucial information is in the off -diagonal peaks, or cross -peaks.
These cross -peaks identify the pairs of protons that are close in space.
Those distance constraints are then used to calculate the final 3D structure.
Yes.
With hundreds of these precise distance constraints, the 3D structure can be computationally reconstructed.
The structure calculation program basically ensures that all the protons identified by those NODO -ESY cross -peaks are held within that five angstrom constraint.
Now, the final result of an NMR analysis is often shown not as a single model, but as a family of related structures.
Why is it an ensemble?
There are a few reasons.
First, the distance constraints are approximate, not absolute points.
But more importantly, because NMR is done in solution, the final family of structures reflects the dynamic reality of the protein.
The protein is constantly moving, vibrating, adopting slightly different low -energy conformations.
The family of structures visualizes that dynamic range.
And these two methods have stocked the Protein Data Bank, the PDB, with tens of thousands of structures, constantly enriching our understanding of molecular recognition, catalysis, and evolution.
So we started this deep dive asking how biochemists study proteins.
And we've seen this remarkable interwoven journey from brute force homogenization, grappling with that Faustian bargain of yield versus purity, to using recombinant hacks, like his tags and specific antibodies to track proteins inside a living cell, and then finally deploying light and magnetism in mass spec and NMR to map structure and atomic detail.
And these techniques are so deeply interdependent.
Recombinant DNA gives us the affinity tags that aid purification.
Sequencing relies on specific chemical cleavage.
And NMR benefits massively from isotope labeling that's made possible by recombinant methods.
The result is this dynamic understanding of the proteome, a picture that's just far more complex and nuanced than the static genome could ever convey.
What really stands out to me is the sheer speed and resolution that's possible now, resolving thousands of proteins in a single 2D gel, sequencing peptides in seconds with MSMS, and solving structures on a daily basis.
The molecular architecture of life is just being revealed faster than ever before.
Indeed, we know the structures of tens of thousands of proteins, and that number just keeps climbing exponentially.
But here's an important question to mull over as you contemplate this enormous structural data set.
Knowing the precise atomic 3D structure often helps us predict a protein's function.
But structure alone doesn't always tell us when or where that function is activated or regulated inside the cell.
So considering the sheer volume of structures we determine now, how can future biochemical research effectively integrate this structural deluge with the dynamic, real -time context of the constantly changing living cell?
That challenge contextualizing structure dynamically, that's where the next great set of biochemical breakthroughs must occur.
A great thought to chew on as you reflect on this deep dive into the world of protein analysis.
We appreciate you bringing these sources to us.
ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.
Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.
Support LML ♥Related Chapters
- Amino Acids and the Primary Structures of ProteinsPrinciples of Biochemistry
- Techniques in Cell & Molecular BiologyKarp's Cell and Molecular Biology
- Culturing & Visualizing CellsMolecular Cell Biology
- Protein Primary Structure & SequencingHarper's Illustrated Biochemistry
- Protein Structure & FunctionMolecular Cell Biology
- Amino Acids, Peptides, and Proteins: Structure, Properties, and PurificationLehninger Principles of Biochemistry