Chapter 5: Quantitative & Experimental Methods in Psychiatry

Search this chapter

Audio Overview

0:00 / 0:00

Autoplay next chapter

Welcome to Last Minute Lecture.

This free chapter overview is designed to help students review and understand key concepts.

These summaries supplement not replaced the original textbook and may not be redistributed or resold.

For complete coverage, always consult the official text.

Okay, let's unpack this.

We're diving deep today into what you might call the engine room of psychiatric research.

That sounds about right.

Because, you know, if you want to understand why researchers might say certain things increase depression risk or how they prove a new drug works.

You absolutely have to understand the tools they're using, the methods.

Exactly.

That's right.

Our source material today is basically the playbook for psychiatric investigation.

It focuses on the quantitative and experimental methods used to find and validate psychiatric discoveries.

So our mission here.

Well, our mission is to take this complex college -level textbook material, the foundation of the field really, and boil it down into clear essential concepts for you.

Makes sense.

We're concentrating on two main areas.

Epidemiology, that's how we measure disease rates.

And then the statistical and experimental design methods, how findings are actually interpreted and proven solid.

Right.

So let's start with epidemiology.

It's basically the study of how often diseases pop up in populations, how those rates shift.

And what factors might be driving those changes.

Yeah.

And it's crucial because mental disorders, they often follow chronic patterns, right?

So the models look more like those for long -term medical issues, not like, say, the flu.

Precisely.

Stuart L.

Morris outlined some classic uses for this field.

Okay.

Like what?

Well, one key use is historical study, looking back over time.

But this is incredibly tricky because diagnostic practices changed so radically after the 1970s.

Right.

Apples and oranges almost.

Kind of.

But when researchers can make comparisons or look at trends, they can project future needs.

For example, with the baby boomers aging in the US.

We're seeing predictions about increases in age -related disorders.

Exactly.

Like major neurocognitive disorder, Alzheimer's type.

We expect proportional increases, that projection, that's epidemiology in action.

And that big picture view then informs things like community diagnosis, right?

Estimating the societal cost.

Yes.

The burden of illness, lost work productivity, health care costs, things like that.

And it also ties into health services utilization.

Which is where we see this huge gap, isn't it?

The sources really highlight a massive unmet need for mental health treatment globally.

They do.

But what's fascinating and kind of unique to psychiatry is that the very concept of treatment need is, well, it's unresolved.

How so?

Think about it.

If you break your leg, the need for treatment is obvious.

But for mental disorders, the diagnostic line can be fuzzy.

It often depends on how much function is impaired.

And symptoms can just go away on their own sometimes.

Right.

Mild symptoms might spontaneously remit.

So defining who truly needs treatment at a population level, it's uniquely challenging.

Okay.

So if that need is hard to pin down in the whole community, how do researchers get an accurate clinical picture?

You mentioned a caveat.

Selection bias.

Yes, a crucial one.

If your research sample only includes people already getting treatment, say from clinics or hospitals, that group is fundamentally different from the wider community.

Because they're likely the more severe cases.

Typically, yes.

More severe symptoms, longer illness duration, and often more comorbid conditions, having multiple disorders at once.

And this leads to Berkson bias.

Exactly.

Berkson bias.

It's when you draw faulty conclusions about risk factors.

Because the rates of comorbidity are artificially inflated in your treated or hospitalized sample compared to people out in the community.

You're basically studying only the sickest slice.

So to get beyond just observation and avoid these biases, epidemiologists need clear ways to measure things.

Metrics.

Okay.

We rely on two core measures of frequency.

First up is prevalence.

That's the number of existing cases in a population during a specific time window.

And because psychiatric symptoms can fluctuate a lot day to day.

Researchers often prefer something like one month prevalence over just point in time prevalence, gives a slightly broader picture.

We also use lifetime prevalence.

Which sounds useful for risk factors.

It is for studying long -term risks.

But it's very vulnerable to recall bias.

Meaning people forget.

Yeah.

Especially older folks trying to remember episodes from decades ago.

It gets hazy.

Okay, so prevalence is existing cases.

But that's only part of the story.

We also need to know about new cases appearing, right?

That's incidence.

Exactly.

Incidence measures the number of new cases that arise within a specific period.

And sometimes researchers look specifically for the very first time someone gets a disorder.

Yes.

That's called first incidence.

But since most mental disorders are relatively uncommon, finding enough first incidence cases requires enormous long -term studies.

Very expensive.

Very complex.

Now this is where it gets really interesting for me.

The math's behind it.

Prevalence, incidence, and duration are linked, aren't they?

They are.

There's a relationship.

Prevalence is roughly proportional to incidence multiplied by duration.

Think of it as P is approximately I times D.

Okay.

P equals I times D.

So if you have a treatment for a chronic incurable condition, a treatment that helps people live longer.

It extends the duration of the illness for those individuals.

And so it can actually increase the prevalence.

Even if the number of new cases stays the same.

Precisely.

The source uses the example of AIDS.

When effective treatments arrived, they extended lives dramatically.

Which counterintuitively caused AIDS prevalence rates to go up.

Right.

Because people were living longer with the condition.

It shows why just looking at prevalence alone for public health planning can be really misleading.

Wow.

Okay.

What about when disorders occur together?

Comorbidity.

Yeah.

Comorbidity.

Sometimes it might just be random chance that someone has two disorders.

But often we see non -random comorbidity.

Meaning the rate of disorder A is significantly higher in people who already have disorder B.

Exactly.

This suggests they might share underlying risk factors.

Or potentially that what we call two separate conditions might actually be the different expressions of the same core underlying issue.

Like depression and anxiety often peering together.

That's a classic example.

Depression, anxiety, somatization.

They frequently show high comorbidity.

But, and this is vital just because two things are associated.

Like female sex being a risk factor for major depression.

It does not mean one causes the other.

Causation is complex.

We usually talk about a web of causation involving many interacting factors.

So how do we quantify the strength of these associations, these risk factors?

We use re -shows.

The source mentions table 5 .11, which would show things like relative risk.

That's a ratio comparing the incidence rate in an exposed group to the incidence rate in an unexposed group.

Okay.

Or if you can't easily measure incidence, like in case control studies, you often use the odds ratio.

It approximates relative risk.

And if that ratio is greater than one.

Then the exposure is considered a risk factor.

It increases the likelihood of the outcome.

Now the strength of any study, any finding hinges entirely on whether it can actually measure the disorder accurately in the first place.

Which was a huge problem historically, right?

Before DSM -III in 1980.

A massive problem.

Psychiatric disorders lacked clear agreed upon definitions.

And since we rarely have objective biological tests, diagnosis still relies heavily on self -report and clinical judgment applying specific criteria.

These are approximations of the underlying illness.

Wasn't there a famous study that highlighted this?

The US -UK study?

Ah, yes.

The landmark US -UK diagnostic project.

It was a game changer.

Before standardized criteria, diagnostic practices were all over the place.

Like the US diagnosing schizophrenia more often.

And the UK diagnosing affective disorders more.

Exactly.

But when clinicians on both sides were trained to use the same standardized interview, the present state examination, PSE, poof.

Those huge diagnostic differences disappeared.

Wow.

So it wasn't the patients.

It was the lack of standard criteria.

Precisely.

It proved the absolute necessity of objective criteria.

This momentum directly led to DSM -III, which moved away from vague descriptions to specific, observable, symptom -based criteria.

And that dramatically improved reliability, didn't it?

The consistency of measurement.

Hugely.

When we talk reliability, we mean consistency.

Like, test, retest reliability, getting the same result if you measure again later.

And in clinical work, iterator reliability is crucial.

Do two different clinicians reach the same diagnosis?

Absolutely.

And to measure that agreement properly, we use statistics like the co and kappa coefficient.

It calculates agreement, but adjusts for the amount you'd expect just by chance.

Is there a benchmark for kappa?

Like, what's considered good enough?

The source notes that a kappa of 0 .4 or higher is generally considered adequate reliability for a diagnostic tool in this context.

It's not perfect, but it's a reasonable standard.

Okay, so that's reliability consistency.

What about validity?

Does the test measure what it's supposed to measure?

Exactly.

That's validity.

One key type is criterion validity.

How well does our measure say a quick questionnaire stack up against an external gold standard criterion?

Usually that standard is something like an in -depth semi -structured clinical interview done by an expert.

And validity involves two key terms, sensitivity and specificity.

Right.

Sensitivity is the proportion of actual cases that your test correctly identifies.

Think true positives.

And specificity.

That's the proportion of non -cases that your test correctly identifies as negative.

True negatives.

You ideally want both to be high for an accurate test.

These concepts then drove the development of specific tools for big studies, right?

Yes.

Tools for large -scale surveys.

A major early one was the diagnostic interview schedule.

That was for the ECA program.

Yes, the Epidemiologic Catchment Area Program.

The DIS was revolutionary because it was fully structured, designed so late interviewers, not just clinicians, could administer it.

That allowed for massive scale.

And then came others.

Later came the CAI, the Composite International Diagnostic Interview.

It built on the DIS and has been used worldwide, like in the WHO World Mental Health Surveys.

Okay.

And then there's the SCID, the Structured Clinical Interview for DSM.

This one is semi -structured and considered a gold standard diagnostic assessment, but it requires a trained clinician or research professional to administer it.

So who administers the test really matters for the kind of data you get?

It absolutely dictates the balance between scale, cost, and the depth or potential reliability of the information.

Okay, now we wade into statistics.

This is often where people's eyes glaze over, but it's fundamental.

The jargon can be dense.

It can.

When researchers analyze data, the most common approach traditionally has been the frequentist approach, hypothesis testing.

And this involves thinking about errors, right?

Type I and type II.

Correct.

A type I error denoted by alpha is when you mistakenly reject a true null hypothesis.

You basically claim there's an effect like a drug works when there really isn't one.

And we usually set alpha low, like 0 .05 to minimize that chance.

Typically, yes, 5 % chance of a false positive.

Then there's type II error, beta, that's failing to reject a false null hypothesis, missing a real effect, saying the drug doesn't work when it actually does.

Okay.

At the flip side of beta, $1 beta is the study's power.

That's the probability of correctly detecting a true effect if it exists.

But the concept that trips everyone up is the p value.

Ah, the infamous p value.

Right.

So I see a p value less than 0 .05, say p kills 0 .01.

I know that means statistically significant, but you're saying it's not the probability that my finding is true or that the null hypothesis is false.

That is correct.

And that misunderstanding is the trap.

So what is the danger of misinterpreting that 0 .01?

Why does it matter?

The danger is unwarranted certainty over confidence.

The correct but admittedly complex interpretation of a p value like 0 .01 is this.

If the null hypothesis were actually true, meaning no real effect exists at all, there would only be a 1 % chance of observing a result as extreme as or more extreme than what you actually found just due to random variation.

Okay.

That's convoluted.

It is.

It's a statement about the data conditional on the null being true.

It is absolutely incorrect to flip that around and say there's only a 1 % chance the null hypothesis is true.

The frequentist approach doesn't allow that interpretation because it treats the hypothesis itself as fixed,

not probabilistic.

So misinterpreting it makes findings seem much stronger than they might be.

Exactly.

Especially in studies with low statistical power, a significant p value might still be more likely to be a false positive than people realize.

Is there an alternative approach that gives a more direct probability about the hypothesis itself?

Yes, there is.

That's the Bayesian approach.

How does that work?

Bayes' theorem combines your prior probability distribution, essentially, your belief about the likelihood of an effect before seeing the new data with the likelihood of the observed data itself.

And this calculation yields a posterior distribution.

And the crucial difference is within the Bayesian framework, you can interpret results as the probability that the null hypothesis is true or false given the data and the prior.

It aligns more intuitively with how we often think.

And it's controversial.

The main controversy surrounds the choice of the prior.

Critics argue that because the researcher chooses the prior belief, it introduces subjectivity that can bias the outcome.

Proponents argue it makes assumptions explicit.

It's an ongoing debate.

Okay.

Shifting gears slightly to study design.

If you want to prove something causes something else, the gold standard is?

The randomized clinical trial, or RCT.

And why is randomization so powerful?

Because when done properly, randomly assigning participants to either the treatment or control group makes the treatment assignment independent of any patient characteristics, known or unknown.

It balances the groups on average.

Which lets you isolate the effect of the treatment itself.

Correct.

It breaks the link between patient factors and outcomes, allowing for stronger conclusions about causality.

It minimizes bias.

But even the best RCTs run into real -world problems like people dropping out, missing data.

Absolutely inevitable.

And how you handle missing data depends on why it's missing.

There are different mechanisms.

MCIR is missing completely at random.

Like data lost due to a random computer glitch.

It's generally not problematic for bias.

Simple enough.

Then there's MAR, missing at random.

This means the likelihood of data being missing depends on other observed variables you have measured.

For example, maybe sicker patients whose severity you can measure are more likely to drop out.

You can often adjust for this statistically.

And the tricky one.

That's non -ignorable missing data.

This is when the probability of data being missing depends on the unobserved value itself.

For instance, if people stop reporting their mood specifically because their mood got very low, this is very hard to handle properly.

And researchers sometimes use shortcuts to fill in gaps.

Like LOCF.

Last observation carried forward.

Yes.

And the source warns against these simple incutation methods.

Why is LOCF risky?

Because it makes a strong, often wrong assumption that the patient's condition just stayed the same after their last measurement.

By carrying that last value forward, it artificially reduces the variability of the spread in the data.

Making the results look cleaner than they are.

Exactly.

It shrinks the estimated variance, which can make statistical tests look more significant than they should be.

It can lead to seriously misleading conclusions about treatment effects.

Yikes.

Okay, this connects to another distinction.

Efficacy versus effectiveness.

A crucial distinction.

Efficacy studies are your typical RCTs.

Conducted under ideal, highly controlled conditions.

Pristine settings.

Carefully selected patients.

Good for proving something can work.

Yes.

But the findings might have limited generalizability to messy, real -world clinics.

Because the patients are often too clean,

no comorbidities, good adherence, etc.

So that's where effectiveness studies come in.

Precisely.

Effectiveness studies aim to see how well a treatment works under typical, real -world, usual care conditions.

They include the complex patients.

Often, yes.

They try to include patients with comorbidities, varying levels of adherence, the kind of patients clinicians actually see every day.

This gives a much better sense of how the treatment will likely perform in practice.

Hashtag tag outro.

So pulling all these threads together, the epidemiology, the statistics, the study design, where does it ultimately lead?

A time example is the global burden of disease.

GBD studies.

Right, the GBD project.

What was their big insight?

They recognized, really profoundly, that the burden of disease isn't just about dying prematurely, it's also about living with disability.

Years spent unwell.

And they came up with a metric for that.

They did.

The Disability Adjusted Life Year, or DAILY.

The DAILY?

How does that work?

It's a combined measure.

It sums up the years of life lost, YLL, due to premature death, and the years lived with disability, YLD, weighted by severity.

And this completely changed how we view the impact of mental illness, didn't it?

Absolutely.

Because while mental disorders might not always be the top causes of death, YLL, they are massive contributors to disability, YLD.

So what did the GBD data show?

Well, the GBD 2019 findings, for example, showed that mental disorders think depressive disorders, anxiety disorders, substance use disorders, collectively accounted for about 5 % of the total global disease burden measured in DAILYs.

And that 5 % was mostly driven by?

Primarily driven by while these years lived with disability.

It cemented the understanding that mental disorders impose a huge societal burden through chronic impairment and reduced quality of life, even when they don't directly cause early death.

It's a powerful application of all these quantitative methods we've discussed.

Okay, so a final thought for our listeners to chew on.

We've talked about complexities, defining disorders, measuring reliably, proving cause.

How might the rise of machine learning and these enormous datasets like the NCS or NESARC surveys start to change things?

Could they reshape the very definitions and criteria we use for psychiatric diagnosis in the coming years?

That's a fascinating and incredibly important question.

How will big data and AI interact with these foundational methods?

Definitely something to watch.

Indeed.

Well, thank you for joining us on this deep dive into the quantitative methods that power psychiatric science.

It's been illuminating.

My pleasure.

See you next time.

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

Chapter SummaryWhat this audio overview covers

Scientific investigation in psychiatry requires systematic methodological approaches that transform clinical observations into reliable evidence supporting diagnosis, treatment, and understanding of mental disorders. Randomized controlled trials establish causal relationships by randomly assigning participants to intervention or control conditions, creating comparable groups that allow researchers to attribute outcome differences directly to the treatment under study. Cohort studies follow defined populations over time to identify risk factors and track disease progression, while case-control studies work backward from outcome to exposure by comparing individuals with and without psychiatric conditions to uncover potential causes. Cross-sectional designs capture population snapshots at single points in time, providing prevalence estimates and associations without temporal sequencing. Converting subjective psychiatric experiences into measurable constructs requires psychometric instruments and symptom rating scales that operationalize complex phenomena like mood disturbance, anxiety, or cognitive impairment into quantifiable dimensions. These measurement tools must demonstrate diagnostic validity and sensitivity to treatment changes, enabling clinicians and researchers to track meaningful shifts in clinical status. Statistical methods transform raw observational data into inferential conclusions by testing hypotheses, calculating effect sizes that quantify intervention strength, and distinguishing true relationships from random variation. Experimental psychiatry extends beyond human studies through animal models that simulate psychiatric conditions in controlled laboratory settings, neuroimaging techniques that visualize brain architecture and regional activation patterns, and genetic research strategies that examine heritability and molecular pathways underlying susceptibility to mental illness. Methodological rigor demands careful attention to sample size justification, blinding procedures that prevent bias, replication protocols that verify findings across independent studies, and control of confounding variables that might offer alternative explanations. Research involving human subjects carries ethical obligations including informed consent, protection from harm, and oversight by institutional review bodies. Psychiatric advancement depends fundamentally on integrating quantitative precision with clinical relevance, transforming systematic research into practice improvements that benefit patients.

Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.

Support LML ♥

Chapter 5: Quantitative & Experimental Methods in Psychiatry

Related Chapters