Chapter 5: Selecting Research Participants

Search this chapter

Audio Overview

0:00 / 0:00

Autoplay next chapter

Welcome to Last Minute Lecture.

This free chapter overview is designed to help students review and understand key concepts.

These summaries supplement, not replace, the original textbook and may not be redistributed or resold.

For complete coverage, always consult the official text.

Okay, let's untack this.

You want to know what a massive group of people think or do say all college students in the country or everyone who uses a certain app.

But there are millions of them.

You can only ever gather data from a small fraction.

How on earth do you pick that small group so that what they tell you is actually true for the giant group you really care about?

That right there is arguably the most critical challenge in conducting research.

You're interested in a large population, but you must work with a smaller sample.

The fundamental question is how to select that sample so it truly reflects the larger population.

Get this wrong and your findings might be completely off the mark.

And that's our mission today.

We're diving deep into the absolute core of research design.

Step four in the process we've talked about before.

Selecting your participants.

Yeah, a crucial step.

We're pulling insights from Chapter 5 of Research Methods for the Behavioral Sciences 6th Ed.

Our goal is to, well, cut through the jargon and give you the essential concepts and techniques researchers use for this selection process.

We'll see why it matters so much for the validity of findings and how to spot potential red flags like, you know, bias.

It's all about ensuring your small sample acts as an honest mirror for that big population you're trying to understand.

To start, let's nail down the two key players in this whole scenario.

First, the population.

Think of the population as the entire group of individuals you're interested in.

If you want to study the impact of screen time on teenagers,

your population is likely all teenagers.

The whole group.

Exactly.

It's almost always too large, too spread out, or just plain impossible to study everyone.

Got it.

The big, sometimes theoretical group you want to make conclusions about.

Precisely.

And then there's the sample.

This is the smaller group you actually select from that population to be in your study.

You collect your data from them.

You run your analyses on their responses or behaviors.

Right.

And the entire point of doing the research on the sample is the hope that you can then take the results you find and generalize them back to that larger population.

The connection sample drawn for the population, results generalized to the population, is the foundational relationship you have to keep in mind.

So we have the population we dream of studying and the sample we actually study.

This brings us to the sampling challenge itself.

How do you make sure that sample is a good mirror?

Well, as you touched on, practicality is the main driver.

You simply can't test a new therapy on every single person with anxiety or interview every small business owner in the country.

Just impossible.

Right.

There are limits on time, money, access, and resources.

You have to sample.

And the source makes an important distinction about different levels of population.

There's the target population, which is the broad group you're ideally interested in,

say,

all adults aged 1865 with a specific health condition.

Yes, the group defined by your research question.

But then there's a practical layer, the accessible population.

This is the portion of that target population that you can actually get your hands on for recruiting participants.

So maybe people nearby or people you have contact lists for.

Exactly.

If you're at a university, your accessible population might be students at that university or perhaps patients at clinics affiliated with the university.

Ah, okay.

So your sample is drawn from the accessible population.

Precisely.

And this creates a critical potential gap.

You draw your sample from the accessible population and you want to generalize your results back to your target population.

Right.

But if your accessible population isn't truly representative of your target population, say, your university students are younger or wealthier than the average adult with a health condition,

then your generalizations are already on shaky ground.

That's a crucial point.

It forces you to be really clear about who you could study versus who you want to make claims about.

It does.

And it leads directly into the concept of representativeness.

Okay.

This is just how well the characteristics of your sample match the characteristics of the population you're cutting about.

So a representative sample looks a lot like the population it came from.

Correct.

And a biased sample looks noticeably different.

Maybe your sample has too many men or they're all from one region or they're all volunteers who are particularly enthusiastic.

And bias can happen by pure chance, I guess, like picking names randomly and accidentally getting all women.

It can, yeah, though that's less likely with larger samples.

But it's much more often due to selection bias or sampling bias.

This happens when your sampling procedure itself favors selecting certain individuals over others.

Like how you recruit people.

Exactly.

Recruiting participants from a specific online forum about gardening will give you a very biased sample if you're trying to study the general public's opinions on climate change, for instance.

Right.

The method of recruitment baked the bias right in.

Precisely.

What about sample size?

Is there a magic number?

Do you just need more people?

Well, there's no universal magic number.

But the principle captured by the law of large numbers is fundamental.

Larger samples are much more likely to be representative of the population than smaller ones.

That makes intuitive sense.

It's harder for chance to create a weirdly skewed sample if you pick a lot of people.

Exactly.

And if you look at how the average from your sample varies from the true population average, that difference shrinks as sample size increases.

So it gets more accurate.

Right.

The accuracy improves quite rapidly up to a point and then the rate of improvement slows down considerably.

I remember seeing that improvement slows dramatically around 30 or so.

Is that why a common guideline in behavioral sciences is often 25, 30 individuals per group or condition?

Yes.

That guideline often comes up, especially when you're comparing different treatment groups or conditions in an experiment.

But sample size needs very hugely, depending on your specific research question, how much variability there is in the population and how precise you need your estimate to be.

So like a political poll needs way more.

Oh, yeah.

A political poll aiming for a small margin of error needs hundreds.

Maybe even over the thousand, not just 30.

There's also an ethical tightrope walk with sample size, isn't there?

Absolutely.

It's an ethical issue if your sample is unnecessarily large because you're using more people's time and resources than needed.

Wasting resources.

Conversely, if your sample is too small, your study might not have enough statistical power to detect a real effect if one exists.

Meaning the study was kind of pointless from the start.

It could mean that, yeah.

That means you've subjected people to the study procedures and wasted their time and resources on a study that was unlikely to yield meaningful results from the start.

Researchers grapple with this, often using power analysis to estimate the minimum needed sample size.

So a practical strategy is to look at similar successful studies in your field and see what sample sizes they use.

That's a very common and sensible approach.

It gives you a benchmark for what's considered adequate for similar questions, while keeping in mind that larger samples do increase your chances of both representativeness and finding statistically significant results if they are there.

All right.

So selecting participants is vital, riddled with potential bias, and the number matters.

How do researchers actually do the selecting?

There are two main roads.

Probability sampling and non -probability sampling.

These are the fundamental categories for the process of sampling the methods used to select who's in your study.

What's the dividing line between them?

What's the core difference?

The key difference is whether you know the exact odds or probability of selecting any individual from the population.

Probability sampling methods require that you know the exact size of the population, can list all its members, and that every individual has a known non -zero probability of being selected using a truly random process.

Random process meaning unpredictable, where each outcome is equally likely, like flipping a perfect coin.

Exactly.

For probability sampling to work in theory, you need those conditions.

A known population size and list,

a specifiable probability for each person, and unbiased random selection.

That sounds incredibly difficult for large real -world populations in behavioral science.

Do you really have a list of every single person in the U .S.

with a certain opinion or every child in California?

Often you don't.

And this is precisely why probability sampling, while theoretically ideal for ensuring representativeness across the whole population, is less commonly used in behavioral sciences compared to fields like large -scale government surveys or political polling.

It's often just not practical.

Right, which leads to the other main approach.

Exactly.

If you can't meet those strict requirements, you take the other road.

Non -probability sampling.

Okay.

And what defines that?

Yes.

With non -probability sampling, you don't know the population size or you can't list everyone, so you can't calculate the probability of selecting any specific individual.

The selection isn't random across the entire population.

It's based more on accessibility, convenience, or controlling certain characteristics of your sample.

And this carries a higher risk of getting a biased sample.

Generally, yes, because you're not relying on true random chance to balance out characteristics across the whole population.

However, due to those practical constraints we mentioned,

non -probability methods are very common in behavioral sciences, and researchers use strategies to try and minimize the bias.

Let's look at the probability methods first, then.

The goal is using randomness to get that representative sample, sometimes with specific rules to improve the chances of representation compared to pure randomness.

The most straightforward is simple random sampling.

Where everyone in the population has an equal and independent chance of being picked.

Sounds fair.

Right.

Conceptually simple.

Define your population, get a list of everyone in it, assign each person a number, and then use a random process, a random number generator, drawing from a hat to select your sample from that list.

And there's the difference between sampling with replacement and without replacement.

Yeah, good point.

With replacement means you put the selected person back in the pool so they could potentially be chosen again.

This keeps the probability of selection constant for every choice, ensuring independence.

Without replacement means once they're selected, they're out.

So no duplicates.

Right.

It guarantees you don't have duplicates, but it technically changes the probability for the remaining people with each selection.

But without replacement is usually used in practice for large populations.

Because the change is tiny?

Yes, exactly.

Because the change in probability becomes negligible with large lists and it avoids duplicates, which is usually desired.

So simple random sampling sounds perfect.

No research or bias in selection.

What's the catch?

The main one is that while the process is unbiased, pure chance doesn't guarantee a You could, by random chance alone, end up with a sample that doesn't quite match the population proportions on some variable.

Like flipping a coin 10 times and getting 8 heads perfectly random,

but not a perfect 50 -50 reflection of the coin's true probability.

Gotcha.

That's where other probability methods step in to add a bit more control.

Like systematic sampling.

Exactly.

Here, you also start with a list of the population.

You pick a random starting point on the list, and then you select every nth individual after

Where n is your sampling interval, calculated by dividing the total population size by your desired sample size.

So if you need 50 people from a list of 500, n is 10, and you select every 10th person after a random start.

Precisely.

That's the procedure.

The source notes this technically violates the principle of independence.

How so?

It does.

Once you've selected person number 7, you know for sure you're not picking 8, 9, 10, 11, etc.

until you get to number 17.

Ah, I see.

The selection of one person determines who can't be selected nearby, and who will be selected later in the sequence.

However, it's generally considered a robust probability method that produces representative samples, especially when the list isn't ordered in a way that correlates with the variable you're studying.

Okay.

Moving on to methods that explicitly try to ensure subgroups are represented.

There's stratified random sampling.

Here, you divide your population into specific subgroups, or strata, based on some relevant characteristic, perhaps, age groups,

gender,

geographic region, or income level.

Like layers.

Exactly.

Then, you draw equal -sized, simple, random samples from each of those subgroups and combine them.

So if I wanted to study opinions across income levels, I divide the population into low, medium, and high -income strata, and then randomly pick, say, 50 people from each stratum, even if the low -income stratum is much larger in the population than the high -income one.

That's the core idea.

It's incredibly useful when your research specifically aims to describe or compare those subgroups.

Right.

Ensures you have enough people in each bucket.

It guarantees you'll have enough people in each category to do that meaningful comparison, which simple random sampling might not provide if one subgroup is very small.

But the trade -off is it can give you a distorted picture of the overall population if those subgroups aren't equally represented in reality.

If only 10 % of the population is high -income, but they make up a third of your sample, your sample doesn't represent the population's true income distribution.

Precisely.

And as you noted, because individuals in smaller strata have a higher chance of being selected than those in larger strata, it's not true simple random sampling where everyone has an equal chance overall.

To fix that distortion, there's proportionate stratified random sampling.

This method still divides the population into strata, but instead of taking equal -size samples, you determine the actual proportion of each stratum in the population.

And then sample so that the proportions in your sample exactly match those population proportions.

Okay, so if the college population is 40 % female and 60 % male, and I want a sample of 200, I'd randomly select exactly 80 females from the female list and 120 males from the male list.

You've got it.

This is the gold standard for ensuring your sample's composition accurately reflects the population's composition, commonly used in large surveys and political polls.

It sounds like a lot more work, because you need precise population percentages for each stratum.

That data might be hard to get.

It definitely requires more detailed population data upfront.

And a potential downside is that very small subgroups in the population will result in very small numbers in your sample.

If a group is only 1 % of the population and you need a sample of 100, you'll only get one person from that group.

That one person can hardly be considered representative of that specific subgroup.

Yeah, one person doesn't tell you much about a whole group.

Finally, we have cluster sampling.

This is fundamentally different because you don't select individuals directly.

You randomly select groups or clusters of individuals that already exist.

Like selecting 10 different neighborhoods in a city, or picking 20 classrooms from a school district, and then including everyone in those selected neighborhoods or classrooms.

That's the process.

You list all the potential clusters in your population, randomly select a number of those clusters, and your sample consists of all the individuals within the chosen clusters.

That sounds incredibly efficient for getting a large sample quickly, especially if the population is geographically spread out.

That's its main advantage.

Data collection can be done on whole groups at once.

It's very practical in that sense.

But the potential issue is that individuals within a cluster might be more similar to each other than to people in other clusters.

Like students in the same classroom might have the same teacher or socioeconomic background.

Their scores might not be truly independent.

That's the primary concern.

It can compromise the assumption of independence that many statistical tests rely on.

People within a cluster might influence each other too.

And researchers often combine these methods, right?

Like a stratified approach for regions, and then cluster sampling within those regions.

Combined strategy sampling is very common to leverage the advantages of multiple techniques and tailor the sampling to complex population structures.

Okay, deep breath.

Probability sampling methods, simple, random, systematic, stratified, proportionate stratified, cluster and combinations.

They rely on randomness and knowing the population structure to achieve representativeness, but can be complex and time consuming.

Right.

Now let's turn to the non -probability sampling methods.

The underlying principle here is convenience and sometimes control over composition, but without that full population list or true random selection across everyone.

The most common, especially in behavioral science, seems to be convenience sampling.

Absolutely.

The most frequent.

That's exactly what it sounds like.

You sample individuals who are easy to access and willing to participate.

Like using introductory psychology students at your university or people who respond to an online ad or volunteers from a community center, the people right in front of you.

Exactly.

The people who are right there, available and say yes.

The source highlights how this method can lead to highly biased samples.

Think about call -in polls on TV.

Only people watching that channel at that moment who feel strongly enough to call are represented.

Totally unrepresentative.

Or a survey mailed to subscribers of a specific magazine, again, a very specific group.

It's sometimes called accidental or haphazard sampling, which really highlights the potential for bias.

No population list needed.

No randomness.

High risk of bias.

Despite that risk, it's widely used because it's practical, fast, and often much cheaper.

The key is how researchers try to mitigate that risk and how they report what they've done.

How do they try to limit the bias, then?

The source suggests a couple of important strategies.

First,

within your accessible group, try to ensure reasonable representativeness.

How?

Well, if you're using students from your university, try to get a mix of ages, genders, majors, etc., rather than just the first 50 people who show up.

The assumption here is that samples from similar typical locations,

a standard state college, a general community center, might be comparable unless the location is clearly unique, like a performing arts school.

So don't assume your convenience sample is representative of the target population, but try to make it representative within the accessible group you're drawing from.

Exactly.

The second and arguably even more important strategy is complete transparency.

Researchers need to provide a really clear, detailed description of exactly how they obtained the sample and the characteristics of the participants.

Like reporting the age range, gender breakdown, specific location, and recruitment method.

We got these people from this place this way.

Precisely.

Reporting 30 undergraduates age 18 -22 from an Introduction to Psychology course at Midwest State University, refuted via sign -up sheet, allows anyone reading the research to judge for themselves how applicable those findings might be to other groups.

It doesn't eliminate bias, but it manages it by informing the reader.

The other non -probability method is quota sampling.

How's that different?

This adds a layer of control to convenience sampling.

You identify specific subgroups you want represented and set target numbers quotas for each.

Then you recruit participants using convenience methods until you meet those quotas for each subgroup.

So if I need 20 men and 20 women, I'll just stop recruiting men once I have 20, even if I'm still looking for women.

Or vice versa.

Right.

You're controlling the final composition of your sample on certain variables,

mimicking the idea of stratified sampling, but without the random selection within the subgroups.

You can set quotas for equal numbers or for proportions that match known population percentages, similar to the stratified methods.

But the crucial difference from probability stratified sampling is that how you fill those quotas is still based on convenience, still grabbing whoever's easy to get.

Absolutely.

You're not randomly selecting from a list of all men or all women in the population.

You're just taking the first 20 available and willing men, and the first 20 available and willing women from your accessible pool.

Got it.

The source notes it's sometimes called convenience stratified sampling, and emphasizes that the method is more important than the name to understand the actual process and its limitations.

And like probability methods, non -probability methods can be combined.

You could, for instance, systematically sample every 10th person walking into a mall.

Which is a kind of convenience plus systematic approach, yeah.

Or select clusters of students from a couple of convenience schools you have access to.

Right.

Combining convenience with a cluster idea.

So non -probability methods are practical and widely used, relying on convenience and controlled composition via quotas, but they inherently carry a higher risk of sample bias because selection isn't random across the entire population you might want to generalize to.

That's the core trade -off, yes.

Okay.

That was a thorough unpacking.

We've done a deep dive into selecting research participants, which is a fundamental step in the research process.

Absolutely foundational.

We covered why sampling is necessary.

The critical distinction between target populations, accessible populations, and samples.

And the paramount importance of representativeness versus the dangers of bias.

We walked through the five primary probability sampling methods.

Simple, random, systematic, stratified, proportionate, stratified, and cluster sampling.

Explaining their processes, their strengths in aiming for representativeness using randomness, and their practical challenges.

And we explored the two major non -probability methods, convenience sampling and quota sampling, which are more practical and widely used in behavioral sciences, while acknowledging their higher risk of bias and the strategies researchers use to manage that risk through careful selection within accessible groups and, critically, through clear reporting.

Understanding these methods is absolutely essential for evaluating the quality and generalizability of any research findings you encounter.

The sampling method is a core component of the research design, directly impacting the data collection process and the conclusions you can draw.

We've really focused on the core mechanics and implications of this selection process as presented in the chapter.

Covered the key concepts, the different designs, how data collection relates, the different types.

That was incredibly helpful in laying out the landscape of participant selection.

Really clarified the different approaches and their pros and cons.

Good.

Before we wrap up, here's a final thought to take away.

Given how difficult true population -wide random sampling is in practice, and how common non -probability methods are, especially convenience sampling.

How does knowing about the potential biases inherent in these samples change the way you approach research findings reported in everyday life?

Whether it's a news headline about a poll or a claim based on a study you see online.

What's the first question you should ask yourself about any study's result now that you know all this?

Something to ponder as you encounter research out in the world.

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

Chapter SummaryWhat this audio overview covers

Selecting appropriate research participants forms a cornerstone of empirical inquiry, directly determining whether findings can be meaningfully applied beyond the immediate study. Researchers must navigate the conceptual distinction between the target population, which encompasses all individuals they theoretically wish to understand, and the accessible population, representing those practically available for inclusion. The quality of participant selection hinges on whether the resulting sample genuinely mirrors population characteristics or introduces systematic distortions that undermine research conclusions. Two contrasting approaches structure sampling decisions. Probability sampling methods, where each potential participant carries a known and calculable likelihood of selection, include simple random sampling achieved through unbiased mechanical processes, systematic sampling that selects participants at regular intervals from an ordered list, stratified random sampling which partitions the population into relevant subgroups before conducting randomization within each stratum, and cluster sampling that randomly selects naturally existing groups rather than individuals. Nonprobability sampling methods, lacking mathematically determined selection probabilities, encompass convenience sampling relying on readily available participants and quota sampling attempting to match population proportions through deliberate researcher decisions. Every sampling approach produces measurable consequences. Sampling error, the divergence between sample statistics and true population parameters, varies predictably with method and sample size. Selection bias emerges when the sampling process systematically excludes certain population segments or overrepresents others, fundamentally compromising the ability to draw valid inferences. Probability methods generally offer stronger justification for generalizing results to broader populations, yet nonprobability approaches persist in real-world research contexts where logistical constraints, financial limitations, or restricted access make ideal sampling impractical. Researchers must ultimately weigh methodological preferences against operational realities, employing strategic adjustments when constraints prevent implementation of gold-standard procedures while remaining transparent about resulting limitations on generalizability.

Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.

Support LML ♥

Chapter 5: Selecting Research Participants

Related Chapters