Sampling Methods In Reseach: Types, Techniques, & Examples

Saul Mcleod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul Mcleod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Learn about our Editorial Process

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

Sampling methods in psychology refer to strategies used to select a subset of individuals (a sample) from a larger population, to study and draw inferences about the entire population. Common methods include random sampling, stratified sampling, cluster sampling, and convenience sampling. Proper sampling ensures representative, generalizable, and valid research results.
  • Sampling : the process of selecting a representative group from the population under study.
  • Target population : the total group of individuals from which the sample might be drawn.
  • Sample: a subset of individuals selected from a larger population for study or investigation. Those included in the sample are termed “participants.”
  • Generalizability : the ability to apply research findings from a sample to the broader target population, contingent on the sample being representative of that population.

For instance, if the advert for volunteers is published in the New York Times, this limits how much the study’s findings can be generalized to the whole population, because NYT readers may not represent the entire population in certain respects (e.g., politically, socio-economically).

The Purpose of Sampling

We are interested in learning about large groups of people with something in common in psychological research. We call the group interested in studying our “target population.”

In some types of research, the target population might be as broad as all humans. Still, in other types of research, the target population might be a smaller group, such as teenagers, preschool children, or people who misuse drugs.

Sample Target Population

Studying every person in a target population is more or less impossible. Hence, psychologists select a sample or sub-group of the population that is likely to be representative of the target population we are interested in.

This is important because we want to generalize from the sample to the target population. The more representative the sample, the more confident the researcher can be that the results can be generalized to the target population.

One of the problems that can occur when selecting a sample from a target population is sampling bias. Sampling bias refers to situations where the sample does not reflect the characteristics of the target population.

Many psychology studies have a biased sample because they have used an opportunity sample that comprises university students as their participants (e.g., Asch ).

OK, so you’ve thought up this brilliant psychological study and designed it perfectly. But who will you try it out on, and how will you select your participants?

There are various sampling methods. The one chosen will depend on a number of factors (such as time, money, etc.).

Probability and Non-Probability Samples

Random Sampling

Random sampling is a type of probability sampling where everyone in the entire target population has an equal chance of being selected.

This is similar to the national lottery. If the “population” is everyone who bought a lottery ticket, then everyone has an equal chance of winning the lottery (assuming they all have one ticket each).

Random samples require naming or numbering the target population and then using some raffle method to choose those to make up the sample. Random samples are the best method of selecting your sample from the population of interest.

  • The advantages are that your sample should represent the target population and eliminate sampling bias.
  • The disadvantage is that it is very difficult to achieve (i.e., time, effort, and money).

Stratified Sampling

During stratified sampling , the researcher identifies the different types of people that make up the target population and works out the proportions needed for the sample to be representative.

A list is made of each variable (e.g., IQ, gender, etc.) that might have an effect on the research. For example, if we are interested in the money spent on books by undergraduates, then the main subject studied may be an important variable.

For example, students studying English Literature may spend more money on books than engineering students, so if we use a large percentage of English students or engineering students, our results will not be accurate.

We have to determine the relative percentage of each group at a university, e.g., Engineering 10%, Social Sciences 15%, English 20%, Sciences 25%, Languages 10%, Law 5%, and Medicine 15%. The sample must then contain all these groups in the same proportion as the target population (university students).

  • The disadvantage of stratified sampling is that gathering such a sample would be extremely time-consuming and difficult to do. This method is rarely used in Psychology.
  • However, the advantage is that the sample should be highly representative of the target population, and therefore we can generalize from the results obtained.

Opportunity Sampling

Opportunity sampling is a method in which participants are chosen based on their ease of availability and proximity to the researcher, rather than using random or systematic criteria. It’s a type of convenience sampling .

An opportunity sample is obtained by asking members of the population of interest if they would participate in your research. An example would be selecting a sample of students from those coming out of the library.

  • This is a quick and easy way of choosing participants (advantage)
  • It may not provide a representative sample and could be biased (disadvantage).

Systematic Sampling

Systematic sampling is a method where every nth individual is selected from a list or sequence to form a sample, ensuring even and regular intervals between chosen subjects.

Participants are systematically selected (i.e., orderly/logical) from the target population, like every nth participant on a list of names.

To take a systematic sample, you list all the population members and then decide upon a sample you would like. By dividing the number of people in the population by the number of people you want in your sample, you get a number we will call n.

If you take every nth name, you will get a systematic sample of the correct size. If, for example, you wanted to sample 150 children from a school of 1,500, you would take every 10th name.

  • The advantage of this method is that it should provide a representative sample.

Sample size

The sample size is a critical factor in determining the reliability and validity of a study’s findings. While increasing the sample size can enhance the generalizability of results, it’s also essential to balance practical considerations, such as resource constraints and diminishing returns from ever-larger samples.

Reliability and Validity

Reliability refers to the consistency and reproducibility of research findings across different occasions, researchers, or instruments. A small sample size may lead to inconsistent results due to increased susceptibility to random error or the influence of outliers. In contrast, a larger sample minimizes these errors, promoting more reliable results.

Validity pertains to the accuracy and truthfulness of research findings. For a study to be valid, it should accurately measure what it intends to do. A small, unrepresentative sample can compromise external validity, meaning the results don’t generalize well to the larger population. A larger sample captures more variability, ensuring that specific subgroups or anomalies don’t overly influence results.

Practical Considerations

Resource Constraints : Larger samples demand more time, money, and resources. Data collection becomes more extensive, data analysis more complex, and logistics more challenging.

Diminishing Returns : While increasing the sample size generally leads to improved accuracy and precision, there’s a point where adding more participants yields only marginal benefits. For instance, going from 50 to 500 participants might significantly boost a study’s robustness, but jumping from 10,000 to 10,500 might not offer a comparable advantage, especially considering the added costs.

Print Friendly, PDF & Email

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, automatically generate references for free.

  • Knowledge Base
  • Methodology
  • Sampling Methods | Types, Techniques, & Examples

Sampling Methods | Types, Techniques, & Examples

Published on 3 May 2022 by Shona McCombes . Revised on 10 October 2022.

When you conduct research about a group of people, it’s rarely possible to collect data from every person in that group. Instead, you select a sample. The sample is the group of individuals who will actually participate in the research.

To draw valid conclusions from your results, you have to carefully decide how you will select a sample that is representative of the group as a whole. There are two types of sampling methods:

  • Probability sampling involves random selection, allowing you to make strong statistical inferences about the whole group. It minimises the risk of selection bias .
  • Non-probability sampling involves non-random selection based on convenience or other criteria, allowing you to easily collect data.

You should clearly explain how you selected your sample in the methodology section of your paper or thesis.

Table of contents

Population vs sample, probability sampling methods, non-probability sampling methods, frequently asked questions about sampling.

First, you need to understand the difference between a population and a sample , and identify the target population of your research.

  • The population is the entire group that you want to draw conclusions about.
  • The sample is the specific group of individuals that you will collect data from.

The population can be defined in terms of geographical location, age, income, and many other characteristics.

Population vs sample

It is important to carefully define your target population according to the purpose and practicalities of your project.

If the population is very large, demographically mixed, and geographically dispersed, it might be difficult to gain access to a representative sample.

Sampling frame

The sampling frame is the actual list of individuals that the sample will be drawn from. Ideally, it should include the entire target population (and nobody who is not part of that population).

You are doing research on working conditions at Company X. Your population is all 1,000 employees of the company. Your sampling frame is the company’s HR database, which lists the names and contact details of every employee.

Sample size

The number of individuals you should include in your sample depends on various factors, including the size and variability of the population and your research design. There are different sample size calculators and formulas depending on what you want to achieve with statistical analysis .

Prevent plagiarism, run a free check.

Probability sampling means that every member of the population has a chance of being selected. It is mainly used in quantitative research . If you want to produce results that are representative of the whole population, probability sampling techniques are the most valid choice.

There are four main types of probability sample.

Probability sampling

1. Simple random sampling

In a simple random sample , every member of the population has an equal chance of being selected. Your sampling frame should include the whole population.

To conduct this type of sampling, you can use tools like random number generators or other techniques that are based entirely on chance.

You want to select a simple random sample of 100 employees of Company X. You assign a number to every employee in the company database from 1 to 1000, and use a random number generator to select 100 numbers.

2. Systematic sampling

Systematic sampling is similar to simple random sampling, but it is usually slightly easier to conduct. Every member of the population is listed with a number, but instead of randomly generating numbers, individuals are chosen at regular intervals.

All employees of the company are listed in alphabetical order. From the first 10 numbers, you randomly select a starting point: number 6. From number 6 onwards, every 10th person on the list is selected (6, 16, 26, 36, and so on), and you end up with a sample of 100 people.

If you use this technique, it is important to make sure that there is no hidden pattern in the list that might skew the sample. For example, if the HR database groups employees by team, and team members are listed in order of seniority, there is a risk that your interval might skip over people in junior roles, resulting in a sample that is skewed towards senior employees.

3. Stratified sampling

Stratified sampling involves dividing the population into subpopulations that may differ in important ways. It allows you draw more precise conclusions by ensuring that every subgroup is properly represented in the sample.

To use this sampling method, you divide the population into subgroups (called strata) based on the relevant characteristic (e.g., gender, age range, income bracket, job role).

Based on the overall proportions of the population, you calculate how many people should be sampled from each subgroup. Then you use random or systematic sampling to select a sample from each subgroup.

The company has 800 female employees and 200 male employees. You want to ensure that the sample reflects the gender balance of the company, so you sort the population into two strata based on gender. Then you use random sampling on each group, selecting 80 women and 20 men, which gives you a representative sample of 100 people.

4. Cluster sampling

Cluster sampling also involves dividing the population into subgroups, but each subgroup should have similar characteristics to the whole sample. Instead of sampling individuals from each subgroup, you randomly select entire subgroups.

If it is practically possible, you might include every individual from each sampled cluster. If the clusters themselves are large, you can also sample individuals from within each cluster using one of the techniques above. This is called multistage sampling .

This method is good for dealing with large and dispersed populations, but there is more risk of error in the sample, as there could be substantial differences between clusters. It’s difficult to guarantee that the sampled clusters are really representative of the whole population.

The company has offices in 10 cities across the country (all with roughly the same number of employees in similar roles). You don’t have the capacity to travel to every office to collect your data, so you use random sampling to select 3 offices – these are your clusters.

In a non-probability sample , individuals are selected based on non-random criteria, and not every individual has a chance of being included.

This type of sample is easier and cheaper to access, but it has a higher risk of sampling bias . That means the inferences you can make about the population are weaker than with probability samples, and your conclusions may be more limited. If you use a non-probability sample, you should still aim to make it as representative of the population as possible.

Non-probability sampling techniques are often used in exploratory and qualitative research . In these types of research, the aim is not to test a hypothesis about a broad population, but to develop an initial understanding of a small or under-researched population.

Non probability sampling

1. Convenience sampling

A convenience sample simply includes the individuals who happen to be most accessible to the researcher.

This is an easy and inexpensive way to gather initial data, but there is no way to tell if the sample is representative of the population, so it can’t produce generalisable results.

You are researching opinions about student support services in your university, so after each of your classes, you ask your fellow students to complete a survey on the topic. This is a convenient way to gather data, but as you only surveyed students taking the same classes as you at the same level, the sample is not representative of all the students at your university.

2. Voluntary response sampling

Similar to a convenience sample, a voluntary response sample is mainly based on ease of access. Instead of the researcher choosing participants and directly contacting them, people volunteer themselves (e.g., by responding to a public online survey).

Voluntary response samples are always at least somewhat biased, as some people will inherently be more likely to volunteer than others.

You send out the survey to all students at your university and many students decide to complete it. This can certainly give you some insight into the topic, but the people who responded are more likely to be those who have strong opinions about the student support services, so you can’t be sure that their opinions are representative of all students.

3. Purposive sampling

Purposive sampling , also known as judgement sampling, involves the researcher using their expertise to select a sample that is most useful to the purposes of the research.

It is often used in qualitative research , where the researcher wants to gain detailed knowledge about a specific phenomenon rather than make statistical inferences, or where the population is very small and specific. An effective purposive sample must have clear criteria and rationale for inclusion.

You want to know more about the opinions and experiences of students with a disability at your university, so you purposely select a number of students with different support needs in order to gather a varied range of data on their experiences with student services.

4. Snowball sampling

If the population is hard to access, snowball sampling can be used to recruit participants via other participants. The number of people you have access to ‘snowballs’ as you get in contact with more people.

You are researching experiences of homelessness in your city. Since there is no list of all homeless people in the city, probability sampling isn’t possible. You meet one person who agrees to participate in the research, and she puts you in contact with other homeless people she knows in the area.

A sample is a subset of individuals from a larger population. Sampling means selecting the group that you will actually collect data from in your research.

For example, if you are researching the opinions of students in your university, you could survey a sample of 100 students.

Statistical sampling allows you to test a hypothesis about the characteristics of a population. There are various sampling methods you can use to ensure that your sample is representative of the population as a whole.

Samples are used to make inferences about populations . Samples are easier to collect data from because they are practical, cost-effective, convenient, and manageable.

Probability sampling means that every member of the target population has a known chance of being included in the sample.

Probability sampling methods include simple random sampling , systematic sampling , stratified sampling , and cluster sampling .

In non-probability sampling , the sample is selected based on non-random criteria, and not every member of the population has a chance of being included.

Common non-probability sampling methods include convenience sampling , voluntary response sampling, purposive sampling , snowball sampling , and quota sampling .

Sampling bias occurs when some members of a population are systematically more likely to be selected in a sample than others.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

McCombes, S. (2022, October 10). Sampling Methods | Types, Techniques, & Examples. Scribbr. Retrieved 22 April 2024, from https://www.scribbr.co.uk/research-methods/sampling/

Is this article helpful?

Shona McCombes

Shona McCombes

Other students also liked, what is quantitative research | definition & methods, a quick guide to experimental design | 5 steps & examples, controlled experiments | methods & examples of control.

  • En español – ExME
  • Em português – EME

What are sampling methods and how do you choose the best one?

Posted on 18th November 2020 by Mohamed Khalifa

""

This tutorial will introduce sampling methods and potential sampling errors to avoid when conducting medical research.

Introduction to sampling methods

Examples of different sampling methods, choosing the best sampling method.

It is important to understand why we sample the population; for example, studies are built to investigate the relationships between risk factors and disease. In other words, we want to find out if this is a true association, while still aiming for the minimum risk for errors such as: chance, bias or confounding .

However, it would not be feasible to experiment on the whole population, we would need to take a good sample and aim to reduce the risk of having errors by proper sampling technique.

What is a sampling frame?

A sampling frame is a record of the target population containing all participants of interest. In other words, it is a list from which we can extract a sample.

What makes a good sample?

A good sample should be a representative subset of the population we are interested in studying, therefore, with each participant having equal chance of being randomly selected into the study.

We could choose a sampling method based on whether we want to account for sampling bias; a random sampling method is often preferred over a non-random method for this reason. Random sampling examples include: simple, systematic, stratified, and cluster sampling. Non-random sampling methods are liable to bias, and common examples include: convenience, purposive, snowballing, and quota sampling. For the purposes of this blog we will be focusing on random sampling methods .

Example: We want to conduct an experimental trial in a small population such as: employees in a company, or students in a college. We include everyone in a list and use a random number generator to select the participants

Advantages: Generalisable results possible, random sampling, the sampling frame is the whole population, every participant has an equal probability of being selected

Disadvantages: Less precise than stratified method, less representative than the systematic method

Simple sampling method example in stick men.

Example: Every nth patient entering the out-patient clinic is selected and included in our sample

Advantages: More feasible than simple or stratified methods, sampling frame is not always required

Disadvantages:  Generalisability may decrease if baseline characteristics repeat across every nth participant

Systematic sampling method example in stick men

Example: We have a big population (a city) and we want to ensure representativeness of all groups with a pre-determined characteristic such as: age groups, ethnic origin, and gender

Advantages:  Inclusive of strata (subgroups), reliable and generalisable results

Disadvantages: Does not work well with multiple variables

Stratified sampling method example stick men

Example: 10 schools have the same number of students across the county. We can randomly select 3 out of 10 schools as our clusters

Advantages: Readily doable with most budgets, does not require a sampling frame

Disadvantages: Results may not be reliable nor generalisable

Cluster sampling method example with stick men

How can you identify sampling errors?

Non-random selection increases the probability of sampling (selection) bias if the sample does not represent the population we want to study. We could avoid this by random sampling and ensuring representativeness of our sample with regards to sample size.

An inadequate sample size decreases the confidence in our results as we may think there is no significant difference when actually there is. This type two error results from having a small sample size, or from participants dropping out of the sample.

In medical research of disease, if we select people with certain diseases while strictly excluding participants with other co-morbidities, we run the risk of diagnostic purity bias where important sub-groups of the population are not represented.

Furthermore, measurement bias may occur during re-collection of risk factors by participants (recall bias) or assessment of outcome where people who live longer are associated with treatment success, when in fact people who died were not included in the sample or data analysis (survivors bias).

By following the steps below we could choose the best sampling method for our study in an orderly fashion.

Research objectiveness

Firstly, a refined research question and goal would help us define our population of interest. If our calculated sample size is small then it would be easier to get a random sample. If, however, the sample size is large, then we should check if our budget and resources can handle a random sampling method.

Sampling frame availability

Secondly, we need to check for availability of a sampling frame (Simple), if not, could we make a list of our own (Stratified). If neither option is possible, we could still use other random sampling methods, for instance, systematic or cluster sampling.

Study design

Moreover, we could consider the prevalence of the topic (exposure or outcome) in the population, and what would be the suitable study design. In addition, checking if our target population is widely varied in its baseline characteristics. For example, a population with large ethnic subgroups could best be studied using a stratified sampling method.

Random sampling

Finally, the best sampling method is always the one that could best answer our research question while also allowing for others to make use of our results (generalisability of results). When we cannot afford a random sampling method, we can always choose from the non-random sampling methods.

To sum up, we now understand that choosing between random or non-random sampling methods is multifactorial. We might often be tempted to choose a convenience sample from the start, but that would not only decrease precision of our results, and would make us miss out on producing research that is more robust and reliable.

References (pdf)

' src=

Mohamed Khalifa

Leave a reply cancel reply.

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

No Comments on What are sampling methods and how do you choose the best one?

' src=

Thank you for this overview. A concise approach for research.

' src=

really helps! am an ecology student preparing to write my lab report for sampling.

' src=

I learned a lot to the given presentation.. It’s very comprehensive… Thanks for sharing…

' src=

Very informative and useful for my study. Thank you

' src=

Oversimplified info on sampling methods. Probabilistic of the sampling and sampling of samples by chance does rest solely on the random methods. Factors such as the random visits or presentation of the potential participants at clinics or sites could be sufficiently random in nature and should be used for the sake of efficiency and feasibility. Nevertheless, this approach has to be taken only after careful thoughts. Representativeness of the study samples have to be checked at the end or during reporting by comparing it to the published larger studies or register of some kind in/from the local population.

' src=

Thank you so much Mr.mohamed very useful and informative article

Subscribe to our newsletter

You will receive our monthly newsletter and free access to Trip Premium.

Related Articles

research type sampling

How to read a funnel plot

This blog introduces you to funnel plots, guiding you through how to read them and what may cause them to look asymmetrical.

""

Internal and external validity: what are they and how do they differ?

Is this study valid? Can I trust this study’s methods and design? Can I apply the results of this study to other contexts? Learn more about internal and external validity in research to help you answer these questions when you next look at a paper.

""

Cluster Randomized Trials: Concepts

This blog summarizes the concepts of cluster randomization, and the logistical and statistical considerations while designing a cluster randomized controlled trial.

  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • QuestionPro

survey software icon

  • Solutions Industries Gaming Automotive Sports and events Education Government Travel & Hospitality Financial Services Healthcare Cannabis Technology Use Case NPS+ Communities Audience Contactless surveys Mobile LivePolls Member Experience GDPR Positive People Science 360 Feedback Surveys
  • Resources Blog eBooks Survey Templates Case Studies Training Help center

research type sampling

Home Market Research

Sampling Methods: Guide To All Types with Examples

Sampling Methods

Sampling is an essential part of any research project. The right sampling method can make or break the validity of your research, and it’s essential to choose the right method for your specific question. In this article, we’ll take a closer look at some of the most popular sampling methods and provide real-world examples of how they can be used to gather accurate and reliable data.

LEARN ABOUT:   Research Process Steps

From simple random sampling to complex stratified sampling , we’ll explore each method’s pros, cons, and best practices. So, whether you’re a seasoned researcher or just starting your journey, this article is a must-read for anyone looking to master sampling methods. Let’s get started!

Content Index

What is sampling?

Types of sampling: sampling methods, types of probability sampling with examples:, uses of probability sampling, types of non-probability sampling with examples, uses of non-probability sampling, how do you decide on the type of sampling to use, difference between probability sampling and non-probability sampling methods.

Sampling is a technique of selecting individual members or a subset of the population to make statistical inferences from them and estimate the characteristics of the whole population. Different sampling methods are widely used by researchers in market research so that they do not need to research the entire population to collect actionable insights.

It is also a time-convenient and cost-effective method and hence forms the basis of any research design . Sampling techniques can be used in research survey software for optimum derivation.

For example, suppose a drug manufacturer would like to research the adverse side effects of a drug on the country’s population. In that case, it is almost impossible to conduct a research study that involves everyone. In this case, the researcher decides on a sample of people from each demographic and then researches them, giving him/her indicative feedback on the drug’s behavior.

Learn more about Audience by QuestionPro

Sampling in market action research is of two types – probability sampling and non-probability sampling. Let’s take a closer look at these two methods of sampling.

  • Probability sampling: Probability sampling is a sampling technique where a researcher selects a few criteria and chooses members of a population randomly. All the members have an equal opportunity to participate in the sample with this selection parameter.
  • Non-probability sampling: In non-probability sampling, the researcher randomly chooses members for research. This sampling method is not a fixed or predefined selection process. This makes it difficult for all population elements to have equal opportunities to be included in a sample.

This blog discusses the various probability and non-probability sampling methods you can implement in any market research study.

LEARN ABOUT: Survey Sampling

Probability sampling is a technique in which researchers choose samples from a larger population based on the theory of probability. This sampling method considers every member of the population and forms samples based on a fixed process.

For example, in a population of 1000 members, every member will have a 1/1000 chance of being selected to be a part of a sample. Probability sampling eliminates sampling bias in the population and allows all members to be included in the sample.

There are four types of probability sampling techniques:

Types of probability sampling

  • Simple random sampling: One of the best probability sampling techniques that helps in saving time and resources is the Simple Random Sampling method. It is a reliable method of obtaining information where every single member of a population is chosen randomly, merely by chance. Each individual has the same probability of being chosen to be a part of a sample. For example, in an organization of 500 employees, if the HR team decides on conducting team-building activities, they would likely prefer picking chits out of a bowl. In this case, each of the 500 employees has an equal opportunity of being selected.
  • Cluster sampling: Cluster sampling is a method where the researchers divide the entire population into sections or clusters representing a population. Clusters are identified and included in a sample based on demographic parameters like age, sex, location, etc. This makes it very simple for a survey creator to derive effective inferences from the feedback. For example, suppose the United States government wishes to evaluate the number of immigrants living in the Mainland US. In that case, they can divide it into clusters based on states such as California, Texas, Florida, Massachusetts, Colorado, Hawaii, etc. This way of conducting a survey will be more effective as the results will be organized into states and provide insightful immigration data.
  • Systematic sampling: Researchers use the systematic sampling method to choose the sample members of a population at regular intervals. It requires selecting a starting point for the sample and sample size determination that can be repeated at regular intervals. This type of sampling method has a predefined range; hence, this sampling technique is the least time-consuming. For example, a researcher intends to collect a systematic sample of 500 people in a population of 5000. He/she numbers each element of the population from 1-5000 and will choose every 10th individual to be a part of the sample (Total population/ Sample Size = 5000/500 = 10).
  • Stratified random sampling: Stratified random sampling is a method in which the researcher divides the population into smaller groups that don’t overlap but represent the entire population. While sampling, these groups can be organized, and then draw a sample from each group separately. For example, a researcher looking to analyze the characteristics of people belonging to different annual income divisions will create strata (groups) according to the annual family income. Eg – less than $20,000, $21,000 – $30,000, $31,000 to $40,000, $41,000 to $50,000, etc. By doing this, the researcher concludes the characteristics of people belonging to different income groups. Marketers can analyze which income groups to target and which ones to eliminate to create a roadmap that would bear fruitful results.

LEARN ABOUT: Purposive Sampling

There are multiple uses of probability sampling:

  • Reduce Sample Bias: Using the probability sampling method, the research bias in the sample derived from a population is negligible to non-existent. The sample selection mainly depicts the researcher’s understanding and inference. Probability sampling leads to higher-quality data collection as the sample appropriately represents the population.
  • Diverse Population: When the population is vast and diverse, it is essential to have adequate representation so that the data is not skewed toward one demographic . For example, suppose Square would like to understand the people that could make their point-of-sale devices. In that case, a survey conducted from a sample of people across the US from different industries and socio-economic backgrounds helps.
  • Create an Accurate Sample: Probability sampling helps the researchers plan and create an accurate sample. This helps to obtain well-defined data.

The non-probability method is a sampling method that involves a collection of feedback based on a researcher or statistician’s sample selection capabilities and not on a fixed selection process. In most situations, the output of a survey conducted with a non-probable sample leads to skewed results, which may not represent the desired target population. But, there are situations, such as the preliminary stages of research or cost constraints for conducting research, where non-probability sampling will be much more useful than the other type.

Four types of non-probability sampling explain the purpose of this sampling method in a better manner:

  • Convenience sampling: This method depends on the ease of access to subjects such as surveying customers at a mall or passers-by on a busy street. It is usually termed as convenience sampling  because of the researcher’s ease of carrying it out and getting in touch with the subjects. Researchers have nearly no authority to select the sample elements, and it’s purely done based on proximity and not representativeness. This non-probability sampling method is used when there are time and cost limitations in collecting feedback. In situations with resource limitations, such as the initial stages of research, convenience sampling is used. For example, startups and NGOs usually conduct convenience sampling at a mall to distribute leaflets of upcoming events or promotion of a cause – they do that by standing at the mall entrance and giving out pamphlets randomly.
  • Judgmental or purposive sampling: Judgmental or purposive samples are formed at the researcher’s discretion. Researchers purely consider the purpose of the study, along with the understanding of the target audience. For instance, when researchers want to understand the thought process of people interested in studying for their master’s degree. The selection criteria will be: “Are you interested in doing your masters in …?” and those who respond with a “No” are excluded from the sample.
  • Snowball sampling: Snowball sampling is a sampling method that researchers apply when the subjects are difficult to trace. For example, surveying shelterless people or illegal immigrants will be extremely challenging. In such cases, using the snowball theory, researchers can track a few categories to interview and derive results. Researchers also implement this sampling method when the topic is highly sensitive and not openly discussed—for example, surveys to gather information about HIV Aids. Not many victims will readily respond to the questions. Still, researchers can contact people they might know or volunteers associated with the cause to get in touch with the victims and collect information.
  • Quota sampling:   In Quota sampling , members in this sampling technique selection happens based on a pre-set standard. In this case, as a sample is formed based on specific attributes, the created sample will have the same qualities found in the total population. It is a rapid method of collecting samples.

Non-probability sampling is used for the following:

  • Create a hypothesis: Researchers use the non-probability sampling method to create an assumption when limited to no prior information is available. This method helps with the immediate return of data and builds a base for further research.
  • Exploratory research: Researchers use this sampling technique widely when conducting qualitative research, pilot studies, or exploratory research .
  • Budget and time constraints: The non-probability method when there are budget and time constraints, and some preliminary data must be collected. Since the survey design is not rigid, it is easier to pick respondents randomly and have them take the survey or questionnaire .

For any research, it is essential to choose a sampling method accurately to meet the goals of your study. The effectiveness of your sampling relies on various factors. Here are some steps expert researchers follow to decide the best sampling method.

  • Jot down the research goals. Generally, it must be a combination of cost, precision, or accuracy.
  • Identify the effective sampling techniques that might potentially achieve the research goals.
  • Test each of these methods and examine whether they help achieve your goal.
  • Select the method that works best for the research.

Unlock the power of accurate sampling!

We have looked at the different types of sampling methods above and their subtypes. To encapsulate the whole discussion, though, the significant differences between probability sampling methods and non-probability sampling methods are as below:

Now that we have learned how different sampling methods work and are widely used by researchers in market research so that they don’t need to research the entire population to collect actionable insights, let’s go over a tool that can help you manage these insights.

LEARN ABOUT: 12 Best Tools for Researchers

QuestionPro understands the need for an accurate, timely, and cost-effective method to select the proper sample; that’s why we bring QuestionPro Software, a set of tools that allow you to efficiently select your target audience , manage your insights in an organized, customizable repository and community management for post-survey feedback.

Don’t miss the chance to elevate the value of research.

FREE TRIAL         LEARN MORE

MORE LIKE THIS

NPS Survey Platform

NPS Survey Platform: Types, Tips, 11 Best Platforms & Tools

Apr 26, 2024

user journey vs user flow

User Journey vs User Flow: Differences and Similarities

gap analysis tools

Best 7 Gap Analysis Tools to Empower Your Business

Apr 25, 2024

employee survey tools

12 Best Employee Survey Tools for Organizational Excellence

Other categories.

  • Academic Research
  • Artificial Intelligence
  • Assessments
  • Brand Awareness
  • Case Studies
  • Communities
  • Consumer Insights
  • Customer effort score
  • Customer Engagement
  • Customer Experience
  • Customer Loyalty
  • Customer Research
  • Customer Satisfaction
  • Employee Benefits
  • Employee Engagement
  • Employee Retention
  • Friday Five
  • General Data Protection Regulation
  • Insights Hub
  • Life@QuestionPro
  • Market Research
  • Mobile diaries
  • Mobile Surveys
  • New Features
  • Online Communities
  • Question Types
  • Questionnaire
  • QuestionPro Products
  • Release Notes
  • Research Tools and Apps
  • Revenue at Risk
  • Survey Templates
  • Training Tips
  • Uncategorized
  • Video Learning Series
  • What’s Coming Up
  • Workforce Intelligence

Grad Coach

Sampling Methods & Strategies 101

Everything you need to know (including examples)

By: Derek Jansen (MBA) | Expert Reviewed By: Kerryn Warren (PhD) | January 2023

If you’re new to research, sooner or later you’re bound to wander into the intimidating world of sampling methods and strategies. If you find yourself on this page, chances are you’re feeling a little overwhelmed or confused. Fear not – in this post we’ll unpack sampling in straightforward language , along with loads of examples .

Overview: Sampling Methods & Strategies

  • What is sampling in a research context?
  • The two overarching approaches

Simple random sampling

Stratified random sampling, cluster sampling, systematic sampling, purposive sampling, convenience sampling, snowball sampling.

  • How to choose the right sampling method

What (exactly) is sampling?

At the simplest level, sampling (within a research context) is the process of selecting a subset of participants from a larger group . For example, if your research involved assessing US consumers’ perceptions about a particular brand of laundry detergent, you wouldn’t be able to collect data from every single person that uses laundry detergent (good luck with that!) – but you could potentially collect data from a smaller subset of this group.

In technical terms, the larger group is referred to as the population , and the subset (the group you’ll actually engage with in your research) is called the sample . Put another way, you can look at the population as a full cake and the sample as a single slice of that cake. In an ideal world, you’d want your sample to be perfectly representative of the population, as that would allow you to generalise your findings to the entire population. In other words, you’d want to cut a perfect cross-sectional slice of cake, such that the slice reflects every layer of the cake in perfect proportion.

Achieving a truly representative sample is, unfortunately, a little trickier than slicing a cake, as there are many practical challenges and obstacles to achieving this in a real-world setting. Thankfully though, you don’t always need to have a perfectly representative sample – it all depends on the specific research aims of each study – so don’t stress yourself out about that just yet!

With the concept of sampling broadly defined, let’s look at the different approaches to sampling to get a better understanding of what it all looks like in practice.

research type sampling

The two overarching sampling approaches

At the highest level, there are two approaches to sampling: probability sampling and non-probability sampling . Within each of these, there are a variety of sampling methods , which we’ll explore a little later.

Probability sampling involves selecting participants (or any unit of interest) on a statistically random basis , which is why it’s also called “random sampling”. In other words, the selection of each individual participant is based on a pre-determined process (not the discretion of the researcher). As a result, this approach achieves a random sample.

Probability-based sampling methods are most commonly used in quantitative research , especially when it’s important to achieve a representative sample that allows the researcher to generalise their findings.

Non-probability sampling , on the other hand, refers to sampling methods in which the selection of participants is not statistically random . In other words, the selection of individual participants is based on the discretion and judgment of the researcher, rather than on a pre-determined process.

Non-probability sampling methods are commonly used in qualitative research , where the richness and depth of the data are more important than the generalisability of the findings.

If that all sounds a little too conceptual and fluffy, don’t worry. Let’s take a look at some actual sampling methods to make it more tangible.

Need a helping hand?

research type sampling

Probability-based sampling methods

First, we’ll look at four common probability-based (random) sampling methods:

Importantly, this is not a comprehensive list of all the probability sampling methods – these are just four of the most common ones. So, if you’re interested in adopting a probability-based sampling approach, be sure to explore all the options.

Simple random sampling involves selecting participants in a completely random fashion , where each participant has an equal chance of being selected. Basically, this sampling method is the equivalent of pulling names out of a hat , except that you can do it digitally. For example, if you had a list of 500 people, you could use a random number generator to draw a list of 50 numbers (each number, reflecting a participant) and then use that dataset as your sample.

Thanks to its simplicity, simple random sampling is easy to implement , and as a consequence, is typically quite cheap and efficient . Given that the selection process is completely random, the results can be generalised fairly reliably. However, this also means it can hide the impact of large subgroups within the data, which can result in minority subgroups having little representation in the results – if any at all. To address this, one needs to take a slightly different approach, which we’ll look at next.

Stratified random sampling is similar to simple random sampling, but it kicks things up a notch. As the name suggests, stratified sampling involves selecting participants randomly , but from within certain pre-defined subgroups (i.e., strata) that share a common trait . For example, you might divide the population into strata based on gender, ethnicity, age range or level of education, and then select randomly from each group.

The benefit of this sampling method is that it gives you more control over the impact of large subgroups (strata) within the population. For example, if a population comprises 80% males and 20% females, you may want to “balance” this skew out by selecting a random sample from an equal number of males and females. This would, of course, reduce the representativeness of the sample, but it would allow you to identify differences between subgroups. So, depending on your research aims, the stratified approach could work well.

Free Webinar: Research Methodology 101

Next on the list is cluster sampling. As the name suggests, this sampling method involves sampling from naturally occurring, mutually exclusive clusters within a population – for example, area codes within a city or cities within a country. Once the clusters are defined, a set of clusters are randomly selected and then a set of participants are randomly selected from each cluster.

Now, you’re probably wondering, “how is cluster sampling different from stratified random sampling?”. Well, let’s look at the previous example where each cluster reflects an area code in a given city.

With cluster sampling, you would collect data from clusters of participants in a handful of area codes (let’s say 5 neighbourhoods). Conversely, with stratified random sampling, you would need to collect data from all over the city (i.e., many more neighbourhoods). You’d still achieve the same sample size either way (let’s say 200 people, for example), but with stratified sampling, you’d need to do a lot more running around, as participants would be scattered across a vast geographic area. As a result, cluster sampling is often the more practical and economical option.

If that all sounds a little mind-bending, you can use the following general rule of thumb. If a population is relatively homogeneous , cluster sampling will often be adequate. Conversely, if a population is quite heterogeneous (i.e., diverse), stratified sampling will generally be more appropriate.

The last probability sampling method we’ll look at is systematic sampling. This method simply involves selecting participants at a set interval , starting from a random point .

For example, if you have a list of students that reflects the population of a university, you could systematically sample that population by selecting participants at an interval of 8 . In other words, you would randomly select a starting point – let’s say student number 40 – followed by student 48, 56, 64, etc.

What’s important with systematic sampling is that the population list you select from needs to be randomly ordered . If there are underlying patterns in the list (for example, if the list is ordered by gender, IQ, age, etc.), this will result in a non-random sample, which would defeat the purpose of adopting this sampling method. Of course, you could safeguard against this by “shuffling” your population list using a random number generator or similar tool.

Systematic sampling simply involves selecting participants at a set interval (e.g., every 10th person), starting from a random point.

Non-probability-based sampling methods

Right, now that we’ve looked at a few probability-based sampling methods, let’s look at three non-probability methods :

Again, this is not an exhaustive list of all possible sampling methods, so be sure to explore further if you’re interested in adopting a non-probability sampling approach.

First up, we’ve got purposive sampling – also known as judgment , selective or subjective sampling. Again, the name provides some clues, as this method involves the researcher selecting participants using his or her own judgement , based on the purpose of the study (i.e., the research aims).

For example, suppose your research aims were to understand the perceptions of hyper-loyal customers of a particular retail store. In that case, you could use your judgement to engage with frequent shoppers, as well as rare or occasional shoppers, to understand what judgements drive the two behavioural extremes .

Purposive sampling is often used in studies where the aim is to gather information from a small population (especially rare or hard-to-find populations), as it allows the researcher to target specific individuals who have unique knowledge or experience . Naturally, this sampling method is quite prone to researcher bias and judgement error, and it’s unlikely to produce generalisable results, so it’s best suited to studies where the aim is to go deep rather than broad .

Purposive sampling involves the researcher selecting participants using their own judgement, based on the purpose of the study.

Next up, we have convenience sampling. As the name suggests, with this method, participants are selected based on their availability or accessibility . In other words, the sample is selected based on how convenient it is for the researcher to access it, as opposed to using a defined and objective process.

Naturally, convenience sampling provides a quick and easy way to gather data, as the sample is selected based on the individuals who are readily available or willing to participate. This makes it an attractive option if you’re particularly tight on resources and/or time. However, as you’d expect, this sampling method is unlikely to produce a representative sample and will of course be vulnerable to researcher bias , so it’s important to approach it with caution.

Last but not least, we have the snowball sampling method. This method relies on referrals from initial participants to recruit additional participants. In other words, the initial subjects form the first (small) snowball and each additional subject recruited through referral is added to the snowball, making it larger as it rolls along .

Snowball sampling is often used in research contexts where it’s difficult to identify and access a particular population. For example, people with a rare medical condition or members of an exclusive group. It can also be useful in cases where the research topic is sensitive or taboo and people are unlikely to open up unless they’re referred by someone they trust.

Simply put, snowball sampling is ideal for research that involves reaching hard-to-access populations . But, keep in mind that, once again, it’s a sampling method that’s highly prone to researcher bias and is unlikely to produce a representative sample. So, make sure that it aligns with your research aims and questions before adopting this method.

How to choose a sampling method

Now that we’ve looked at a few popular sampling methods (both probability and non-probability based), the obvious question is, “ how do I choose the right sampling method for my study?”. When selecting a sampling method for your research project, you’ll need to consider two important factors: your research aims and your resources .

As with all research design and methodology choices, your sampling approach needs to be guided by and aligned with your research aims, objectives and research questions – in other words, your golden thread. Specifically, you need to consider whether your research aims are primarily concerned with producing generalisable findings (in which case, you’ll likely opt for a probability-based sampling method) or with achieving rich , deep insights (in which case, a non-probability-based approach could be more practical). Typically, quantitative studies lean toward the former, while qualitative studies aim for the latter, so be sure to consider your broader methodology as well.

The second factor you need to consider is your resources and, more generally, the practical constraints at play. If, for example, you have easy, free access to a large sample at your workplace or university and a healthy budget to help you attract participants, that will open up multiple options in terms of sampling methods. Conversely, if you’re cash-strapped, short on time and don’t have unfettered access to your population of interest, you may be restricted to convenience or referral-based methods.

In short, be ready for trade-offs – you won’t always be able to utilise the “perfect” sampling method for your study, and that’s okay. Much like all the other methodological choices you’ll make as part of your study, you’ll often need to compromise and accept practical trade-offs when it comes to sampling. Don’t let this get you down though – as long as your sampling choice is well explained and justified, and the limitations of your approach are clearly articulated, you’ll be on the right track.

research type sampling

Let’s recap…

In this post, we’ve covered the basics of sampling within the context of a typical research project.

  • Sampling refers to the process of defining a subgroup (sample) from the larger group of interest (population).
  • The two overarching approaches to sampling are probability sampling (random) and non-probability sampling .
  • Common probability-based sampling methods include simple random sampling, stratified random sampling, cluster sampling and systematic sampling.
  • Common non-probability-based sampling methods include purposive sampling, convenience sampling and snowball sampling.
  • When choosing a sampling method, you need to consider your research aims , objectives and questions, as well as your resources and other practical constraints .

If you’d like to see an example of a sampling strategy in action, be sure to check out our research methodology chapter sample .

Last but not least, if you need hands-on help with your sampling (or any other aspect of your research), take a look at our 1-on-1 coaching service , where we guide you through each step of the research process, at your own pace.

research type sampling

Psst... there’s more!

This post was based on one of our popular Research Bootcamps . If you're working on a research project, you'll definitely want to check this out ...

You Might Also Like:

Research constructs: construct validity and reliability

Excellent and helpful. Best site to get a full understanding of Research methodology. I’m nolonger as “clueless “..😉

Takele Gezaheg Demie

Excellent and helpful for junior researcher!

Andrea

Grad Coach tutorials are excellent – I recommend them to everyone doing research. I will be working with a sample of imprisoned women and now have a much clearer idea concerning sampling. Thank you to all at Grad Coach for generously sharing your expertise with students.

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

  • Print Friendly

If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

To log in and use all the features of Khan Academy, please enable JavaScript in your browser.

Statistics and probability

Course: statistics and probability   >   unit 6.

  • Picking fairly
  • Using probability to make fair decisions
  • Techniques for generating a simple random sample
  • Simple random samples
  • Techniques for random sampling and avoiding bias
  • Sampling methods

Sampling methods review

  • Samples and surveys

What are sampling methods?

Bad ways to sample.

  • (Choice A)   Convenience sampling A Convenience sampling
  • (Choice B)   Voluntary response sampling B Voluntary response sampling

Good ways to sample

  • (Choice A)   Simple random sampling A Simple random sampling
  • (Choice B)   Stratified random sampling B Stratified random sampling
  • (Choice C)   Cluster random sampling C Cluster random sampling
  • (Choice D)   Systematic random sampling D Systematic random sampling

Want to join the conversation?

  • Upvote Button navigates to signup page
  • Downvote Button navigates to signup page
  • Flag Button navigates to signup page

Great Answer

An overview of sampling methods

Last updated

27 February 2023

Reviewed by

Cathy Heath

When researching perceptions or attributes of a product, service, or people, you have two options:

Survey every person in your chosen group (the target market, or population), collate your responses, and reach your conclusions.

Select a smaller group from within your target market and use their answers to represent everyone. This option is sampling .

Sampling saves you time and money. When you use the sampling method, the whole population being studied is called the sampling frame .

The sample you choose should represent your target market, or the sampling frame, well enough to do one of the following:

Generalize your findings across the sampling frame and use them as though you had surveyed everyone

Use the findings to decide on your next step, which might involve more in-depth sampling

Make research less tedious

Dovetail streamlines research to help you uncover and share actionable insights

How was sampling developed?

Valery Glivenko and Francesco Cantelli, two mathematicians studying probability theory in the early 1900s, devised the sampling method. Their research showed that a properly chosen sample of people would reflect the larger group’s status, opinions, decisions, and decision-making steps.

They proved you don't need to survey the entire target market, thereby saving the rest of us a lot of time and money.

  • Why is sampling important?

We’ve already touched on the fact that sampling saves you time and money. When you get reliable results quickly, you can act on them sooner. And the money you save can pay for something else.

It’s often easier to survey a sample than a whole population. Sample inferences can be more reliable than those you get from a very large group because you can choose your samples carefully and scientifically.

Sampling is also useful because it is often impossible to survey the entire population. You probably have no choice but to collect only a sample in the first place.

Because you’re working with fewer people, you can collect richer data, which makes your research more accurate. You can:

Ask more questions

Go into more detail

Seek opinions instead of just collecting facts

Observe user behaviors

Double-check your findings if you need to

In short, sampling works! Let's take a look at the most common sampling methods.

  • Types of sampling methods

There are two main sampling methods: probability sampling and non-probability sampling. These can be further refined, which we'll cover shortly. You can then decide which approach best suits your research project.

Probability sampling method

Probability sampling is used in quantitative research , so it provides data on the survey topic in terms of numbers. Probability relates to mathematics, hence the name ‘quantitative research’. Subjects are asked questions like:

How many boxes of candy do you buy at one time?

How often do you shop for candy?

How much would you pay for a box of candy?

This method is also called random sampling because everyone in the target market has an equal chance of being chosen for the survey. It is designed to reduce sampling error for the most important variables. You should, therefore, get results that fairly reflect the larger population.

Non-probability sampling method

In this method, not everyone has an equal chance of being part of the sample. It's usually easier (and cheaper) to select people for the sample group. You choose people who are more likely to be involved in or know more about the topic you’re researching.

Non-probability sampling is used for qualitative research. Qualitative data is generated by questions like:

Where do you usually shop for candy (supermarket, gas station, etc.?)

Which candy brand do you usually buy?

Why do you like that brand?

  • Probability sampling methods

Here are five ways of doing probability sampling:

Simple random sampling (basic probability sampling)

Systematic sampling

Stratified sampling.

Cluster sampling

Multi-stage sampling

Simple random sampling.

There are three basic steps to simple random sampling:

Choose your sampling frame.

Decide on your sample size. Make sure it is large enough to give you reliable data.

Randomly choose your sample participants.

You could put all their names in a hat, shake the hat to mix the names, and pull out however many names you want in your sample (without looking!)

You could be more scientific by giving each participant a number and then using a random number generator program to choose the numbers.

Instead of choosing names or numbers, you decide beforehand on a selection method. For example, collect all the names in your sampling frame and start at, for example, the fifth person on the list, then choose every fourth name or every tenth name. Alternatively, you could choose everyone whose last name begins with randomly-selected initials, such as A, G, or W.

Choose your system of selecting names, and away you go.

This is a more sophisticated way to choose your sample. You break the sampling frame down into important subgroups or strata . Then, decide how many you want in your sample, and choose an equal number (or a proportionate number) from each subgroup.

For example, you want to survey how many people in a geographic area buy candy, so you compile a list of everyone in that area. You then break that list down into, for example, males and females, then into pre-teens, teenagers, young adults, senior citizens, etc. who are male or female.

So, if there are 1,000 young male adults and 2,000 young female adults in the whole sampling frame, you may want to choose 100 males and 200 females to keep the proportions balanced. You then choose the individual survey participants through the systematic sampling method.

Clustered sampling

This method is used when you want to subdivide a sample into smaller groups or clusters that are geographically or organizationally related.

Let’s say you’re doing quantitative research into candy sales. You could choose your sample participants from urban, suburban, or rural populations. This would give you three geographic clusters from which to select your participants.

This is a more refined way of doing cluster sampling. Let’s say you have your urban cluster, which is your primary sampling unit. You can subdivide this into a secondary sampling unit, say, participants who typically buy their candy in supermarkets. You could then further subdivide this group into your ultimate sampling unit. Finally, you select the actual survey participants from this unit.

  • Uses of probability sampling

Probability sampling has three main advantages:

It helps minimizes the likelihood of sampling bias. How you choose your sample determines the quality of your results. Probability sampling gives you an unbiased, randomly selected sample of your target market.

It allows you to create representative samples and subgroups within a sample out of a large or diverse target market.

It lets you use sophisticated statistical methods to select as close to perfect samples as possible.

  • Non-probability sampling methods

To recap, with non-probability sampling, you choose people for your sample in a non-random way, so not everyone in your sampling frame has an equal chance of being chosen. Your research findings, therefore, may not be as representative overall as probability sampling, but you may not want them to be.

Sampling bias is not a concern if all potential survey participants share similar traits. For example, you may want to specifically focus on young male adults who spend more than others on candy. In addition, it is usually a cheaper and quicker method because you don't have to work out a complex selection system that represents the entire population in that community.

Researchers do need to be mindful of carefully considering the strengths and limitations of each method before selecting a sampling technique.

Non-probability sampling is best for exploratory research , such as at the beginning of a research project.

There are five main types of non-probability sampling methods:

Convenience sampling

Purposive sampling, voluntary response sampling, snowball sampling, quota sampling.

The strategy of convenience sampling is to choose your sample quickly and efficiently, using the least effort, usually to save money.

Let's say you want to survey the opinions of 100 millennials about a particular topic. You could send out a questionnaire over the social media platforms millennials use. Ask respondents to confirm their birth year at the top of their response sheet and, when you have your 100 responses, begin your analysis. Or you could visit restaurants and bars where millennials spend their evenings and sign people up.

A drawback of convenience sampling is that it may not yield results that apply to a broader population.

This method relies on your judgment to choose the most likely sample to deliver the most useful results. You must know enough about the survey goals and the sampling frame to choose the most appropriate sample respondents.

Your knowledge and experience save you time because you know your ideal sample candidates, so you should get high-quality results.

This method is similar to convenience sampling, but it is based on potential sample members volunteering rather than you looking for people.

You make it known you want to do a survey on a particular topic for a particular reason and wait until enough people volunteer. Then you give them the questionnaire or arrange interviews to ask your questions directly.

Snowball sampling involves asking selected participants to refer others who may qualify for the survey. This method is best used when there is no sampling frame available. It is also useful when the researcher doesn’t know much about the target population.

Let's say you want to research a niche topic that involves people who may be difficult to locate. For our candy example, this could be young males who buy a lot of candy, go rock climbing during the day, and watch adventure movies at night. You ask each participant to name others they know who do the same things, so you can contact them. As you make contact with more people, your sample 'snowballs' until you have all the names you need.

This sampling method involves collecting the specific number of units (quotas) from your predetermined subpopulations. Quota sampling is a way of ensuring that your sample accurately represents the sampling frame.

  • Uses of non-probability sampling

You can use non-probability sampling when you:

Want to do a quick test to see if a more detailed and sophisticated survey may be worthwhile

Want to explore an idea to see if it 'has legs'

Launch a pilot study

Do some initial qualitative research

Have little time or money available (half a loaf is better than no bread at all)

Want to see if the initial results will help you justify a longer, more detailed, and more expensive research project

  • The main types of sampling bias, and how to avoid them

Sampling bias can fog or limit your research results. This will have an impact when you generalize your results across the whole target market. The two main causes of sampling bias are faulty research design and poor data collection or recording. They can affect probability and non-probability sampling.

Faulty research

If a surveyor chooses participants inappropriately, the results will not reflect the population as a whole.

A famous example is the 1948 presidential race. A telephone survey was conducted to see which candidate had more support. The problem with the research design was that, in 1948, most people with telephones were wealthy, and their opinions were very different from voters as a whole. The research implied Dewey would win, but it was Truman who became president.

Poor data collection or recording

This problem speaks for itself. The survey may be well structured, the sample groups appropriate, the questions clear and easy to understand, and the cluster sizes appropriate. But if surveyors check the wrong boxes when they get an answer or if the entire subgroup results are lost, the survey results will be biased.

How do you minimize bias in sampling?

 To get results you can rely on, you must:

Know enough about your target market

Choose one or more sample surveys to cover the whole target market properly

Choose enough people in each sample so your results mirror your target market

Have content validity . This means the content of your questions must be direct and efficiently worded. If it isn’t, the viability of your survey could be questioned. That would also be a waste of time and money, so make the wording of your questions your top focus.

If using probability sampling, make sure your sampling frame includes everyone it should and that your random sampling selection process includes the right proportion of the subgroups

If using non-probability sampling, focus on fairness, equality, and completeness in identifying your samples and subgroups. Then balance those criteria against simple convenience or other relevant factors.

What are the five types of sampling bias?

Self-selection bias. If you mass-mail questionnaires to everyone in the sample, you’re more likely to get results from people with extrovert or activist personalities and not from introverts or pragmatists. So if your convenience sampling focuses on getting your quota responses quickly, it may be skewed.

Non-response bias. Unhappy customers, stressed-out employees, or other sub-groups may not want to cooperate or they may pull out early.

Undercoverage bias. If your survey is done, say, via email or social media platforms, it will miss people without internet access, such as those living in rural areas, the elderly, or lower-income groups.

Survivorship bias. Unsuccessful people are less likely to take part. Another example may be a researcher excluding results that don’t support the overall goal. If the CEO wants to tell the shareholders about a successful product or project at the AGM, some less positive survey results may go “missing” (to take an extreme example.) The result is that your data will reflect an overly optimistic representation of the truth.

Pre-screening bias. If the researcher, whose experience and knowledge are being used to pre-select respondents in a judgmental sampling, focuses more on convenience than judgment, the results may be compromised.

How do you minimize sampling bias?

Focus on the bullet points in the next section and:

Make survey questionnaires as direct, easy, short, and available as possible, so participants are more likely to complete them accurately and send them back

Follow up with the people who have been selected but have not returned their responses

Ignore any pressure that may produce bias

  • How do you decide on the type of sampling to use?

Use the ideas you've gleaned from this article to give yourself a platform, then choose the best method to meet your goals while staying within your time and cost limits.

If it isn't obvious which method you should choose, use this strategy:

Clarify your research goals

Clarify how accurate your research results must be to reach your goals

Evaluate your goals against time and budget

List the two or three most obvious sampling methods that will work for you

Confirm the availability of your resources (researchers, computer time, etc.)

Compare each of the possible methods with your goals, accuracy, precision, resource, time, and cost constraints

Make your decision

  • The takeaway

Effective market research is the basis of successful marketing, advertising, and future productivity. By selecting the most appropriate sampling methods, you will collect the most useful market data and make the most effective decisions.

Get started today

Go from raw data to valuable insights with a flexible research platform

Editor’s picks

Last updated: 21 December 2023

Last updated: 16 December 2023

Last updated: 6 October 2023

Last updated: 25 November 2023

Last updated: 12 May 2023

Last updated: 15 February 2024

Last updated: 11 March 2024

Last updated: 12 December 2023

Last updated: 18 May 2023

Last updated: 6 March 2024

Last updated: 10 April 2023

Last updated: 20 December 2023

Latest articles

Related topics, log in or sign up.

Get started for free

Root out friction in every digital experience, super-charge conversion rates, and optimise digital self-service

Uncover insights from any interaction, deliver AI-powered agent coaching, and reduce cost to serve

Increase revenue and loyalty with real-time insights and recommendations delivered straight to teams on the ground

Know exactly how your people feel and empower managers to improve employee engagement, productivity, and retention

Take action in the moments that matter most along the employee journey and drive bottom line growth

Whatever they’re are saying, wherever they’re saying it, know exactly what’s going on with your people

Get faster, richer insights with qual and quant tools that make powerful market research available to everyone

Run concept tests, pricing studies, prototyping + more with fast, powerful studies designed by UX research experts

Track your brand performance 24/7 and act quickly to respond to opportunities and challenges in your market

Meet the operating system for experience management

  • Free Account
  • For Digital
  • For Customer Care
  • For Human Resources
  • For Researchers
  • Financial Services
  • All Industries

Popular Use Cases

  • Customer Experience
  • Employee Experience
  • Employee Exit Interviews
  • Net Promoter Score
  • Voice of Customer
  • Customer Success Hub
  • Product Documentation
  • Training & Certification
  • XM Institute
  • Popular Resources
  • Customer Stories
  • Market Research
  • Artificial Intelligence
  • Partnerships
  • Marketplace

The annual gathering of the experience leaders at the world’s iconic brands building breakthrough business results.

language

  • English/AU & NZ
  • Español/Europa
  • Español/América Latina
  • Português Brasileiro
  • REQUEST DEMO
  • Experience Management
  • Sampling Methods

Try Qualtrics for free

Sampling methods, types & techniques.

15 min read Your comprehensive guide to the different sampling methods available to researchers – and how to know which is right for your research.

What is sampling?

In survey research, sampling is the process of using a subset of a population to represent the whole population. To help illustrate this further, let’s look at data sampling methods with examples below.

Let’s say you wanted to do some research on everyone in North America. To ask every person would be almost impossible. Even if everyone said “yes”, carrying out a survey across different states, in different languages and timezones, and then collecting and processing all the results , would take a long time and be very costly.

Sampling allows large-scale research to be carried out with a more realistic cost and time-frame because it uses a smaller number of individuals in the population with representative characteristics to stand in for the whole.

However, when you decide to sample, you take on a new task. You have to decide who is part of your sample list and how to choose the people who will best represent the whole population. How you go about that is what the practice of sampling is all about.

population to a sample

Sampling definitions

  • Population: The total number of people or things you are interested in
  • Sample: A smaller number within your population that will represent the whole
  • Sampling: The process and method of selecting your sample

Free eBook: 2024 Market Research Trends

Why is sampling important?

Although the idea of sampling is easiest to understand when you think about a very large population, it makes sense to use sampling methods in research studies of all types and sizes. After all, if you can reduce the effort and cost of doing a study, why wouldn’t you? And because sampling allows you to research larger target populations using the same resources as you would smaller ones, it dramatically opens up the possibilities for research.

Sampling is a little like having gears on a car or bicycle. Instead of always turning a set of wheels of a specific size and being constrained by their physical properties, it allows you to translate your effort to the wheels via the different gears, so you’re effectively choosing bigger or smaller wheels depending on the terrain you’re on and how much work you’re able to do.

Sampling allows you to “gear” your research so you’re less limited by the constraints of cost, time, and complexity that come with different population sizes.

It allows us to do things like carrying out exit polls during elections, map the spread and effects rates of epidemics across geographical areas, and carry out nationwide census research that provides a snapshot of society and culture.

Types of sampling

Sampling strategies in research vary widely across different disciplines and research areas, and from study to study.

There are two major types of sampling methods: probability and non-probability sampling.

  • Probability sampling , also known as random sampling , is a kind of sample selection where randomisation is used instead of deliberate choice. Each member of the population has a known, non-zero chance of being selected.
  • Non-probability sampling techniques are where the researcher deliberately picks items or individuals for the sample based on non-random factors such as convenience, geographic availability, or costs.

As we delve into these categories, it’s essential to understand the nuances and applications of each method to ensure that the chosen sampling strategy aligns with the research goals.

Probability sampling methods

There’s a wide range of probability sampling methods to explore and consider. Here are some of the best-known options.

1. Simple random sampling

With simple random sampling , every element in the population has an equal chance of being selected as part of the sample. It’s something like picking a name out of a hat. Simple random sampling can be done by anonymising the population – e.g. by assigning each item or person in the population a number and then picking numbers at random.

Pros: Simple random sampling is easy to do and cheap. Designed to ensure that every member of the population has an equal chance of being selected, it reduces the risk of bias compared to non-random sampling.

Cons: It offers no control for the researcher and may lead to unrepresentative groupings being picked by chance.

simple random sample

2. Systematic sampling

With systematic sampling the random selection only applies to the first item chosen. A rule then applies so that every nth item or person after that is picked.

Best practice is to sort your list in a random way to ensure that selections won’t be accidentally clustered together. This is commonly achieved using a random number generator. If that’s not available you might order your list alphabetically by first name and then pick every fifth name to eliminate bias, for example.

Next, you need to decide your sampling interval – for example, if your sample will be 10% of your full list, your sampling interval is one in 10 – and pick a random start between one and 10 – for example three. This means you would start with person number three on your list and pick every tenth person.

Pros: Systematic sampling is efficient and straightforward, especially when dealing with populations that have a clear order. It ensures a uniform selection across the population.

Cons: There’s a potential risk of introducing bias if there’s an unrecognised pattern in the population that aligns with the sampling interval.

3. Stratified sampling

Stratified sampling involves random selection within predefined groups. It’s a useful method for researchers wanting to determine what aspects of a sample are highly correlated with what’s being measured. They can then decide how to subdivide (stratify) it in a way that makes sense for the research.

For example, you want to measure the height of students at a college where 80% of students are female and 20% are male. We know that gender is highly correlated with height, and if we took a simple random sample of 200 students (out of the 2,000 who attend the college), we could by chance get 200 females and not one male. This would bias our results and we would underestimate the height of students overall. Instead, we could stratify by gender and make sure that 20% of our sample (40 students) are male and 80% (160 students) are female.

Pros: Stratified sampling enhances the representation of all identified subgroups within a population, leading to more accurate results in heterogeneous populations.

Cons: This method requires accurate knowledge about the population’s stratification, and its design and execution can be more intricate than other methods.

stratified sample

4. Cluster sampling

With cluster sampling, groups rather than individual units of the target population are selected at random for the sample. These might be pre-existing groups, such as people in certain zip codes or students belonging to an academic year.

Cluster sampling can be done by selecting the entire cluster, or in the case of two-stage cluster sampling, by randomly selecting the cluster itself, then selecting at random again within the cluster.

Pros: Cluster sampling is economically beneficial and logistically easier when dealing with vast and geographically dispersed populations.

Cons: Due to potential similarities within clusters, this method can introduce a greater sampling error compared to other methods.

Non-probability sampling methods

The non-probability sampling methodology doesn’t offer the same bias-removal benefits as probability sampling, but there are times when these types of sampling are chosen for expediency or simplicity. Here are some forms of non-probability sampling and how they work.

1. Convenience sampling

People or elements in a sample are selected on the basis of their accessibility and availability. If you are doing a research survey and you work at a university, for example, a convenience sample might consist of students or co-workers who happen to be on campus with open schedules who are willing to take your questionnaire .

This kind of sample can have value, especially if it’s done as an early or preliminary step, but significant bias will be introduced.

Pros: Convenience sampling is the most straightforward method, requiring minimal planning, making it quick to implement.

Cons: Due to its non-random nature, the method is highly susceptible to biases, and the results are often lacking in their application to the real world.

convenience sample

2. Quota sampling

Like the probability-based stratified sampling method, this approach aims to achieve a spread across the target population by specifying who should be recruited for a survey according to certain groups or criteria.

For example, your quota might include a certain number of males and a certain number of females. Alternatively, you might want your samples to be at a specific income level or in certain age brackets or ethnic groups.

Pros: Quota sampling ensures certain subgroups are adequately represented, making it great for when random sampling isn’t feasible but representation is necessary.

Cons: The selection within each quota is non-random and researchers’ discretion can influence the representation, which both strongly increase the risk of bias.

3. Purposive sampling

Participants for the sample are chosen consciously by researchers based on their knowledge and understanding of the research question at hand or their goals.

Also known as judgment sampling, this technique is unlikely to result in a representative sample , but it is a quick and fairly easy way to get a range of results or responses.

Pros: Purposive sampling targets specific criteria or characteristics, making it ideal for studies that require specialised participants or specific conditions.

Cons: It’s highly subjective and based on researchers’ judgment, which can introduce biases and limit the study’s real-world application.

4. Snowball or referral sampling

With this approach, people recruited to be part of a sample are asked to invite those they know to take part, who are then asked to invite their friends and family and so on. The participation radiates through a community of connected individuals like a snowball rolling downhill.

Pros: Especially useful for hard-to-reach or secretive populations, snowball sampling is effective for certain niche studies.

Cons: The method can introduce bias due to the reliance on participant referrals, and the choice of initial seeds can significantly influence the final sample.

snowball sample

What type of sampling should I use?

Choosing the right sampling method is a pivotal aspect of any research process, but it can be a stumbling block for many.

Here’s a structured approach to guide your decision.

1) Define your research goals

If you aim to get a general sense of a larger group, simple random or stratified sampling could be your best bet. For focused insights or studying unique communities, snowball or purposive sampling might be more suitable.

2) Assess the nature of your population

The nature of the group you’re studying can guide your method. For a diverse group with different categories, stratified sampling can ensure all segments are covered. If they’re widely spread geographically , cluster sampling becomes useful. If they’re arranged in a certain sequence or order, systematic sampling might be effective.

3) Consider your constraints

Your available time, budget and ease of accessing participants matter. Convenience or quota sampling can be practical for quicker studies, but they come with some trade-offs. If reaching everyone in your desired group is challenging, snowball or purposive sampling can be more feasible.

4) Determine the reach of your findings

Decide if you want your findings to represent a much broader group. For a wider representation, methods that include everyone fairly (like probability sampling ) are a good option. For specialised insights into specific groups, non-probability sampling methods can be more suitable.

5) Get feedback

Before fully committing, discuss your chosen method with others in your field and consider a test run.

Avoid or reduce sampling errors and bias

Using a sample is a kind of short-cut. If you could ask every single person in a population to take part in your study and have each of them reply, you’d have a highly accurate (and very labor-intensive) project on your hands.

But since that’s not realistic, sampling offers a “good-enough” solution that sacrifices some accuracy for the sake of practicality and ease. How much accuracy you lose out on depends on how well you control for sampling error, non-sampling error, and bias in your survey design . Our blog post helps you to steer clear of some of these issues.

How to choose the correct sample size

Finding the best sample size for your target population is something you’ll need to do again and again, as it’s different for every study.

To make life easier, we’ve provided a sample size calculator . To use it, you need to know your:

  • Population size
  • Confidence level
  • Margin of error (confidence interval)

If any of those terms are unfamiliar, have a look at our blog post on determining sample size for details of what they mean and how to find them.

Unlock the insights of yesterday to shape tomorrow

In the ever-evolving business landscape, relying on the most recent market research is paramount. Reflecting on 2022, brands and businesses can harness crucial insights to outmaneuver challenges and seize opportunities.

Equip yourself with this knowledge by exploring Qualtrics’ ‘2022 Market Research Global Trends’ report.

Delve into this comprehensive study to unearth:

  • How businesses made sense of tricky situations in 2022
  • Tips that really helped improve research results
  • Steps to take your findings and put them into action

Find out how Qualtrics XM can help you conduct world-class research

Related resources

Sampling and non-sampling errors 10 min read, how to determine sample size 16 min read, convenience sampling 15 min read, non-probability sampling 17 min read, probability sampling 8 min read, stratified random sampling 13 min read, simple random sampling 10 min read, request demo.

Ready to learn more about Qualtrics?

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Indian J Dermatol
  • v.61(5); Sep-Oct 2016

Methodology Series Module 5: Sampling Strategies

Maninder singh setia.

Epidemiologist, MGM Institute of Health Sciences, Navi Mumbai, Maharashtra, India

Once the research question and the research design have been finalised, it is important to select the appropriate sample for the study. The method by which the researcher selects the sample is the ‘ Sampling Method’. There are essentially two types of sampling methods: 1) probability sampling – based on chance events (such as random numbers, flipping a coin etc.); and 2) non-probability sampling – based on researcher's choice, population that accessible & available. Some of the non-probability sampling methods are: purposive sampling, convenience sampling, or quota sampling. Random sampling method (such as simple random sample or stratified random sample) is a form of probability sampling. It is important to understand the different sampling methods used in clinical studies and mention this method clearly in the manuscript. The researcher should not misrepresent the sampling method in the manuscript (such as using the term ‘ random sample’ when the researcher has used convenience sample). The sampling method will depend on the research question. For instance, the researcher may want to understand an issue in greater detail for one particular population rather than worry about the ‘ generalizability’ of these results. In such a scenario, the researcher may want to use ‘ purposive sampling’ for the study.

Introduction

The purpose of this section is to discuss various sampling methods used in research. After finalizing the research question and the research design, it is important to select the appropriate sample for the study. The method by which the researcher selects the sample is the “Sampling Method” [ Figure 1 ].

An external file that holds a picture, illustration, etc.
Object name is IJD-61-505-g001.jpg

Flowchart from “Universe” to “Sampling Method”

Why do we need to sample?

Let us answer this research question: What is the prevalence of HIV in the adult Indian population?

The best response to this question will be obtained when we test every adult Indian for HIV. However, this is logistically difficult, time consuming, expensive, and difficult for a single researcher – do not forget about ethics of conducting such a study. The government usually conducts an exercise regularly to measure certain outcomes in the whole population – ”the census.” However, as researchers, we often have limited time and resources. Hence, we will have to select a few adult Indians who will consent to be a part of the study. We will test them for HIV and present out results (as our estimates of HIV prevalence). These selected individuals are called our “sample.” We hope that we have selected the appropriate sample that is required to answer our research question.

The researcher should clearly and explicitly mention the sampling method in the manuscript. The description of these helps the reviewers and readers assess the validity and generalizability of the results. Furthermore, the authors should also acknowledge the limitations of their sampling method and its effects on estimated obtained in the study.

Types of Methods

We will try to understand some of these sampling methods that are commonly used in clinical research. There are essentially two types of sampling methods: (1) Probability sampling – based on chance events (such as random numbers, flipping a coin, etc.) and (2) nonprobability sampling – based on researcher's choice, population that accessible and available.

What is a “convenience sample?”

Research question: How many patients with psoriasis also have high cholesterol levels (according to our definition)?

We plan to conduct the study in the outpatient department of our hospital.

This is a common scenario for clinical studies. The researcher recruits the participants who are easily accessible in a clinical setting – this type of sample is called a “convenience sample.” Furthermore, in such a clinic-based setting, the researcher will approach all the psoriasis patients that he/she comes across. They are informed about the study, and all those who consent to be the study are evaluated for eligibility. If they meet the inclusion criteria (and need not be excluded as per the criteria), they are recruited for the study. Thus, this will be “consecutive consenting sample.”

This method is relatively easy and is one of the common types of sampling methods used (particularly in postgraduate dissertations).

Since this is clinic-based sample, the estimates from such a study may not necessarily be generalizable to the larger population. To begin with, the patients who access healthcare potentially have a different “health-seeking behavior” compared with those who do not access health in these settings. Furthermore, many of the clinical cases in tertiary care centers may be severe, complicated, or recalcitrant. Thus, the estimates of biological parameters or outcomes may be different in these compared with the general population. The researcher should clearly discuss in the manuscript/report as to how the convenience sample may have biased the estimates (for example: Overestimated or underestimated the outcome in the population studied).

What is a “random sample?”

A “random sample” is a probability sample where every individual has an equal and independent probability of being selected in the sample.

Please note that “random sample” does not mean arbitrary sample. For example, if the researcher selects 10–12 individuals from the waiting area (without any structure), it is not a random sample. Randomization is a specific process, and only samples that are recruited using this process is a “random sample.”

What is a “simple random sample?”

Let us recruit a “simple random sample” in the above example. The center only allows a fixed number of patients every day. All the patients have to confirm the appointment a day in advance and should present in the clinic between 9 and 9:30 a.m. for the appointment. Thus, by 9:30 a.m., you will all have all the individuals who will be examined day.

We wish to select 50% of these patients for posttreatment survey.

  • Make a list of all the patients present at 9:30 a.m.
  • Give a number to each individual
  • Use a “randomization method” to select five of these numbers. Although “random tables” have been used as a method of randomization, currently, many researchers use “computer-generated lists for random selection” of participants. Most of the statistical packages have programs for random selection of population. Please state the method that you have used for random selection in the manuscript
  • Recruit the individuals whose numbers have been selected by the randomization method.

The process is described in Figure 2 .

An external file that holds a picture, illustration, etc.
Object name is IJD-61-505-g002.jpg

Representation of Simple Random Sample

What is a major issue with this recruitment process?

As you may notice, “only males” have been recruited for the study. This scenario is possible in a simple random sample selection.

This is a limitation of this type of sampling method – population units which are smaller in number in the sampling frame may be underrepresented in this sample.

What is “stratified sample?”

In a stratified sample, the population is divided into two or more similar groups (based on demographic or clinical characteristics). The sample is recruited from each stratum. The researcher may use a simple random sample procedure within each stratum.

Let us address the limitation in the above example (selection of 50% of the participants for postprocedure survey).

  • Divide the list into two strata: Males and females
  • Use a “randomization method” to select three numbers among males and two numbers among females. As discussed earlier, the researcher may use random tables or computer generated random selection. Please state the method that you have used for random selection in the manuscript

The process is described in Figure 3 .

An external file that holds a picture, illustration, etc.
Object name is IJD-61-505-g003.jpg

Representation of Stratified Random Sample

Thus, with this sampling method, we ensure that people from both sexes are included in the sample. This type of sampling method is used for sampling when we want to ensure that minority populations (in number) are adequately represented in the sample.

Kindly note that in this example, we sampled 50% of the population in each stratum. However, the researcher may oversample in one particular stratum and under-sample in the other. For instance, in this example, we may have taken three females and three males (if want to ensure equal representation of both). All this should be discussed explicitly in methods.

What is a “systematic sample?”

Sometimes, the researcher may decide to include study participants using a fixed pattern. For example, the researcher may recruit every second patient, or every patient whose registration ends with an even number or those who are admitted in certain days of the week (Tuesday/Thursday/Saturday). This type of sample is generally easy to implement. However, a lot of the recruitments are based on the researcher and may lead to selection bias. Furthermore, patients who come to the hospital may differ on different days of the week. For example, a higher proportion of working individuals may access the hospital on Saturdays.

This is not a “random sample.” Please do not write that “we selected the participants using a random sample method” if you have selected the sample systematically.

Another type of sampling discussed by some authors is “systematic random sample.” The steps for this method are:

  • Make a list of all the potential recruits
  • Using a random method (described earlier) to select a starting point (example number 4)
  • Select this number and every fifth number from this starting point. Thus, the researcher will select number 9, 14, and so on.

Please note that the “skip” depends on the total number of potential participants and the total sample size. For instance, you have a total of fifty potential participants and you wish to recruit ten participants, do not skip to every 10 th patient.

Aday (1996) states that the skip depends on the total number of participants and the total sample size required.

  • Fraction = total number of participants/total sample size
  • In the above example, it will be 50/10 = 5
  • Thus, using a random table or computer-generated random number selection, the researcher will select a random number from 1 to 5
  • The number selected in two
  • The researcher selects the second patient
  • The next patient will be the fifth patient after patient number two – patient number 7
  • The next patient will be patient number 12 and so on.

What is a “cluster sample?”

For some studies, the sample is selected from larger units or “clusters.” This type of method is generally used for “community-based studies.”

Research question: What is the prevalence of dermatological conditions in school children in city XXXXX?

In this study, we will select students from multiple schools. Thus, each school becomes one cluster. Each individual child in the school has much in common with other children in the same school compared with children from other schools (for example, they are more likely to have the same socioeconomic background). Thus, these children are recruited from the same cluster.

If the researcher uses “cluster sample,” he/she also performs “cluster analysis.” The statistical methods for these are different compared with nonclustered analysis (the methods we use commonly).

What is a “multistage sample?”

In many studies, we have to combine multiple methods for the appropriate and required sample.

Let us use a multistage sample to answer this research question.

Research question: What is the prevalence of dermatological conditions in school children in city XXXXX? (Assumption: The city is divided into four zones).

We have a list of all the schools in the city. How do we sample them?

Method 1: Select 10% of the schools using “simple random sample” method.

Question: What is the problem with this type of method?

Answer: As discussed earlier, it is possible that we may miss most of the schools from one particular zone.

However, we are interested to ensure that all zones are adequately represented in the sample.

  • Stage 1: List all the schools in all zones
  • Stage 2: Select 10% of schools from each zone using “random selection method” (first stratum)
  • Stage 3: List all the students in Grade VIII, IX, and X(population of interest) in each school (second stratum)
  • Stage 4: Create a separate list for males and females in each grade in each school (third stratum)
  • Stage 5: Select 10% of males and females in each grade in each school.

Please note that this is just an example. You may have to change the proportion selected from each stratum based on the sample size and the total number of individuals in each stratum.

What are other types of sampling methods?

Although these are the common types of sampling methods that we use in clinical studies, we have also listed some other sampling methods in Table 1 .

Some other types of sampling methods

An external file that holds a picture, illustration, etc.
Object name is IJD-61-505-g004.jpg

  • It is important to understand the different sampling methods used in clinical studies. As stated earlier, please mention this method clearly in the manuscript
  • Do not misrepresent the sampling method. For example, if you have not used “random method” for selection, do not state it in the manuscript
  • Sometimes, the researcher may want to understand an issue in greater detail for one particular population rather than worry about the “generalizability” of these results. In such a scenario, the researcher may want to use ‘purposive sampling’.

Financial support and sponsorship

Conflicts of interest.

There are no conflicts of interest.

Bibliography

Logo for Open Educational Resources

Chapter 5. Sampling

Introduction.

Most Americans will experience unemployment at some point in their lives. Sarah Damaske ( 2021 ) was interested in learning about how men and women experience unemployment differently. To answer this question, she interviewed unemployed people. After conducting a “pilot study” with twenty interviewees, she realized she was also interested in finding out how working-class and middle-class persons experienced unemployment differently. She found one hundred persons through local unemployment offices. She purposefully selected a roughly equal number of men and women and working-class and middle-class persons for the study. This would allow her to make the kinds of comparisons she was interested in. She further refined her selection of persons to interview:

I decided that I needed to be able to focus my attention on gender and class; therefore, I interviewed only people born between 1962 and 1987 (ages 28–52, the prime working and child-rearing years), those who worked full-time before their job loss, those who experienced an involuntary job loss during the past year, and those who did not lose a job for cause (e.g., were not fired because of their behavior at work). ( 244 )

The people she ultimately interviewed compose her sample. They represent (“sample”) the larger population of the involuntarily unemployed. This “theoretically informed stratified sampling design” allowed Damaske “to achieve relatively equal distribution of participation across gender and class,” but it came with some limitations. For one, the unemployment centers were located in primarily White areas of the country, so there were very few persons of color interviewed. Qualitative researchers must make these kinds of decisions all the time—who to include and who not to include. There is never an absolutely correct decision, as the choice is linked to the particular research question posed by the particular researcher, although some sampling choices are more compelling than others. In this case, Damaske made the choice to foreground both gender and class rather than compare all middle-class men and women or women of color from different class positions or just talk to White men. She leaves the door open for other researchers to sample differently. Because science is a collective enterprise, it is most likely someone will be inspired to conduct a similar study as Damaske’s but with an entirely different sample.

This chapter is all about sampling. After you have developed a research question and have a general idea of how you will collect data (observations or interviews), how do you go about actually finding people and sites to study? Although there is no “correct number” of people to interview, the sample should follow the research question and research design. You might remember studying sampling in a quantitative research course. Sampling is important here too, but it works a bit differently. Unlike quantitative research, qualitative research involves nonprobability sampling. This chapter explains why this is so and what qualities instead make a good sample for qualitative research.

Quick Terms Refresher

  • The population is the entire group that you want to draw conclusions about.
  • The sample is the specific group of individuals that you will collect data from.
  • Sampling frame is the actual list of individuals that the sample will be drawn from. Ideally, it should include the entire target population (and nobody who is not part of that population).
  • Sample size is how many individuals (or units) are included in your sample.

The “Who” of Your Research Study

After you have turned your general research interest into an actual research question and identified an approach you want to take to answer that question, you will need to specify the people you will be interviewing or observing. In most qualitative research, the objects of your study will indeed be people. In some cases, however, your objects might be content left by people (e.g., diaries, yearbooks, photographs) or documents (official or unofficial) or even institutions (e.g., schools, medical centers) and locations (e.g., nation-states, cities). Chances are, whatever “people, places, or things” are the objects of your study, you will not really be able to talk to, observe, or follow every single individual/object of the entire population of interest. You will need to create a sample of the population . Sampling in qualitative research has different purposes and goals than sampling in quantitative research. Sampling in both allows you to say something of interest about a population without having to include the entire population in your sample.

We begin this chapter with the case of a population of interest composed of actual people. After we have a better understanding of populations and samples that involve real people, we’ll discuss sampling in other types of qualitative research, such as archival research, content analysis, and case studies. We’ll then move to a larger discussion about the difference between sampling in qualitative research generally versus quantitative research, then we’ll move on to the idea of “theoretical” generalizability, and finally, we’ll conclude with some practical tips on the correct “number” to include in one’s sample.

Sampling People

To help think through samples, let’s imagine we want to know more about “vaccine hesitancy.” We’ve all lived through 2020 and 2021, and we know that a sizable number of people in the United States (and elsewhere) were slow to accept vaccines, even when these were freely available. By some accounts, about one-third of Americans initially refused vaccination. Why is this so? Well, as I write this in the summer of 2021, we know that some people actively refused the vaccination, thinking it was harmful or part of a government plot. Others were simply lazy or dismissed the necessity. And still others were worried about harmful side effects. The general population of interest here (all adult Americans who were not vaccinated by August 2021) may be as many as eighty million people. We clearly cannot talk to all of them. So we will have to narrow the number to something manageable. How can we do this?

Null

First, we have to think about our actual research question and the form of research we are conducting. I am going to begin with a quantitative research question. Quantitative research questions tend to be simpler to visualize, at least when we are first starting out doing social science research. So let us say we want to know what percentage of each kind of resistance is out there and how race or class or gender affects vaccine hesitancy. Again, we don’t have the ability to talk to everyone. But harnessing what we know about normal probability distributions (see quantitative methods for more on this), we can find this out through a sample that represents the general population. We can’t really address these particular questions if we only talk to White women who go to college with us. And if you are really trying to generalize the specific findings of your sample to the larger population, you will have to employ probability sampling , a sampling technique where a researcher sets a selection of a few criteria and chooses members of a population randomly. Why randomly? If truly random, all the members have an equal opportunity to be a part of the sample, and thus we avoid the problem of having only our friends and neighbors (who may be very different from other people in the population) in the study. Mathematically, there is going to be a certain number that will be large enough to allow us to generalize our particular findings from our sample population to the population at large. It might surprise you how small that number can be. Election polls of no more than one thousand people are routinely used to predict actual election outcomes of millions of people. Below that number, however, you will not be able to make generalizations. Talking to five people at random is simply not enough people to predict a presidential election.

In order to answer quantitative research questions of causality, one must employ probability sampling. Quantitative researchers try to generalize their findings to a larger population. Samples are designed with that in mind. Qualitative researchers ask very different questions, though. Qualitative research questions are not about “how many” of a certain group do X (in this case, what percentage of the unvaccinated hesitate for concern about safety rather than reject vaccination on political grounds). Qualitative research employs nonprobability sampling . By definition, not everyone has an equal opportunity to be included in the sample. The researcher might select White women they go to college with to provide insight into racial and gender dynamics at play. Whatever is found by doing so will not be generalizable to everyone who has not been vaccinated, or even all White women who have not been vaccinated, or even all White women who have not been vaccinated who are in this particular college. That is not the point of qualitative research at all. This is a really important distinction, so I will repeat in bold: Qualitative researchers are not trying to statistically generalize specific findings to a larger population . They have not failed when their sample cannot be generalized, as that is not the point at all.

In the previous paragraph, I said it would be perfectly acceptable for a qualitative researcher to interview five White women with whom she goes to college about their vaccine hesitancy “to provide insight into racial and gender dynamics at play.” The key word here is “insight.” Rather than use a sample as a stand-in for the general population, as quantitative researchers do, the qualitative researcher uses the sample to gain insight into a process or phenomenon. The qualitative researcher is not going to be content with simply asking each of the women to state her reason for not being vaccinated and then draw conclusions that, because one in five of these women were concerned about their health, one in five of all people were also concerned about their health. That would be, frankly, a very poor study indeed. Rather, the qualitative researcher might sit down with each of the women and conduct a lengthy interview about what the vaccine means to her, why she is hesitant, how she manages her hesitancy (how she explains it to her friends), what she thinks about others who are unvaccinated, what she thinks of those who have been vaccinated, and what she knows or thinks she knows about COVID-19. The researcher might include specific interview questions about the college context, about their status as White women, about the political beliefs they hold about racism in the US, and about how their own political affiliations may or may not provide narrative scripts about “protective whiteness.” There are many interesting things to ask and learn about and many things to discover. Where a quantitative researcher begins with clear parameters to set their population and guide their sample selection process, the qualitative researcher is discovering new parameters, making it impossible to engage in probability sampling.

Looking at it this way, sampling for qualitative researchers needs to be more strategic. More theoretically informed. What persons can be interviewed or observed that would provide maximum insight into what is still unknown? In other words, qualitative researchers think through what cases they could learn the most from, and those are the cases selected to study: “What would be ‘bias’ in statistical sampling, and therefore a weakness, becomes intended focus in qualitative sampling, and therefore a strength. The logic and power of purposeful sampling like in selecting information-rich cases for study in depth. Information-rich cases are those from which one can learn a great deal about issues of central importance to the purpose of the inquiry, thus the term purposeful sampling” ( Patton 2002:230 ; emphases in the original).

Before selecting your sample, though, it is important to clearly identify the general population of interest. You need to know this before you can determine the sample. In our example case, it is “adult Americans who have not yet been vaccinated.” Depending on the specific qualitative research question, however, it might be “adult Americans who have been vaccinated for political reasons” or even “college students who have not been vaccinated.” What insights are you seeking? Do you want to know how politics is affecting vaccination? Or do you want to understand how people manage being an outlier in a particular setting (unvaccinated where vaccinations are heavily encouraged if not required)? More clearly stated, your population should align with your research question . Think back to the opening story about Damaske’s work studying the unemployed. She drew her sample narrowly to address the particular questions she was interested in pursuing. Knowing your questions or, at a minimum, why you are interested in the topic will allow you to draw the best sample possible to achieve insight.

Once you have your population in mind, how do you go about getting people to agree to be in your sample? In qualitative research, it is permissible to find people by convenience. Just ask for people who fit your sample criteria and see who shows up. Or reach out to friends and colleagues and see if they know anyone that fits. Don’t let the name convenience sampling mislead you; this is not exactly “easy,” and it is certainly a valid form of sampling in qualitative research. The more unknowns you have about what you will find, the more convenience sampling makes sense. If you don’t know how race or class or political affiliation might matter, and your population is unvaccinated college students, you can construct a sample of college students by placing an advertisement in the student paper or posting a flyer on a notice board. Whoever answers is your sample. That is what is meant by a convenience sample. A common variation of convenience sampling is snowball sampling . This is particularly useful if your target population is hard to find. Let’s say you posted a flyer about your study and only two college students responded. You could then ask those two students for referrals. They tell their friends, and those friends tell other friends, and, like a snowball, your sample gets bigger and bigger.

Researcher Note

Gaining Access: When Your Friend Is Your Research Subject

My early experience with qualitative research was rather unique. At that time, I needed to do a project that required me to interview first-generation college students, and my friends, with whom I had been sharing a dorm for two years, just perfectly fell into the sample category. Thus, I just asked them and easily “gained my access” to the research subject; I know them, we are friends, and I am part of them. I am an insider. I also thought, “Well, since I am part of the group, I can easily understand their language and norms, I can capture their honesty, read their nonverbal cues well, will get more information, as they will be more opened to me because they trust me.” All in all, easy access with rich information. But, gosh, I did not realize that my status as an insider came with a price! When structuring the interview questions, I began to realize that rather than focusing on the unique experiences of my friends, I mostly based the questions on my own experiences, assuming we have similar if not the same experiences. I began to struggle with my objectivity and even questioned my role; am I doing this as part of the group or as a researcher? I came to know later that my status as an insider or my “positionality” may impact my research. It not only shapes the process of data collection but might heavily influence my interpretation of the data. I came to realize that although my inside status came with a lot of benefits (especially for access), it could also bring some drawbacks.

—Dede Setiono, PhD student focusing on international development and environmental policy, Oregon State University

The more you know about what you might find, the more strategic you can be. If you wanted to compare how politically conservative and politically liberal college students explained their vaccine hesitancy, for example, you might construct a sample purposively, finding an equal number of both types of students so that you can make those comparisons in your analysis. This is what Damaske ( 2021 ) did. You could still use convenience or snowball sampling as a way of recruitment. Post a flyer at the conservative student club and then ask for referrals from the one student that agrees to be interviewed. As with convenience sampling, there are variations of purposive sampling as well as other names used (e.g., judgment, quota, stratified, criterion, theoretical). Try not to get bogged down in the nomenclature; instead, focus on identifying the general population that matches your research question and then using a sampling method that is most likely to provide insight, given the types of questions you have.

There are all kinds of ways of being strategic with sampling in qualitative research. Here are a few of my favorite techniques for maximizing insight:

  • Consider using “extreme” or “deviant” cases. Maybe your college houses a prominent anti-vaxxer who has written about and demonstrated against the college’s policy on vaccines. You could learn a lot from that single case (depending on your research question, of course).
  • Consider “intensity”: people and cases and circumstances where your questions are more likely to feature prominently (but not extremely or deviantly). For example, you could compare those who volunteer at local Republican and Democratic election headquarters during an election season in a study on why party matters. Those who volunteer are more likely to have something to say than those who are more apathetic.
  • Maximize variation, as with the case of “politically liberal” versus “politically conservative,” or include an array of social locations (young vs. old; Northwest vs. Southeast region). This kind of heterogeneity sampling can capture and describe the central themes that cut across the variations: any common patterns that emerge, even in this wildly mismatched sample, are probably important to note!
  • Rather than maximize the variation, you could select a small homogenous sample to describe some particular subgroup in depth. Focus groups are often the best form of data collection for homogeneity sampling.
  • Think about which cases are “critical” or politically important—ones that “if it happens here, it would happen anywhere” or a case that is politically sensitive, as with the single “blue” (Democratic) county in a “red” (Republican) state. In both, you are choosing a site that would yield the most information and have the greatest impact on the development of knowledge.
  • On the other hand, sometimes you want to select the “typical”—the typical college student, for example. You are trying to not generalize from the typical but illustrate aspects that may be typical of this case or group. When selecting for typicality, be clear with yourself about why the typical matches your research questions (and who might be excluded or marginalized in doing so).
  • Finally, it is often a good idea to look for disconfirming cases : if you are at the stage where you have a hypothesis (of sorts), you might select those who do not fit your hypothesis—you will surely learn something important there. They may be “exceptions that prove the rule” or exceptions that force you to alter your findings in order to make sense of these additional cases.

In addition to all these sampling variations, there is the theoretical approach taken by grounded theorists in which the researcher samples comparative people (or events) on the basis of their potential to represent important theoretical constructs. The sample, one can say, is by definition representative of the phenomenon of interest. It accompanies the constant comparative method of analysis. In the words of the funders of Grounded Theory , “Theoretical sampling is sampling on the basis of the emerging concepts, with the aim being to explore the dimensional range or varied conditions along which the properties of the concepts vary” ( Strauss and Corbin 1998:73 ).

When Your Population is Not Composed of People

I think it is easiest for most people to think of populations and samples in terms of people, but sometimes our units of analysis are not actually people. They could be places or institutions. Even so, you might still want to talk to people or observe the actions of people to understand those places or institutions. Or not! In the case of content analyses (see chapter 17), you won’t even have people involved at all but rather documents or films or photographs or news clippings. Everything we have covered about sampling applies to other units of analysis too. Let’s work through some examples.

Case Studies

When constructing a case study, it is helpful to think of your cases as sample populations in the same way that we considered people above. If, for example, you are comparing campus climates for diversity, your overall population may be “four-year college campuses in the US,” and from there you might decide to study three college campuses as your sample. Which three? Will you use purposeful sampling (perhaps [1] selecting three colleges in Oregon that are different sizes or [2] selecting three colleges across the US located in different political cultures or [3] varying the three colleges by racial makeup of the student body)? Or will you select three colleges at random, out of convenience? There are justifiable reasons for all approaches.

As with people, there are different ways of maximizing insight in your sample selection. Think about the following rationales: typical, diverse, extreme, deviant, influential, crucial, or even embodying a particular “pathway” ( Gerring 2008 ). When choosing a case or particular research site, Rubin ( 2021 ) suggests you bear in mind, first, what you are leaving out by selecting this particular case/site; second, what you might be overemphasizing by studying this case/site and not another; and, finally, whether you truly need to worry about either of those things—“that is, what are the sources of bias and how bad are they for what you are trying to do?” ( 89 ).

Once you have selected your cases, you may still want to include interviews with specific people or observations at particular sites within those cases. Then you go through possible sampling approaches all over again to determine which people will be contacted.

Content: Documents, Narrative Accounts, And So On

Although not often discussed as sampling, your selection of documents and other units to use in various content/historical analyses is subject to similar considerations. When you are asking quantitative-type questions (percentages and proportionalities of a general population), you will want to follow probabilistic sampling. For example, I created a random sample of accounts posted on the website studentloanjustice.org to delineate the types of problems people were having with student debt ( Hurst 2007 ). Even though my data was qualitative (narratives of student debt), I was actually asking a quantitative-type research question, so it was important that my sample was representative of the larger population (debtors who posted on the website). On the other hand, when you are asking qualitative-type questions, the selection process should be very different. In that case, use nonprobabilistic techniques, either convenience (where you are really new to this data and do not have the ability to set comparative criteria or even know what a deviant case would be) or some variant of purposive sampling. Let’s say you were interested in the visual representation of women in media published in the 1950s. You could select a national magazine like Time for a “typical” representation (and for its convenience, as all issues are freely available on the web and easy to search). Or you could compare one magazine known for its feminist content versus one antifeminist. The point is, sample selection is important even when you are not interviewing or observing people.

Goals of Qualitative Sampling versus Goals of Quantitative Sampling

We have already discussed some of the differences in the goals of quantitative and qualitative sampling above, but it is worth further discussion. The quantitative researcher seeks a sample that is representative of the population of interest so that they may properly generalize the results (e.g., if 80 percent of first-gen students in the sample were concerned with costs of college, then we can say there is a strong likelihood that 80 percent of first-gen students nationally are concerned with costs of college). The qualitative researcher does not seek to generalize in this way . They may want a representative sample because they are interested in typical responses or behaviors of the population of interest, but they may very well not want a representative sample at all. They might want an “extreme” or deviant case to highlight what could go wrong with a particular situation, or maybe they want to examine just one case as a way of understanding what elements might be of interest in further research. When thinking of your sample, you will have to know why you are selecting the units, and this relates back to your research question or sets of questions. It has nothing to do with having a representative sample to generalize results. You may be tempted—or it may be suggested to you by a quantitatively minded member of your committee—to create as large and representative a sample as you possibly can to earn credibility from quantitative researchers. Ignore this temptation or suggestion. The only thing you should be considering is what sample will best bring insight into the questions guiding your research. This has implications for the number of people (or units) in your study as well, which is the topic of the next section.

What is the Correct “Number” to Sample?

Because we are not trying to create a generalizable representative sample, the guidelines for the “number” of people to interview or news stories to code are also a bit more nebulous. There are some brilliant insightful studies out there with an n of 1 (meaning one person or one account used as the entire set of data). This is particularly so in the case of autoethnography, a variation of ethnographic research that uses the researcher’s own subject position and experiences as the basis of data collection and analysis. But it is true for all forms of qualitative research. There are no hard-and-fast rules here. The number to include is what is relevant and insightful to your particular study.

That said, humans do not thrive well under such ambiguity, and there are a few helpful suggestions that can be made. First, many qualitative researchers talk about “saturation” as the end point for data collection. You stop adding participants when you are no longer getting any new information (or so very little that the cost of adding another interview subject or spending another day in the field exceeds any likely benefits to the research). The term saturation was first used here by Glaser and Strauss ( 1967 ), the founders of Grounded Theory. Here is their explanation: “The criterion for judging when to stop sampling the different groups pertinent to a category is the category’s theoretical saturation . Saturation means that no additional data are being found whereby the sociologist can develop properties of the category. As he [or she] sees similar instances over and over again, the researcher becomes empirically confident that a category is saturated. [They go] out of [their] way to look for groups that stretch diversity of data as far as possible, just to make certain that saturation is based on the widest possible range of data on the category” ( 61 ).

It makes sense that the term was developed by grounded theorists, since this approach is rather more open-ended than other approaches used by qualitative researchers. With so much left open, having a guideline of “stop collecting data when you don’t find anything new” is reasonable. However, saturation can’t help much when first setting out your sample. How do you know how many people to contact to interview? What number will you put down in your institutional review board (IRB) protocol (see chapter 8)? You may guess how many people or units it will take to reach saturation, but there really is no way to know in advance. The best you can do is think about your population and your questions and look at what others have done with similar populations and questions.

Here are some suggestions to use as a starting point: For phenomenological studies, try to interview at least ten people for each major category or group of people . If you are comparing male-identified, female-identified, and gender-neutral college students in a study on gender regimes in social clubs, that means you might want to design a sample of thirty students, ten from each group. This is the minimum suggested number. Damaske’s ( 2021 ) sample of one hundred allows room for up to twenty-five participants in each of four “buckets” (e.g., working-class*female, working-class*male, middle-class*female, middle-class*male). If there is more than one comparative group (e.g., you are comparing students attending three different colleges, and you are comparing White and Black students in each), you can sometimes reduce the number for each group in your sample to five for, in this case, thirty total students. But that is really a bare minimum you will want to go. A lot of people will not trust you with only “five” cases in a bucket. Lareau ( 2021:24 ) advises a minimum of seven or nine for each bucket (or “cell,” in her words). The point is to think about what your analyses might look like and how comfortable you will be with a certain number of persons fitting each category.

Because qualitative research takes so much time and effort, it is rare for a beginning researcher to include more than thirty to fifty people or units in the study. You may not be able to conduct all the comparisons you might want simply because you cannot manage a larger sample. In that case, the limits of who you can reach or what you can include may influence you to rethink an original overcomplicated research design. Rather than include students from every racial group on a campus, for example, you might want to sample strategically, thinking about the most contrast (insightful), possibly excluding majority-race (White) students entirely, and simply using previous literature to fill in gaps in our understanding. For example, one of my former students was interested in discovering how race and class worked at a predominantly White institution (PWI). Due to time constraints, she simplified her study from an original sample frame of middle-class and working-class domestic Black and international African students (four buckets) to a sample frame of domestic Black and international African students (two buckets), allowing the complexities of class to come through individual accounts rather than from part of the sample frame. She wisely decided not to include White students in the sample, as her focus was on how minoritized students navigated the PWI. She was able to successfully complete her project and develop insights from the data with fewer than twenty interviewees. [1]

But what if you had unlimited time and resources? Would it always be better to interview more people or include more accounts, documents, and units of analysis? No! Your sample size should reflect your research question and the goals you have set yourself. Larger numbers can sometimes work against your goals. If, for example, you want to help bring out individual stories of success against the odds, adding more people to the analysis can end up drowning out those individual stories. Sometimes, the perfect size really is one (or three, or five). It really depends on what you are trying to discover and achieve in your study. Furthermore, studies of one hundred or more (people, documents, accounts, etc.) can sometimes be mistaken for quantitative research. Inevitably, the large sample size will push the researcher into simplifying the data numerically. And readers will begin to expect generalizability from such a large sample.

To summarize, “There are no rules for sample size in qualitative inquiry. Sample size depends on what you want to know, the purpose of the inquiry, what’s at stake, what will be useful, what will have credibility, and what can be done with available time and resources” ( Patton 2002:244 ).

How did you find/construct a sample?

Since qualitative researchers work with comparatively small sample sizes, getting your sample right is rather important. Yet it is also difficult to accomplish. For instance, a key question you need to ask yourself is whether you want a homogeneous or heterogeneous sample. In other words, do you want to include people in your study who are by and large the same, or do you want to have diversity in your sample?

For many years, I have studied the experiences of students who were the first in their families to attend university. There is a rather large number of sampling decisions I need to consider before starting the study. (1) Should I only talk to first-in-family students, or should I have a comparison group of students who are not first-in-family? (2) Do I need to strive for a gender distribution that matches undergraduate enrollment patterns? (3) Should I include participants that reflect diversity in gender identity and sexuality? (4) How about racial diversity? First-in-family status is strongly related to some ethnic or racial identity. (5) And how about areas of study?

As you can see, if I wanted to accommodate all these differences and get enough study participants in each category, I would quickly end up with a sample size of hundreds, which is not feasible in most qualitative research. In the end, for me, the most important decision was to maximize the voices of first-in-family students, which meant that I only included them in my sample. As for the other categories, I figured it was going to be hard enough to find first-in-family students, so I started recruiting with an open mind and an understanding that I may have to accept a lack of gender, sexuality, or racial diversity and then not be able to say anything about these issues. But I would definitely be able to speak about the experiences of being first-in-family.

—Wolfgang Lehmann, author of “Habitus Transformation and Hidden Injuries”

Examples of “Sample” Sections in Journal Articles

Think about some of the studies you have read in college, especially those with rich stories and accounts about people’s lives. Do you know how the people were selected to be the focus of those stories? If the account was published by an academic press (e.g., University of California Press or Princeton University Press) or in an academic journal, chances are that the author included a description of their sample selection. You can usually find these in a methodological appendix (book) or a section on “research methods” (article).

Here are two examples from recent books and one example from a recent article:

Example 1 . In It’s Not like I’m Poor: How Working Families Make Ends Meet in a Post-welfare World , the research team employed a mixed methods approach to understand how parents use the earned income tax credit, a refundable tax credit designed to provide relief for low- to moderate-income working people ( Halpern-Meekin et al. 2015 ). At the end of their book, their first appendix is “Introduction to Boston and the Research Project.” After describing the context of the study, they include the following description of their sample selection:

In June 2007, we drew 120 names at random from the roughly 332 surveys we gathered between February and April. Within each racial and ethnic group, we aimed for one-third married couples with children and two-thirds unmarried parents. We sent each of these families a letter informing them of the opportunity to participate in the in-depth portion of our study and then began calling the home and cell phone numbers they provided us on the surveys and knocking on the doors of the addresses they provided.…In the end, we interviewed 115 of the 120 families originally selected for the in-depth interview sample (the remaining five families declined to participate). ( 22 )

Was their sample selection based on convenience or purpose? Why do you think it was important for them to tell you that five families declined to be interviewed? There is actually a trick here, as the names were pulled randomly from a survey whose sample design was probabilistic. Why is this important to know? What can we say about the representativeness or the uniqueness of whatever findings are reported here?

Example 2 . In When Diversity Drops , Park ( 2013 ) examines the impact of decreasing campus diversity on the lives of college students. She does this through a case study of one student club, the InterVarsity Christian Fellowship (IVCF), at one university (“California University,” a pseudonym). Here is her description:

I supplemented participant observation with individual in-depth interviews with sixty IVCF associates, including thirty-four current students, eight former and current staff members, eleven alumni, and seven regional or national staff members. The racial/ethnic breakdown was twenty-five Asian Americans (41.6 percent), one Armenian (1.6 percent), twelve people who were black (20.0 percent), eight Latino/as (13.3 percent), three South Asian Americans (5.0 percent), and eleven people who were white (18.3 percent). Twenty-nine were men, and thirty-one were women. Looking back, I note that the higher number of Asian Americans reflected both the group’s racial/ethnic composition and my relative ease about approaching them for interviews. ( 156 )

How can you tell this is a convenience sample? What else do you note about the sample selection from this description?

Example 3. The last example is taken from an article published in the journal Research in Higher Education . Published articles tend to be more formal than books, at least when it comes to the presentation of qualitative research. In this article, Lawson ( 2021 ) is seeking to understand why female-identified college students drop out of majors that are dominated by male-identified students (e.g., engineering, computer science, music theory). Here is the entire relevant section of the article:

Method Participants Data were collected as part of a larger study designed to better understand the daily experiences of women in MDMs [male-dominated majors].…Participants included 120 students from a midsize, Midwestern University. This sample included 40 women and 40 men from MDMs—defined as any major where at least 2/3 of students are men at both the university and nationally—and 40 women from GNMs—defined as any may where 40–60% of students are women at both the university and nationally.… Procedure A multi-faceted approach was used to recruit participants; participants were sent targeted emails (obtained based on participants’ reported gender and major listings), campus-wide emails sent through the University’s Communication Center, flyers, and in-class presentations. Recruitment materials stated that the research focused on the daily experiences of college students, including classroom experiences, stressors, positive experiences, departmental contexts, and career aspirations. Interested participants were directed to email the study coordinator to verify eligibility (at least 18 years old, man/woman in MDM or woman in GNM, access to a smartphone). Sixteen interested individuals were not eligible for the study due to the gender/major combination. ( 482ff .)

What method of sample selection was used by Lawson? Why is it important to define “MDM” at the outset? How does this definition relate to sampling? Why were interested participants directed to the study coordinator to verify eligibility?

Final Words

I have found that students often find it difficult to be specific enough when defining and choosing their sample. It might help to think about your sample design and sample recruitment like a cookbook. You want all the details there so that someone else can pick up your study and conduct it as you intended. That person could be yourself, but this analogy might work better if you have someone else in mind. When I am writing down recipes, I often think of my sister and try to convey the details she would need to duplicate the dish. We share a grandmother whose recipes are full of handwritten notes in the margins, in spidery ink, that tell us what bowl to use when or where things could go wrong. Describe your sample clearly, convey the steps required accurately, and then add any other details that will help keep you on track and remind you why you have chosen to limit possible interviewees to those of a certain age or class or location. Imagine actually going out and getting your sample (making your dish). Do you have all the necessary details to get started?

Table 5.1. Sampling Type and Strategies

Further Readings

Fusch, Patricia I., and Lawrence R. Ness. 2015. “Are We There Yet? Data Saturation in Qualitative Research.” Qualitative Report 20(9):1408–1416.

Saunders, Benjamin, Julius Sim, Tom Kinstone, Shula Baker, Jackie Waterfield, Bernadette Bartlam, Heather Burroughs, and Clare Jinks. 2018. “Saturation in Qualitative Research: Exploring Its Conceptualization and Operationalization.”  Quality & Quantity  52(4):1893–1907.

  • Rubin ( 2021 ) suggests a minimum of twenty interviews (but safer with thirty) for an interview-based study and a minimum of three to six months in the field for ethnographic studies. For a content-based study, she suggests between five hundred and one thousand documents, although some will be “very small” ( 243–244 ). ↵

The process of selecting people or other units of analysis to represent a larger population. In quantitative research, this representation is taken quite literally, as statistically representative.  In qualitative research, in contrast, sample selection is often made based on potential to generate insight about a particular topic or phenomenon.

The actual list of individuals that the sample will be drawn from. Ideally, it should include the entire target population (and nobody who is not part of that population).  Sampling frames can differ from the larger population when specific exclusions are inherent, as in the case of pulling names randomly from voter registration rolls where not everyone is a registered voter.  This difference in frame and population can undercut the generalizability of quantitative results.

The specific group of individuals that you will collect data from.  Contrast population.

The large group of interest to the researcher.  Although it will likely be impossible to design a study that incorporates or reaches all members of the population of interest, this should be clearly defined at the outset of a study so that a reasonable sample of the population can be taken.  For example, if one is studying working-class college students, the sample may include twenty such students attending a particular college, while the population is “working-class college students.”  In quantitative research, clearly defining the general population of interest is a necessary step in generalizing results from a sample.  In qualitative research, defining the population is conceptually important for clarity.

A sampling strategy in which the sample is chosen to represent (numerically) the larger population from which it is drawn by random selection.  Each person in the population has an equal chance of making it into the sample.  This is often done through a lottery or other chance mechanisms (e.g., a random selection of every twelfth name on an alphabetical list of voters).  Also known as random sampling .

The selection of research participants or other data sources based on availability or accessibility, in contrast to purposive sampling .

A sample generated non-randomly by asking participants to help recruit more participants the idea being that a person who fits your sampling criteria probably knows other people with similar criteria.

Broad codes that are assigned to the main issues emerging in the data; identifying themes is often part of initial coding . 

A form of case selection focusing on examples that do not fit the emerging patterns. This allows the researcher to evaluate rival explanations or to define the limitations of their research findings. While disconfirming cases are found (not sought out), researchers should expand their analysis or rethink their theories to include/explain them.

A methodological tradition of inquiry and approach to analyzing qualitative data in which theories emerge from a rigorous and systematic process of induction.  This approach was pioneered by the sociologists Glaser and Strauss (1967).  The elements of theory generated from comparative analysis of data are, first, conceptual categories and their properties and, second, hypotheses or generalized relations among the categories and their properties – “The constant comparing of many groups draws the [researcher’s] attention to their many similarities and differences.  Considering these leads [the researcher] to generate abstract categories and their properties, which, since they emerge from the data, will clearly be important to a theory explaining the kind of behavior under observation.” (36).

The result of probability sampling, in which a sample is chosen to represent (numerically) the larger population from which it is drawn by random selection.  Each person in the population has an equal chance of making it into the random sample.  This is often done through a lottery or other chance mechanisms (e.g., the random selection of every twelfth name on an alphabetical list of voters).  This is typically not required in qualitative research but rather essential for the generalizability of quantitative research.

A form of case selection or purposeful sampling in which cases that are unusual or special in some way are chosen to highlight processes or to illuminate gaps in our knowledge of a phenomenon.   See also extreme case .

The point at which you can conclude data collection because every person you are interviewing, the interaction you are observing, or content you are analyzing merely confirms what you have already noted.  Achieving saturation is often used as the justification for the final sample size.

The accuracy with which results or findings can be transferred to situations or people other than those originally studied.  Qualitative studies generally are unable to use (and are uninterested in) statistical generalizability where the sample population is said to be able to predict or stand in for a larger population of interest.  Instead, qualitative researchers often discuss “theoretical generalizability,” in which the findings of a particular study can shed light on processes and mechanisms that may be at play in other settings.  See also statistical generalization and theoretical generalization .

A term used by IRBs to denote all materials aimed at recruiting participants into a research study (including printed advertisements, scripts, audio or video tapes, or websites).  Copies of this material are required in research protocols submitted to IRB.

Introduction to Qualitative Research Methods Copyright © 2023 by Allison Hurst is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License , except where otherwise noted.

Logo for University of Southern Queensland

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Sampling is the statistical process of selecting a subset—called a ‘sample’—of a population of interest for the purpose of making observations and statistical inferences about that population. Social science research is generally about inferring patterns of behaviours within specific populations. We cannot study entire populations because of feasibility and cost constraints, and hence, we must select a representative sample from the population of interest for observation and analysis. It is extremely important to choose a sample that is truly representative of the population so that the inferences derived from the sample can be generalised back to the population of interest. Improper and biased sampling is the primary reason for the often divergent and erroneous inferences reported in opinion polls and exit polls conducted by different polling groups such as CNN/Gallup Poll, ABC, and CBS, prior to every US Presidential election.

The sampling process

As Figure 8.1 shows, the sampling process comprises of several stages. The first stage is defining the target population. A population can be defined as all people or items ( unit of analysis ) with the characteristics that one wishes to study. The unit of analysis may be a person, group, organisation, country, object, or any other entity that you wish to draw scientific inferences about. Sometimes the population is obvious. For example, if a manufacturer wants to determine whether finished goods manufactured at a production line meet certain quality requirements or must be scrapped and reworked, then the population consists of the entire set of finished goods manufactured at that production facility. At other times, the target population may be a little harder to understand. If you wish to identify the primary drivers of academic learning among high school students, then what is your target population: high school students, their teachers, school principals, or parents? The right answer in this case is high school students, because you are interested in their performance, not the performance of their teachers, parents, or schools. Likewise, if you wish to analyse the behaviour of roulette wheels to identify biased wheels, your population of interest is not different observations from a single roulette wheel, but different roulette wheels (i.e., their behaviour over an infinite set of wheels).

The sampling process

The second step in the sampling process is to choose a sampling frame . This is an accessible section of the target population—usually a list with contact information—from where a sample can be drawn. If your target population is professional employees at work, because you cannot access all professional employees around the world, a more realistic sampling frame will be employee lists of one or two local companies that are willing to participate in your study. If your target population is organisations, then the Fortune 500 list of firms or the Standard & Poor’s (S&P) list of firms registered with the New York Stock exchange may be acceptable sampling frames.

Note that sampling frames may not entirely be representative of the population at large, and if so, inferences derived by such a sample may not be generalisable to the population. For instance, if your target population is organisational employees at large (e.g., you wish to study employee self-esteem in this population) and your sampling frame is employees at automotive companies in the American Midwest, findings from such groups may not even be generalisable to the American workforce at large, let alone the global workplace. This is because the American auto industry has been under severe competitive pressures for the last 50 years and has seen numerous episodes of reorganisation and downsizing, possibly resulting in low employee morale and self-esteem. Furthermore, the majority of the American workforce is employed in service industries or in small businesses, and not in automotive industry. Hence, a sample of American auto industry employees is not particularly representative of the American workforce. Likewise, the Fortune 500 list includes the 500 largest American enterprises, which is not representative of all American firms, most of which are medium or small sized firms rather than large firms, and is therefore, a biased sampling frame. In contrast, the S&P list will allow you to select large, medium, and/or small companies, depending on whether you use the S&P LargeCap, MidCap, or SmallCap lists, but includes publicly traded firms (and not private firms) and is hence still biased. Also note that the population from which a sample is drawn may not necessarily be the same as the population about which we actually want information. For example, if a researcher wants to examine the success rate of a new ‘quit smoking’ program, then the target population is the universe of smokers who had access to this program, which may be an unknown population. Hence, the researcher may sample patients arriving at a local medical facility for smoking cessation treatment, some of whom may not have had exposure to this particular ‘quit smoking’ program, in which case, the sampling frame does not correspond to the population of interest.

The last step in sampling is choosing a sample from the sampling frame using a well-defined sampling technique. Sampling techniques can be grouped into two broad categories: probability (random) sampling and non-probability sampling. Probability sampling is ideal if generalisability of results is important for your study, but there may be unique circumstances where non-probability sampling can also be justified. These techniques are discussed in the next two sections.

Probability sampling

Probability sampling is a technique in which every unit in the population has a chance (non-zero probability) of being selected in the sample, and this chance can be accurately determined. Sample statistics thus produced, such as sample mean or standard deviation, are unbiased estimates of population parameters, as long as the sampled units are weighted according to their probability of selection. All probability sampling have two attributes in common: every unit in the population has a known non-zero probability of being sampled, and the sampling procedure involves random selection at some point. The different types of probability sampling techniques include:

n

Stratified sampling. In stratified sampling, the sampling frame is divided into homogeneous and non-overlapping subgroups (called ‘strata’), and a simple random sample is drawn within each subgroup. In the previous example of selecting 200 firms from a list of 1,000 firms, you can start by categorising the firms based on their size as large (more than 500 employees), medium (between 50 and 500 employees), and small (less than 50 employees). You can then randomly select 67 firms from each subgroup to make up your sample of 200 firms. However, since there are many more small firms in a sampling frame than large firms, having an equal number of small, medium, and large firms will make the sample less representative of the population (i.e., biased in favour of large firms that are fewer in number in the target population). This is called non-proportional stratified sampling because the proportion of the sample within each subgroup does not reflect the proportions in the sampling frame—or the population of interest—and the smaller subgroup (large-sized firms) is oversampled . An alternative technique will be to select subgroup samples in proportion to their size in the population. For instance, if there are 100 large firms, 300 mid-sized firms, and 600 small firms, you can sample 20 firms from the ‘large’ group, 60 from the ‘medium’ group and 120 from the ‘small’ group. In this case, the proportional distribution of firms in the population is retained in the sample, and hence this technique is called proportional stratified sampling. Note that the non-proportional approach is particularly effective in representing small subgroups, such as large-sized firms, and is not necessarily less representative of the population compared to the proportional approach, as long as the findings of the non-proportional approach are weighted in accordance to a subgroup’s proportion in the overall population.

Cluster sampling. If you have a population dispersed over a wide geographic region, it may not be feasible to conduct a simple random sampling of the entire population. In such case, it may be reasonable to divide the population into ‘clusters’—usually along geographic boundaries—randomly sample a few clusters, and measure all units within that cluster. For instance, if you wish to sample city governments in the state of New York, rather than travel all over the state to interview key city officials (as you may have to do with a simple random sample), you can cluster these governments based on their counties, randomly select a set of three counties, and then interview officials from every office in those counties. However, depending on between-cluster differences, the variability of sample estimates in a cluster sample will generally be higher than that of a simple random sample, and hence the results are less generalisable to the population than those obtained from simple random samples.

Matched-pairs sampling. Sometimes, researchers may want to compare two subgroups within one population based on a specific criterion. For instance, why are some firms consistently more profitable than other firms? To conduct such a study, you would have to categorise a sampling frame of firms into ‘high profitable’ firms and ‘low profitable firms’ based on gross margins, earnings per share, or some other measure of profitability. You would then select a simple random sample of firms in one subgroup, and match each firm in this group with a firm in the second subgroup, based on its size, industry segment, and/or other matching criteria. Now, you have two matched samples of high-profitability and low-profitability firms that you can study in greater detail. Matched-pairs sampling techniques are often an ideal way of understanding bipolar differences between different subgroups within a given population.

Multi-stage sampling. The probability sampling techniques described previously are all examples of single-stage sampling techniques. Depending on your sampling needs, you may combine these single-stage techniques to conduct multi-stage sampling. For instance, you can stratify a list of businesses based on firm size, and then conduct systematic sampling within each stratum. This is a two-stage combination of stratified and systematic sampling. Likewise, you can start with a cluster of school districts in the state of New York, and within each cluster, select a simple random sample of schools. Within each school, you can select a simple random sample of grade levels, and within each grade level, you can select a simple random sample of students for study. In this case, you have a four-stage sampling process consisting of cluster and simple random sampling.

Non-probability sampling

Non-probability sampling is a sampling technique in which some units of the population have zero chance of selection or where the probability of selection cannot be accurately determined. Typically, units are selected based on certain non-random criteria, such as quota or convenience. Because selection is non-random, non-probability sampling does not allow the estimation of sampling errors, and may be subjected to a sampling bias. Therefore, information from a sample cannot be generalised back to the population. Types of non-probability sampling techniques include:

Convenience sampling. Also called accidental or opportunity sampling, this is a technique in which a sample is drawn from that part of the population that is close to hand, readily available, or convenient. For instance, if you stand outside a shopping centre and hand out questionnaire surveys to people or interview them as they walk in, the sample of respondents you will obtain will be a convenience sample. This is a non-probability sample because you are systematically excluding all people who shop at other shopping centres. The opinions that you would get from your chosen sample may reflect the unique characteristics of this shopping centre such as the nature of its stores (e.g., high end-stores will attract a more affluent demographic), the demographic profile of its patrons, or its location (e.g., a shopping centre close to a university will attract primarily university students with unique purchasing habits), and therefore may not be representative of the opinions of the shopper population at large. Hence, the scientific generalisability of such observations will be very limited. Other examples of convenience sampling are sampling students registered in a certain class or sampling patients arriving at a certain medical clinic. This type of sampling is most useful for pilot testing, where the goal is instrument testing or measurement validation rather than obtaining generalisable inferences.

Quota sampling. In this technique, the population is segmented into mutually exclusive subgroups (just as in stratified sampling), and then a non-random set of observations is chosen from each subgroup to meet a predefined quota. In proportional quota sampling , the proportion of respondents in each subgroup should match that of the population. For instance, if the American population consists of 70 per cent Caucasians, 15 per cent Hispanic-Americans, and 13 per cent African-Americans, and you wish to understand their voting preferences in an sample of 98 people, you can stand outside a shopping centre and ask people their voting preferences. But you will have to stop asking Hispanic-looking people when you have 15 responses from that subgroup (or African-Americans when you have 13 responses) even as you continue sampling other ethnic groups, so that the ethnic composition of your sample matches that of the general American population.

Non-proportional quota sampling is less restrictive in that you do not have to achieve a proportional representation, but perhaps meet a minimum size in each subgroup. In this case, you may decide to have 50 respondents from each of the three ethnic subgroups (Caucasians, Hispanic-Americans, and African-Americans), and stop when your quota for each subgroup is reached. Neither type of quota sampling will be representative of the American population, since depending on whether your study was conducted in a shopping centre in New York or Kansas, your results may be entirely different. The non-proportional technique is even less representative of the population, but may be useful in that it allows capturing the opinions of small and under-represented groups through oversampling.

Expert sampling. This is a technique where respondents are chosen in a non-random manner based on their expertise on the phenomenon being studied. For instance, in order to understand the impacts of a new governmental policy such as the Sarbanes-Oxley Act, you can sample a group of corporate accountants who are familiar with this Act. The advantage of this approach is that since experts tend to be more familiar with the subject matter than non-experts, opinions from a sample of experts are more credible than a sample that includes both experts and non-experts, although the findings are still not generalisable to the overall population at large.

Snowball sampling. In snowball sampling, you start by identifying a few respondents that match the criteria for inclusion in your study, and then ask them to recommend others they know who also meet your selection criteria. For instance, if you wish to survey computer network administrators and you know of only one or two such people, you can start with them and ask them to recommend others who also work in network administration. Although this method hardly leads to representative samples, it may sometimes be the only way to reach hard-to-reach populations or when no sampling frame is available.

Statistics of sampling

In the preceding sections, we introduced terms such as population parameter, sample statistic, and sampling bias. In this section, we will try to understand what these terms mean and how they are related to each other.

When you measure a certain observation from a given unit, such as a person’s response to a Likert-scaled item, that observation is called a response (see Figure 8.2). In other words, a response is a measurement value provided by a sampled unit. Each respondent will give you different responses to different items in an instrument. Responses from different respondents to the same item or observation can be graphed into a frequency distribution based on their frequency of occurrences. For a large number of responses in a sample, this frequency distribution tends to resemble a bell-shaped curve called a normal distribution , which can be used to estimate overall characteristics of the entire sample, such as sample mean (average of all observations in a sample) or standard deviation (variability or spread of observations in a sample). These sample estimates are called sample statistics (a ‘statistic’ is a value that is estimated from observed data). Populations also have means and standard deviations that could be obtained if we could sample the entire population. However, since the entire population can never be sampled, population characteristics are always unknown, and are called population parameters (and not ‘statistic’ because they are not statistically estimated from data). Sample statistics may differ from population parameters if the sample is not perfectly representative of the population. The difference between the two is called sampling error . Theoretically, if we could gradually increase the sample size so that the sample approaches closer and closer to the population, then sampling error will decrease and a sample statistic will increasingly approximate the corresponding population parameter.

If a sample is truly representative of the population, then the estimated sample statistics should be identical to the corresponding theoretical population parameters. How do we know if the sample statistics are at least reasonably close to the population parameters? Here, we need to understand the concept of a sampling distribution . Imagine that you took three different random samples from a given population, as shown in Figure 8.3, and for each sample, you derived sample statistics such as sample mean and standard deviation. If each random sample was truly representative of the population, then your three sample means from the three random samples will be identical—and equal to the population parameter—and the variability in sample means will be zero. But this is extremely unlikely, given that each random sample will likely constitute a different subset of the population, and hence, their means may be slightly different from each other. However, you can take these three sample means and plot a frequency histogram of sample means. If the number of such samples increases from three to 10 to 100, the frequency histogram becomes a sampling distribution. Hence, a sampling distribution is a frequency distribution of a sample statistic (like sample mean) from a set of samples , while the commonly referenced frequency distribution is the distribution of a response (observation) from a single sample . Just like a frequency distribution, the sampling distribution will also tend to have more sample statistics clustered around the mean (which presumably is an estimate of a population parameter), with fewer values scattered around the mean. With an infinitely large number of samples, this distribution will approach a normal distribution. The variability or spread of a sample statistic in a sampling distribution (i.e., the standard deviation of a sampling statistic) is called its standard error . In contrast, the term standard deviation is reserved for variability of an observed response from a single sample.

Sample statistic

Social Science Research: Principles, Methods and Practices (Revised edition) Copyright © 2019 by Anol Bhattacherjee is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

  • Skip to main content
  • Keyboard shortcuts for audio player

Shots - Health News

  • Your Health
  • Treatments & Tests
  • Health Inc.
  • Public Health

How to Thrive as You Age

A cheap drug may slow down aging. a study will determine if it works.

Allison Aubrey - 2015 square

Allison Aubrey

Can a pill slow down aging?

A drug taken by millions of people to control diabetes may do more than lower blood sugar.

Research suggests metformin has anti-inflammatory effects that could help protect against common age-related diseases including heart disease, cancer, and cognitive decline.

Scientists who study the biology of aging have designed a clinical study, known as The TAME Trial, to test whether metformin can help prevent these diseases and promote a longer healthspan in healthy, older adults.

Michael Cantor, an attorney, and his wife Shari Cantor , the mayor of West Hartford, Connecticut both take metformin. "I tell all my friends about it," Michael Cantor says. "We all want to live a little longer, high-quality life if we can," he says.

Michael Cantor started on metformin about a decade ago when his weight and blood sugar were creeping up. Shari Cantor began taking metformin during the pandemic after she read that it may help protect against serious infections.

research type sampling

Shari and Michael Cantor both take metformin. They are both in their mid-60s and say they feel healthy and full of energy. Theresa Oberst/Michael Cantor hide caption

Shari and Michael Cantor both take metformin. They are both in their mid-60s and say they feel healthy and full of energy.

The Cantors are in their mid-60s and both say they feel healthy and have lots of energy. Both noticed improvements in their digestive systems – feeling more "regular" after they started on the drug,

Metformin costs less than a dollar a day, and depending on insurance, many people pay no out-of-pocket costs for the drug.

"I don't know if metformin increases lifespan in people, but the evidence that exists suggests that it very well might," says Steven Austad , a senior scientific advisor at the American Federation for Aging Research who studies the biology of aging.

An old drug with surprising benefits

Metformin was first used to treat diabetes in the 1950s in France. The drug is a derivative of guanidine , a compound found in Goat's Rue, an herbal medicine long used in Europe.

The FDA approved metformin for the treatment of type 2 diabetes in the U.S. in the 1990s. Since then, researchers have documented several surprises, including a reduced risk of cancer. "That was a bit of a shock," Austad says. A meta-analysis that included data from dozens of studies, found people who took metformin had a lower risk of several types of cancers , including gastrointestinal, urologic and blood cancers.

Austad also points to a British study that found a lower risk of dementia and mild cognitive decline among people with type 2 diabetes taking metformin. In addition, there's research pointing to improved cardiovascular outcomes in people who take metformin including a reduced risk of cardiovascular death .

As promising as this sounds, Austad says most of the evidence is observational, pointing only to an association between metformin and the reduced risk. The evidence stops short of proving cause and effect. Also, it's unknown if the benefits documented in people with diabetes will also reduce the risk of age-related diseases in healthy, older adults.

"That's what we need to figure out," says Steve Kritchevsky , a professor of gerontology at Wake Forest School of Medicine, who is a lead investigator for the Tame Trial.

The goal is to better understand the mechanisms and pathways by which metformin works in the body. For instance, researchers are looking at how the drug may help improve energy in the cells by stimulating autophagy, which is the process of clearing out or recycling damaged bits inside cells.

Scientists can tell how fast you're aging. Now, the trick is to slow it down

Shots - Health News

Scientists can tell how fast you're aging. now, the trick is to slow it down.

You can order a test to find out your biological age. Is it worth it?

You can order a test to find out your biological age. Is it worth it?

Researchers also want to know more about how metformin can help reduce inflammation and oxidative stress, which may slow biological aging.

"When there's an excess of oxidative stress, it will damage the cell. And that accumulation of damage is essentially what aging is," Kritchevsky explains.

When the forces that are damaging cells are running faster than the forces that are repairing or replacing cells, that's aging, Kritchevsky says. And it's possible that drugs like metformin could slow this process down.

By targeting the biology of aging, the hope is to prevent or delay multiple diseases, says Dr. Nir Barzilai of Albert Einstein College of Medicine, who leads the effort to get the trial started.

The ultimate in preventative medicine

Back in 2015, Austad and a bunch of aging researchers began pushing for a clinical trial.

"A bunch of us went to the FDA to ask them to approve a trial for metformin,' Austad recalls, and the agency was receptive. "If you could help prevent multiple problems at the same time, like we think metformin may do, then that's almost the ultimate in preventative medicine," Austad says.

The aim is to enroll 3,000 people between the ages of 65 and 79 for a six-year trial. But Dr. Barzilai says it's been slow going to get it funded. "The main obstacle with funding this study is that metformin is a generic drug, so no pharmaceutical company is standing to make money," he says.

Barzilai has turned to philanthropists and foundations, and has some pledges. The National Institute on Aging, part of the National Institutes of Health, set aside about $5 million for the research, but that's not enough to pay for the study which is estimated to cost between $45 and $70 million.

The frustration over the lack of funding is that if the trial points to protective effects, millions of people could benefit. "It's something that everybody will be able to afford," Barzilai says.

Currently the FDA doesn't recognize aging as a disease to treat, but the researchers hope this would usher in a paradigm shift — from treating each age-related medical condition separately, to treating these conditions together, by targeting aging itself.

For now, metformin is only approved to treat type 2 diabetes in the U.S., but doctors can prescribe it off-label for conditions other than its approved use .

Michael and Shari Cantor's doctors were comfortable prescribing it to them, given the drug's long history of safety and the possible benefits in delaying age-related disease.

"I walk a lot, I hike, and at 65 I have a lot of energy," Michael Cantor says. I feel like the metformin helps," he says. He and Shari say they have not experienced any negative side effects.

Research shows a small percentage of people who take metformin experience GI distress that makes the drug intolerable. And, some people develop a b12 vitamin deficiency. One study found people over the age of 65 who take metformin may have a harder time building new muscle.

Millions of women are 'under-muscled.' These foods help build strength

Millions of women are 'under-muscled.' These foods help build strength

"There's some evidence that people who exercise who are on metformin have less gain in muscle mass, says Dr. Eric Verdin , President of the Buck Institute for Research on Aging. That could be a concern for people who are under-muscled .

But Verdin says it may be possible to repurpose metformin in other ways "There are a number of companies that are exploring metformin in combination with other drugs," he says. He points to research underway to combine metformin with a drug called galantamine for the treatment of sarcopenia , which is the medical term for age-related muscle loss. Sarcopenia affects millions of older people, especially women .

The science of testing drugs to target aging is rapidly advancing, and metformin isn't the only medicine that may treat the underlying biology.

"Nobody thinks this is the be all and end all of drugs that target aging," Austad says. He says data from the clinical trial could stimulate investment by the big pharmaceutical companies in this area. "They may come up with much better drugs," he says.

Michael Cantor knows there's no guarantee with metformin. "Maybe it doesn't do what we think it does in terms of longevity, but it's certainly not going to do me any harm," he says.

Cantor's father had his first heart attack at 51. He says he wants to do all he can to prevent disease and live a healthy life, and he thinks Metformin is one tool that may help.

For now, Dr. Barzilai says the metformin clinical trial can get underway when the money comes in.

7 habits to live a healthier life, inspired by the world's longest-lived communities

7 habits to live a healthier life, inspired by the world's longest-lived communities

This story was edited by Jane Greenhalgh

Align Your Steps: Optimizing Sampling Schedules in Diffusion Models

Diffusion models (DMs) have established themselves as the state-of-the-art generative modeling approach in the visual domain and beyond. A crucial drawback of DMs is their slow sampling speed, relying on many sequential function evaluations through large neural networks. Sampling from DMs can be seen as solving a differential equation through a discretized set of noise levels known as the sampling schedule. While past works primarily focused on deriving efficient solvers, little attention has been given to finding optimal sampling schedules, and the entire literature relies on hand-crafted heuristics. In this work, for the first time, we propose Align Your Steps, a general and principled approach to optimizing the sampling schedules of DMs for high-quality outputs. We leverage methods from stochastic calculus and find optimal schedules specific to different solvers, trained DMs and datasets. We evaluate our novel approach on several image, video as well as 2D toy data synthesis benchmarks, using a variety of different solvers, and observe that our optimized schedules outperform previous handcrafted schedules in almost all experiments. Our method demonstrates the untapped potential of sampling schedule optimization, especially in the few-step synthesis regime.

Our optimized schedules can be used at inference time in a plug-and-play fashion. Please see our quickstart guide to get started with using our schedules in diffusers and the colab notebook for example code with Stable Diffusion 1.5 and SDXL.

research type sampling

The sampling schedule is iteratively optimized to reduce the discretization error. As the optimization proceeds, the generated images become sharper and more detailed.

Optimizing Sampling Schedules in Diffusion Models

Diffusion models (DMs) have proven themselves to be extremely reliable probabilistic generative models that can produce high-quality data. They have been successfully applied to applications such as image synthesis, image super-resolution, image-to-image translation, image editing, inpainting, video synthesis, text-to-3d generation, and even planning. However, sampling from DMs corresponds to solving a generative Stochastic or Ordinary Differential Equation (SDE/ODE) in reverse time and requires multiple sequential forward passes through a large neural network, limiting their real-time applicability.

Solving SDE/ODEs within the interval \([t_{min}, t_{max}]\) works by discretizing it into \(n\) smaller sub-intervals \(t_{min} = t_0 < t_1 < \dots < t_{n}=t_{max}\) called a sampling schedule, and numerically solving the differential equation between consecutive \(t_i\) values. Currently, most prior works adopt one of a handful of heuristic schedules, such as simple polynomials and cosine functions, and little effort has gone into optimizing this schedule. We attempt to fill this gap by introducing a principled approach for optimizing the schedule in a dataset and model specific manner, resulting in improved outputs given the same compute budget.

Assuming that \( P_{true} \) represents the distribution of running the reverse-time SDE (defined by the learnt model) exactly, and \( P_{disc} \) represents the distribution of solving it with Stochastic-DDIM and a sampling schedule, using the Girsanov theorem an upper bound can be derived for the Kullback-Leibler divergence between these two distributions (simplified; see paper for details) \[ D_{KL}(P_{true} || P_{disc}) \leq \underbrace{ \sum_{i=1}^{n} \int_{t_{i-1}}^{t_{i}} \frac{1}{t^3} \mathbb{E}_{x_t \sim p_t, x_{t_i} \sim p_{t_i | t}} || D_{\theta}(x_t, t) - D_{\theta}(x_{t_i}, t_i) ||_2^2 \ dt }_{= KLUB(t_0, t_1, \dots, t_n)} + constant \] A similar Kullback-Leibler Upper Bound (KLUB) can be found for other stochastic SDE solvers. Given this, we formulate the problem of optimizing the sampling schedule as minimizing the KLUB term with respect to its time discretization, i.e. the sampling scheduling. Monte-Carlo integration with importance sampling is used to estimate the expectation values and the schedule is optimized iteratively. We showcase the benefits of optimizing schedules on a 2D toy distribution (see visualization below).

research type sampling

Modeling a 2D toy distribution: Samples in (b), (c), and (d) are generated using 8 steps of SDE-DPM-Solver++(2M) with EDM, LogSNR, and AYS schedules, respectively. Each image consists of 100,000 sampled points.

Experimental Results

To evaluate the usefulness of optimized schedules, we performed rigorous quantitative experiments on standard image generation benchmarks (CIFAR10, FFHQ, ImageNet), and found that these schedules result in consistent improvements across the board in image quality (measured by FID) for a large variety of popular samplers. We also performed a user study for text-to-image models (specifically Stable Diffusion 1.5), and found that on average images generated with these schedules are preferred twice as much . Please see the paper for these results and evaluations.

Below, we showcase some text-to-image examples that illustrate how using an optimized schedule can generate images with more visual details and better text-alignment given the same number of forward evaluations (NFEs). We provide side-by-side comparisons between our optimized schedules against two of the most popular schedules used in practice (EDM and Time-Uniform). All images are generated with a stochastic ( casino ) or deterministic ( lock ) version of DPM-Solver++(2M) with 10 steps. Hover over the images for zoom-ins.

Stable Diffusion 1.5

research type sampling

casino Text prompt: "1girl, blue dress, blue hair, ponytail, studying at the library, focused" Model: Dreamshaper 8

research type sampling

casino Text prompt: "An enchanting forest path with sunlight filtering through the dense canopy, highlighting the vibrant greens and the soft, mossy floor"

research type sampling

casino Text prompt: "A digital Illustration of the Babel tower, 4k, detailed, trending in artstation, fantasy vivid colors"

research type sampling

casino Text prompt: "A glass-blown vase with a complex swirl of colors, illuminated by sunlight, casting a mosaic of shadows on a white table"

research type sampling

casino Text prompt: "A delicate glass pendant holding a single, luminous firefly, its glow casting warm, dancing shadows on the wearer's neck"

research type sampling

casino Text prompt: "A wise old owl wearing a velvet smoking jacket and spectacles, with a pipe in its beak, seated in a vintage leather armchair"

research type sampling

casino Text prompt: "A close-up portrait of a baby wearing a tiny spider-man costume, trending on artstation" Model: Dreamshaper 8

DeepFloyd-IF

research type sampling

casino Text prompt: "Capybara podcast neon sign"

research type sampling

casino Text prompt: "Long-exposure night photography of a starry sky over a mountain range, with light trails"

research type sampling

casino Text prompt: "A tranquil village nestled in a lush valley, with small, cozy houses dotting the landscape, surrounded by towering, snow-capped mountains under a clear blue sky. A gentle river meanders through the village, reflecting the warm glow of the sunrise"

research type sampling

casino Text prompt: "An ancient library buried beneath the earth, its halls lit by glowing crystals, with scrolls and tomes stacked in endless rows"

research type sampling

casino Text prompt: "A bustling spaceport on a distant planet, with ships of various designs taking off against a backdrop of twin moons"

research type sampling

casino Text prompt: "A set of ancient armor, standing as if worn by an invisible warrior, in front of a backdrop of medieval banners and weaponry."

research type sampling

casino Text prompt: "An elephant painting a colorful abstract masterpiece with its trunk, in a studio surrounded by amused onlookers."

research type sampling

casino Text prompt: "Tiger in construction gear, perched on aged wooden docks, formidable, curious, tiger on the waterfront, textured, vibrant, atmospheric, sharp focus, lifelike, professional lighting, cinematic, 8K"

research type sampling

casino Text prompt: "Cluttered house in the woods, anime, oil painting, high resolution, cottagecore, ghibli inspired, 4k"

research type sampling

casino Text prompt: "An old, creepy dollhouse in a dusty attic, with dolls posed in unsettling positions. Cobwebs, dim lighting, and the shadows of unseen presences create a chilling scene"

research type sampling

lock Text prompt: "A stunning, intricately detailed painting of a sunset in a forest valley, blending the rich, symmetrical styles of Dan Mumford and Marc Simonetti with astrophotography elements"

research type sampling

lock Text prompt: "Create a photorealistic scene of a powerful storm with swirling, dark clouds and fierce winds approaching a coastal village. Show villagers preparing for the storm, with detailed architecture reflecting a fantasy world"

research type sampling

lock Text prompt: "Cyberpunk cityscape with towering skyscrapers, neon signs, and flying cars"

Stable Video Diffusion

We also studied the effect of optimized schedules in video generation using the open-source image-to-video model Stable Video Diffusion. We find that using optimized schedules leads to more stable videos with less color distortions as the video progresses. Below we show side-by-side comparisons of videos generated with 10 DDIM steps using the two different schedules.

research type sampling

Amirmojtaba Sabour, Sanja Fidler, Karsten Kreis

Numbers, Facts and Trends Shaping Your World

Read our research on:

Full Topic List

Regions & Countries

  • Publications
  • Our Methods
  • Short Reads
  • Tools & Resources

Read Our Research On:

Methodology: 2023 focus groups of Asian Americans

Methodology: 2022-23 survey of asian americans, table of contents.

  • About the focus groups
  • Participant recruitment procedures
  • Moderator and interpreter qualification
  • Data analysis
  • Sample design
  • Data collection
  • Weighting and variance estimation
  • Analysis of Asians living in poverty
  • Acknowledgments

Table showing survey of Asian American adults margins of sampling error

The survey analysis is drawn from a national cross-sectional survey conducted for Pew Research Center by Westat. The sampling design of the survey was an address-based sampling (ABS) approach, supplemented by list samples, to reach a nationally representative group of respondents. The survey was fielded July 5, 2022, through Jan. 27, 2023. Self-administered screening interviews were conducted with a total of 36,469 U.S. adults either online or by mail, resulting in 7,006 interviews with Asian American adults. It is these 7,006 Asian Americans who are the focus of this report. After accounting for the complex sample design and loss of precision due to weighting, the margin of sampling error for these respondents is plus or minus 2.1 percentage points at the 95% level of confidence.

The survey was administered in two stages. In the first stage, a short screening survey was administered to a national sample of U.S. adults to collect basic demographics and determine a respondent’s eligibility for the extended survey of Asian Americans. Screener respondents were considered eligible for the extended survey if they self-identified as Asian (alone or in combination with any other race or ethnicity). Note that all individuals who self-identified as Asian were asked to complete the extended survey.

To maintain consistency with the Census Bureau’s definition of “Asian,” individuals responding as Asian but who self-identified with origins that did not meet the bureau’s official standards prior to the 2020 decennial census were considered ineligible and were not asked to complete the extended survey or were removed from the final sample. Those excluded were people solely of Southwest Asian descent (e.g., Lebanese, Saudi), those with Central Asian origins (e.g., Afghan, Uzbek) as well as various other non-Asian origins. The impact of excluding these groups is small, as together they represent about 1%-2% of the national U.S. Asian population, according to a Pew Research Center analysis of the 2021 American Community Survey.

Eligible survey respondents were asked in the extended survey how they identified ethnically (for example: Chinese, Filipino, Indian, Korean, Vietnamese, or some other ethnicity with a write-in option). Note that survey respondents were asked about their ethnicity rather than nationality. For example, those classified as Chinese in the survey are those self-identifying as of Chinese ethnicity, rather than necessarily being a citizen or former citizen of the People’s Republic of China. Since this is an ethnicity, classification of survey respondents as Chinese also includes those who are Taiwanese.

The research plan for this project was submitted to Westat’s institutional review board (IRB), which is an independent committee of experts that specializes in helping to protect the rights of research participants. Due to the minimal risks associated with this questionnaire content and the population of interest, this research underwent an expedited review and received approval (approval #FWA 00005551).

Throughout this methodology statement, the terms “extended survey” and “extended questionnaire” refer to the extended survey of Asian Americans that is the focus of this report, and “eligible adults” and “eligible respondents” refer to those individuals who met its eligibility criteria, unless otherwise noted.

The survey had a complex sample design constructed to maximize efficiency in reaching Asian American adults while also supporting reliable, national estimates for the population as a whole and for the five largest ethnic groups (Chinese, Filipino, Indian, Korean and Vietnamese). Asian American adults include those who self-identify as Asian, either alone or in combination with other races or Hispanic identity.

The main sample frame of the 2022-2023 Asian American Survey is an address-based sample (ABS). The ABS frame of addresses was derived from the USPS Computerized Delivery Sequence file. It is maintained by Marketing Systems Group (MSG) and is updated monthly. MSG geocodes their entire ABS frame, so block, block group, and census tract characteristics from the decennial census and the American Community Survey (ACS) could be appended to addresses and used for sampling and data collection.

All addresses on the ABS frame were geocoded to a census tract. Census tracts were then grouped into three strata based on the density of Asian American adults, defined as the proportion of Asian American adults among all adults in the tract. The three strata were defined as:

  • High density: Tracts with an Asian American adult density of 10% or higher
  • Medium density: Tracts with a density 3% to less than 10%
  • Low density: Tracts with a density less than 3%

Mailing addresses in census tracts from the lowest density stratum, strata 3, were excluded from the sampling frame. As a result, the frame excluded 54.1% of the 2020 census tracts, 49.1% of the U.S. adult population, including 9.1% of adults who self-identified as Asian alone or in combination with other races or Hispanic ethnicity. For the largest five Asian ethnic subgroups, Filipinos had the largest percentage of excluded adults, with 6.8%, while Indians had the lowest with 4.2% of the adults. Addresses were then sampled from the two remaining strata. This stratification and the assignment of differential sampling rates to the strata were critical design components because of the rareness of the Asian American adult population.

Despite oversampling of the high- and medium-density Asian American strata in the ABS sample, the ABS sample was not expected to efficiently yield the required number of completed interviews for some ethnic subgroups. Therefore, the ABS sample was supplemented with samples from the specialized surname list frames maintained by the MSG. These list frames identify households using commercial databases linked to addresses and telephone numbers. The individuals’ surnames in these lists could be classified by likely ethnic origin. Westat requested MSG to produce five list frames: Chinese, Filipino, Indian, Korean and Vietnamese. The lists were subset to include only cases with a mailing address. Addresses sampled from the lists, unlike those sampled from the ABS frame, were not limited to high- and medium-density census tracts.

Once an address was sampled from either the ABS frame or the surname lists, an invitation was mailed to the address. The invitation requested that the adult in the household with the next birthday complete the survey.

To maximize response, the survey used a sequential mixed-mode protocol in which sampled households were first directed to respond online and later mailed a paper version of the questionnaire if they did not respond online.

Table showing sample allocation and Asian American incidence by sampling frame

The first mailing was a letter introducing the survey and providing the information necessary (URL and unique PIN) for online response. A pre-incentive of $2 was included in the mailing. This and remaining screener recruitment letters focused on the screener survey, without mentioning the possibility of eligibility for a longer survey and associated promised incentive, since most people would only be asked to complete the short screening survey. It was important for all households to complete the screening survey, not just those who identify as Asian American. As such, the invitation did not mention that the extended survey would focus on topics surrounding the Asian American experience. The invitation was generic to minimize the risk of nonresponse bias due to topic salience bias.

After one week, Westat sent a postcard reminder to all sampled individuals, followed three weeks later by a reminder letter to nonrespondents. Approximately 8.5 weeks after the initial mailing, Westat sent nonrespondents a paper version screening survey, which was a four-page booklet (one folded 11×17 paper) and a postage-paid return envelope in addition to the cover letter. If no response was obtained from those four mailings, no further contact was made.

Eligible adults who completed the screening interview on the web were immediately asked to continue with the extended questionnaire. If an eligible adult completed the screener online but did not complete the extended interview, Westat sent them a reminder letter. This was performed on a rolling basis when it had been at least one week since the web breakoff. Names were not collected until the end of the web survey, so these letters were addressed to “Recent Participant.”

If an eligible respondent completed a paper screener, Westat mailed them the extended survey and a postage-paid return envelope. This was sent weekly as completed paper screeners arrived. Westat followed these paper mailings with a reminder postcard. Later, Westat sent a final paper version via FedEx to eligible adults who had not completed the extended interview online or by paper.

A pre-incentive of $2 (in the form of two $1 bills) was sent to all sampled addresses with the first letter, which provided information about how to complete the survey online. This and subsequent screener invitations only referred to the pre-incentive without reference to the possibility of later promised incentives.

Respondents who completed the screening survey and were found eligible were offered a promised incentive of $10 to go on and complete the extended survey. All participants who completed the extended web survey were offered their choice of a $10 Amazon.com gift code instantly or $10 cash mailed. All participants who completed the survey via paper were mailed a $10 cash incentive.

In December 2022 a mailing was added for eligible respondents who had completed a screener questionnaire, either by web or paper but who had not yet completed the extended survey. It was sent to those who had received their last mailing in the standard sequence at least four weeks earlier. It included a cover letter, a paper copy of the extended survey, and a business reply envelope, and was assembled in a 9×12 envelope with a $1 bill made visible through the envelope window.

In the last month of data collection, an additional mailing was added to boost the number of Vietnamese respondents. A random sample of 4,000 addresses from the Vietnamese surname list and 2,000 addresses from the ABS frame who were flagged as likely Vietnamese were sent another copy of the first invitation letter, which contained web login credentials but no paper copy of the screener. This was sent in a No. 10 envelope with a wide window and was assembled with a $1 bill visible through the envelope window.

The mail and web screening and extended surveys were developed in English and translated into Chinese (Simplified and Traditional), Hindi, Korean, Tagalog and Vietnamese. For web, the landing page was displayed in English initially but included banners at the top and bottom of the page that allowed respondents to change the displayed language. Once in the survey, a dropdown button at the top of each page was available to respondents to toggle between languages.

The paper surveys were also formatted into all six languages. Recipients thought to be more likely to use a specific language option, based on supplemental information in the sampling frame or their address location, were sent a paper screener in that language in addition to an English screener questionnaire. Those receiving a paper extended instrument were sent the extended survey in the language in which the screener was completed. For web, respondents continued in their selected language from the screener.

Household-level weighting

The first step in weighting was creating a base weight for each sampled mailing address to account for its probability of selection into the sample. The base weight for mailing address k is called BW k and is defined as the inverse of its probability of selection. The ABS sample addresses had a probability of selection based on the stratum from which they were sampled. The supplemental samples (i.e., Chinese, Filipino, Indian, Korean and Vietnamese surname lists) also had a probability of selection from the list frames. Because all of the addresses in the list frames are also included in the ABS frame, these addresses had multiple opportunities for these addresses to be selected, and the base weights include an adjustment to account for their higher probability of selection.

Each sampled mailing address was assigned to one of four categories according to its final screener disposition. The categories were 1) household with a completed screener interview, 2) household with an incomplete screener interview, 3) ineligible (i.e., not a household, which were primarily postmaster returns), and 4) addresses for which status was unknown (i.e., addresses that were not identified as undeliverable by the USPS but from which no survey response was received).

The second step in the weighting process was adjusting the base weight to account for occupied households among those with unknown eligibility (category 4). Previous ABS studies have found that about 13% of all addresses in the ABS frame were either vacant or not home to anyone in the civilian, non-institutionalized adult population. For this survey, it was assumed that 87% of all sampled addresses from the ABS frame were eligible households. However, this value was not appropriate for the addresses sampled from the list frames, which were expected to have a higher proportion of households as these were maintained lists. For the list samples, the occupied household rate was computed as the proportion of list cases in category 3 compared to all resolved list cases (i.e., the sum of categories 1 through 3). The base weights for the share of category 4 addresses (unknown eligibility) assumed to be eligible were then allocated to cases in categories 1 and 2 (known households) so that the sum of the combined category 1 and 2 base weights equaled the number of addresses assumed to be eligible in each frame. The category 3 ineligible addresses were given a weight of zero.

The next step was adjusting for nonresponse for households without a completed screener interview to create a final household weight. This adjustment allocated the weights of nonrespondents (category 2) to those of respondents (category 1) within classes defined by the cross-classification of sampling strata, census region, and sample type (e.g., ABS and list supplemental samples). Those classes with fewer than 50 sampled addresses or large adjustment factors were collapsed with nearby cells within the sample type. Given the large variance in the household weights among the medium density ABS stratum, final household weights for addresses within this stratum were capped at 300.

Weighting of extended survey respondents

The extended interview nonresponse adjustment began by assigning each case that completed the screener interview to one of three dispositions: 1) eligible adult completed the extended interview; 2) eligible adult did not complete the extended interview; and 3) not eligible for the extended interview.

An initial adult base weight was calculated for the cases with a completed extended interview as the product of the truncated number of adults in the household (max value of 3) and the household weight. This adjustment accounted for selecting one adult in each household.

The final step in the adult weighting was calibrating the adult weights for those who completed the extended interview so that the calibrated weights (i.e., the estimated number of adults) aligned with benchmarks for non-institutionalized Asian adults from the 2016-2020 American Community Surveys Public Use Microdata Sample (PUMS). Specifically, raking was used to calibrate the weights on the following dimensions:

  • Ethnic group (Chinese, Filipino, Indian, Japanese, Korean, Vietnamese, other single Asian ethnicities, and multiple Asian ethnicities)
  • Collapsed ethnic group (Chinese, Filipino, Indian, Korean, Vietnamese, all other single and multiple Asian ethnicities) by age group
  • Collapsed ethnic group by sex
  • Collapsed ethnic group by census region
  • Collapsed ethnic group by education
  • Collapsed ethnic group by housing tenure
  • Collapsed ethnic group by nativity
  • Income group by number of persons in the household

The control totals used in raking were based on the entire population of Asian American adults (including those who live in the excluded stratum) to correct for both extended interview nonresponse and undercoverage from excluding the low-density stratum in the ABS frame.

Variance estimation

Because the modeled estimates used in the weighting are themselves subject to sampling error, variance estimation and tests of statistical significance were performed using the grouped jackknife estimator ( JK 2). One hundred sets of replicates were created by deleting a group of cases within each stratum from each replicate and doubling the weights for a corresponding set of cases in the same stratum. The entire weighting and modeling process was performed on the full sample and then separately repeated for each replicate. The result is a total of 101 separate weights for each respondent that have incorporated the variability from the complex sample design. 1

Response rates

Westat assigned all sampled cases a result code for their participation in the screener, and then they assigned a result for the extended questionnaire for those who were eligible for the survey of Asian Americans. Two of the dispositions warrant some discussion. One is the category “4.313 No such address.” This category is for addresses that were returned by the U.S. Postal Service as not being deliverable. This status indicates the address, which was on the USPS Delivery Sequence File at the time of sampling, currently is not occupied or no longer exists. The second category is “4.90 Other.” This category contains 588 addresses that were never mailed because they had a drop count of greater than four. Drop points are addresses with multiple households that share the same address. The information available in the ABS frame on drop points is limited to the number of drop points at the address, without information on the type of households at the drop point, or how they should be labeled for mailing purposes. In this survey, all drop points were eligible for sampling, but only those with drop point counts of four or fewer were mailed. Westat treated drop point counts of five or more as out of scope, and no mailing was done for those addresses.

Westat used the disposition results to compute response rates consistent with AAPOR definitions. The response rates are weighted by the base weight to account for the differential sampling in this survey. The AAPOR RR3 response rate to the screening interview was 17.0%. 2  The RR1 response rate to the extended Asian American interview (77.9%) is the number of eligible adults completing the questionnaire over the total sampled for that extended questionnaire. The overall response rate is the product of the screener response rate and the conditional response rate for the extended questionnaire. The overall response rate for the Asian American sample in the Pew Research Center survey was 13.3% (17.0% x 77.9%).

Table showing AAPOR disposition codes

Survey analysis of Asian adults living in poverty is based on 561 respondents of the 2022-23 survey of Asian Americans whose approximate family income falls at or below the 2022 federal poverty line published by the U.S. Department of Health and Human Services (HHS).

The survey asked respondents to choose their family income brackets in the 12 months prior to the survey. These income brackets were converted into dollars in the following ways:

  • For those reporting a family income of less than $12,500, $12,499 was used as a proxy for their family income.
  • For respondents reporting income brackets that are between $12,500 and $149,999, the midpoint of the selected income bracket was used as a proxy. For example, if they chose “$12,500 to $14,999,” $13,750 was used.
  • For respondents reporting a family income of $150,000 or more, $150,000 was used.

The survey also asked respondents how many adults ages 18 or older live in their household including themselves, from one to 10 adults. Additionally, the survey asked how many children under 18 live in their household, from zero to 10 children. These responses were used to calculate their total family (household) size. Asian adults were categorized as “living near or below the poverty line” if their approximate family income, after being adjusted for family size, falls at or below 100% of the 2022 federal poverty line. Respondents with a household size of four were categorized as “living near or below the poverty line” if their approximate family income is $27,750 or less. 3 All Asian adults whose approximate family income is $12,499 were categorized as “living near or below” the poverty line regardless of family size, since those respondents have an income under the 2022 federal poverty line for a family of one. All Asian adults who meet the criteria above are used for the analysis of Asians in poverty, irrespective of their status as students or not.

A number of sensitivity checks were performed to test the robustness of the findings, and the main conclusions were consistently upheld. These sensitivity checks included using the poverty thresholds published by the Census Bureau instead of the poverty line published by HHS to define Asians in poverty, and excluding full-time students from the analysis even if their family income falls at or below the poverty line.

  • For additional details on jackknife replication, refer to Rust, K.F., and J.N.K. Rao. 1996. “ Variance estimation for complex surveys using replication techniques .” Statistical Methods in Medical Research. ↩
  • The weighted share of unscreened households assumed to be eligible for the screener interview (occupied “e”) was 87%. ↩
  • The U.S. Department of Health and Human Services has separate poverty guidelines for the 48 contiguous states and the District of Columbia, Alaska and Hawaii. For all respondents in the 2022-23 survey of Asian Americans, poverty status was determined by applying the federal poverty line for the 48 contiguous states and D.C., regardless of respondents’ state of residence. ↩

Sign up for our weekly newsletter

Fresh data delivery Saturday mornings

Sign up for The Briefing

Weekly updates on the world of news & information

  • Asian Americans
  • Homeownership & Renting
  • Income & Wages
  • Income, Wealth & Poverty
  • Personal Finances

Key facts about Asian Americans living in poverty

1 in 10: redefining the asian american dream (short film), the hardships and dreams of asian americans living in poverty, key facts about asian american eligible voters in 2024, striking findings from 2023, most popular, report materials.

  • Moderator Guide

1615 L St. NW, Suite 800 Washington, DC 20036 USA (+1) 202-419-4300 | Main (+1) 202-857-8562 | Fax (+1) 202-419-4372 |  Media Inquiries

Research Topics

  • Age & Generations
  • Coronavirus (COVID-19)
  • Economy & Work
  • Family & Relationships
  • Gender & LGBTQ
  • Immigration & Migration
  • International Affairs
  • Internet & Technology
  • Methodological Research
  • News Habits & Media
  • Non-U.S. Governments
  • Other Topics
  • Politics & Policy
  • Race & Ethnicity
  • Email Newsletters

ABOUT PEW RESEARCH CENTER  Pew Research Center is a nonpartisan fact tank that informs the public about the issues, attitudes and trends shaping the world. It conducts public opinion polling, demographic research, media content analysis and other empirical social science research. Pew Research Center does not take policy positions. It is a subsidiary of  The Pew Charitable Trusts .

Copyright 2024 Pew Research Center

Terms & Conditions

Privacy Policy

Cookie Settings

Reprints, Permissions & Use Policy

National Center for Science and Engineering Statistics

  • Report PDF (690 KB)
  • Report - All Formats .ZIP (1.1 MB)
  • Share on X/Twitter
  • Share on Facebook
  • Share on LinkedIn
  • Send as Email

Business R&D Performance in the United States Tops $600 Billion in 2021

September 28, 2023

Funds spent for business R&D performed in the United States, by type of R&D, source of funds, and size of company: 2018–21

i = more than 50% of the estimate is a combination of imputation and reweighting to account for nonresponse.

a Domestic R&D performance is the cost of R&D paid for and performed by the respondent company and paid for by others outside of the company and performed by the respondent company. b R&D comprises creative and systematic work undertaken in order to increase the stock of knowledge and to devise new applications of available knowledge. This includes (1) activities aimed at acquiring new knowledge or understanding without specific immediate commercial applications or uses (basic research), (2) activities aimed at solving a specific problem or meeting a specific commercial objective (applied research), and (3) systematic work, drawing on research and practical experience and resulting in additional knowledge, which is directed to producing new processes or to improving existing products—goods or services—or processes (development). c Includes foreign subsidiaries of U.S. companies. d Includes companies located inside and outside the United States; U.S. state government agencies and laboratories; U.S. universities, colleges, and academic researchers; and all other organizations located inside and outside the United States. e Includes only companies with 10 or more domestic employees.

Detail may not add to total because of rounding.

National Center for Science and Engineering Statistics and Census Bureau, Business Enterprise Research and Development Survey.

R&D Performance, by Type of R&D, Industry Sector, and Source of Funding

In 2021, of the $602 billion that companies spent on R&D, $40 billion (7%) was spent on basic research, $86 billion (14%) on applied research, and $476 billion (79%) on development ( table 1 ). In 2021, companies in manufacturing industries performed $326 billion (54%) of domestic R&D , defined as R&D performed in the 50 states and Washington, DC ( table 2 ). Most of the funding came from these companies’ own funds (88%). Companies in nonmanufacturing industries performed $276 billion of domestic R&D (46% of total domestic R&D performance), 87% of which was paid for from companies’ own funds.

The U.S. federal government was a large source of external funding for R&D ( also referred to as R&D paid for by others ) across all industries. Of the $75 billion paid for by others, the federal government accounted for $24 billion. Seventy-four percent of federal government funding went to three industry groups: aerospace products and parts (North American Industry Classification System [NAICS] code 3364) ($11 billion), scientific research and development services (NAICS 5417) ($4 billion), and computer and electronic products (NAICS 334) ($3 billion). Companies also received funds from other U.S. companies ($27 billion) and foreign companies—including foreign parent companies of U.S. subsidiaries ($23 billion). Eighteen billion dollars (69%) of all business R&D funded by other U.S. companies was for scientific research and development services (NAICS 5417). The distribution of foreign company R&D funding was spread more broadly across multiple industries ( table 2 ). (See “ Survey Information and Data Availability ” for information on the availability of data tables with full industry detail.)

Funds spent for business R&D performed in the United States, by source of funds, selected industry, and company size: 2021

D = suppressed to avoid disclosure of confidential information; i = more than 50% of the estimate is a combination of imputation and reweighting to account for nonresponse; r = relative standard error is more than 50%.

NAICS = North American Industry Classification System; nec = not elsewhere classified.

a All R&D is the cost of R&D paid for and performed by the respondent company and paid for by others outside of the company and performed by the respondent company. b Includes foreign subsidiaries of U.S. companies ($32.1 billion). c Includes foreign parent companies of U.S. subsidiaries ($20.8 billion) and unaffiliated companies ($2.5 billion). Excludes funds from foreign subsidiaries to U.S. companies paid for through intercompany transactions ($32.1 billion). d Includes U.S. state government agencies and laboratories ($0.3 billion); U.S. universities, colleges, and academic researchers (< $0.01 billion); and all other organizations located inside ($0.7 billion) and outside the United States (< $0.01 billion). e Includes only companies with 10 or more domestic employees.

Detail may not add to total because of rounding. Industry classification was based on the dominant business code for domestic R&D performance, where available. For companies that did not report business codes, the classification used for sampling was assigned. Statistics are representative of companies located in the United States that performed or funded $50,000 or more of R&D.

National Center for Science and Engineering Statistics and Census Bureau, Business Enterprise Research and Development Survey, 2021.

Sales, R&D Intensity, and Employment of Companies That Performed or Funded R&D

U.S. companies that performed or funded R&D reported domestic net sales of $13 trillion in 2021 ( table 3 ). ​ Determining the amount of domestic net sales and operating revenues was left to the reporting company. However, guidance was given to include revenues from foreign operations and subsidiaries and from discontinued operations and to exclude intracompany transfers, returns, allowances, freight charges, and excise, sales, and other revenue-based taxes. For all industries, the R&D intensity (R&D-to-sales ratio) was 4.6%; for manufacturers, 5.0%; and for nonmanufacturers, 4.2%. Manufacturing industries with high levels of R&D intensity in 2021 were pharmaceuticals and medicines (NAICS 3254) (16.1%) and computer and electronic products (NAICS 334) (13.0%). Among the nonmanufacturing industries, industries with high levels of R&D intensity were scientific research and development services (NAICS 5417) (41.2%), software publishers (NAICS 5112) (12.9%), and computer systems design and related services (NAICS 5415) (10.2%).

Businesses that performed or funded R&D employed 23.7 million people in the United States in 2021 ( table 3 ). ​ Employment statistics in this InfoBrief are headcounts unless they are designated as full-time equivalent (FTE) estimates. R&D employees include researchers (defined as R&D scientists and engineers and their managers) and the technicians, technologists, and support staff members who work on R&D or who provide direct support to R&D activities. Approximately 2.1 million (9%) were business R&D employees. ​ The number of persons employed who were assigned full time to R&D plus a prorated number of employees who worked on R&D only part of the time was 1.9 million FTEs, of which 1.3 million FTEs were R&D researchers.

Of the 2.1 million people working on R&D in companies that performed or funded business R&D in 2021, 1.5 million were men and 0.6 million were women; 48% of the men and 45% of the women worked in manufacturing industries ( table 4 ). Researchers—that is, scientists, engineers, and their managers—accounted for 1.4 million of the 2.1 million R&D workers (67%). Of the R&D workers, 130,000 (9%) held PhD degrees. R&D technicians numbered 501,000, and 205,000 were grouped as other supporting staff.

Sales, R&D, R&D intensity, and employment for companies that performed or funded business R&D in the United States, by selected industry and company size: 2021

a Dollar values are for goods sold or services rendered by R&D-performing or R&D-funding companies located in the United States to customers outside of the company, including the U.S. federal government, foreign customers, and the company's foreign subsidiaries. Included are revenues from a company’s foreign operations and subsidiaries and from discontinued operations. If a respondent company is owned by a foreign parent company, sales to the parent company and to affiliates not owned by the respondent company are included. Excluded are intracompany transfers, returns, allowances, freight charges, and excise, sales, and other revenue-based taxes. b All R&D is the cost of R&D paid for and performed by the respondent company and paid for by others outside of the company and performed by the respondent company. c R&D intensity is the cost of domestic R&D paid for by the respondent company and others outside of the company and performed by the company divided by domestic net sales of companies that performed or funded R&D. d Data recorded on 12 March represent employment figures for the year. e Headcounts of researchers, R&D managers, technicians, clerical staff, and others assigned to R&D groups. f Includes only companies with 10 or more domestic employees.

Detail may not add to total because of rounding. Industry classification was based on the dominant business code for domestic R&D performance, where available. For companies that did not report business codes, the classification used for sampling was assigned.

Domestic employment, R&D employment by sex and work activity, R&D researchers by level of education, and full-time equivalent researcher employment for companies that performed or funded business R&D in the United States, by industrial sector: 2021

NAICS = North American Industry Classification System.

a Data recorded on 12 March represent employment figures for the year. b Includes R&D scientists and engineers and their managers. c Includes clerical staff and others assigned to R&D groups. d The number of persons employed who were assigned full time to R&D, plus a prorated number of employees who worked on R&D only part of the time.

Detail may not add to total because of rounding. Industry classification was based on the dominant business code for domestic R&D performance, where available. For companies that did not report business codes, the classification used for sampling was assigned. Excludes data for federally funded research and development centers. Also available in the full set of data tables are statistics on domestic R&D employment, by state; foreign R&D personnel headcounts, by country; and headcounts of leased (i.e., external) R&D personnel, by function.

R&D Performance, by Company Size

Small- and medium-sized companies (10–249 domestic employees) performed 9.8% of the nation’s total business R&D in 2021 ( table 3 ). Frascati Manual ; see Organisation for Economic Co-operation and Development (OECD). 2015. Frascati Manual: Guidelines for Collecting and Reporting Data on Research and Experimental Development. The Measurement of Scientific, Technological, and Innovation Activities . Paris: OECD Publishing. Available at https://www.oecd-ilibrary.org/science-and-technology/frascati-manual-2015_9789264239012-en . Anderson and Kindlon (2019) provide estimates of R&D performance and employment using these new classifications over 2008–15. The authors also compare the trends to those observed in SIRD for the time prior to 2008. The ABS, also cosponsored by NCSES and the Census Bureau, collects R&D data from companies with fewer than 10 employees for 2017 and beyond. See Anderson G, Kindlon A; NCSES. 2019. Indicators of R&D in Small Businesses: Data from the 2009–15 Business R&D and Innovation Survey . InfoBrief NSF 19-316. Alexandria, VA: National Science Foundation. Available at https://www.nsf.gov/statistics/2019/nsf19316/ ." data-bs-content="Company size classifications changed for 2017 and subsequent years in response to the revised Frascati Manual ; see Organisation for Economic Co-operation and Development (OECD). 2015. Frascati Manual: Guidelines for Collecting and Reporting Data on Research and Experimental Development. The Measurement of Scientific, Technological, and Innovation Activities . Paris: OECD Publishing. Available at https://www.oecd-ilibrary.org/science-and-technology/frascati-manual-2015_9789264239012-en . Anderson and Kindlon (2019) provide estimates of R&D performance and employment using these new classifications over 2008–15. The authors also compare the trends to those observed in SIRD for the time prior to 2008. The ABS, also cosponsored by NCSES and the Census Bureau, collects R&D data from companies with fewer than 10 employees for 2017 and beyond. See Anderson G, Kindlon A; NCSES. 2019. Indicators of R&D in Small Businesses: Data from the 2009–15 Business R&D and Innovation Survey . InfoBrief NSF 19-316. Alexandria, VA: National Science Foundation. Available at https://www.nsf.gov/statistics/2019/nsf19316/ ." data-endnote-uuid="bbd761ec-4ed8-45ec-810e-9b53647fe422">​ Company size classifications changed for 2017 and subsequent years in response to the revised Frascati Manual ; see Organisation for Economic Co-operation and Development (OECD). 2015. Frascati Manual: Guidelines for Collecting and Reporting Data on Research and Experimental Development. The Measurement of Scientific, Technological, and Innovation Activities . Paris: OECD Publishing. Available at https://www.oecd-ilibrary.org/science-and-technology/frascati-manual-2015_9789264239012-en . Anderson and Kindlon (2019) provide estimates of R&D performance and employment using these new classifications over 2008–15. The authors also compare the trends to those observed in SIRD for the time prior to 2008. The ABS, also cosponsored by NCSES and the Census Bureau, collects R&D data from companies with fewer than 10 employees for 2017 and beyond. See Anderson G, Kindlon A; NCSES. 2019. Indicators of R&D in Small Businesses: Data from the 2009–15 Business R&D and Innovation Survey . InfoBrief NSF 19-316. Alexandria, VA: National Science Foundation. Available at https://www.nsf.gov/statistics/2019/nsf19316/ . For these companies as a group, the R&D intensity was 8.8%. These companies accounted for 5% of sales and employed 7% of the 23.7 million employees who worked for R&D-performing or R&D-funding companies. They employed 18% of the 2.1 million employees engaged in business R&D in the United States.

Large companies with 250–24,999 domestic employees performed 52% of the nation’s total business R&D in 2021, and their R&D intensity was 4.7%. They accounted for 51% of sales, employed 42% of those who worked for R&D-performing or R&D-funding companies, and employed 51% of R&D employees in the United States.

The largest companies (25,000 or more domestic employees) performed 38% of the nation’s total business R&D in 2021, and their R&D intensity was 4.0%. They accounted for 44% of sales, employed 51% of those who worked for R&D-performing or R&D-funding companies, and employed 31% of business R&D employees in the United States.

R&D Performance, by State

In 2021, of the $602 billion of R&D performed in the United States, businesses in California alone accounted for 35.1% ( table 5 ). Other states with large amounts of business R&D were Washington (8.1% of the national total in 2021), Massachusetts (6.6%), Texas (4.7%), New York (4.4%), and New Jersey (4.2%). Over Half of U.S. Business R&D Performed in 10 Metropolitan Areas in 2015 . InfoBrief NSF 19-322. Alexandria, VA: National Science Foundation. Available at https://www.nsf.gov/statistics/2019/nsf19322/ . Also see Shackelford B, Wolfe R; NCSES. 2020. Businesses Performed 60% of Their U.S. R&D in 10 Metropolitan Areas in 2018 . InfoBrief NSF 21-331. Alexandria, VA: National Science Foundation. Available at https://ncses.nsf.gov/pubs/nsf21331 . Information and statistics on U.S. state trends in R&D, science and engineering education, workforce, patents and publications, and knowledge-intensive industries is also available in the Science and Engineering State Indicators data tool at https://ncses.nsf.gov/indicators/states ." data-bs-content="In addition to statistics for all states and for all states by industry, below-state level statistics are available in the full set of data tables and in other InfoBriefs; see Shackelford B, Wolfe R; NCSES. 2019. Over Half of U.S. Business R&D Performed in 10 Metropolitan Areas in 2015 . InfoBrief NSF 19-322. Alexandria, VA: National Science Foundation. Available at https://www.nsf.gov/statistics/2019/nsf19322/ . Also see Shackelford B, Wolfe R; NCSES. 2020. Businesses Performed 60% of Their U.S. R&D in 10 Metropolitan Areas in 2018 . InfoBrief NSF 21-331. Alexandria, VA: National Science Foundation. Available at https://ncses.nsf.gov/pubs/nsf21331 . Information and statistics on U.S. state trends in R&D, science and engineering education, workforce, patents and publications, and knowledge-intensive industries is also available in the Science and Engineering State Indicators data tool at https://ncses.nsf.gov/indicators/states ." data-endnote-uuid="8051c6cd-6983-4989-9a6c-bbc5713eaaa4">​ In addition to statistics for all states and for all states by industry, below-state level statistics are available in the full set of data tables and in other InfoBriefs; see Shackelford B, Wolfe R; NCSES. 2019. Over Half of U.S. Business R&D Performed in 10 Metropolitan Areas in 2015 . InfoBrief NSF 19-322. Alexandria, VA: National Science Foundation. Available at https://www.nsf.gov/statistics/2019/nsf19322/ . Also see Shackelford B, Wolfe R; NCSES. 2020. Businesses Performed 60% of Their U.S. R&D in 10 Metropolitan Areas in 2018 . InfoBrief NSF 21-331. Alexandria, VA: National Science Foundation. Available at https://ncses.nsf.gov/pubs/nsf21331 . Information and statistics on U.S. state trends in R&D, science and engineering education, workforce, patents and publications, and knowledge-intensive industries is also available in the Science and Engineering State Indicators data tool at https://ncses.nsf.gov/indicators/states .

Funds spent for business R&D performed in the United States, by state and source of funds: 2021

a All R&D is the cost of domestic R&D paid for by the respondent company and others outside of the company and performed by the respondent company. b Includes data reported that were not allocated to a specific state by multi-establishment companies. For single-establishment companies, data reported were allocated to the state in the address used to mail the survey form.

Capital Expenditures

Companies that performed or funded R&D in the United States in 2021 spent $793 billion on capital, that is, assets with expected useful lives of more than 1 year ( table 6 ). Of this amount, $53 billion (7%) was for assets used for domestic R&D operations (i.e., land acquisitions, buildings and land improvement, equipment, capitalized software, and other assets). Companies in manufacturing industries spent $28 billion on capital for domestic R&D, and companies in nonmanufacturing industries spent $24 billion. Industries with high levels of capital expenditures on assets used for domestic R&D in 2021 were pharmaceuticals and medicines (NAICS 3254) ($7.5 billion, or 14% of national capital expenditures on assets used for R&D) and semiconductor and other electronic products (NAICS 3344) ($5 billion, or 9%). Among all types of capital assets, manufacturing industries spent the most on equipment ($15 billion, or 53% of total capital assets used for domestic R&D), and nonmanufacturing industries disbursed the most on capitalized software ($13.7 billion, or 56%).

Capital expenditures in the United States, total and used for domestic R&D, by type of expenditure, industry, and company size: 2021

* = amount < $500,000; i = more than 50% of the estimate is a combination of imputation and reweighting to account for nonresponse; r = relative standard error is more than 50%.

a Domestic R&D is the R&D paid for by the respondent company and others outside of the company and performed by the company. b Capital expenditures are payments by a business for assets that usually have a useful life of more than 1 year. The value of assets acquired or improved through capital expenditures is recorded on a company’s balance sheet. BERD Survey statistics exclude the cost of assets acquired through mergers and acquisitions. c Capital expenditures for long-lived assets used in a company’s R&D operations are not included in its R&D expense, but any depreciation recorded for those assets is included in its R&D expense. For 2021, depreciation associated with domestic R&D paid for and performed by the company was $18.4 billion and with domestic R&D performed by the company and paid for by others was $2.7 billion. d Includes the cost of purchased or improved buildings and other facilities that are fixed to the land. e Includes the cost of other capital expenditures, including purchased patents and other intangible assets, and expenditures not distributed among the categories shown. f Includes only companies with 10 or more domestic employees.

Detail may not add to total because of rounding. Industry classification was based on dominant business code for domestic R&D performance, where available. For companies that did not report business codes, the classification used for sampling was assigned. An estimate range may be displayed in place of a single estimate to avoid disclosing operations of individual companies.

National Center for Science and Engineering Statistics and U.S. Census Bureau, Business Enterprise Research and Development Survey, 2021.

Survey Information and Data Availability

The sample for the BERD Survey was selected to represent all for-profit, nonfarm companies that were publicly or privately held, had 10 or more employees in the United States, and performed or funded R&D either domestically or abroad. The estimates in this InfoBrief are based on responses from a sample of the population and may differ from actual values because of sampling variability or other factors. As a result, apparent differences between the estimates for two or more groups may not be statistically significant. All comparative statements in this InfoBrief have undergone statistical testing and are significant at the 90% confidence level unless otherwise noted. The variances of estimates in this report were calculated using design-based formulas. Also, because the statistics from the survey are based on a sample, they are subject to both sampling and nonsampling errors. (See the 2021 “Technical Notes” at https://ncses.nsf.gov/surveys/business-enterprise-research-development/ .) ​ The Census Bureau reviewed the information in this InfoBrief for unauthorized disclosure of confidential information and approved the disclosure avoidance practices applied (Project No. P-7504682, Disclosure Review Board (DRB) approval number: CBDRB-FY23-0161).

Beginning in survey year 2018, companies that performed or funded less than $50,000 of R&D were excluded from tabulation.

In this InfoBrief, money amounts are expressed in current U.S. dollars and are not adjusted for inflation. A company is defined as a business organization located in the United States, either U.S. owned or a U.S. affiliate of a foreign parent company, of one or more establishments under common ownership or control.

For 2020, a total of 47,500 companies were sampled to represent the population of 1,140,000 companies; for 2021, a total of 47,500 companies were sampled, representing 1,137,000 companies. The actual numbers of reporting units in the sample that remained within the scope of the survey between sample selection and tabulation were 44,500 for 2020 and 44,000 for 2021. These lower counts represent the number of reporting units that were determined to be within the scope of the survey after all data collected were processed. Reasons for the reduced counts include mergers, acquisitions, and instances where companies had fewer than 10 employees in the United States or had gone out of business in the interim. Of these in-scope reporting units, 67% were considered to have met the criteria for a complete response to the 2020 survey; 69% fulfilled the 2021 complete response criteria. Coverage of the previous year’s known positive R&D stratum for 2020 was 92%; the coverage rate for 2021 was also 92%. Industry classification was based on the dominant business activity for domestic R&D performance, where available. For reporting units that did not report business activity codes for R&D, the classification used for sampling was assigned.

The estimation methodology for state estimates in the BERD Survey takes the form of a hybrid estimator, combining the unweighted reported amount, by state, with a weighted amount apportioned (or raked) across states with relevant industrial activity. The hybrid estimator smooths the estimate over states with R&D activity, by industry, and accounts for real observed change within a state. Table 5 shows the adjusted state estimates after this estimation methodology was applied.

The full set of data tables from the 2021 survey will be available at the BERD Survey page . Individual data tables and tables with relative standard errors and imputation rates from the 2021 survey are available from the author in advance of the full release. To minimize reporting burden, survey items are rotated on and off the survey on an odd- and even-numbered year schedule. Statistics on patents, intellectual property, and technology transfer activities were rotated off the survey for 2021. Items rotated on the survey for 2021 include questions on R&D performed by others by type of performer, federal R&D by government agency, and R&D by application area.

The BERD Survey contains confidential data that are protected under Title 13 and Title 26 of the U.S. Code. Restricted microdata can be accessed at the secure Federal Statistical Research Data Centers (FSRDCs) administered by the Census Bureau. FSRDCs are partnerships between federal statistical agencies and leading research institutions. FSRDCs provide secure environments supporting qualified researchers using restricted-access data while protecting respondent confidentiality. Researchers interested in using the microdata can submit a proposal to the Census Bureau, which evaluates proposals based on their benefit to the Census Bureau, scientific merit, feasibility, and risk of disclosure. To learn more about the FSRDCs and how to apply, please visit https://www.census.gov/about/adrm/fsrdc.html .

Suggested Citation

Britt R; National Center for Science and Engineering Statistics (NCSES). 2023. Business R&D Performance in the United States Tops $600 Billion in 2021 . NSF 23-350. Alexandria, VA: National Science Foundation. Available at http://ncses.nsf.gov/pubs/nsf23350 .

1 NSF has cosponsored an annual business R&D survey since 1953. The Survey of Industrial Research and Development (SIRD) collected data for 1953–2007, and its successor, the Business R&D and Innovation Survey (BRDIS), collected data for 2008–16. Beginning with 2017, the collection of innovation data was moved to the Annual Business Survey (ABS), another survey cosponsored with the Census Bureau, and BRDIS became the Business Research and Development Survey (BRDS). Beginning with 2019, the business R&D data collection reported here was renamed the Business Enterprise Research and Development (BERD) Survey for international comparability.

2 Determining the amount of domestic net sales and operating revenues was left to the reporting company. However, guidance was given to include revenues from foreign operations and subsidiaries and from discontinued operations and to exclude intracompany transfers, returns, allowances, freight charges, and excise, sales, and other revenue-based taxes.

3 Employment statistics in this InfoBrief are headcounts unless they are designated as full-time equivalent (FTE) estimates. R&D employees include researchers (defined as R&D scientists and engineers and their managers) and the technicians, technologists, and support staff members who work on R&D or who provide direct support to R&D activities.

4 The number of persons employed who were assigned full time to R&D plus a prorated number of employees who worked on R&D only part of the time was 1.9 million FTEs, of which 1.3 million FTEs were R&D researchers.

5 Company size classifications changed for 2017 and subsequent years in response to the revised Frascati Manual ; see Organisation for Economic Co-operation and Development (OECD). 2015. Frascati Manual: Guidelines for Collecting and Reporting Data on Research and Experimental Development. The Measurement of Scientific, Technological, and Innovation Activities . Paris: OECD Publishing. Available at https://www.oecd-ilibrary.org/science-and-technology/frascati-manual-2015_9789264239012-en . Anderson and Kindlon (2019) provide estimates of R&D performance and employment using these new classifications over 2008–15. The authors also compare the trends to those observed in SIRD for the time prior to 2008. The ABS, also cosponsored by NCSES and the Census Bureau, collects R&D data from companies with fewer than 10 employees for 2017 and beyond. See Anderson G, Kindlon A; NCSES. 2019. Indicators of R&D in Small Businesses: Data from the 2009–15 Business R&D and Innovation Survey . InfoBrief NSF 19-316. Alexandria, VA: National Science Foundation. Available at https://www.nsf.gov/statistics/2019/nsf19316/ .

6 In addition to statistics for all states and for all states by industry, below-state level statistics are available in the full set of data tables and in other InfoBriefs; see Shackelford B, Wolfe R; NCSES. 2019. Over Half of U.S. Business R&D Performed in 10 Metropolitan Areas in 2015 . InfoBrief NSF 19-322. Alexandria, VA: National Science Foundation. Available at https://www.nsf.gov/statistics/2019/nsf19322/ . Also see Shackelford B, Wolfe R; NCSES. 2020. Businesses Performed 60% of Their U.S. R&D in 10 Metropolitan Areas in 2018 . InfoBrief NSF 21-331. Alexandria, VA: National Science Foundation. Available at https://ncses.nsf.gov/pubs/nsf21331 . Information and statistics on U.S. state trends in R&D, science and engineering education, workforce, patents and publications, and knowledge-intensive industries is also available in the Science and Engineering State Indicators data tool at https://ncses.nsf.gov/indicators/states .

7 The Census Bureau reviewed the information in this InfoBrief for unauthorized disclosure of confidential information and approved the disclosure avoidance practices applied (Project No. P-7504682, Disclosure Review Board (DRB) approval number: CBDRB-FY23-0161).

Report Author

Ronda Britt Survey Manager NCSES Tel: (703) 292-7765 E-mail: [email protected]

National Center for Science and Engineering Statistics Directorate for Social, Behavioral and Economic Sciences National Science Foundation 2415 Eisenhower Avenue, Suite W14200 Alexandria, VA 22314 Tel: (703) 292-8780 FIRS: (800) 877-8339 TDD: (800) 281-8749 E-mail: [email protected]

Source Data & Analysis

Data Tables (NSF 23-351)

Get e-mail updates from NCSES

NCSES is an official statistical agency. Subscribe below to receive our latest news and announcements.

IMAGES

  1. Sampling Methods: Guide To All Types with Examples

    research type sampling

  2. Types Of Sampling Sampling Methods With Examples

    research type sampling

  3. Types of Sampling: Sampling Methods with Examples

    research type sampling

  4. Types of sampling methods

    research type sampling

  5. Systematic Sampling

    research type sampling

  6. Types of Sampling Methods

    research type sampling

VIDEO

  1. Sampling in Research

  2. what are sampling technique of research .mp4

  3. SAMPLING PROCEDURE AND SAMPLE (QUALITATIVE RESEARCH)

  4. Cluster Random Sampling I Probability sampling

  5. Systematic Sampling

  6. Sampling Techniques

COMMENTS

  1. Sampling Methods

    Sampling methods are crucial for conducting reliable research. In this article, you will learn about the types, techniques and examples of sampling methods, and how to choose the best one for your study. Scribbr also offers free tools and guides for other aspects of academic writing, such as citation, bibliography, and fallacy.

  2. Sampling Methods In Reseach: Types, Techniques, & Examples

    Sampling methods in psychology refer to strategies used to select a subset of individuals (a sample) from a larger population, to study and draw inferences about the entire population. Common methods include random sampling, stratified sampling, cluster sampling, and convenience sampling. Proper sampling ensures representative, generalizable, and valid research results.

  3. Sampling Methods

    Types of Sampling Methods. Sampling can be broadly categorized into two main categories: Probability Sampling. This type of sampling is based on the principles of random selection, and it involves selecting samples in a way that every member of the population has an equal chance of being included in the sample..

  4. Sampling Methods, Types & Techniques

    Types of sampling. Sampling strategies in research vary widely across different disciplines and research areas, and from study to study. There are two major types of sampling methods: probability and non-probability sampling. Probability sampling, also known as random sampling, is a kind of sample selection where randomization is used instead ...

  5. What are Sampling Methods? Techniques, Types, and Examples

    Understand sampling methods in research, from simple random sampling to stratified, systematic, and cluster sampling. Learn how these sampling techniques boost data accuracy and representation, ensuring robust, reliable results. Check this article to learn about the different sampling method techniques, types and examples.

  6. Sampling Methods

    1. Simple random sampling. In a simple random sample, every member of the population has an equal chance of being selected. Your sampling frame should include the whole population. To conduct this type of sampling, you can use tools like random number generators or other techniques that are based entirely on chance.

  7. What are sampling methods and how do you choose the best one?

    We could choose a sampling method based on whether we want to account for sampling bias; a random sampling method is often preferred over a non-random method for this reason. Random sampling examples include: simple, systematic, stratified, and cluster sampling. Non-random sampling methods are liable to bias, and common examples include ...

  8. Sampling Methods: Guide To All Types with Examples

    Sampling in market action research is of two types - probability sampling and non-probability sampling. Let's take a closer look at these two methods of sampling. Probability sampling:Probability sampling is a sampling technique where a researcher selects a few criteria and chooses members of a population randomly.

  9. Sampling Methods: Different Types in Research

    A sample is the subset of the population that you actually measure, test, or evaluate and base your results. Sampling methods are how you obtain your sample. Before beginning your study, carefully define the population because your results apply to the target population. You can define your population as narrowly as necessary to meet the needs ...

  10. Sampling Methods & Strategies 101 (With Examples)

    Stratified random sampling. Stratified random sampling is similar to simple random sampling, but it kicks things up a notch. As the name suggests, stratified sampling involves selecting participants randomly, but from within certain pre-defined subgroups (i.e., strata) that share a common trait.For example, you might divide the population into strata based on gender, ethnicity, age range or ...

  11. Types of sampling methods

    Cluster sampling- she puts 50 into random groups of 5 so we get 10 groups then randomly selects 5 of them and interviews everyone in those groups --> 25 people are asked. 2. Stratified sampling- she puts 50 into categories: high achieving smart kids, decently achieving kids, mediumly achieving kids, lower poorer achieving kids and clueless ...

  12. Sampling Methods for Research: Types, Uses, and Examples

    Evaluate your goals against time and budget. List the two or three most obvious sampling methods that will work for you. Confirm the availability of your resources (researchers, computer time, etc.) Compare each of the possible methods with your goals, accuracy, precision, resource, time, and cost constraints.

  13. Sampling methods in Clinical Research; an Educational Review

    Sampling types. There are two major categories of sampling methods ( figure 1 ): 1; probability sampling methods where all subjects in the target population have equal chances to be selected in the sample [ 1, 2] and 2; non-probability sampling methods where the sample population is selected in a non-systematic process that does not guarantee ...

  14. Sampling Methods: Types, Techniques & Best Practices

    Sampling strategies vary widely across different disciplines and research areas, and from study to study. There are two major types of sampling - probability and non-probability sampling. Probability sampling, also known as random sampling, is a kind of sample selection where randomisation is used instead of deliberate choice.

  15. Methodology Series Module 5: Sampling Strategies

    We will try to understand some of these sampling methods that are commonly used in clinical research. There are essentially two types of sampling methods: (1) Probability sampling - based on chance events (such as random numbers, flipping a coin, etc.) and (2) nonprobability sampling - based on researcher's choice, population that ...

  16. Types of Sampling in Research : Journal of the Practice of

    Types of stratified sampling. There are two types - (a)-Proportionate stratified random sampling - in this type, the sample size is directly proportional to the entire population of strata, i.e., each strata sample has the same sampling fraction. (b) When the sample size is not proportional.

  17. Sampling Methods

    Abstract. Knowledge of sampling methods is essential to design quality research. Critical questions are provided to help researchers choose a sampling method. This article reviews probability and non-probability sampling methods, lists and defines specific sampling techniques, and provides pros and cons for consideration.

  18. Chapter 5. Sampling

    The population is the entire group that you want to draw conclusions about.; The sample is the specific group of individuals that you will collect data from.; Sampling frame is the actual list of individuals that the sample will be drawn from. Ideally, it should include the entire target population (and nobody who is not part of that population). Sample size is how many individuals (or units ...

  19. (PDF) Types of sampling in research

    Purposive sampling is a type of sampling, which is applied to select members of samples according to the purpose of the study. It is also called judgmental or deliberate sampling (Bhardwaj, 2019) .

  20. Sampling

    Sampling is the statistical process of selecting a subset—called a 'sample'—of a population of interest for the purpose of making observations and statistical inferences about that population. Social science research is generally about inferring patterns of behaviours within specific populations. We cannot study entire populations because of feasibility and cost constraints, and hence ...

  21. Types of Sampling Methods in Research: Briefly Explained

    Cluster Random Sampling. This is one of the popular types of sampling methods that randomly select members from a list which is too large. A typical example is when a researcher wants to choose 1000 individuals from the entire population of the U.S. It is impossible to get a complete list of every individual.

  22. U.S. Surveys

    Pew Research Center has deep roots in U.S. public opinion research. Launched initially as a project focused primarily on U.S. policy and politics in the early 1990s, the Center has grown over time to study a wide range of topics vital to explaining America to itself and to the world.Our hallmarks: a rigorous approach to methodological quality, complete transparency as to our methods, and a ...

  23. A cheap drug may slow down aging. A study will determine if it works

    Austad also points to a British study that found a lower risk of dementia and mild cognitive decline among people with type 2 diabetes taking metformin. In addition, there's research pointing to ...

  24. A look at national polling accuracy

    ABOUT PEW RESEARCH CENTER Pew Research Center is a nonpartisan fact tank that informs the public about the issues, attitudes and trends shaping the world. It conducts public opinion polling, demographic research, media content analysis and other empirical social science research. Pew Research Center does not take policy positions.

  25. Align Your Steps

    Sampling from DMs can be seen as solving a differential equation through a discretized set of noise levels known as the sampling schedule. While past works primarily focused on deriving efficient solvers, little attention has been given to finding optimal sampling schedules, and the entire literature relies on hand-crafted heuristics. ...

  26. Coordination as the flagship to the efficacy of humanitarian aid

    This research aims to raise awareness of the significance of coordination in this field, as well as setting the stage for further research of issues and potential solutions within this policy domain. While "Coordination" is a widely accepted term that is regarded as a comprehensive solution, it lacks specificity regarding the type of ...

  27. Methodology: 2022-23 survey of Asian Americans

    The survey analysis is drawn from a national cross-sectional survey conducted for Pew Research Center by Westat. The sampling design of the survey was an address-based sampling (ABS) approach, supplemented by list samples, to reach a nationally representative group of respondents. The survey was fielded July 5, 2022, through Jan. 27, 2023.

  28. Business R&D Performance in the United States Tops $600 Billion in 2021

    Businesses continued to increase their research and development performance in 2021, spending $602 billion on R&D in the United States, a 12.1% increase from 2020. Funding from the companies' own sources accounted for $528 billion of this spending in 2021, a 13.2% increase from 2020. Funding from other sources accounted for $75 billion, a 4.5% increase from 2020.